From b89e93efed837cb6bdf2209967b3bd47ae7c101e Mon Sep 17 00:00:00 2001 From: SKocer Date: Sat, 2 Sep 2023 04:00:23 +0700 Subject: [PATCH] Add model 2023-09-01-finpipe_deid_en --- .../SKocer/2023-09-01-finpipe_deid_en.md | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 docs/_posts/SKocer/2023-09-01-finpipe_deid_en.md diff --git a/docs/_posts/SKocer/2023-09-01-finpipe_deid_en.md b/docs/_posts/SKocer/2023-09-01-finpipe_deid_en.md new file mode 100644 index 0000000000..f81826229b --- /dev/null +++ b/docs/_posts/SKocer/2023-09-01-finpipe_deid_en.md @@ -0,0 +1,156 @@ +--- +layout: model +title: Financial Deidentification Pipeline +author: John Snow Labs +name: finpipe_deid +date: 2023-09-01 +tags: [licensed, en, finance, deid, deidentification, anonymization] +task: Pipeline Finance +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.2 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Pretrained Pipeline aimed to deidentify legal and financial documents to be compliant with data privacy regulations as GDPR and CCPA. Since the models used in this pipeline are statistical, make sure you use this model in a human-in-the-loop process to guarantee a 100% accuracy. + +You can carry out both masking and obfuscation with this pipeline, on the following entities: +`ALIAS`, `EMAIL`, `PHONE`, `PROFESSION`, `ORG`, `DATE`, `PERSON`, `ADDRESS`, `STREET`, `CITY`, `STATE`, `ZIP`, `COUNTRY`, `TITLE_CLASS`, `TICKER`, `STOCK_EXCHANGE`, `CFN`, `IRS` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.2_1693602013381.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.2_1693602013381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("finpipe_deid", "en", "finance/models") + +result = deid_pipeline.annotate("""CARGILL, INCORPORATED + +By: Pirkko Suominen + + + +Name: Pirkko Suominen Title: Director, Bio Technology Development Center, Date: 10/19/2011 + +BIOAMBER, SAS + +By: Jean-François Huc + + + +Name: Jean-François Huc Title: President Date: October 15, 2011 + +email : jeanfran@gmail.com +phone : 18087339090 """) + +``` + +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +, +By: +Name: : , Date: +, +By: +Name: : Date: + +email : +phone : + +Masked with chars +------------------------------ +[*****], [**********] +By: [*************] +Name: [*******************]: [**********************************] Center, Date: [********] +[******], [*] +By: [***************] +Name: [**********************]: [*******]Date: [**************] + +email : [****************] +phone : [********] + +Masked with fixed length chars +------------------------------ +****, **** +By: **** +Name: ****: ****, Date: **** +****, **** +By: **** +Name: ****: ****Date: **** + +email : **** +phone : **** + +Obfuscated +------------------------------ +MGT Trust Company, LLC., Clarus llc. +By: Benjamin Dean +Name: John Snow Labs Inc: Sales Manager, Date: 03/08/2025 +Clarus llc., SESA CO. +By: JAMES TURNER +Name: MGT Trust Company, LLC.: Business ManagerDate: 11/7/2016 + +email : Tyrus@google.com +phone : 78 834 854 + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finpipe_deid| +|Type:|pipeline| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|472.3 MB| + +## Included Models + +- DocumentAssembler +- SentenceDetector +- TokenizerModel +- BertEmbeddings +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ChunkMergeModel +- DeIdentificationModel +- DeIdentificationModel +- DeIdentificationModel +- DeIdentificationModel \ No newline at end of file