From 2dc1dcc96c8f2c0605293fe1ac17343f5fad716c Mon Sep 17 00:00:00 2001 From: David Cecchini Date: Fri, 1 Sep 2023 18:11:00 -0300 Subject: [PATCH] Models hub finance (#594) * Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer --------- Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com> Co-authored-by: gadde5300 Co-authored-by: Meryem1425 Co-authored-by: SKocer Co-authored-by: Merve Ertas Uslu <67653613+Mary-Sci@users.noreply.github.com> Co-authored-by: gokhanturer --- .../gokhanturer/2023-09-01-finpipe_deid_en.md | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 docs/_posts/gokhanturer/2023-09-01-finpipe_deid_en.md diff --git a/docs/_posts/gokhanturer/2023-09-01-finpipe_deid_en.md b/docs/_posts/gokhanturer/2023-09-01-finpipe_deid_en.md new file mode 100644 index 0000000000..6d2e41062d --- /dev/null +++ b/docs/_posts/gokhanturer/2023-09-01-finpipe_deid_en.md @@ -0,0 +1,156 @@ +--- +layout: model +title: Financial Deidentification Pipeline +author: John Snow Labs +name: finpipe_deid +date: 2023-09-01 +tags: [licensed, en, finance, deid, deidentification, anonymization] +task: Pipeline Finance +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.4 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Pretrained Pipeline aimed to deidentify legal and financial documents to be compliant with data privacy regulations as GDPR and CCPA. Since the models used in this pipeline are statistical, make sure you use this model in a human-in-the-loop process to guarantee a 100% accuracy. + +You can carry out both masking and obfuscation with this pipeline, on the following entities: +`ALIAS`, `EMAIL`, `PHONE`, `PROFESSION`, `ORG`, `DATE`, `PERSON`, `ADDRESS`, `STREET`, `CITY`, `STATE`, `ZIP`, `COUNTRY`, `TITLE_CLASS`, `TICKER`, `STOCK_EXCHANGE`, `CFN`, `IRS` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.4_1693602582270.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finpipe_deid_en_1.0.0_3.4_1693602582270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +from sparknlp.pretrained import PretrainedPipeline + +deid_pipeline = PretrainedPipeline("finpipe_deid", "en", "finance/models") + +result = deid_pipeline.annotate("""CARGILL, INCORPORATED + +By: Pirkko Suominen + + + +Name: Pirkko Suominen Title: Director, Bio Technology Development Center, Date: 10/19/2011 + +BIOAMBER, SAS + +By: Jean-François Huc + + + +Name: Jean-François Huc Title: President Date: October 15, 2011 + +email : jeanfran@gmail.com +phone : 18087339090 """) + +``` + +
+ +## Results + +```bash +Masked with entity labels +------------------------------ +, +By: +Name: : , Date: +, +By: +Name: : Date: + +email : +phone : + +Masked with chars +------------------------------ +[*****], [**********] +By: [*************] +Name: [*******************]: [**********************************] Center, Date: [********] +[******], [*] +By: [***************] +Name: [**********************]: [*******]Date: [**************] + +email : [****************] +phone : [********] + +Masked with fixed length chars +------------------------------ +****, **** +By: **** +Name: ****: ****, Date: **** +****, **** +By: **** +Name: ****: ****Date: **** + +email : **** +phone : **** + +Obfuscated +------------------------------ +MGT Trust Company, LLC., Clarus llc. +By: Benjamin Dean +Name: John Snow Labs Inc: Sales Manager, Date: 03/08/2025 +Clarus llc., SESA CO. +By: JAMES TURNER +Name: MGT Trust Company, LLC.: Business ManagerDate: 11/7/2016 + +email : Tyrus@google.com +phone : 78 834 854 + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finpipe_deid| +|Type:|pipeline| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|475.2 MB| + +## Included Models + +- DocumentAssembler +- SentenceDetector +- TokenizerModel +- BertEmbeddings +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- FinanceNerModel +- NerConverterInternalModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ContextualParserModel +- ChunkMergeModel +- DeIdentificationModel +- DeIdentificationModel +- DeIdentificationModel +- DeIdentificationModel \ No newline at end of file