From 8fd3da23cfb73e9e56384ffae859bd64bc9287af Mon Sep 17 00:00:00 2001 From: David Cecchini Date: Wed, 10 Jul 2024 12:06:32 -0300 Subject: [PATCH] Models hub finance (#1316) * Add model 2023-08-03-finner_bert_subpoenas_sm_en (#493) Co-authored-by: gadde5300 * Delete subpoenas ner finance * Add model 2023-08-30-finpipe_deid_en (#566) Co-authored-by: Meryem1425 * Add model 2023-08-30-finpipe_deid_en (#570) Co-authored-by: SKocer * Add model 2023-08-30-finpipe_deid_en (#571) Co-authored-by: SKocer * Delete 2023-08-30-finpipe_deid_en.md * Add model 2023-08-30-finpipe_deid_en (#572) Co-authored-by: gokhanturer * Add model 2023-08-30-finpipe_deid_en (#574) Co-authored-by: SKocer * Add model 2023-09-01-finpipe_deid_en (#586) Co-authored-by: Meryem1425 * Add model 2023-09-01-finpipe_deid_en (#589) Co-authored-by: SKocer * Add model 2023-09-01-finpipe_deid_en (#593) Co-authored-by: gokhanturer * 2023-10-06-finembedding_e5_base_en (#685) * Add model 2023-10-06-finembedding_e5_base_en * Add model 2023-10-06-finner_absa_sm_en * Add model 2023-10-06-finassertion_absa_sm_en --------- Co-authored-by: dcecchini * Add model 2023-11-09-finembedding_e5_large_en (#745) Co-authored-by: dcecchini * 2023-11-11-finner_aspect_based_sentiment_md_en (#754) * Add model 2023-11-11-finner_aspect_based_sentiment_md_en * Add model 2023-11-11-finassertion_aspect_based_sentiment_md_en * Update 2023-11-11-finner_aspect_based_sentiment_md_en.md * Update 2023-11-11-finassertion_aspect_based_sentiment_md_en.md --------- Co-authored-by: Mary-Sci Co-authored-by: Merve Ertas Uslu <67653613+Mary-Sci@users.noreply.github.com> * Add model 2023-12-07-finembeddings_bge_base_en (#812) Co-authored-by: dcecchini * 2024-05-17-finner_sec_edgar_fe_en (#1211) * Add model 2024-05-17-finner_sec_edgar_fe_en * Add model 2024-05-17-finner_deid_sec_fe_en * Update 2024-05-17-finner_deid_sec_fe_en.md * Add model 2024-05-21-finner_aspect_based_sentiment_fe_en * Add model 2024-05-21-finance_word_embeddings_en * Add model 2024-06-07-finner_financial_xlarge_fe_en * Update 2024-06-07-finner_financial_xlarge_fe_en.md * Add model 2024-06-10-finel_nasdaq_company_name_stock_screener_fe_en * Add model 2024-06-10-finel_edgar_company_name_fe_en * Add model 2024-06-10-finance_bge_base_embeddings_en * Add model 2024-06-11-finel_names2tickers_fe_en * Add model 2024-06-12-finel_tickers2names_fe_en * Add model 2024-06-21-finassertion_aspect_based_sentiment_md_fe_en --------- Co-authored-by: gadde5300 Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com> --------- Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com> Co-authored-by: gadde5300 Co-authored-by: Meryem1425 Co-authored-by: SKocer Co-authored-by: Merve Ertas Uslu <67653613+Mary-Sci@users.noreply.github.com> Co-authored-by: gokhanturer Co-authored-by: Mary-Sci Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com> --- .../2024-05-17-finner_deid_sec_fe_en.md | 138 ++++++++++++++++ .../2024-05-17-finner_sec_edgar_fe_en.md | 130 +++++++++++++++ .../2024-05-21-finance_word_embeddings_en.md | 66 ++++++++ ...-21-finner_aspect_based_sentiment_fe_en.md | 125 ++++++++++++++ ...024-06-07-finner_financial_xlarge_fe_en.md | 153 ++++++++++++++++++ ...24-06-10-finance_bge_base_embeddings_en.md | 64 ++++++++ ...24-06-10-finel_edgar_company_name_fe_en.md | 93 +++++++++++ ...asdaq_company_name_stock_screener_fe_en.md | 124 ++++++++++++++ .../2024-06-11-finel_names2tickers_fe_en.md | 89 ++++++++++ .../2024-06-12-finel_tickers2names_fe_en.md | 89 ++++++++++ ...sertion_aspect_based_sentiment_md_fe_en.md | 132 +++++++++++++++ 11 files changed, 1203 insertions(+) create mode 100644 docs/_posts/gadde5300/2024-05-17-finner_deid_sec_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-05-17-finner_sec_edgar_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-05-21-finance_word_embeddings_en.md create mode 100644 docs/_posts/gadde5300/2024-05-21-finner_aspect_based_sentiment_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-07-finner_financial_xlarge_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-10-finance_bge_base_embeddings_en.md create mode 100644 docs/_posts/gadde5300/2024-06-10-finel_edgar_company_name_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-10-finel_nasdaq_company_name_stock_screener_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-11-finel_names2tickers_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-12-finel_tickers2names_fe_en.md create mode 100644 docs/_posts/gadde5300/2024-06-21-finassertion_aspect_based_sentiment_md_fe_en.md diff --git a/docs/_posts/gadde5300/2024-05-17-finner_deid_sec_fe_en.md b/docs/_posts/gadde5300/2024-05-17-finner_deid_sec_fe_en.md new file mode 100644 index 0000000000..51d63dcb0e --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-17-finner_deid_sec_fe_en.md @@ -0,0 +1,138 @@ +--- +layout: model +title: Generic Deidentification NER (Finance) +author: John Snow Labs +name: finner_deid_sec_fe +date: 2024-05-17 +tags: [deid, deidentification, anonymization, en, licensed] +task: Named Entity Recognition +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: FinanceNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a NER model trained using custom finance embeddings which allows you to detect some generic entities that may require to be masked or obfuscated to be compliant with different regulations, as GDPR and CCPA. This is just an NER model, make sure you try the full De-identification pipelines available in Models Hub. + +## Predicted Entities + +`AGE`, `CITY`, `COUNTRY`, `DATE`, `EMAIL`, `LOCATION-OTHER`, `FAX`, `ORG`, `PERSON`, `PHONE`, `PROFESSION`, `STATE`, `STREET`, `URL`, `ZIP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_deid_sec_fe_en_1.0.0_3.0_1715953927003.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_deid_sec_fe_en_1.0.0_3.0_1715953927003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model =finance.NerModel.pretrained("finner_deid_sec_fe", "en", "finance/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = [""" This LICENSE AND DEVELOPMENT AGREEMENT (this Agreement) is entered into effective as of Nov. 02, 2019 (the Effective Date) by and between Bioeq IP AG, having its principal place of business at 333 Twin Dolphin Drive, Suite 600, Redwood City, CA, 94065, USA (Licensee). """] + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------------+------+ +|chunk |label | ++----------------------+------+ +|Nov. 02, 2019 |DATE | +|333 Twin Dolphin Drive|STREET| +|Redwood City |CITY | +|CA |STATE | +|94065 |ZIP | ++----------------------+------+ + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finner_deid_sec_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.6 MB| + +## References + +In-house annotated documents with protected information + +## Benchmarking + +```bash + precision recall f1-score support + AGE 0.97 0.95 0.96 266 + CITY 0.86 0.80 0.83 120 + COUNTRY 0.86 0.63 0.73 38 + DATE 0.98 0.98 0.98 2206 + EMAIL 1.00 1.00 1.00 1 + FAX 0.00 0.00 0.00 2 +LOCATION-OTHER 1.00 0.33 0.50 6 + ORG 0.82 0.55 0.66 42 + PERSON 0.95 0.95 0.95 1295 + PHONE 0.89 0.89 0.89 62 + PROFESSION 0.75 0.55 0.64 76 + STATE 0.90 0.92 0.91 90 + STREET 0.92 0.89 0.91 81 + URL 0.00 0.00 0.00 1 + ZIP 0.97 0.94 0.95 67 + micro-avg 0.96 0.94 0.95 4353 + macro-avg 0.79 0.69 0.73 4353 + weighted-avg 0.96 0.94 0.95 4353 +``` diff --git a/docs/_posts/gadde5300/2024-05-17-finner_sec_edgar_fe_en.md b/docs/_posts/gadde5300/2024-05-17-finner_sec_edgar_fe_en.md new file mode 100644 index 0000000000..01a7f27a7c --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-17-finner_sec_edgar_fe_en.md @@ -0,0 +1,130 @@ +--- +layout: model +title: Financial NER on EDGAR Documents +author: John Snow Labs +name: finner_sec_edgar_fe +date: 2024-05-17 +tags: [en, licensed, finance, ner, sec] +task: Named Entity Recognition +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This Financial NER model extracts ORG, INST, LAW, COURT, PER, LOC, MISC, ALIAS, and TICKER entities from the US SEC EDGAR documents, was trained using custom finance word embeddings. + +## Predicted Entities + +`ORG`, `INST`, `LAW`, `COURT`, `PER`, `LOC`, `MISC`, `ALIAS`, `TICKER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_sec_edgar_fe_en_1.0.0_3.0_1715948751469.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_sec_edgar_fe_en_1.0.0_3.0_1715948751469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model = finance.NerModel.pretrained("finner_sec_edgar_fe", "en", "finance/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""In our opinion, the accompanying consolidated balance sheets and the related consolidated statements of operations, of changes in stockholders' equity, and of cash flows present fairly, in all material respects, the financial position of SunGard Capital Corp. II and its subsidiaries ( SCC II ) at December 31, 2010, and 2009, and the results of their operations and their cash flows for each of the three years in the period ended December 31, 2010, in conformity with accounting principles generally accepted in the United States of America."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------------------------------+-----+ +|chunk |label| ++----------------------------------------+-----+ +|SunGard Capital Corp |ORG | +|SCC II |ALIAS| +|accounting principles generally accepted|LAW | +|United States of America |LOC | ++----------------------------------------+-----+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finner_sec_edgar_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|2.2 MB| + +## References + +In-house annotations + +## Benchmarking + +```bash + precision recall f1-score support +ALIAS 0.91 0.80 0.85 84 +COURT 1.00 1.00 1.00 6 +INST 0.92 0.76 0.83 76 +LAW 0.89 0.86 0.87 166 +LOC 0.87 0.87 0.87 140 +MISC 0.86 0.75 0.80 226 +ORG 0.88 0.91 0.89 430 +PER 0.89 0.88 0.89 66 +TICKER 1.00 0.86 0.92 7 +micro-avg 0.88 0.85 0.87 1201 +macro-avg 0.91 0.85 0.88 1201 +weighted-avg 0.88 0.85 0.86 1201 +``` \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-05-21-finance_word_embeddings_en.md b/docs/_posts/gadde5300/2024-05-21-finance_word_embeddings_en.md new file mode 100644 index 0000000000..d2ee52117f --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-21-finance_word_embeddings_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: Finance Word Embeddings +author: John Snow Labs +name: finance_word_embeddings +date: 2024-05-21 +tags: [en, finance, licensed, word_embeddings] +task: Embeddings +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: WordEmbeddingsModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The word embedding models were based on Word2Vec, trained on a mix of different datasets. We used public data and in-house annotated documents. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finance_word_embeddings_en_1.0.0_3.0_1716300545868.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finance_word_embeddings_en_1.0.0_3.0_1716300545868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +model = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings","en","finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finance_word_embeddings| +|Type:|embeddings| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[word_embeddings]| +|Language:|en| +|Size:|103.4 MB| +|Case sensitive:|false| +|Dimension:|200| + +## References + +Public data and in-house annotated documents \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-05-21-finner_aspect_based_sentiment_fe_en.md b/docs/_posts/gadde5300/2024-05-21-finner_aspect_based_sentiment_fe_en.md new file mode 100644 index 0000000000..b9a4c157be --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-21-finner_aspect_based_sentiment_fe_en.md @@ -0,0 +1,125 @@ +--- +layout: model +title: Financial NER on Aspect-Based Sentiment Analysis +author: John Snow Labs +name: finner_aspect_based_sentiment_fe +date: 2024-05-21 +tags: [ner, finance, licensed, en] +task: Named Entity Recognition +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: FinanceNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This NER model identifies entities that can be associated with a financial sentiment. The model is trained using custom finance embeddings and is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category. + +## Predicted Entities + +`ASSET`, `CASHFLOW`, `EXPENSE`, `FREE_CASH_FLOW`, `GAINS`, `KPI`, `LIABILITY`, `LOSSES`, `PROFIT`, `REVENUE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_fe_en_1.0.0_3.0_1716293156004.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_aspect_based_sentiment_fe_en_1.0.0_3.0_1716293156004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model =finance.NerModel.pretrained("finner_aspect_based_sentiment_fe", "en", "finance/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."""] + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++--------+------+ +|chunk |label | ++--------+------+ +|Equity |GAINS | +|earnings|PROFIT| ++--------+------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finner_aspect_based_sentiment_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.6 MB| + +## Benchmarking + +```bash +label precision recall f1-score support +ASSET 0.72 0.63 0.67 132 +CASHFLOW 0.81 0.73 0.77 64 +EXPENSE 0.76 0.85 0.81 315 +FREE_CASH_FLOW 0.93 0.93 0.93 43 +GAINS 0.78 0.81 0.80 161 +KPI 0.73 0.68 0.70 253 +LIABILITY 0.73 0.67 0.70 93 +LOSSES 0.79 0.80 0.80 56 +PROFIT 0.80 0.91 0.85 223 +REVENUE 0.81 0.80 0.80 492 +micro-avg 0.78 0.79 0.78 1832 +macro-avg 0.79 0.78 0.78 1832 +weighted-avg 0.78 0.79 0.78 1832 +``` \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-07-finner_financial_xlarge_fe_en.md b/docs/_posts/gadde5300/2024-06-07-finner_financial_xlarge_fe_en.md new file mode 100644 index 0000000000..2220202c47 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-07-finner_financial_xlarge_fe_en.md @@ -0,0 +1,153 @@ +--- +layout: model +title: Financial NER (xlg, XLarge) +author: John Snow Labs +name: finner_financial_xlarge_fe +date: 2024-06-07 +tags: [broker_reports, earning_calls, sec10k, tensorflow, finance, en, licensed] +task: Named Entity Recognition +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This financial model is an xlg (Xlarge) version, which has been trained with more general labels than other versions such (`md`, `lg`, ...) that are available in the Models Hub. The training corpus used for this model is a combination of Broker Reports, Earning Calls, and 10K filings,was trained using custom finance word embeddings. + +## Predicted Entities + +`AMOUNT`, `ASSET`, `CF`, `CF_DECREASE`, `CF_INCREASE`, `COUNT`, `CURRENCY`, `DATE`, `EXPENSE`, `EXPENSE_DECREASE`, `EXPENSE_INCREASE`, `FCF`, `FISCAL_YEAR`, `KPI`, `KPI_DECREASE`, `KPI_INCREASE`, `LIABILITY`, `LIABILITY_DECREASE`, `LIABILITY_INCREASE`, `ORG`, `PERCENTAGE`, `PROFIT`, `PROFIT_DECLINE`, `PROFIT_INCREASE`, `TICKER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_financial_xlarge_fe_en_1.0.0_3.0_1717749730843.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_financial_xlarge_fe_en_1.0.0_3.0_1717749730843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model =finance.NerModel.pretrained("finner_financial_xlarge_fe", "en", "finance/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ['''We expect Revenue / PAT CAGR of ~ 19 %/~ 22 % over FY2022-FY2024E EPS . Hence , we retain our Buy recommendation on VGIL with an unchanged price target ( PT ) of . This includes $ 1 billion in cash and cash equivalents , $ 2 billion in property and equipment , and $ 2 billion in intangible assets .'''] + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-------------------------+---------------+ +|chunk |label | ++-------------------------+---------------+ +|PAT CAGR |EXPENSE | +|19 |PERCENTAGE | +|22 |PERCENTAGE | +|EPS |PROFIT_INCREASE| +|$ |CURRENCY | +|1 billion |AMOUNT | +|cash and cash equivalents|CF | +|$ |CURRENCY | +|2 billion |AMOUNT | +|$ |CURRENCY | +|2 billion |AMOUNT | ++-------------------------+---------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finner_financial_xlarge_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.8 MB| + +## References + +In-house dataset + +## Benchmarking + +```bash + precision recall f1-score support +AMOUNT 0.87 0.93 0.90 3206 +ASSET 0.00 0.00 0.00 24 +CF 0.67 0.56 0.61 476 +CF_DECREASE 0.64 0.30 0.41 23 +CF_INCREASE 0.61 0.83 0.71 59 +COUNT 0.33 0.36 0.35 11 +CURRENCY 0.89 0.98 0.93 2130 +DATE 0.90 0.93 0.91 1196 +EXPENSE 0.59 0.59 0.59 367 +EXPENSE_DECREASE 0.59 0.63 0.61 73 +EXPENSE_INCREASE 0.83 0.80 0.82 135 +FCF 0.68 0.94 0.79 16 +FISCAL_YEAR 0.88 0.90 0.89 435 +KPI 0.33 0.08 0.12 13 +KPI_DECREASE 0.33 0.25 0.29 4 +KPI_INCREASE 0.00 0.00 0.00 8 +LIABILITY 0.50 0.42 0.46 227 +LIABILITY_DECREASE 1.00 0.20 0.33 5 +LIABILITY_INCREASE 1.00 1.00 1.00 1 +ORG 0.94 0.89 0.91 18 +PERCENTAGE 0.99 0.96 0.97 774 +PROFIT 0.70 0.62 0.66 377 +PROFIT_DECLINE 0.54 0.41 0.47 63 +PROFIT_INCREASE 0.70 0.57 0.62 201 +TICKER 1.00 0.94 0.97 17 +micro-avg 0.85 0.87 0.86 9859 +macro-avg 0.66 0.60 0.61 9859 +weighted-avg 0.84 0.87 0.85 9859 +``` diff --git a/docs/_posts/gadde5300/2024-06-10-finance_bge_base_embeddings_en.md b/docs/_posts/gadde5300/2024-06-10-finance_bge_base_embeddings_en.md new file mode 100644 index 0000000000..a46b479417 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-10-finance_bge_base_embeddings_en.md @@ -0,0 +1,64 @@ +--- +layout: model +title: Finance BGE Embeddings +author: John Snow Labs +name: finance_bge_base_embeddings +date: 2024-06-10 +tags: [bge, embeddings, finance, licensed, en, onnx] +task: Embeddings +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The BGE embedding model was trained on a mix of different datasets. We used public data and in-house annotated documents. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finance_bge_base_embeddings_en_1.0.0_3.0_1718032885018.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finance_bge_base_embeddings_en_1.0.0_3.0_1718032885018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings","en","finance/models")\ + .setInputCols("document")\ + .setOutputCol("embeddings") +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finance_bge_base_embeddings| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[sentence_embeddings]| +|Language:|en| +|Size:|400.6 MB| + +## References + +Public data and in-house annotated documents \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-10-finel_edgar_company_name_fe_en.md b/docs/_posts/gadde5300/2024-06-10-finel_edgar_company_name_fe_en.md new file mode 100644 index 0000000000..a9ce46c867 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-10-finel_edgar_company_name_fe_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Company Name Normalization (Edgar Database) +author: John Snow Labs +name: finel_edgar_company_name_fe +date: 2024-06-10 +tags: [finance, edgar, licensed, en] +task: Entity Resolution +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is an Entity Linking / Entity Resolution model, which allows you to map an extracted Company Name from any NER model, to the name used by SEC in Edgar Database. This can come in handy to afterwards use Edgar Chunk Mappers with the output of this resolution, to carry out data augmentation and retrieve additional information stored in Edgar Database about a company. For more information about data augmentation, check `Chunk Mapping` task in Models Hub. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finel_edgar_company_name_fe_en_1.0.0_3.0_1718020983963.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finel_edgar_company_name_fe_en_1.0.0_3.0_1718020983963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("ner_chunk") + +embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ + .setInputCols("ner_chunk") \ + .setOutputCol("sentence_embeddings") + +resolver = finance.SentenceEntityResolverModel.pretrained("finel_edgar_company_name_fe", "en", "finance/models") \ + .setInputCols(["sentence_embeddings"]) \ + .setOutputCol("normalized")\ + .setDistanceFunction("EUCLIDEAN") + +pipelineModel = nlp.Pipeline( + stages = [ + documentAssembler, + embeddings, + resolver + ]) + +lp = LightPipeline(pipelineModel) + +lp.fullAnnotate("AmeriCann Inc") +``` + +
+ +## Results + +```bash +| chunks | begin | end | code | all_codes | resolutions | all_distances | +|:----------:|:---------:|:-------:|:---------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| 0 | CONTACT GOLD | 0 | 11 | Contact Gold Corp. | [Contact Gold Corp., Contact Minerals Corp., Source Gold Corp., GENERAL GOLD CORP, Gold Alan D, INTERNET GOLD GOLDEN LINES LTD, METALINE CONTACT MINES, GOLD STEPHEN J, AuRico Gold Inc., ISHARES GOLD TRUST, GLOBAL GOLD CORP, Golden Minerals Co, Sprott Physical Gold Trust, FOCUS GOLD Corp, GOLDEN CYCLE GOLD CORP] | [Contact Gold Corp., Contact Minerals Corp., Source Gold Corp., GENERAL GOLD CORP, Gold Alan D, INTERNET GOLD GOLDEN LINES LTD, METALINE CONTACT MINES, GOLD STEPHEN J, AuRico Gold Inc., ISHARES GOLD TRUST, GLOBAL GOLD CORP, Golden Minerals Co, Sprott Physical Gold Trust, FOCUS GOLD Corp, GOLDEN CYCLE GOLD CORP] | [0.0684, 0.3294, 0.3476, 0.3541, 0.3548, 0.3635, 0.3698, 0.3879, 0.3902, 0.3916, 0.3933, 0.3958, 0.3964, 0.3969, 0.3974] | + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finel_edgar_company_name_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[original_company_name]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|false| + +## References + +In-house scrapping and postprocessing of SEC Edgar Database \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-10-finel_nasdaq_company_name_stock_screener_fe_en.md b/docs/_posts/gadde5300/2024-06-10-finel_nasdaq_company_name_stock_screener_fe_en.md new file mode 100644 index 0000000000..7d9921dadd --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-10-finel_nasdaq_company_name_stock_screener_fe_en.md @@ -0,0 +1,124 @@ +--- +layout: model +title: Company Name Normalization using Nasdaq Stock Screener +author: John Snow Labs +name: finel_nasdaq_company_name_stock_screener_fe +date: 2024-06-10 +tags: [nasdaq, company, finance, licensed, en] +task: Entity Resolution +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Financial Entity Resolver model, trained to obtain normalized versions of Company Names, registered in NASDAQ Stock Screener. You can use this model after extracting a company name using any NER, and you will obtain the official name of the company as per NASDAQ Stock Screener. + +After this, you can use `finmapper_nasdaq_company_name_stock_screener` to augment and obtain more information about a company using NASDAQ Stock Screener, including Ticker, Sector, Country, etc. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finel_nasdaq_company_name_stock_screener_fe_en_1.0.0_3.0_1718021800356.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finel_nasdaq_company_name_stock_screener_fe_en_1.0.0_3.0_1718021800356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["document"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +ner_model = finance.NerModel.pretrained("finner_orgs_prods_alias", "en", "finance/models")\ + .setInputCols(["document", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["document","token","ner"])\ + .setOutputCol("ner_chunk") + +chunkToDoc = nlp.Chunk2Doc()\ + .setInputCols("ner_chunk")\ + .setOutputCol("ner_chunk_doc") + +bge_embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ + .setInputCols("ner_chunk_doc") \ + .setOutputCol("sentence_embeddings") + +fe_er_model = finance.SentenceEntityResolverModel.pretrained("finel_nasdaq_company_name_stock_screener_fe", "en", "finance/models") \ + .setInputCols(["sentence_embeddings"]) \ + .setOutputCol("normalized")\ + .setDistanceFunction("EUCLIDEAN") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + tokenizer, + embeddings, + ner_model, + ner_converter, + chunkToDoc, + bge_embeddings, + fe_er_model +]) + +text = """NIKE is an American multinational corporation that is engaged in the design, development, manufacturing, and worldwide marketing and sales of footwear, apparel, equipment, accessories, and services.""" + +test_data = spark.createDataFrame([[text]]).toDF("text") + +model = nlpPipeline.fit(test_data) + +lp = nlp.LightPipeline(model) + +result = lp.annotate(text) + +result["normalized"] +``` + +
+ +## Results + +```bash +['Nike Inc. Common Stock'] +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finel_nasdaq_company_name_stock_screener_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[normalized]| +|Language:|en| +|Size:|115.7 MB| +|Case sensitive:|false| + +## References + +https://www.nasdaq.com/market-activity/stocks/screener \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-11-finel_names2tickers_fe_en.md b/docs/_posts/gadde5300/2024-06-11-finel_names2tickers_fe_en.md new file mode 100644 index 0000000000..829ef38159 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-11-finel_names2tickers_fe_en.md @@ -0,0 +1,89 @@ +--- +layout: model +title: Resolver Company Names to Tickers +author: John Snow Labs +name: finel_names2tickers_fe +date: 2024-06-11 +tags: [finance, companies, ticker, nasdaq, licensed, en] +task: Entity Resolution +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is an Entity Resolution / Entity Linking model, which is able to provide Ticker / Trading Symbols using a Company Name as an input. You can use any NER which extracts Organizations / Companies / Parties to then send the output to this Entity Linking model and get the Ticker / Trading Symbol (given the company has one). + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finel_names2tickers_fe_en_1.0.0_3.0_1718110711125.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finel_names2tickers_fe_en_1.0.0_3.0_1718110711125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("ner_chunk") + +embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ + .setInputCols("ner_chunk") \ + .setOutputCol("sentence_embeddings") + +resolver = finance.SentenceEntityResolverModel.pretrained("finel_names2tickers_fe", "en", "finance/models") \ + .setInputCols(["ner_chunk", "sentence_embeddings"]) \ + .setOutputCol("name")\ + .setDistanceFunction("EUCLIDEAN") + +pipelineModel = nlp.Pipeline( + stages = [ + documentAssembler, + embeddings, + resolver]) + +lp = LightPipeline(pipelineModel) + +lp.fullAnnotate("Tesla") +``` + +
+ +## Results + +```bash +['TSLA'] +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finel_names2tickers_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[normalized]| +|Language:|en| +|Size:|115.6 MB| +|Case sensitive:|false| + +## References + +https://data.world/johnsnowlabs/list-of-companies-in-nasdaq-exchanges \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-12-finel_tickers2names_fe_en.md b/docs/_posts/gadde5300/2024-06-12-finel_tickers2names_fe_en.md new file mode 100644 index 0000000000..a8a67bf381 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-12-finel_tickers2names_fe_en.md @@ -0,0 +1,89 @@ +--- +layout: model +title: Resolve Tickers to Company Names +author: John Snow Labs +name: finel_tickers2names_fe +date: 2024-06-12 +tags: [nasdaq, companies, finance, licensed, en] +task: Entity Resolution +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: SentenceEntityResolverModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is an Entity Resolution / Entity Linking model, which is able to provide Company Names given their Ticker / Trading Symbols. You can use any NER which extracts Tickersto then send the output to this Entity Linking model and get the Company Name. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finel_tickers2names_fe_en_1.0.0_3.0_1718189884813.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finel_tickers2names_fe_en_1.0.0_3.0_1718189884813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("ner_chunk") + +embeddings = nlp.BGEEmbeddings.pretrained("finance_bge_base_embeddings", "en", "finance/models")\ + .setInputCols("ner_chunk") \ + .setOutputCol("sentence_embeddings") + +resolver = finance.SentenceEntityResolverModel.pretrained("finel_tickers2names_fe", "en", "finance/models") \ + .setInputCols(["ner_chunk", "sentence_embeddings"]) \ + .setOutputCol("name")\ + .setDistanceFunction("EUCLIDEAN") + +pipelineModel = nlp.Pipeline( + stages = [ + documentAssembler, + embeddings, + resolver]) + +lp = LightPipeline(pipelineModel) + +lp.fullAnnotate("HP") +``` + +
+ +## Results + +```bash +['HP Inc. Common Stock'] +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finel_tickers2names_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[normalized]| +|Language:|en| +|Size:|115.7 MB| +|Case sensitive:|false| + +## References + +https://data.world/johnsnowlabs/list-of-companies-in-nasdaq-exchanges \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-21-finassertion_aspect_based_sentiment_md_fe_en.md b/docs/_posts/gadde5300/2024-06-21-finassertion_aspect_based_sentiment_md_fe_en.md new file mode 100644 index 0000000000..ce14ddbdd7 --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-21-finassertion_aspect_based_sentiment_md_fe_en.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Financial Assertion of Aspect-Based Sentiment (md, Medium) +author: John Snow Labs +name: finassertion_aspect_based_sentiment_md_fe +date: 2024-06-21 +tags: [assertion, licensed, en, finance] +task: Assertion Status +language: en +edition: Finance NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: AssertionDLModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This assertion model classifies financial entities into an aspect-based sentiment. It is designed to be used together with the associated NER model. + +## Predicted Entities + +`POSITIVE`, `NEGITIVE`, `NEUTRAL` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_fe_en_1.0.0_3.0_1718963493988.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finassertion_aspect_based_sentiment_md_fe_en_1.0.0_3.0_1718963493988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +# Sentence Detector annotator, processes various sentences per line +sentenceDetector = nlp.SentenceDetector()\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +# Tokenizer splits words in a relevant format for NLP +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model =finance.NerModel.pretrained("finner_aspect_based_sentiment_fe", "en", "finance/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = finance.NerConverterInternal()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +assertion_model = finance.AssertionDLModel.pretrained("finassertion_aspect_based_sentiment_md", "en", "finance/models")\ + .setInputCols(["sentence", "ner_chunk", "embeddings"])\ + .setOutputCol("assertion") + + +nlpPipeline = nlp.Pipeline( + stages=[documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter, + assertion_model]) + + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter." + +light_model = nlp.LightPipeline(model) + +light_result = light_model.fullAnnotate(text)[0] + +print(text) + +chunks=[] +entities=[] +status=[] +confidence=[] + +for n,m in zip(light_result['ner_chunk'],light_result['assertion']): + + chunks.append(n.result) + entities.append(n.metadata['entity']) + status.append(m.result) + confidence.append(m.metadata['confidence']) + +df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence}) +``` + +
+ +## Results + +```bash +| chunks | entities | assertion | confidence | +|----------|-----------|-----------|------------| +| 0 | Equity | GAINS | POSITIVE | 0.9463 | +| 1 | earnings | PROFIT | POSITIVE | 0.9144 | + +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finassertion_aspect_based_sentiment_md_fe| +|Compatibility:|Finance NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, chunk, embeddings]| +|Output Labels:|[assertion]| +|Language:|en| +|Size:|1.2 MB| \ No newline at end of file