Skip to content

Commit

Permalink
Finance NLP 1.14.0 (#286)
Browse files Browse the repository at this point in the history
* Add model 2023-05-24-finclf_bert_twitter_financial_news_sentiment_en (#253)

Co-authored-by: gadde5300 <gadde5300@gmail.com>

* 2023-05-25-fingen_flant5_finetuned_alpaca_en (#262)

* Add model 2023-05-25-fingen_flant5_finetuned_alpaca_en

* Update 2023-05-25-fingen_flant5_finetuned_alpaca_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-05-24-finclf_bert_twitter_financial_text_sentiment_en (#255)

* Add model 2023-05-24-finclf_bert_twitter_financial_text_sentiment_en

* Add model 2023-05-25-finclf_bert_twitter_financial_text_sentiment_lg_en

* Update 2023-05-24-finclf_bert_twitter_financial_text_sentiment_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

* 2023-05-29-fingen_flant5_finetuned_fiqa_en (#277)

* Add model 2023-05-29-fingen_flant5_finetuned_fiqa_en

* Update 2023-05-29-fingen_flant5_finetuned_fiqa_en.md

* Update 2023-05-29-fingen_flant5_finetuned_fiqa_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>
Co-authored-by: Juan Martinez <36634572+josejuanmartinez@users.noreply.github.com>

* Add model 2023-05-29-finqa_flant5_finetuned_en (#281)

Co-authored-by: gadde5300 <gadde5300@gmail.com>

---------

Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>
  • Loading branch information
6 people committed May 30, 2023
1 parent 3b57ca4 commit 9a65615
Show file tree
Hide file tree
Showing 6 changed files with 615 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
layout: model
title: Financial Finetuned FLAN-T5 Text Generation ( Financial Alpaca )
author: John Snow Labs
name: fingen_flant5_finetuned_alpaca
date: 2023-05-25
tags: [en, finance, generation, licensed, flant5, alpaca, tensorflow]
task: Text Generation
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: FinanceTextGenerator
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

The `fingen_flant5_finetuned_alpaca` model is the Text Generation model that has been fine-tuned on FLAN-T5 using Financial Alpaca dataset. FLAN-T5 is a state-of-the-art language model developed by Facebook AI that utilizes the T5 architecture for text-generation tasks.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/fingen_flant5_finetuned_alpaca_en_1.0.0_3.0_1685016665729.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/fingen_flant5_finetuned_alpaca_en_1.0.0_3.0_1685016665729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

flant5 = finance.TextGenerator.pretrained("fingen_flant5_finetuned_alpaca", "en", "finance/models")\
.setInputCols(["document"])\
.setOutputCol("generated")\
.setMaxNewTokens(256)\
.setStopAtEos(True)\
.setDoSample(True)\
.setTopK(3)

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
[1, "What is the US Fair Tax?"]]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("generated.result").show(truncate=False)

```

</div>

## Results

```bash


|result |

|[Fair tax in the US is essentially an income tax. Fair taxes are tax on your income, and are not taxeable in any country. Fair taxes are taxed as income. If you have a net gain or if the loss of income from taxable activities is less then the fair value (the loss) of your gross income (the loss) then you have to file an Income Report. This will give the US government an overview and give you an understanding. If your net income is less that your fair share of your gross income (which you are entitled) you have the right to claim a refund.]|


```
{:.model-param}
## Model Information
{:.table-model}
|---|---|
|Model Name:|fingen_flant5_finetuned_alpaca|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|1.6 GB|
## References
The dataset is available [here](https://huggingface.co/datasets/gbharti/finance-alpaca/viewer/gbharti--finance-alpaca)
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
layout: model
title: Financial Finetuned FLAN-T5 Text Generation (FIQA dataset)
author: John Snow Labs
name: fingen_flant5_finetuned_fiqa
date: 2023-05-29
tags: [en, finance, generation, licensed, flant5, fiqa, tensorflow]
task: Text Generation
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: FinanceTextGenerator
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

The `fingen_flant5_finetuned_fiqa` model is the Text Generation model that has been fine-tuned on FLAN-T5 using FIQA dataset. FLAN-T5 is a state-of-the-art language model developed by Facebook AI that utilizes the T5 architecture for text-generation tasks.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/fingen_flant5_finetuned_fiqa_en_1.0.0_3.0_1685363340017.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/fingen_flant5_finetuned_fiqa_en_1.0.0_3.0_1685363340017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python

document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

flant5 = finance.TextGenerator.pretrained("fingen_flant5_finetuned_fiqa", "en", "finance/models")\
.setInputCols(["document"])\
.setOutputCol("generated")\
.setMaxNewTokens(256)\
.setStopAtEos(True)\
.setDoSample(True)\
.setTopK(3)

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
[1, "How to have a small capital investment in US if I am out of the country?"]]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("generated.result").show(truncate=False)

```

</div>

## Results

```bash


|result |

|[I would suggest a local broker. They have diversified funds that are diversified and have the same fees as the US market. They also offer diversified portfolios that have the lowest risk.]|


```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|fingen_flant5_finetuned_fiqa|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|1.6 GB|

## References

The dataset is available [here](https://huggingface.co/datasets/BeIR/fiqa)
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
layout: model
title: Financial Twitter News Sentiment Analysis
author: John Snow Labs
name: finclf_bert_twitter_financial_news_sentiment
date: 2023-05-24
tags: [en, finance, twitter, news, sentiment, licensed, tensorflow]
task: Sentiment Analysis
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: FinanceBertForSequenceClassification
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This model is designed to perform sentiment analysis on Twitter data, extracting three primary sentiments: `Bullish`, `Bearish`, and `Neutral`.

## Predicted Entities

`Bearish`, `Bullish`, `Neutral`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finclf_bert_twitter_financial_news_sentiment_en_1.0.0_3.0_1684923548358.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finclf_bert_twitter_financial_news_sentiment_en_1.0.0_3.0_1684923548358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')

tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')

sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_bert_twitter_financial_news_sentiment", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")

pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])

data = [["""$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W"""],["""Biogen stock price target raised to $392 from $320 at Instinet"""],["""Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN"""]]

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

example = model.transform(spark.createDataFrame(data).toDF("text"))

example.select("text", "class.result").show(truncate=False)
```

</div>

## Results

```bash
+-------------------------------------------------------------------------------------------------------------------+---------+
|text |result |
+-------------------------------------------------------------------------------------------------------------------+---------+
|$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W|[Bearish]|
|Biogen stock price target raised to $392 from $320 at Instinet |[Bullish]|
|Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN |[Neutral]|
+-------------------------------------------------------------------------------------------------------------------+---------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|finclf_bert_twitter_financial_news_sentiment|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[document, token]|
|Output Labels:|[class]|
|Language:|en|
|Size:|406.4 MB|
|Case sensitive:|true|
|Max sentence length:|512|

## References

In-house annotations on financial reports

## Benchmarking

```bash
label precision recall f1-score support
Bearish 0.80 0.72 0.76 379
Bullish 0.82 0.78 0.80 468
Neutral 0.90 0.94 0.92 1540
accuracy 0.87 2387
macro-avg 0.84 0.81 0.83 2387
weighted-avg 0.87 0.87 0.87 2387

```
Loading

0 comments on commit 9a65615

Please sign in to comment.