Skip to content

Commit

Permalink
Finance NLP 1.12.0 (#179)
Browse files Browse the repository at this point in the history
* 2023-04-21-fingen_flant5_base_en (#142)

* Add model 2023-04-21-fingen_flant5_base_en

* Update 2023-04-21-fingen_flant5_base_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

* Add model 2023-04-26-finner_bert_suspicious_activity_reports_en (#150)

Co-authored-by: gadde5300 <gadde5300@gmail.com>

* 2023-04-27-finpipe_suspicious_activity_reports_en (#164)

* Add model 2023-04-27-finpipe_suspicious_activity_reports_en

* Update 2023-04-27-finpipe_suspicious_activity_reports_en.md

* Update 2023-04-27-finpipe_suspicious_activity_reports_en.md

* Update 2023-04-27-finpipe_suspicious_activity_reports_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

* 2023-04-28-fingen_flant5_finetuned_sec10k_en (#173)

* Add model 2023-04-28-fingen_flant5_finetuned_sec10k_en

* Update 2023-04-28-fingen_flant5_finetuned_sec10k_en.md

* Update 2023-04-28-fingen_flant5_finetuned_sec10k_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

---------

Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>
  • Loading branch information
4 people committed May 1, 2023
1 parent 72d2bd9 commit c7c06e6
Show file tree
Hide file tree
Showing 4 changed files with 491 additions and 0 deletions.
85 changes: 85 additions & 0 deletions docs/_posts/gadde5300/2023-04-21-fingen_flant5_base_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
layout: model
title: Financial FLAN-T5 Text Generation (Base)
author: John Snow Labs
name: fingen_flant5_base
date: 2023-04-21
tags: [en, licensed, generation, flan_t5, finance, tensorflow]
task: Text Generation
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: FinanceTextGenerator
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

FLAN-T5 is an enhanced version of the original T5 model and is designed to produce better quality and more coherent text generation. It is trained on a large dataset of diverse texts and can generate high-quality summaries of articles, documents, and other text-based inputs. The model can also be utilized to generate financial text.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/fingen_flant5_base_en_1.0.0_3.0_1682073956957.zip){:.button.button-orange}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/fingen_flant5_base_en_1.0.0_3.0_1682073956957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("question")

flant5 = finance.TextGenerator.pretrained('fingen_flant5_base','en','finance/models')\
.setInputCols(["question"])\
.setOutputCol("summary")
.setMaxNewTokens(150)\
.setStopAtEos(True)


pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
[1, "Explain what is Sec 10-k filing "]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)
```

</div>

## Results

```bash
+--------------------------------------------------------------------------------------------------------------------+
|result |
+--------------------------------------------------------------------------------------------------------------------+
|[Sec 10k filing is a form of tax filing that requires a party to file jointly or several entities for tax purposes.]|
+--------------------------------------------------------------------------------------------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|fingen_flant5_base|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|920.9 MB|
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
---
layout: model
title: Financial Suspicious Activity Reports NER
author: John Snow Labs
name: finner_bert_suspicious_activity_reports
date: 2023-04-26
tags: [finance, suspicious_activity_reports, en, bert, licensed, tensorflow]
task: Named Entity Recognition
language: en
edition: Finance NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: FinanceBertForTokenClassification
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is a Financial BertForTokenClassification NER model aimed to extract entities from suspicious activity reports that are filed by financial institutions, and those associated with their business, with the Financial Crimes Enforcement Network.

## Predicted Entities

`SUSPICIOUS_ITEMS`, `PERSON_NAME`, `SUSPICIOUS_ACTION`, `SUSPICIOUS_KEYWORD`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_bert_suspicious_activity_reports_en_1.0.0_3.0_1682502028225.zip){:.button.button-orange}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/finance/models/finner_bert_suspicious_activity_reports_en_1.0.0_3.0_1682502028225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")

tokenClassifier = finance.BertForTokenClassification.pretrained("finner_bert_suspicious_activity_reports", "en", "finance/models")\
.setInputCols("token", "document")\
.setOutputCol("label")\
.setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
.setInputCols(["document","token","label"])\
.setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
documentAssembler,
tokenizer,
tokenClassifier,
ner_converter
]
)

import pandas as pd

p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))


text = """SUSPICIOUS ACTIVITY REPORT
Date: [Today's Date]
To: [Financial Institution's Compliance Department]
Subject: Suspicious Activity Related to Business Loan
Account Holder Information:
Name: [Name of Business]
Address: [Business Address]
Account Number: [Business Account Number]
Description of Activity:
On [Date], [Name of Business] submitted a loan application for a substantial amount of money. The loan officer reviewing the application noticed several indications of possible suspicious activity."""

res = p_model.transform(spark.createDataFrame([[text]]).toDF("text"))

result_df = res.select(F.explode(F.arrays_zip(res.token.result,res.label.result, res.label.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),
F.expr("cols['1']").alias("label"),
F.expr("cols['2']['confidence']").alias("confidence"))

result_df.show(100, truncate=100)
```

</div>

## Results

```bash
+-------------+--------------------+
|chunk |entity |
+-------------+--------------------+
|SUSPICIOUS |B-SUSPICIOUS_KEYWORD|
|ACTIVITY |O |
|REPORT |O |
|Date |O |
|: |O |
|[Today's |O |
|Date] |O |
|To |O |
|: |O |
|[Financial |O |
|Institution's|O |
|Compliance |O |
|Department] |O |
|Subject |O |
|: |O |
|Suspicious |B-SUSPICIOUS_KEYWORD|
|Activity |O |
|Related |O |
|to |O |
|Business |B-SUSPICIOUS_ACTION |
|Loan |I-SUSPICIOUS_ACTION |
|Account |O |
|Holder |O |
|Information |O |
|: |O |
|Name |O |
|: |O |
|[Name |O |
|of |O |
|Business] |O |
|Address |O |
|: |O |
|[Business |O |
|Address] |O |
|Account |O |
|Number |O |
|: |O |
|[Business |O |
|Account |O |
|Number] |O |
|Description |O |
|of |O |
|Activity |O |
|: |O |
|On |O |
|[Date] |O |
|, |O |
|[Name |O |
|of |O |
|Business] |O |
|submitted |O |
|a |O |
|loan |B-SUSPICIOUS_ACTION |
|application |I-SUSPICIOUS_ACTION |
|for |O |
|a |O |
|substantial |O |
|amount |O |
|of |O |
|money |O |
|. |O |
|The |O |
|loan |O |
|officer |O |
|reviewing |O |
|the |O |
|application |O |
|noticed |O |
|several |O |
|indications |O |
|of |O |
|possible |O |
|suspicious |B-SUSPICIOUS_KEYWORD|
|activity |O |
|. |O |
+-------------+--------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|finner_bert_suspicious_activity_reports|
|Compatibility:|Finance NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|404.2 MB|
|Case sensitive:|true|
|Max sentence length:|128|

## References

In house annotated data

## Benchmarking

```bash

label precision recall f1-score support

B-SUSPICIOUS_ITEMS 0.75 0.84 0.79 1079
B-PERSON_NAME 0.97 0.97 0.97 88
I-PERSON_NAME 0.98 0.99 0.98 171
B-SUSPICIOUS_ACTION 0.91 0.87 0.89 752
I-SUSPICIOUS_ACTION 0.93 0.91 0.92 814
B-SUSPICIOUS_KEYWORD 0.91 0.97 0.94 1528
I-SUSPICIOUS_ITEMS 0.77 0.84 0.81 659
micro-avg 0.86 0.90 0.88 5091
macro-avg 0.89 0.91 0.90 5091
weighted-avg 0.86 0.90 0.88 5091

```
Loading

0 comments on commit c7c06e6

Please sign in to comment.