We have fine-tuned FinBERT pretrained model on several financial NLP tasks, all outperforming traditional machine learning models, deep learning models, and fine-tuned BERT models.

All the fine-tuned FinBERT models are publicly hosted at Huggingface 🤗. Specifically, we have the following:
- **FinBERT-Sentiment**: for sentiment classification task
- **FinBERT-ESG**: for ESG classification task
- **FinBERT-FLS**: for forward-looking statement (FLS) classification task

*Note: the following code is for demonstration purpose. Please use GPU for fast inference on large scale dataset.*

In [4]:
from transformers import BertTokenizer, BertForSequenceClassification, pipeline

In [5]:
# tested in transformers==4.18.0 
import transformers
transformers.__version__

'4.32.1'

## Sentiment Analysis
Analyzing financial text sentiment is valuable as it can engage the views and opinions of managers, information intermediaries and investors. FinBERT-Sentiment is a FinBERT model fine-tuned on 10,000 manually annotated sentences from analyst reports of S&P 500 firms.

**Input**: A financial text.

**Output**: Positive, Neutral or Negative.

In [6]:
finbert = BertForSequenceClassification.from_pretrained('Data/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

ImportError: 
BertForSequenceClassification requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.


In [4]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(['growth is strong and we have plenty of liquidity.', 
               'there is a shortage of capital, and we need extra financing.',
              'formulation patents might protect Vasotec to a limited extent.'])

In [5]:
results

[{'label': 'Positive', 'score': 1.0},
 {'label': 'Negative', 'score': 0.9952379465103149},
 {'label': 'Neutral', 'score': 0.9979718327522278}]

## ESG-Classification
ESG analysis can help investors determine a business' long-term sustainability and identify associated risks. FinBERT-ESG is a FinBERT model fine-tuned on 2,000 manually annotated sentences from firms' ESG reports and annual reports.

**Input**: A financial text.

**Output**: Environmental, Social, Governance or None.

In [6]:
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-esg',num_labels=4)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-esg')

In [7]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(['Managing and working to mitigate the impact our operations have on the environment is a core element of our business.',
               'Rhonda has been volunteering for several years for a variety of charitable community programs.',
               'Cabot\'s annual statements are audited annually by an independent registered public accounting firm.',
               'As of December 31, 2012, the 2011 Term Loan had a principal balance of $492.5 million.'])

In [8]:
results

[{'label': 'Environmental', 'score': 0.9805498719215393},
 {'label': 'Social', 'score': 0.9906041026115417},
 {'label': 'Governance', 'score': 0.6738429069519043},
 {'label': 'None', 'score': 0.9960240125656128}]

## FLS-Classification
Forward-looking statements (FLS) inform investors of managers’ beliefs and opinions about firm's future events or results. Identifying forward-looking statements from corporate reports can assist investors in financial analysis. FinBERT-FLS is a FinBERT model fine-tuned on 3,500 manually annotated sentences from Management Discussion and Analysis section of annual reports of Russell 3000 firms.

**Input**: A financial text.

**Output**: Specific-FLS , Non-specific FLS, or Not-FLS.

In [9]:
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-fls',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-fls')

In [10]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(['we expect the age of our fleet to enhance availability and reliability due to reduced downtime for repairs.',
               'on an equivalent unit of production basis, general and administrative expenses declined 24 percent from 1994 to $.67 per boe.',
               'we will continue to assess the need for a valuation allowance against deferred tax assets considering all available evidence obtained in future reporting periods.'])

In [11]:
results

[{'label': 'Specific FLS', 'score': 0.77278733253479},
 {'label': 'Not FLS', 'score': 0.9905241131782532},
 {'label': 'Non-specific FLS', 'score': 0.975904107093811}]