# Introduction

This notebooks is intended to experiment with NLP Sentiment Analysis Text Classification task.

The primary model would be [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) from **Hugging Face**, which is specialized in financial text. 

In [19]:
# Import Standard Libraries
import flair
from transformers import BertTokenizer
from transformers import AutoModelForSequenceClassification

import torch.nn.functional as F

# Read Data

In [3]:
# Define input sentence
input_text = 'I like you!'
input_text_2 = "I hate it when I'm not learning"

In [4]:
# Investing Subreddit post
input_text_finbert_1 = ("Given the recent downturn in stocks especially in tech which is likely to persist as yields keep going up, "
                        "I thought it would be prudent to share the risks of investing in ARK ETFs, written up very nicely by "
                        "[The Bear Cave](https://thebearcave.substack.com/p/special-edition-will-ark-invest-blow). The risks comes "
                        "primarily from ARK's illiquid and very large holdings in small cap companies. ARK is forced to sell its "
                        "holdings whenever its liquid ETF gets hit with outflows as is especially the case in market downturns. "
                        "This could force very painful liquidations at unfavorable prices and the ensuing crash goes into a "
                        "positive feedback loop leading into a death spiral enticing even more outflows and predatory shorts.")

# Data Preparation

## Tokenization

In [5]:
# Instyance a Bert Tokenizer
# NOTE: Transformers might need the appropriate tokenizer for preparing the input data
bert_tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')

In [6]:
# Tokenize the input through Flair
input_text_tokens = flair.data.Sentence(input_text)
input_text_2_tokens = flair.data.Sentence(input_text_2)

In [7]:
input_text_tokens.to_tokenized_string()

'I like you !'

In [13]:
# Tokenize the Finbert input
tokens_finbert_1 = bert_tokenizer.encode_plus(input_text_finbert_1, 
                                              max_length=512, # Max token length
                                              truncation=True, # If sequence is bigger, drop it
                                              padding='max_length', #
                                              add_special_tokens=True, # CLS = 101, SEP = 102, MASK = 103, UNK = 100, PAD = 0
                                              return_tensors='pt') # PyTorch Tensor

# Model Definition

In [8]:
# Initialise the model "Distillbert"
model_flair = flair.models.TextClassifier.load('en-sentiment')

# Initialise the Finbert model
model_finbert = AutoModelForSequenceClassification.from_pretrained('ProsusAI/finbert')

# Model Inference

In [9]:
# Predict the sentiment
model_flair.predict(input_text_tokens)
model_flair.predict(input_text_2_tokens)

In [12]:
# Extract rating
print('Sentiment: {}'.format(input_text_2_tokens.get_labels()[0].value))
print('Score: {}'.format(input_text_2_tokens.get_labels()[0].score))

Sentiment: NEGATIVE
Score: 0.9991464614868164


In [17]:
# Finbert Inference - Compute the activations
output_finbert_1 = model_finbert(**tokens_finbert_1)

In [18]:
# In order to have probs from the activations, we need to pass them through an acitvation function
output_finbert_1

SequenceClassifierOutput(loss=None, logits=tensor([[-1.8200,  2.4484,  0.0216]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [20]:
# Apply the Softmax function to all the dimensions of the input tensor
F.softmax(output_finbert_1[0], dim=-1)

tensor([[0.0127, 0.9072, 0.0801]], grad_fn=<SoftmaxBackward0>)