# Sentiment Analysis with Flair

## Install Flair

In [None]:
# !pip install flair

## Connect to the google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from IPython.core.display import display, HTML
display(HTML('<style> .container {width:90% !important} </style>'))

import warnings
warnings.filterwarnings('ignore')

import flair
import pandas as pd
import numpy as np
import copy
import re

# Import the data
    - Data has been taken from Analytics Vidya competition.
    - This data has target variables. Since we are not building a model, we don't need these target variables.
    - But we can use these target variables to validate how good flair is in sentiment analysis.

    - Data can be obtained from competition: https://datahack.analyticsvidhya.com/contest/linguipedia-codefest-natural-language-processing-1/#ProblemStatement

In [None]:
df = pd.read_csv('/content/drive/MyDrive/Flair Sentiment Analysis/dataset/train.csv')
print('Shape of the dataframe:', df.shape)
print('Columns:', df.columns)

Shape of the dataframe: (7920, 3)
Columns: Index(['id', 'label', 'tweet'], dtype='object')


In [None]:
print('Head of the dataframe:')
df.head()

Head of the dataframe:


Unnamed: 0,id,label,tweet
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...
1,2,0,Finally a transparant silicon case ^^ Thanks t...
2,3,0,We love this! Would you go? #talk #makememorie...
3,4,0,I'm wired I know I'm George I was made that wa...
4,5,1,What amazing service! Apple won't even talk to...


# Under label:
    - 0 - Negative
    - 1 - Positive

In [None]:
print('Checking if any null values are there or not...')
df.info()

Checking if any null values are there or not...
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7920 entries, 0 to 7919
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      7920 non-null   int64 
 1   label   7920 non-null   int64 
 2   tweet   7920 non-null   object
dtypes: int64(2), object(1)
memory usage: 185.8+ KB


### Observation
    - No null values. Good to go.

### Note: I am not doing any deep analysis or deep cleaning. Just a basic analysis like shape, null value counts and cleaning like removing punctuations. Then we will directly apply flair model.

In [None]:
# removing punctuations and numbers
df['clean_tweet'] = df['tweet'].apply(lambda x: re.sub('[^a-zA-Z]', ' ', x))

# remove additional spaces and strip
df['clean_tweet'] = df['clean_tweet'].apply(lambda x: re.sub(' +', ' ', x))

### Initialize the flair model

In [None]:
flair_model = flair.models.TextClassifier.load('en-sentiment')
print('Flair Model Loaded...')

2021-11-24 11:11:32,520 https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-distilbert/sentiment-en-mix-distillbert_4.pt not found in cache, downloading to /tmp/tmpz3ghyqzy


100%|██████████| 265512723/265512723 [00:30<00:00, 8701741.06B/s] 

2021-11-24 11:12:03,544 copying /tmp/tmpz3ghyqzy to cache at /root/.flair/models/sentiment-en-mix-distillbert_4.pt





2021-11-24 11:12:04,046 removing temp file /tmp/tmpz3ghyqzy
2021-11-24 11:12:04,083 loading file /root/.flair/models/sentiment-en-mix-distillbert_4.pt


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Flair Model Loaded...


### Writing a function that will get the sentiment label and score for each tweet

In [None]:
def sentiment_analysis(tweet_col, flair_model):

    # get the tokens
    tweet_tokens = flair.data.Sentence(tweet_col)

    # predict the token
    # Prediction is stored in the input tokens.
    # We get sentence, number of tokens and sentence labels having label and corresponding score
    flair_model.predict(tweet_tokens)

    label = tweet_tokens.get_labels()[0].value
    score = tweet_tokens.get_labels()[0].score

    return (label, score)

## Calling sentiment_analysis

In [None]:
%%time

df['labels_scores'] = df.apply(lambda x: sentiment_analysis(x['clean_tweet'], flair_model), axis=1)

CPU times: user 8min 31s, sys: 2.19 s, total: 8min 34s
Wall time: 8min 32s


In [None]:
df.head()

Unnamed: 0,id,label,tweet,clean_tweet,labels_scores
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...,fingerprint Pregnancy Test https goo gl h MfQ...,"(POSITIVE, 0.5724270939826965)"
1,2,0,Finally a transparant silicon case ^^ Thanks t...,Finally a transparant silicon case Thanks to m...,"(POSITIVE, 0.9993873834609985)"
2,3,0,We love this! Would you go? #talk #makememorie...,We love this Would you go talk makememories un...,"(POSITIVE, 0.9769493937492371)"
3,4,0,I'm wired I know I'm George I was made that wa...,I m wired I know I m George I was made that wa...,"(POSITIVE, 0.7664701342582703)"
4,5,1,What amazing service! Apple won't even talk to...,What amazing service Apple won t even talk to ...,"(POSITIVE, 0.5792829394340515)"


In [None]:
# separating labels and scores
df['predicted_label_name'] = df['labels_scores'].apply(lambda x: x[0].title())
df['predicted_scores'] = df['labels_scores'].apply(lambda x: np.round(x[1],4))

In [None]:
# converting Postive to 0 and Negative to 1
label_dic = {'Positive':0, 'Negative':1}
df['predicted_label_value'] = df['predicted_label_name'].map(label_dic)

df.head()

Unnamed: 0,id,label,tweet,clean_tweet,labels_scores,predicted_label_name,predicted_scores,predicted_label_value
0,1,0,#fingerprint #Pregnancy Test https://goo.gl/h1...,fingerprint Pregnancy Test https goo gl h MfQ...,"(POSITIVE, 0.5724270939826965)",Positive,0.5724,0
1,2,0,Finally a transparant silicon case ^^ Thanks t...,Finally a transparant silicon case Thanks to m...,"(POSITIVE, 0.9993873834609985)",Positive,0.9994,0
2,3,0,We love this! Would you go? #talk #makememorie...,We love this Would you go talk makememories un...,"(POSITIVE, 0.9769493937492371)",Positive,0.9769,0
3,4,0,I'm wired I know I'm George I was made that wa...,I m wired I know I m George I was made that wa...,"(POSITIVE, 0.7664701342582703)",Positive,0.7665,0
4,5,1,What amazing service! Apple won't even talk to...,What amazing service Apple won t even talk to ...,"(POSITIVE, 0.5792829394340515)",Positive,0.5793,0


## Validating predicted label with the actual label with weighted f1 score

In [None]:
from sklearn.metrics import f1_score

print('Weighted F1 SCore:', f1_score(df['label'], df['predicted_label_value'], average='weighted'))

Weighted F1 SCore: 0.691450951281932


## Conclusion
    - Score around 0.7 ie 70 is not bad. Neither great too.
    - Without training, with just simple 3 steps, we are able to get the sentiments of the text.