# Background

The objective of this project is to classify the overall sentiment of a tweet's context as neutral, negative, or positive using NLP classifiers. To complete, this project, we are given a dataset of 27,481 tweets, where 22,464 of those tweets were captured as having either a neutral, negative, or positive sentiment. Our goal is to use this training data of ~27.5k tweets to predict the sentiment of the 3,534 tweets in our testing data set.

# Objective
Run Flair on testing set and see how accurate it is

## Import Libraries

In [1]:
import re
import numpy as np
import pandas as pd
from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')
from flair.data import Sentence
from sklearn.metrics import accuracy_score
from flair.models import SequenceTagger

  from .autonotebook import tqdm as notebook_tqdm


2022-06-07 17:11:08,193 loading file C:\Users\valmh\.flair\models\sentiment-en-mix-distillbert_4.pt


## Read Testing Dataset

In [2]:
test = pd.read_csv("C:/Users/valmh/Documents/GitHub/ENTITY-Final-Project/Data/test.csv")
test.head()

Unnamed: 0,textID,text,sentiment
0,f87dea47db,Last session of the day http://twitpic.com/67ezh,neutral
1,96d74cb729,Shanghai is also really exciting (precisely -...,positive
2,eee518ae67,"Recession hit Veronique Branquinho, she has to...",negative
3,01082688c6,happy bday!,positive
4,33987a8ee5,http://twitpic.com/4w75p - I like it!!,positive


In [3]:
def removepunct(text):
    text = re.sub(r'[^\w\s]', '', text)
    return text

In [4]:
test = test[['text', 'sentiment']]
test['text'] = test.text.astype(str).str.lower()
test['text_clean'] = test['text'].apply(removepunct)
test.head()

Unnamed: 0,text,sentiment,text_clean
0,last session of the day http://twitpic.com/67ezh,neutral,last session of the day httptwitpiccom67ezh
1,shanghai is also really exciting (precisely -...,positive,shanghai is also really exciting precisely s...
2,"recession hit veronique branquinho, she has to...",negative,recession hit veronique branquinho she has to ...
3,happy bday!,positive,happy bday
4,http://twitpic.com/4w75p - i like it!!,positive,httptwitpiccom4w75p i like it


In [5]:
test.isna().sum()

text          0
sentiment     0
text_clean    0
dtype: int64

In [6]:
test = test.reset_index()

In [7]:
test.dropna(inplace=True)

In [8]:
# Example text
text = 'GrabNGoInfo.com is a great machine learning tutorial website.'
# Flair tokenization
sentence = Sentence(text)
sentence

Sentence: "GrabNGoInfo.com is a great machine learning tutorial website ."

In [9]:
test.head()

Unnamed: 0,index,text,sentiment,text_clean
0,0,last session of the day http://twitpic.com/67ezh,neutral,last session of the day httptwitpiccom67ezh
1,1,shanghai is also really exciting (precisely -...,positive,shanghai is also really exciting precisely s...
2,2,"recession hit veronique branquinho, she has to...",negative,recession hit veronique branquinho she has to ...
3,3,happy bday!,positive,happy bday
4,4,http://twitpic.com/4w75p - i like it!!,positive,httptwitpiccom4w75p i like it


In [10]:
# Flair sentiment prediction
classifier.predict(sentence)
sentence

Sentence: "GrabNGoInfo.com is a great machine learning tutorial website ." → POSITIVE (0.9895)

In [11]:
# Extract sentiment prediction score
print(f'Flair classified the review as {sentence.labels[0].value} with the score of {sentence.labels[0].score:.2f}')

Flair classified the review as POSITIVE with the score of 0.99


In [12]:
# Define a function to get Flair sentiment prediction score
def score_flair(text):
  sentence = Sentence(text)
  classifier.predict(sentence)
  score = sentence.labels[0].score
  value = sentence.labels[0].value
  return score, value

# Get sentiment score for each review
test['scores_flair'] = test['text_clean'].apply(lambda s: score_flair(s)[0])

# Predict sentiment label for each review
test['pred_flair'] = test['text_clean'].apply(lambda s: score_flair(s)[1])

# Check the distribution of the score
test['scores_flair'].describe()

count    3534.000000
mean        0.931787
std         0.116108
min         0.500121
25%         0.925629
50%         0.990121
75%         0.998368
max         0.999999
Name: scores_flair, dtype: float64

In [13]:
# Check the counts of labels
test['pred_flair'].value_counts()

NEGATIVE    1768
POSITIVE    1766
Name: pred_flair, dtype: int64

In [14]:
test.head()

Unnamed: 0,index,text,sentiment,text_clean,scores_flair,pred_flair
0,0,last session of the day http://twitpic.com/67ezh,neutral,last session of the day httptwitpiccom67ezh,0.999538,NEGATIVE
1,1,shanghai is also really exciting (precisely -...,positive,shanghai is also really exciting precisely s...,0.997785,POSITIVE
2,2,"recession hit veronique branquinho, she has to...",negative,recession hit veronique branquinho she has to ...,0.999957,NEGATIVE
3,3,happy bday!,positive,happy bday,0.995238,POSITIVE
4,4,http://twitpic.com/4w75p - i like it!!,positive,httptwitpiccom4w75p i like it,0.958464,POSITIVE


In [15]:
# Change the label of flair prediction to 0 if negative and 1 if positive
mapping = {'NEGATIVE': 0, 'POSITIVE': 1}
test['pred_flair'] = test['pred_flair'].map(mapping)

In [16]:
# Check counts
test['pred_flair'].value_counts()

0    1768
1    1766
Name: pred_flair, dtype: int64

In [17]:
# Compare Actual and Predicted
accuracy_score(test['sentiment'],test['pred_flair'])

0.0

In [18]:
test.head()

Unnamed: 0,index,text,sentiment,text_clean,scores_flair,pred_flair
0,0,last session of the day http://twitpic.com/67ezh,neutral,last session of the day httptwitpiccom67ezh,0.999538,0
1,1,shanghai is also really exciting (precisely -...,positive,shanghai is also really exciting precisely s...,0.997785,1
2,2,"recession hit veronique branquinho, she has to...",negative,recession hit veronique branquinho she has to ...,0.999957,0
3,3,happy bday!,positive,happy bday,0.995238,1
4,4,http://twitpic.com/4w75p - i like it!!,positive,httptwitpiccom4w75p i like it,0.958464,1
