## **Indroduction**
A comparative analysis on ML approaches to Emotion Recognition in Twitter reviews

### **Team**
Machine Learning Mini Project - Semester VI

Third Year Computer Engineering - B2
1. Devang Shah - 60004200158
2. Ayush Parikh - 60004200162
3. Yash Shah - 60004200146


### **Base paper**
Comparison of Machine Learning Techniques on Twitter Emotions Classification: https://link.springer.com/article/10.1007/s42979-021-00889-x


### **Dataset**
The dataset used consists of 10 thousand Twitter user reviews (input text) and Emotions (output labels).

The features present in the dataset are: Sl no, Tweets, Search key, Feeling.

Link: https://www.kaggle.com/datasets/shainy/twitter-reviews-for-emotion-analysis


### **Emotion**
6 types of emotions were considered in this study. These include:
1. Happy
2. Sad
3. Fear
4. Anger
5. Disgust
6. Surprise


### **Models**
5 different model were used for identifying emotions in tweets:
1. Decision Tree Classifier
2. Logistic Regression
3. Support Vector Machine
4. k-Nearest Neighbors
5. Naive Bayes

# **Importing Data and Libraries**

In [None]:
%%capture
import numpy as np 
import pandas as pd 
import nltk
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
pd.set_option('max_colwidth', -1)

# **Learning about the data**

In [None]:
dataset=pd.read_csv("//kaggle/input/twitter-reviews-for-emotion-analysis/data.csv")
display(dataset.tail())

Let's take a quick look at the emotion dataset that is imported.

# **Dataset Summary**

In [None]:
dataset.describe(include='all')
dataset['length'] = dataset['Tweets'].apply(len)
dataset.head()
graph = sns.FacetGrid(data=dataset,col='Feeling')
graph.map(plt.hist,'length',bins=50,color='Purple')

 **GETTING THE MEAN VALUES OF THE VOTE COLUMNS WRT THE STARS ON THE REVIEW**

In [None]:
val = dataset.groupby('Feeling').mean()
val


** FINDING THE CORRELATION BETWEEN THE VOTE COLUMNS**

In [None]:
val.corr()

In [None]:
dataset.Feeling.value_counts()


**PLOT THE DATASET**

In [None]:
Sentiment_val=dataset.groupby('Feeling').count()
plt.bar(Sentiment_val.index.values, Sentiment_val['Tweets'])
plt.xlabel('Review Sentiments')
plt.ylabel('Number of Review')
plt.show()

**Tokenizer to remove unwanted elements from out data like symbols and numbers**

In [None]:
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(lowercase=True,stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize)
text_counts= cv.fit_transform(dataset['Tweets'])
tf=TfidfVectorizer()
text_tf= tf.fit_transform(dataset['Tweets'])

# **Train and Test data**

In [None]:
x=text_tf
# y=dataset['Emotion'].astype('int')
y=dataset['Feeling']
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.3, random_state=1)

# **Implementing Decision Tree Classifier**

In [None]:
dt = DecisionTreeClassifier(random_state=42)
dt.fit(x_train,y_train)
preddt = dt.predict(x_test)
print("Confusion Matrix for Decision Tree:")
print(confusion_matrix(y_test,preddt))
dtt_score = round(accuracy_score(y_test,preddt)*100,2)
print("Score:",dtt_score)
print("Classification Report:")
print(classification_report(y_test,preddt))

# **Implementing Logistic Regression**

In [None]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=42)
lr.fit(x_train,y_train)
lr_preddt = lr.predict(x_test)
print("Confusion Matrix for Logistic regression:")
print(confusion_matrix(y_test,lr_preddt))

logi_score = round(accuracy_score(y_test,lr_preddt)*100,2)
print("Score:",logi_score)

print("Classification Report:")
print(classification_report(y_test,lr_preddt))

# **Implementing k Nearest Neighbors**

In [None]:
from sklearn.neighbors import KNeighborsClassifier

lir = KNeighborsClassifier(n_neighbors=6,algorithm='brute')
lir.fit(x_train,y_train)
lir_preddt = lir.predict(x_test)
print("Confusion Matrix for kNN:")
print(confusion_matrix(y_test,lir_preddt))

knn_score = round(accuracy_score(y_test,lir_preddt)*100,2)
print("Score:",knn_score)

print("Classification Report:")
print(classification_report(y_test,lir_preddt))

# **Implementing SVM**

In [None]:
from sklearn.svm import SVC
clf = SVC()
clf.fit(x_train,y_train)
pred2 = clf.predict(x_test)
print(confusion_matrix(y_test,pred2))
svm_score=clf.score(x_test,y_test)*100
print("Score:",svm_score) 
print("Classification Report:")
print(classification_report(y_test,pred2))

# **Implementing Naive Bayes**

In [None]:
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(x_train.toarray(),y_train)
pred3 = nb.predict(x_test.toarray())
print(confusion_matrix(y_test,pred3))
naiveBayes_score=nb.score(x_test.toarray(),y_test)*100
print("Score:",naiveBayes_score)
print("Classification Report:")
print(classification_report(y_test,pred3))

# **Model Analysis**

In [None]:
scores=[dtt_score,logi_score,knn_score,svm_score,naiveBayes_score]
title=["DecisionTree","Logistic","KNN","SVM","NaiveBayes"]
plt.bar(title,scores, color ='red')
 
plt.xlabel("Classifiers")
plt.ylabel("Accuracy Scores")

plt.show()