<a href="https://colab.research.google.com/github/WafaSanaa/Computer_Vision/blob/main/Emotion_classify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer   #Convertir le texte en vecteurs
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier    #Entraîner un modèle de classification
from sklearn.metrics import accuracy_score, classification_report   #Tester sa précision

import spacy

**Exploratory Data Analysis EDA**

In [4]:
data = pd.read_csv("/content/drive/MyDrive/Emotion_classify_Data.csv")

In [5]:
print(data.shape)          # nombre de lignes et colonnes
data.head(5)

(5937, 2)


Unnamed: 0,Comment,Emotion
0,i seriously hate one subject to death but now ...,fear
1,im so full of life i feel appalled,anger
2,i sit here to write i start to dig out my feel...,fear
3,ive been really angry with r and i feel like a...,joy
4,i feel suspicious if there is no one outside l...,fear


In [6]:
print(data.isnull().sum())  # combien de valeurs manquantes par colonne

Comment    0
Emotion    0
dtype: int64


In [7]:
data['Emotion'].value_counts() #check the distribution of Emotion

Unnamed: 0_level_0,count
Emotion,Unnamed: 1_level_1
anger,2000
joy,2000
fear,1937


In [8]:
print(f"{data['Comment'][0]} -> {data['Emotion'][0]}") #Show sample

i seriously hate one subject to death but now i feel reluctant to drop it -> fear


In [9]:
print(f"{data['Comment'][2]} -> {data['Emotion'][2]}") #Show sample

i sit here to write i start to dig out my feelings and i think that i am afraid to accept the possibility that he might not make it -> fear


In [10]:
print(f"{data['Comment'][5]} -> {data['Emotion'][5]}") #Show sample

i feel jealous becasue i wanted that kind of love the true connection between two souls and i wanted that -> anger


In [11]:
print(f"{data['Comment'][50]} -> {data['Emotion'][50]}") #Show sample

i feel like i have been a bit obnoxious in my picture posting -> anger


In [12]:
pip install spacy



In [13]:
nlp = spacy.load("en_core_web_sm") # load english language model and create nlp object from it


In [14]:
txt = data['Comment'][3]
txt

'ive been really angry with r and i feel like an idiot for trusting him in the first place'

**Sentence Tokenization**

In [15]:
#Tokenization
doc = nlp(txt)

**Word Tokenization**

In [16]:
for token in doc:
    print(token.text)

i
ve
been
really
angry
with
r
and
i
feel
like
an
idiot
for
trusting
him
in
the
first
place


In [17]:
tokens = [token.text for token in doc if not token.is_punct and not token.is_stop]
print(tokens)

['ve', 'angry', 'r', 'feel', 'like', 'idiot', 'trusting', 'place']


**Stemming and Lemmatization**

In [18]:
for token in doc:
  print(f"Word: {token} | -> {token.lemma_}")

Word: i | -> I
Word: ve | -> ve
Word: been | -> be
Word: really | -> really
Word: angry | -> angry
Word: with | -> with
Word: r | -> r
Word: and | -> and
Word: i | -> I
Word: feel | -> feel
Word: like | -> like
Word: an | -> an
Word: idiot | -> idiot
Word: for | -> for
Word: trusting | -> trust
Word: him | -> he
Word: in | -> in
Word: the | -> the
Word: first | -> first
Word: place | -> place


**Stop Words**

In [19]:
for token in doc:
  if token.is_stop or token.is_punct:
    print(token)

i
been
really
with
and
i
an
for
him
in
the
first


**Preprocess Function**

In [20]:
#use this utility function to get the preprocessed text data
def preprocess(text):
  #remove stop words and lemmatize the text
  doc = nlp(text)
  filtered_tokens = []
  for token in doc:
    if token.is_stop or token .is_punct:
      continue
    filtered_tokens.append(token.lemma_)

  return " ".join(filtered_tokens)

In [21]:
print(txt)
procces_txt = preprocess(txt)
print(procces_txt)

ive been really angry with r and i feel like an idiot for trusting him in the first place
ve angry r feel like idiot trust place


**Apply preprocess function on dataframe**

In [22]:
data['preprocessed_comment']= data['Comment'].apply(preprocess)

In [23]:
print(data[['Comment', 'preprocessed_comment']].head())

                                             Comment  \
0  i seriously hate one subject to death but now ...   
1                 im so full of life i feel appalled   
2  i sit here to write i start to dig out my feel...   
3  ive been really angry with r and i feel like a...   
4  i feel suspicious if there is no one outside l...   

                                preprocessed_comment  
0   seriously hate subject death feel reluctant drop  
1                               m life feel appalled  
2  sit write start dig feeling think afraid accep...  
3             ve angry r feel like idiot trust place  
4        feel suspicious outside like rapture happen  


In [24]:
preprocessed_list = data['preprocessed_comment'].tolist()
print(preprocessed_list[:10]) # Pour voir les 10 premiers

['seriously hate subject death feel reluctant drop', 'm life feel appalled', 'sit write start dig feeling think afraid accept possibility', 've angry r feel like idiot trust place', 'feel suspicious outside like rapture happen', 'feel jealous becasue want kind love true connection soul want', 'friend keep tell morbid thing happen dog', 'finally fall asleep feeling angry useless anxiety', 'feel bit annoyed antsy good way', 'feel like ve regain vital life live']


In [25]:
data

Unnamed: 0,Comment,Emotion,preprocessed_comment
0,i seriously hate one subject to death but now ...,fear,seriously hate subject death feel reluctant drop
1,im so full of life i feel appalled,anger,m life feel appalled
2,i sit here to write i start to dig out my feel...,fear,sit write start dig feeling think afraid accep...
3,ive been really angry with r and i feel like a...,joy,ve angry r feel like idiot trust place
4,i feel suspicious if there is no one outside l...,fear,feel suspicious outside like rapture happen
...,...,...,...
5932,i begun to feel distressed for you,fear,begin feel distressed
5933,i left feeling annoyed and angry thinking that...,anger,leave feel annoyed angry thinking center stupi...
5934,i were to ever get married i d have everything...,joy,marry d ready offer ve get club perfect good l...
5935,i feel reluctant in applying there because i w...,fear,feel reluctant apply want able find company kn...


**Encoding target columns**

In [26]:
data['Emotion_num'] = data['Emotion'].map({'joy':0, 'fear':1, 'anger':2})
data.head(5)

Unnamed: 0,Comment,Emotion,preprocessed_comment,Emotion_num
0,i seriously hate one subject to death but now ...,fear,seriously hate subject death feel reluctant drop,1
1,im so full of life i feel appalled,anger,m life feel appalled,2
2,i sit here to write i start to dig out my feel...,fear,sit write start dig feeling think afraid accep...,1
3,ive been really angry with r and i feel like a...,joy,ve angry r feel like idiot trust place,0
4,i feel suspicious if there is no one outside l...,fear,feel suspicious outside like rapture happen,1


**Split data into train and test**

In [29]:
X_train, X_test, y_train,y_test = train_test_split(data['preprocessed_comment'], data['Emotion_num'],
                                                   test_size=0.2, random_state=42, stratify=data['Emotion_num'])

In [30]:
print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)

Shape of X_train: (4749,)
Shape of X_test: (1188,)


**Convert text column to numeric vector**

In [31]:
v= TfidfVectorizer()   # Création d'un objet TfidfVectorizer (il va transformer le texte en vecteurs numériques)

X_train_cv = v.fit_transform(X_train)   # Apprentissage du vocabulaire à partir des textes d'entraînement + transformation en vecteurs
X_test_cv = v.transform(X_test)    # Transformation des textes de test en vecteurs, en utilisant le même vocabulaire appris précédemment

print(v.vocabulary_)    #Affichage du vocabulaire



**Machine Learning Model**

***Logistic Regression***

In [32]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train_cv, y_train)

In [33]:
from sklearn.metrics import accuracy_score, classification_report

y_pred = model.predict(X_test_cv)

print("Accuracy :", accuracy_score(y_test, y_pred))
print("Rapport de classification :\n", classification_report(y_test, y_pred))

Accuracy : 0.9191919191919192
Rapport de classification :
               precision    recall  f1-score   support

           0       0.89      0.95      0.92       400
           1       0.94      0.90      0.92       388
           2       0.94      0.90      0.92       400

    accuracy                           0.92      1188
   macro avg       0.92      0.92      0.92      1188
weighted avg       0.92      0.92      0.92      1188



***Naive Bayes***

In [34]:
NB_model = MultinomialNB()

#model training
NB_model.fit(X_train_cv, y_train)

In [36]:
#get prediction
y_pred = NB_model.predict(X_test_cv)

In [38]:
#print accuracy score
print("Accuracy :",accuracy_score(y_test,y_pred))

Accuracy : 0.9006734006734006


In [41]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.90      0.89      0.90       400
           1       0.90      0.90      0.90       388
           2       0.90      0.92      0.91       400

    accuracy                           0.90      1188
   macro avg       0.90      0.90      0.90      1188
weighted avg       0.90      0.90      0.90      1188



***Random Forest***

In [42]:
RFC_model = RandomForestClassifier()

RFC_model.fit(X_train_cv, y_train)

In [43]:
#get prediction
y_pred = RFC_model.predict(X_test_cv)

In [44]:
#print accuracy score
print("Accuracy :",accuracy_score(y_test,y_pred))

Accuracy : 0.9242424242424242


In [45]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.92      0.95      0.93       400
           1       0.92      0.92      0.92       388
           2       0.93      0.90      0.92       400

    accuracy                           0.92      1188
   macro avg       0.92      0.92      0.92      1188
weighted avg       0.92      0.92      0.92      1188



***SVM***

In [46]:
# Créer et entraîner le modèle SVM
from sklearn.svm import SVC

model = SVC(kernel='linear')  # kernel='linear' fonctionne bien pour les textes
model.fit(X_train_cv, y_train)

In [47]:
# Faire des prédictions
y_pred = model.predict(X_test_cv)

In [48]:
#Évaluer les performances
print("Accuracy :", accuracy_score(y_test, y_pred))

Accuracy : 0.9276094276094277


In [49]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.91      0.95      0.93       400
           1       0.93      0.93      0.93       388
           2       0.94      0.90      0.92       400

    accuracy                           0.93      1188
   macro avg       0.93      0.93      0.93      1188
weighted avg       0.93      0.93      0.93      1188



**Test Model**

***Get Text***

In [50]:
test_text = data['Comment'][2000]
test_text

'im looking good and feeling good other than this crappy cold im dealing with'

***Apply preprocess***

In [51]:
test_text_processed = [preprocess(test_text)]
test_text_processed

['m look good feel good crappy cold m deal']

***Convert to vector***

In [53]:
test_text_vc = v.transform(test_text_processed)
test_text_vc

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 6 stored elements and shape (1, 6138)>

***Get Prediction***

In [59]:
test_text = RFC_model.predict(test_text_vc)

***Output***

In [60]:
print(f"{data['Emotion'][2000]} -> {data['Emotion_num'][2000]}")
print(test_text)

joy -> 0
[0]
