# Part I: Emotion classification
The problem type is supervised multiclass classification and the target is the emotion, with the different classes being ('sadness', 'anger', 'love', 'surprise', 'fear', 'joy').  
To do this we're going to apply transfer learning by using a model pre-trained specifically on this task  
The model is provided by hugging face
https://huggingface.co/mrm8488/t5-base-finetuned-emotion

## Prerequisites

In [42]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from transformers import AutoTokenizer, AutoModelWithLMHead

## Transfer learning

In [43]:
# hugging face tokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-emotion")
# load the model which is already trained on emotion dataset
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-emotion")
# function that takes input and returns emotion
def get_emotion(text):
  input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')

  output = model.generate(input_ids=input_ids,
               max_length=2)
  
  dec = [tokenizer.decode(ids) for ids in output]
  label = dec[0]
  return label
  
#get_emotion("i feel as if i havent blogged in ages are at least truly blogged i am doing an update cute") # Output: 'joy'
 
get_emotion("i have a feeling i kinda lost my best friend") # Output: 'sadness'



'<pad> sadness'

# Part II: Emoji emotion classification
In this part, we're going to use the previous trained model to help us predict the emotions of the emojis. The feature is the emoji name, e.g., FACE WITH TEARS OF JOY. and the target variable is the emotion.


## Read the tweet-emoji dataset

In [3]:
import pandas as pd
# the problem with this dataset is that it saves the names of the emojis not the emojis themselves # we will solve this by merging it with another dataset
tweets = pd.read_csv("../data/new-emojis/tweets_emojis.csv")


In [6]:
# delete uneeded column
tweets.drop('Unnamed: 0', axis=1,inplace=True)

In [29]:
# remove all characters that are not letters or numbers # save it in new column called unicode name to help with merge later
tweets['Unicode name'] = tweets['emoji'].str.replace('_', ' ')
# convert to upper case
tweets['Unicode name']= tweets['emoji'].apply(lambda names: names.upper())

In [31]:
# drop old column
tweets.drop('emoji', axis=1, inplace=True)

In [30]:
tweets['Unicode name']

0                 FACE WITH TEARS OF JOY
1                 FACE WITH TEARS OF JOY
2                              THUMBS UP
3                 FACE WITH TEARS OF JOY
4                         CLAPPING HANDS
                       ...              
1320030                        MALE SIGN
1320031    BACKHAND INDEX POINTING RIGHT
1320032                     FLUSHED FACE
1320033                 PERSON SHRUGGING
1320034                    RAISING HANDS
Name: Unicode name, Length: 1320035, dtype: object

In [32]:
tweets.head()

Unnamed: 0,text,Unicode name
0,Idk who taught my baby this BS ️ IGmeetthesa...,FACE WITH TEARS OF JOY
1,Thats me in every lesson,FACE WITH TEARS OF JOY
2,There are MANY of you 🇺 🇸 🇺 🇸 🇮 🇱,THUMBS UP
3,Partner strategy LLRC Urban naxal theories ar...,FACE WITH TEARS OF JOY
4,Happy Birthday More blessings Matsatsi 🏽 Hop...,CLAPPING HANDS


## Read the emoji dataset

In [26]:
# use this dataset to get the emojis
emojis = pd.read_csv("../data/emojis/Emoji_Sentiment_Data_v1.0.csv")
emojis.head()

Unnamed: 0,Emoji,Unicode codepoint,Occurrences,Position,Negative,Neutral,Positive,Unicode name,Unicode block
0,😂,0x1f602,14622,0.805101,3614,4163,6845,FACE WITH TEARS OF JOY,Emoticons
1,❤,0x2764,8050,0.746943,355,1334,6361,HEAVY BLACK HEART,Dingbats
2,♥,0x2665,7144,0.753806,252,1942,4950,BLACK HEART SUIT,Miscellaneous Symbols
3,😍,0x1f60d,6359,0.765292,329,1390,4640,SMILING FACE WITH HEART-SHAPED EYES,Emoticons
4,😭,0x1f62d,5526,0.803352,2412,1218,1896,LOUDLY CRYING FACE,Emoticons


In [27]:
# since we dont need all features, only save emoji and name
emojis = emojis[['Emoji', 'Unicode name']]

In [28]:
# after
emojis.head()

Unnamed: 0,Emoji,Unicode name
0,😂,FACE WITH TEARS OF JOY
1,❤,HEAVY BLACK HEART
2,♥,BLACK HEART SUIT
3,😍,SMILING FACE WITH HEART-SHAPED EYES
4,😭,LOUDLY CRYING FACE


In [33]:
# merge the two dataset on the unicode name to get a tweet - emoji dataset
tweets_emojis = emojis.merge(tweets)
tweets_emojis.head()

Unnamed: 0,Emoji,Unicode name,text
0,😂,FACE WITH TEARS OF JOY,Idk who taught my baby this BS ️ IGmeetthesa...
1,😂,FACE WITH TEARS OF JOY,Thats me in every lesson
2,😂,FACE WITH TEARS OF JOY,Partner strategy LLRC Urban naxal theories ar...
3,😂,FACE WITH TEARS OF JOY,Dont play with me 🏾 ‍ ️
4,😂,FACE WITH TEARS OF JOY,To the goofiest boy ever who apparently looks ...


In [34]:
tweets_emojis.shape

(741777, 3)

In [41]:
# explore the distribution of emojis
tweets_emojis['Unicode name'].value_counts()

FACE WITH TEARS OF JOY            210309
LOUDLY CRYING FACE                 82659
FIRE                               51030
FEMALE SIGN                        50941
MALE SIGN                          34149
WEARY FACE                         30489
TWO HEARTS                         30049
SMILING FACE WITH SMILING EYES     28519
SPARKLES                           23532
EYES                               20824
FLEXED BICEPS                      16634
PURPLE HEART                       16457
PARTY POPPER                       16259
WINKING FACE                       16075
BLUE HEART                         15941
SMILING FACE WITH SUNGLASSES       15322
SPARKLING HEART                    14614
SKULL                              11976
CRYING FACE                        11265
YELLOW HEART                       10100
FLUSHED FACE                        8449
WHITE HEAVY CHECK MARK              7192
TROPHY                              6892
GLOWING STAR                        6189
HEAVY CHECK MARK

## Send tweets as input to the pre-trained model

In [44]:
# classify emoji emotions and save in new column
tweets_emojis['Emotions'] = tweets_emojis['text'].apply(lambda names: get_emotion(names.lower()))

KeyboardInterrupt: 

In [25]:
# delete extra word present at the start of the string
tweets_emojis['Emotions'] = tweets_emojis['Emotions'].apply(lambda names: names.split(' ', 1)[1])

In [45]:
tweets_emojis.head(20) 

Unnamed: 0,Emoji,Unicode name,text
0,😂,FACE WITH TEARS OF JOY,Idk who taught my baby this BS ️ IGmeetthesa...
1,😂,FACE WITH TEARS OF JOY,Thats me in every lesson
2,😂,FACE WITH TEARS OF JOY,Partner strategy LLRC Urban naxal theories ar...
3,😂,FACE WITH TEARS OF JOY,Dont play with me 🏾 ‍ ️
4,😂,FACE WITH TEARS OF JOY,To the goofiest boy ever who apparently looks ...
5,😂,FACE WITH TEARS OF JOY,Real Real cuz these niggas really b bluffing s...
6,😂,FACE WITH TEARS OF JOY,“ What the heck I ordered an Xbox card ”
7,😂,FACE WITH TEARS OF JOY,thread is absolutely outstanding Silage of th...
8,😂,FACE WITH TEARS OF JOY,how can you say you love her if you cant even ...
9,😂,FACE WITH TEARS OF JOY,Whats wrong with females these days 🏾 ‍ ️


In [28]:
emojis.head(20) 

Unnamed: 0,Emoji,Unicode name,Emotions
0,😂,FACE WITH TEARS OF JOY,joy
1,❤,HEAVY BLACK HEART,sadness
2,♥,BLACK HEART SUIT,anger
3,😍,SMILING FACE WITH HEART-SHAPED EYES,joy
4,😭,LOUDLY CRYING FACE,anger
5,😘,FACE THROWING A KISS,fear
6,😊,SMILING FACE WITH SMILING EYES,joy
7,👌,OK HAND SIGN,joy
8,💕,TWO HEARTS,anger
9,👏,CLAPPING HANDS SIGN,anger


## Merge emoji and emotion dataset

In [20]:
emotion_train = pd.read_csv("../data/emotions/train.txt", delimiter=';', header=None, names=['Sentence','Emotions'])

In [21]:
# show first 5 rows
emotion_train.head()

Unnamed: 0,Sentence,Emotions
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,love
4,i am feeling grouchy,anger


In [None]:
tweets_emojis = tweets_emojis[['Emoji', 'Emotions']]
tweets_emojis.head()

In [30]:
emojis = emojis[['Emoji', 'Emotions']]
emojis.head()

Unnamed: 0,Emoji,Emotions
0,😂,joy
1,❤,sadness
2,♥,anger
3,😍,joy
4,😭,anger


In [31]:
# merge the two tables on 'Emotion' column
emotion_emoji_merged = emotion_train.merge(tweets_emojis)

In [34]:
emotion_emoji_merged.head(20)

Unnamed: 0,Sentence,Emotions,Emoji
0,i didnt feel humiliated,sadness,❤
1,i didnt feel humiliated,sadness,😩
2,i didnt feel humiliated,sadness,💜
3,i didnt feel humiliated,sadness,💙
4,i didnt feel humiliated,sadness,😢
5,i didnt feel humiliated,sadness,😞
6,i didnt feel humiliated,sadness,😫
7,i didnt feel humiliated,sadness,💔
8,i didnt feel humiliated,sadness,😑
9,i didnt feel humiliated,sadness,🎅


In [33]:
emotion_emoji_merged.shape

(2801783, 3)

# Part III: Text emoji recommendation
In this lat part, we train a new model to take the text and recommends an emoji based on that text. The feature is the text and the target variable is the emoji

## 

In [5]:
#Importing require Libraries
import os

import nltk
from tkinter import *
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
import scipy

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.python import keras
import string
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Embedding, LSTM ,Conv2D, Dense,GlobalAveragePooling1D,Flatten, Dropout , GRU, TimeDistributed, Conv1D, MaxPool1D, MaxPool2D, Bidirectional

from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
##


## Text preprocessing using embeddings

In [None]:
# create an embedding matrix using the golve vectors (?)

In [None]:
# converting y_train to one hot vectors so that cross-entropy loss can be used
y_train = to_categorical(y_train)

## Model building

In [None]:
model = Sequential()

In [None]:
model.add(LSTM(units = 256, return_sequences=True, input_shape = (168,50)))
model.add(Dropout(0.3))
model.add(LSTM(units=128))
model.add(Dropout(0.3))
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=20, activation='relu'))
model.add(Dense(units=20, activation='softmax'))

In [None]:
model.summary()

In [None]:

model.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy, metrics=['acc'])

## Model training

In [None]:
res = model.fit(X_temb, y_train, validation_split=0.2, batch_size=32, epochs=10, verbose=2)

## Model performance overview

In [None]:
# Loss and accuracy plots

## Confusion matrix and correlation report