# Emoji Predictor 😀 😁 😎
- LSTM based model, that predicts an emoji for the given text input.
- Uses Glove.5B.60D Embedding
- Transfer learning is used to learn the weights for the embeddings.

### 1. Installing the Emoji Package

In [189]:
!pip3 install emoji

Collecting emoji
Installing collected packages: emoji
Successfully installed emoji-0.5.4


In [190]:
import emoji

In [191]:
emoji_dictionary = {
    '0' : ':orange_heart:',
    '1' : ':baseball:',
    '2' : ':grinning_face_with_big_eyes:',
    '3' : ':downcast_face_with_sweat:',
    '4' : ':fork_and_knife:'
}

In [192]:
for e in emoji_dictionary.values():
    print(emoji.emojize(e))

🧡
⚾
😃
😓
🍴


### 2. Processing the Dataset 

In [193]:
import pandas as pd
import numpy as np
from keras.utils import to_categorical

In [194]:
train = pd.read_csv('Datasets/train_emoji.csv',header = None)
test = pd.read_csv('Datasets/test_emoji.csv',header = None)

In [195]:
train.head()

Unnamed: 0,0,1,2,3
0,never talk to me again,3,,
1,I am proud of your achievements,2,,
2,It is the worst day in my life,3,,
3,Miss you so much,0,,[0]
4,food is life,4,,


In [196]:
X_train = train[0]
Y_train = train[1]
X_test = test[0]
Y_test = test[1]

In [197]:
for i in range(10):
    print(X_train[i],emoji.emojize(emoji_dictionary[str(Y_train[i])]))

never talk to me again 😓
I am proud of your achievements 😃
It is the worst day in my life 😓
Miss you so much 🧡
food is life 🍴
I love you mum 🧡
Stop saying bullshit 😓
congratulations on your acceptance 😃
The assignment is too long  😓
I want to go play ⚾


### 3. Converting Words into Embeddings

In [198]:
f = open('glove.6B.50d.txt',encoding='utf-8')
embeddings_index = {}
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:],dtype='float')
    embeddings_index[word] = coefs
f.close()

### 4. Converting Sentences into Vectors

In [199]:
def embedding_output(X):
    maxLen = 10
    embedding_out = np.zeros((X.shape[0],maxLen,50))
    for ix in range(X.shape[0]):
        X[ix] = X[ix].split()
        for jx in range(len(X[ix])):
            embedding_out[ix][jx] = embeddings_index[X[ix][jx].lower()]
    return embedding_out

In [200]:
embeddings_matrix_train = embedding_output(X_train)
embeddings_matrix_test = embedding_output(X_test)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [201]:
print(embeddings_matrix_train.shape)
print(embeddings_matrix_test.shape)
Y_train = to_categorical(Y_train,num_classes=5)
Y_test = to_categorical(Y_test,num_classes=5)
print(Y_train.shape)

(132, 10, 50)
(56, 10, 50)
(132, 5)


### 5. Defining the LSTM Model

In [202]:
from keras.layers import *
from keras.models import Sequential
from keras.callbacks import EarlyStopping,ModelCheckpoint

In [203]:
model = Sequential()
model.add(LSTM(64,input_shape=(10,50),return_sequences=True))
model.add(Dropout(0.4))
model.add(LSTM(64,return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_14 (LSTM)               (None, 10, 64)            29440     
_________________________________________________________________
dropout_14 (Dropout)         (None, 10, 64)            0         
_________________________________________________________________
lstm_15 (LSTM)               (None, 64)                33024     
_________________________________________________________________
dropout_15 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 5)                 325       
_________________________________________________________________
activation_10 (Activation)   (None, 5)                 0         
Total params: 62,789
Trainable params: 62,789
Non-trainable params: 0
_________________________________________________

In [None]:
checkpoint = ModelCheckpoint('best_model.h5',monitor='val_loss',verbose=True,save_best_only=True)
early_stop = EarlyStopping(monitor='val_acc',patience=10)
hist = model.fit(embeddings_matrix_train,Y_train,epochs=40,batch_size=64,shuffle=True,validation_split=0.2)

In [205]:
model.evaluate(embeddings_matrix_test,Y_test)



[1.0891593864985876, 0.6071428656578064]

### 6. Making Predictions

In [206]:
pred = model.predict_classes(embeddings_matrix_test)

In [207]:
for i in range(30):
    print(' '.join(X_test[i]),end=" ")
    print(emoji.emojize(emoji_dictionary[str(pred[i])]))

I want to eat 🍴
he did not answer 😓
he got a raise 😓
she got me a present 😓
ha ha ha it was so funny 😃
he is a good friend 😃
I am upset ⚾
We had such a lovely dinner tonight 😃
where is the food 🍴
Stop making this joke ha ha ha 😃
where is the ball ⚾
work is hard 😃
This girl is messing with me 🧡
are you serious ha ha 😓
Let us go play baseball ⚾
This stupid grader is not working 😓
work is horrible 😓
Congratulation for having a baby 😃
stop messing around 😓
any suggestions for dinner 😃
I love taking breaks 🧡
you brighten my day 🧡
I boiled rice 🍴
she is a bully 🧡
Why are you feeling bad 😓
I am upset ⚾
I worked during my birthday 😃
My grandmother is the love of my life 🧡
enjoy your break 🍴
valentine day is near 😃
