## Emoji Predictor ##

### Step1: Get The Emoji Predictor ###

In [1]:
!pip install emoji




In [2]:
import emoji

In [3]:
#emoji.EMOJI_UNICODE

In [4]:
emoji_dictionary = {"0": "\u2764\uFE0F",    # :heart: prints a black instead of red heart depending on the font
                    "1": ":baseball:",
                    "2": ":beaming_face_with_smiling_eyes:",
                    "3": ":downcast_face_with_sweat:",
                    "4": ":fork_and_knife:",
                   }

In [5]:
emoji.emojize(":fork_and_knife:")
emoji.emojize(":fire:")

'🔥'

In [6]:
for e in emoji_dictionary.values():
    print(emoji.emojize(e))

❤️
⚾
😁
😓
🍴


### Step2: Processing a Custom Dataset ###

In [7]:
import pandas as pd
import numpy as np

In [8]:
train=pd.read_csv('dataset/train_emoji.csv',header=None)
test=pd.read_csv('dataset/test_emoji.csv',header=None)


In [9]:
train.head()

Unnamed: 0,0,1,2,3
0,never talk to me again,3,,
1,I am proud of your achievements,2,,
2,It is the worst day in my life,3,,
3,Miss you so much,0,,[0]
4,food is life,4,,


In [10]:
#Print the train sentences with emoji
data=train.values
print(data.shape)


(132, 4)


In [11]:
X_train=train[0]
Y_train=train[1]

X_test=test[0]
Y_test=test[1]

In [12]:
for i in range(len(X_train)):
    print(X_train[i],end=" ")
    print(emoji.emojize(emoji_dictionary[str(Y_train[i])]))
    

never talk to me again 😓
I am proud of your achievements 😁
It is the worst day in my life 😓
Miss you so much ❤️
food is life 🍴
I love you mum ❤️
Stop saying bullshit 😓
congratulations on your acceptance 😁
The assignment is too long  😓
I want to go play ⚾
she did not answer my text  😓
Your stupidity has no limit 😓
how many points did he score ⚾
my algorithm performs poorly 😓
I got approved 😁
Stop shouting at me 😓
Sounds like a fun plan ha ha 😁
no one likes him 😓
the game just finished ⚾
I will celebrate soon 😁
So sad you are not coming 😓
She is my dearest love ❤️
Good job 😁
It was funny lol 😁
candy is life  😁
The chicago cubs won again ⚾
I am hungry 🍴
I am so excited to see you after so long 😁
you did well on you exam 😁
lets brunch some day 🍴
he is so cute ❤️
How dare you ask that 😓
do you want to join me for dinner  🍴
I said yes 😁
she is attractive ❤️
you suck 😓
she smiles a lot 😁
he is laughing 😁
she takes forever to get ready  😓
French macaroon is so tasty 🍴
we made it 😁
I am excited

### Step3 Converting Sentences into Embedding ###


In [13]:
f=open('dataset/glove.6B.50d.txt',encoding='utf-8')

In [14]:
#glove file contains words and its corresponding 50d vector which contains it's representation
embedding_index={}
for line in f:
    values=line.split()
    word=values[0]
    coefs=np.asarray(values[1:],dtype='float')
    embedding_index[word]=coefs
   

In [15]:
f.close()
embedding_index['chocolate']


array([ 0.089859,  0.5691  , -0.91323 ,  0.34064 ,  0.7763  ,  1.3755  ,
       -0.6681  , -0.322   , -0.061527,  0.81761 ,  0.1773  , -0.24408 ,
        1.1812  ,  0.65863 ,  0.77332 ,  0.40388 , -0.31354 ,  0.35177 ,
       -0.10074 , -1.6919  ,  0.70704 , -0.14594 ,  0.93264 ,  0.4056  ,
       -0.49499 ,  0.16782 , -1.5197  ,  1.0247  ,  1.282   , -0.33623 ,
        1.2153  , -0.065825, -1.2306  ,  1.4039  , -0.16776 , -0.40948 ,
       -0.92448 ,  0.99141 ,  1.5194  , -0.54659 ,  0.93013 ,  0.17938 ,
       -0.17086 , -0.42733 ,  0.75439 ,  1.4537  , -0.098187, -0.59428 ,
       -0.19965 , -0.49592 ])

### Step-4 Converting sentences into vectors(Embedding layer Output) ###


In [16]:
#in glove vector each is represented by 50 size array so emb_dim=50
#max_len =10 means in each sentences we want starting 10 words only

def embedding_output(X):
    max_len=10
    emb_dim=50
    embedding_out=np.zeros((X.shape[0],max_len,emb_dim))
    
    for i in range(X.shape[0]):
        #here first we need to split each sentences
        #print(X[i])
        X[i]=X[i].split()
        for j in range(min(len(X[i]),10)):
            #iterate to every word in the current(i) sentence
            
            try:
                embedding_out[i][j]=embedding_index[X[i][j].lower()]
            except:
                embedding_out[i][j]=np.zeros((50,))
    return embedding_out        

In [17]:
embedding_matrix_train=embedding_output(X_train)
embedding_matrix_test=embedding_output(X_test)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':


In [18]:
print(X_train[1])
print(len(X_train[0]))

['I', 'am', 'proud', 'of', 'your', 'achievements']
5


In [19]:
print(embedding_matrix_train.shape)
print(embedding_matrix_test.shape)

(132, 10, 50)
(56, 10, 50)


In [20]:
X_train[0]

['never', 'talk', 'to', 'me', 'again']

In [21]:
from keras.utils import to_categorical

Using TensorFlow backend.


In [22]:
Y_train=to_categorical(Y_train,num_classes=5)
Y_test=to_categorical(Y_test,num_classes=5)

### Step5 Define the RNN/LSTM Model ###

In [79]:
from keras.models import Sequential
from keras.layers import *

In [80]:
#Stacked LSTM
model=Sequential()
model.add(LSTM(64,input_shape=(10,50),return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(64,return_sequences=False))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy",optimizer='adam',metrics=['acc'])
model.summary()

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_13 (LSTM)               (None, 10, 64)            29440     
_________________________________________________________________
dropout_13 (Dropout)         (None, 10, 64)            0         
_________________________________________________________________
lstm_14 (LSTM)               (None, 64)                33024     
_________________________________________________________________
dropout_14 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 5)                 325       
_________________________________________________________________
activation_10 (Activation)   (None, 5)                 0         
Total params: 62,789
Trainable params: 62,789
Non-trainable params: 0
_________________________________________________

In [81]:
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint

In [82]:
checkpoint=ModelCheckpoint('best_model.h5',monitor='val_acc',verbose=True,save_best_only=True,mode="auto")
#earlystop=EarlyStopping(monitor="val_acc",patience=10)
hist=model.fit(embedding_matrix_train,Y_train,epochs=150,batch_size=64,shuffle=True,validation_split=0.2,callbacks=[checkpoint])

Train on 105 samples, validate on 27 samples
Epoch 1/150

Epoch 00001: val_acc improved from -inf to 0.22222, saving model to best_model.h5
Epoch 2/150

Epoch 00002: val_acc did not improve from 0.22222
Epoch 3/150

Epoch 00003: val_acc did not improve from 0.22222
Epoch 4/150

Epoch 00004: val_acc did not improve from 0.22222
Epoch 5/150

Epoch 00005: val_acc did not improve from 0.22222
Epoch 6/150

Epoch 00006: val_acc did not improve from 0.22222
Epoch 7/150

Epoch 00007: val_acc improved from 0.22222 to 0.25926, saving model to best_model.h5
Epoch 8/150

Epoch 00008: val_acc did not improve from 0.25926
Epoch 9/150

Epoch 00009: val_acc did not improve from 0.25926
Epoch 10/150

Epoch 00010: val_acc did not improve from 0.25926
Epoch 11/150

Epoch 00011: val_acc did not improve from 0.25926
Epoch 12/150

Epoch 00012: val_acc did not improve from 0.25926
Epoch 13/150

Epoch 00013: val_acc did not improve from 0.25926
Epoch 14/150

Epoch 00014: val_acc did not improve from 0.25926
E


Epoch 00043: val_acc did not improve from 0.70370
Epoch 44/150

Epoch 00044: val_acc did not improve from 0.70370
Epoch 45/150

Epoch 00045: val_acc did not improve from 0.70370
Epoch 46/150

Epoch 00046: val_acc did not improve from 0.70370
Epoch 47/150

Epoch 00047: val_acc improved from 0.70370 to 0.77778, saving model to best_model.h5
Epoch 48/150

Epoch 00048: val_acc did not improve from 0.77778
Epoch 49/150

Epoch 00049: val_acc did not improve from 0.77778
Epoch 50/150

Epoch 00050: val_acc did not improve from 0.77778
Epoch 51/150

Epoch 00051: val_acc did not improve from 0.77778
Epoch 52/150

Epoch 00052: val_acc did not improve from 0.77778
Epoch 53/150

Epoch 00053: val_acc did not improve from 0.77778
Epoch 54/150

Epoch 00054: val_acc did not improve from 0.77778
Epoch 55/150

Epoch 00055: val_acc did not improve from 0.77778
Epoch 56/150

Epoch 00056: val_acc did not improve from 0.77778
Epoch 57/150

Epoch 00057: val_acc did not improve from 0.77778
Epoch 58/150

Epoc


Epoch 00086: val_acc did not improve from 0.77778
Epoch 87/150

Epoch 00087: val_acc did not improve from 0.77778
Epoch 88/150

Epoch 00088: val_acc did not improve from 0.77778
Epoch 89/150

Epoch 00089: val_acc did not improve from 0.77778
Epoch 90/150

Epoch 00090: val_acc did not improve from 0.77778
Epoch 91/150

Epoch 00091: val_acc did not improve from 0.77778
Epoch 92/150

Epoch 00092: val_acc did not improve from 0.77778
Epoch 93/150

Epoch 00093: val_acc did not improve from 0.77778
Epoch 94/150

Epoch 00094: val_acc did not improve from 0.77778
Epoch 95/150

Epoch 00095: val_acc did not improve from 0.77778
Epoch 96/150

Epoch 00096: val_acc did not improve from 0.77778
Epoch 97/150

Epoch 00097: val_acc did not improve from 0.77778
Epoch 98/150

Epoch 00098: val_acc did not improve from 0.77778
Epoch 99/150

Epoch 00099: val_acc did not improve from 0.77778
Epoch 100/150

Epoch 00100: val_acc did not improve from 0.77778
Epoch 101/150

Epoch 00101: val_acc did not improve 


Epoch 00130: val_acc did not improve from 0.77778
Epoch 131/150

Epoch 00131: val_acc did not improve from 0.77778
Epoch 132/150

Epoch 00132: val_acc did not improve from 0.77778
Epoch 133/150

Epoch 00133: val_acc did not improve from 0.77778
Epoch 134/150

Epoch 00134: val_acc did not improve from 0.77778
Epoch 135/150

Epoch 00135: val_acc did not improve from 0.77778
Epoch 136/150

Epoch 00136: val_acc did not improve from 0.77778
Epoch 137/150

Epoch 00137: val_acc did not improve from 0.77778
Epoch 138/150

Epoch 00138: val_acc did not improve from 0.77778
Epoch 139/150

Epoch 00139: val_acc did not improve from 0.77778
Epoch 140/150

Epoch 00140: val_acc did not improve from 0.77778
Epoch 141/150

Epoch 00141: val_acc did not improve from 0.77778
Epoch 142/150

Epoch 00142: val_acc did not improve from 0.77778
Epoch 143/150

Epoch 00143: val_acc did not improve from 0.77778
Epoch 144/150

Epoch 00144: val_acc did not improve from 0.77778
Epoch 145/150

Epoch 00145: val_acc did

In [83]:
!ls

EmojiPredictor.ipynb
README.md
best_model.h5
dataset


In [84]:
model.load_weights('best_model.h5')

In [85]:
model.evaluate(embedding_matrix_test,Y_test)



[1.408058864729745, 0.6428571343421936]

In [86]:
pred=model.predict_classes(embedding_matrix_test)

In [88]:
embedding_matrix_test.shape

(56, 10, 50)

In [89]:
X_test.shape

(56,)

In [91]:
" ".join(X_test[0])

'I want to eat'

In [92]:
Y_test.shape

(56, 5)

In [94]:
for i in range(X_test.shape[0]):
    print(" ".join(X_test[i]))
    print("TRUE EMOJI:"+emoji.emojize(emoji_dictionary[str(np.argmax(Y_test[i]))]))
    print("Predicted"+emoji.emojize(emoji_dictionary[str(pred[i])]))
    print()


I want to eat
TRUE EMOJI:🍴
Predicted🍴

he did not answer
TRUE EMOJI:😓
Predicted😓

he got a raise
TRUE EMOJI:😁
Predicted😁

she got me a present
TRUE EMOJI:❤️
Predicted❤️

ha ha ha it was so funny
TRUE EMOJI:😁
Predicted😁

he is a good friend
TRUE EMOJI:❤️
Predicted😁

I am upset
TRUE EMOJI:❤️
Predicted😓

We had such a lovely dinner tonight
TRUE EMOJI:❤️
Predicted😁

where is the food
TRUE EMOJI:🍴
Predicted🍴

Stop making this joke ha ha ha
TRUE EMOJI:😁
Predicted😁

where is the ball
TRUE EMOJI:⚾
Predicted⚾

work is hard
TRUE EMOJI:😓
Predicted😁

This girl is messing with me
TRUE EMOJI:😓
Predicted❤️

are you serious ha ha
TRUE EMOJI:😁
Predicted😁

Let us go play baseball
TRUE EMOJI:⚾
Predicted⚾

This stupid grader is not working
TRUE EMOJI:😓
Predicted😓

work is horrible
TRUE EMOJI:😓
Predicted😓

Congratulation for having a baby
TRUE EMOJI:😁
Predicted😁

stop messing around
TRUE EMOJI:😓
Predicted😓

any suggestions for dinner
TRUE EMOJI:🍴
Predicted😁

I love taking breaks
TRUE EMOJI:❤️
Predicted❤️

