## Task 1: Introduction

Welcome to **Sentiment Analysis with Keras and TensorFlow**.

![Sentiment Analysis](images/basic_sentiment_analysis.png)


## Task 2: The IMDB Reviews Dataset


In [5]:
import tensorflow
from tensorflow.keras.datasets import imdb
(x_train, y_train), (x_test,y_test) = imdb.load_data(num_words=10000)

In [6]:
print(x_train[0]) # Each text review is encoded in numeric value. list of tokens

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]


In [7]:
print(y_train[0]) # 0 represents negative review and 1 represents positive review

1


In [8]:
class_names = ['Negative', 'Positive']

In [9]:
word_index = imdb.get_word_index() # key value pairs in dictionary
print(word_index['bad']) # the word bad is tokenized for the toke value 75

75


## Task 3: Decoding the Reviews


In [10]:
reverse_word_index = dict((value,key) for key, value in word_index.items()) #reversing the key value pairs

def decode(review): #review is list of numbers
    text = ''
    for i in review: #i is the token values
        text += reverse_word_index[i]
        text += ' '
    return text

In [11]:
decode(x_train[0]) # words existing in the first training example

"the as you with out themselves powerful lets loves their becomes reaching had journalist of lot from anyone to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but and to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other and in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of and odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have t

In [12]:
def show_length():
    print('Length of first training example', len(x_train[0]))
    print('Length of second training example', len(x_train[1]))
    print('Length of first test example', len(x_test[0]))
    print('Length of second test example', len(x_test[1]))
    
show_length()

Length of first training example 218
Length of second training example 189
Length of first test example 68
Length of second test example 260



## Task 4: Padding the Examples


In [13]:
word_index['the']

1

In [15]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
x_train = pad_sequences(x_train, value = word_index['the'], padding = 'post', maxlen = 256)
x_test = pad_sequences(x_test, value = word_index['the'], padding = 'post', maxlen = 256)

In [16]:
show_length()

Length of first training example 256
Length of second training example 256
Length of first test example 256
Length of second test example 256


In [17]:
decode(x_train[0])

"the as you with out themselves powerful lets loves their becomes reaching had journalist of lot from anyone to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but and to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other and in of seen over landed for anyone of and br show's to whether from than out themselves history he name half some br of and odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have t

## Task 5: Word Embeddings
Word Embeddings:

![Word Embeddings](images/word_embeddings.png)

Feature Vectors:

![Learned Embeddings](images/embeddings.png)


## Task 6: Creating and Training the Model


In [18]:
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Embedding, GlobalAveragePooling1D

model = Sequential([
    Embedding(10000, 16), # Embedding layer can only be used as input layer and first layer. Vocab size is 10000 and feature vector size is 16
    GlobalAveragePooling1D(),
    Dense(16, activation = 'relu'), #relu activation function is used
    Dense(1, activation = 'sigmoid') #output layer - sigmoid gives binary classification 
])
model.compile(
    loss = 'binary_crossentropy',
    optimizer = 'adam', #variant of stochastic gradient descent
    metrics = ['accuracy']
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 16)          160000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 16)                272       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
Total params: 160,289
Trainable params: 160,289
Non-trainable params: 0
_________________________________________________________________


In [19]:
from tensorflow.python.keras.callbacks import LambdaCallback
simple_log = LambdaCallback(on_epoch_end = lambda e, l: print(e, end ='.'))
E=20 #number of epochs
h = model.fit(
    x_train, y_train,
    validation_split =0.2,
    epochs = E,
    callbacks= [simple_log],
    verbose = False
)

0.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.

## Task 7: Predictions and Evaluation



In [20]:
import matplotlib.pyplot as plt 
%matplotlib inline

plt.plot(range(E), h.history['acc'], label = 'Training')  #high accuracy for training set
plt.plot(range(E), h.history['val_acc'], label = 'Validation') # constant accuracy here
plt.legend()
plt.show()


KeyError: 'acc'

In [None]:
#Training set accuracy
loss, acc = model.evaluate(x_train, y_train)
print('Training set accuracy: ', acc*100)
#test Accuracy
loss, acc = model.evaluate(x_test, y_test)
print('Test set accuracy: ', acc*100)

In [None]:
import numpy as np

pred = model.predict(np.expand_dims(x_test[0],axis=0))
print(class_names[np.argmax(pred[0])]) #argmax gives the index

In [None]:
decode(x_test[0])