## Machine Learning Approach for Sentiment Classification 

In [1]:
# first we import all the necessary packages for the tutorial
    # for more information please check the helperfunction.py file

from helperfunctions import *

import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

  return f(*args, **kwds)
Using TensorFlow backend.


Total running time:  3.899999999923409e-05


In [2]:
# we initialize the random number generator to a constant value so that we can easily reproduce results
seed = 7
np.random.seed(seed)

In [3]:
# load the imdb data set and sequence the dataset to a maximum review length of 500 words
    # pad each document to ensure that they are of the same length
    # we used the keras utility to pad the dataset to a length of 500 for each observation
    # longer sequences are truncated and shorter sequences are padded with zeros at the end
# we will focus onlyon the first 5000 most used words in the dataset
    # because the dataset has a built in fixed dictionary of the 5000 most frequent tokens

top_words = 5000
max_words = 500

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=top_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_test = sequence.pad_sequences(x_test, maxlen=max_words)

print(x_train.shape)
print(x_test.shape)
print("Documents adjusted: all reviews have the same length of",max_words,"words.")

(25000, 500)
(25000, 500)
Documents adjusted: all reviews have the same length of 500 words.


## Keras Embedding Layer

In [4]:
# Keras Embedding Layer
    # it requires the input data to be integer encoded, so that each word is represented by a unique integer
    # Embedding layer is initialized with random weights and will learn an embedding for all words in the training dataset

# below we define an Embedding layer with a vocabulary of 5000
    # integer encoded words from 0 to 4999
    # a vector space of 32 dimensions in which words will be embedded
    # and input documents that have 500 words each
Embedding(5000, 32, input_length=500)

<keras.layers.embeddings.Embedding at 0x1a3427c2b0>

## Multilayer Perceptron Model

In [5]:
print('Build MLP model...')
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

#we used binary cross entropy loss here because it is a binary classification problem

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

Build MLP model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
flatten_1 (Flatten)          (None, 16000)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 250)               4000250   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 251       
Total params: 4,160,501
Trainable params: 4,160,501
Non-trainable params: 0
_________________________________________________________________
None


In [6]:
# Fit the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2, batch_size=128, verbose=2)

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
 - 31s - loss: 0.5124 - acc: 0.7084 - val_loss: 0.3435 - val_acc: 0.8492
Epoch 2/2
 - 35s - loss: 0.1929 - acc: 0.9260 - val_loss: 0.3006 - val_acc: 0.8740


<keras.callbacks.History at 0x1a30c21b70>

In [7]:
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 87.40%


____
## Convolutional Neural Networks
* How do we go beyond words (sentences and paragraphs)?<br><br>
* This turns to be a very hard problem?<br><br>
* Simple Approaches:
   * Word Vector Averaging
   * Weighted Word Vector Averaging

____
#### Features
* Excellent feature Extractors<br>
* Features are detected regardless of position in image<br>
* NLP for Text: Collobert et all 2011<br>

<img src="images/CNN_arc.png" alt="CNN_arc" style="width: 900px;"/>
____

### Architecture of the Network:
* Convolutional Layer: <br>
    * In image analysis, our filters slide over local patches of an image, but in NLP we typically use filters that <br>
    slide over full rows of the matrix (words)<br>
    * CNN learns the values of these filters on its own during the training process!<br><br>
* Max Pooling: <br>
    * reduces the spatial size of the input representation<br>
    * pooling makes the input representations (feature dimension) smaller and more manageable<br>
____

### CNN for NLP Example

#### 1. Step:
<img src="images/CNN1.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 2. Step:
<img src="images/CNN3.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 3. Step:
<img src="images/CNN2.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 4. Step:
<img src="images/CNN4.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 5. Step:
<img src="images/CNN5.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 6. Step:
<img src="images/CNNa.jpeg" alt="CNN_arc" style="width: 600px;"/>
____

#### 7. Step:
<img src="images/CNNb.jpeg" alt="CNN_arc" style="width: 600px;"/>
____

#### 8. Step:
<img src="images/CNN6.png" alt="CNN_arc" style="width: 600px;"/>
____

#### 9. Step:
<img src="images/CNN7.png" alt="CNN_arc" style="width: 600px;"/>
____

####  CNN Model

In [8]:
# Model
# Embedding: turns positive integers (indexes) into dense vectors of fixed size
# Conv:
    # filter: dimensionality of the output space (32 dimensions) 
    # kernel_size: reads (window) embedded word representations 3 vector elements of the word embedding at a time
    # padding: "same" results in padding the input such that the output has the same length as the original input
    # activation: activation function to use is 'relu': relu has better properties and speeds up the training
# MaxPooling:
    # pool_size: 2 the pooling layer is used to reduce the amount of parameters to simplify the computation
# Flatten
    # to connect a Dense layer directly to an Embedding layer, flatten the 2D output matrix to a 1D vector
# Dense (sigmoid): sigmoid activation will produce a float number between 0 and 1

print('Build CNN model...')
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Conv1D(filters=32, 
                 kernel_size=3, 
                 padding='same', 
                 activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# we used binary cross entropy loss here because it is a binary classification problem

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

Build CNN model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 500, 32)           3104      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 250, 32)           0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 8000)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 250)               2000250   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 251       
Total params: 2,163,605
Trainable params: 2,163,605
Non-trainable params: 0
_______________________________________________

In [9]:
# epochs: it passed 2 times through the full training set
# batch size: the number of training examples in one forward/backward pass
# vebose: 2 = one line per epoch
    # running the example (accuracy of 88.73%) offers a small improvement over the neural network model above

history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2, batch_size=128, verbose=2)

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
 - 37s - loss: 0.5125 - acc: 0.7139 - val_loss: 0.2862 - val_acc: 0.8810
Epoch 2/2
 - 36s - loss: 0.2244 - acc: 0.9127 - val_loss: 0.2707 - val_acc: 0.8877


In [10]:
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 88.77%


#### Results:
* Running the example offers a improvement over the MLP above with an accuracy of nearly 88.64%.

In [11]:
score, acc = model.evaluate(x_test, y_test)
preds = model.predict_classes(x_test)



#### Confusion Matrix
* Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making

In [12]:
# we want to setup and generate a cofusion matrix to have get a better understanding of the evaluation

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):
   
    plt.figure()
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    print(cm)
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j], horizontalalignment="center", color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    
# plot the confusion Matrix
cm = confusion_matrix(y_test, preds)
plot_confusion_matrix(cm, {'negative': 0, 'positive': 1})

NameError: name 'confusion_matrix' is not defined

#### Results:
   * Total of 2841 wrong classifications and 22159 true classifications<br><br>
   * Classifier predicted "negative" 12503 times, and "positive" 12497 times for the reviews<br><br>
   * True negative: 11081 | False positive: 1419<br><br>
   * False negative: 1422 | True positive: 11078<br><br>
   * Accuracy: 88,64

### Summary

* #### In this tutorial, we discovered the topic of Sentiment Analysis with the Keras IMDB dataset.<br><br>

* #### We learned how to develop deep learning models for sentiment analysis including:
    * How to handle the basic dictionary approach for sentiment analysis<br><br>
    * How to load review and analyze the IMDB dataset within Keras<br><br>
    * How to use and build word embeddings with the Keras Embedding Layer for deep learning<br><br>
    * How to develop a one-dimensional CNN model for sentiment analysis and how it works for NLP<br><br>
    
* #### How to continue with this tutorial?
    * Try to experiment with the number of features such as filter size in the convolutional layer<br><br>
    * You can also experiment with several convolutional layers and maxpooling layers, etc.<br><br>
    * Try to obtain higher accuracy

___
### Limitations and further Topics

* CNNs are not able to encode long-range dependencies, and therefore, for some language modeling tasks, where long-distance dependence matters, other  architectures are preferred:<br><br>
    * Recurrent Neural Networks (RNN)<br><br>
    * Long Short Term memory (LSTM) 
