## Sentiment Analysis using LSTM

#### DESCRIPTION

Sentiment Analysis is one of the common problems that companies are working on. The most important application of sentiment analysis comes while working on natural language processing tasks.

#### Objective: 
Use LSTM to perform sentiment analysis in Keras.
#### Note: 
Use the inbuilt dataset imdb from keras.datasets for this task.

Set the vocabulary size and load in training and test data.

In [None]:
from keras.datasets import imdb

vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

In [None]:
#imdb.get_word_index().keys()

Inspect a sample review and its label.

In [None]:
print('---review---')
print(X_train[6])
print('---label---')
print(y_train[6])

You can use the dictionary returned by imdb.get_word_index() to map the review back to the original words.

In [None]:
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in X_train[6]])
print('---label---')
print(y_train[6])

Maximum review length

In [None]:
print('Maximum review length: {}'.format(
len(max((X_train + X_test), key=len))))

Minimum review length.

In [None]:
print('Minimum review length: {}'.format(
len(min((X_test + X_test), key=len))))

**Pad Sequences**

Limit the maximum review length to max_words by truncating longer reviews and padding shorter reviews with a null value.

In [None]:
from keras.preprocessing import sequence
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

**Define an RNN model for sentiment analysis**

In [None]:
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout
embedding_size=32
model=Sequential()
model.add(Embedding(vocabulary_size, embedding_size, input_length=max_words))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

**Train and evaluate your model**

Initially compile your model by specifying the loss function and optimizer you want to use while training.

In [None]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [None]:
batch_size = 64
num_epochs = 2
X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
model.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs)

Evaluate the performance on test data.

In [None]:
scores = model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', scores[1])