#Text classifier using LSTM Models
This project is made to classify sentiments in IMDB movie reviews.


In [1]:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

**Step 1: Data Preprocessing**


**(a) Loading the Data**

Call imdb.load_data() function for the imdb reviews dataset.

In [2]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 5000)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


**(b) Converting the Raw Labels into Categorical Vectors**

We convert the raw labels ie. y_train and y_test to categorical vectors 


*   If label = 0 then vector = { 1 , 0 }
*   If label = 1 then vector = { 0 , 1 }



In [4]:
y_train = keras.utils.to_categorical(y_train, num_classes = 2)
y_test = keras.utils.to_categorical( y_test , num_classes=2 ) 

**(c) Padding the Sequences to Fixed length**

Padding is a form of Tokenization of words to fixed length in sequences.

Here we pad the sequences of text data of a fixed length of 120 integers.

So that every emotion can be tokenized later.

In [5]:
x_train = pad_sequences( x_train , maxlen=120 , padding='pre' ) 
x_test = pad_sequences( x_test , maxlen=120 , padding='pre' )

**Step 2: Defining and Compiling the Model**

Define the Hyperpararmeters for our LSTM model and compile it.

1. Categorical Crossentropy Loss Function
2. Adam Optimizer

In [9]:
dropout_rate = 0.3
batch_size = 1000
activation_func = keras.activations.relu

SCHEMA = [

    Embedding( 5000 , 10, input_length=120 ),
    LSTM( 32 ) ,
    Dropout(dropout_rate),
    Dense( 32 , activation=activation_func ) ,
    Dropout(dropout_rate),
    Dense( 2 , activation=keras.activations.softmax )
    
]

model = keras.Sequential(SCHEMA)
model.compile(
    optimizer=keras.optimizers.Adam() ,
    loss=keras.losses.categorical_crossentropy ,
    metrics=[ 'accuracy' ]
)

**Step 3: Training the Model**
Now train the above model over the training dataset with a batch size of 1000 samples.

In [10]:
model.fit(x_train, y_train, batch_size = batch_size, epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f25f9699630>

In [11]:
model.evaluate(x_test, y_test)



[0.3906114399433136, 0.8451600074768066]