<a href="https://colab.research.google.com/github/Deepsphere-AI/DSAI_Python_Programming/blob/main/Unit-16/Python%20for%20Deep%20Learning/Program%20353-%20CSLAB_DEEP_LEARNING_KERAS_IMDB_CLASSIFICATION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

# *********************************************************************************************************************
  
  # File Name 	:   CSLAB_DEEP_LEARNING_KERAS_IMDB_CLASSIFICATION
  # Purpose 	:   A Program in Python for IMDB Movie Classification using Keras Library in Deep Learning
  # Author	:   Deepsphere.ai
  # Reviewer 	:   Jothi Periasamy
  # Date 	:   28/10/2022
  # Version	:   1.0	
  
# ***********************************************************************************************************************

## Program Description : Program in Python for IMDB Movie Classification using Keras Library in Deep Learning

## Python Development Environment & Runtime - Python, Anaconda

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D
from keras.datasets import imdb#
from keras.utils import pad_sequences
#from keras.preprocessing.sequence import pad_sequences

# set parameters:
vAR_CSLAB_max_features = 5000
vAR_CSLAB_maxlen = 400
vAR_CSLAB_batch_size = 32
vAR_CSLAB_embedding_dims = 50
vAR_CSLAB_filters = 250
vAR_CSLAB_kernel_size = 3
vAR_CSLAB_hidden_dims = 250
vAR_CSLAB_epochs = 1

print('Loading data...')
(vAR_CSLAB_x_train, vAR_CSLAB_y_train), (vAR_CSLAB_x_test, vAR_CSLAB_y_test) = imdb.load_data(num_words=vAR_CSLAB_max_features)
print(len(vAR_CSLAB_x_train), 'train sequences')
print(len(vAR_CSLAB_x_test), 'test sequences')

#print('Pad sequences (samples x time)')
vAR_CSLAB_x_train = pad_sequences(vAR_CSLAB_x_train, maxlen=vAR_CSLAB_maxlen)
vAR_CSLAB_x_test = pad_sequences(vAR_CSLAB_x_test, maxlen=vAR_CSLAB_maxlen)
print('x_train shape:', vAR_CSLAB_x_train.shape)
print('x_test shape:', vAR_CSLAB_x_test.shape)

print('Build model...')
vAR_CSLAB_model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
vAR_CSLAB_model.add(Embedding(vAR_CSLAB_max_features,
                    vAR_CSLAB_embedding_dims,
                    input_length=vAR_CSLAB_maxlen))
vAR_CSLAB_model.add(Dropout(0.2))

# we add a Convolution1D, which will learn filters
# word group filters of size filter_length:
vAR_CSLAB_model.add(Conv1D(vAR_CSLAB_filters,
                 vAR_CSLAB_kernel_size,
                 padding='valid',
                 activation='relu',
                 strides=1))
# we use max pooling:
vAR_CSLAB_model.add(GlobalMaxPooling1D())

# We add a vanilla hidden layer:
vAR_CSLAB_model.add(Dense(vAR_CSLAB_hidden_dims))
vAR_CSLAB_model.add(Dropout(0.2))
vAR_CSLAB_model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
vAR_CSLAB_model.add(Dense(1))
vAR_CSLAB_model.add(Activation('sigmoid'))

vAR_CSLAB_model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
vAR_CSLAB_model.fit(vAR_CSLAB_x_train, vAR_CSLAB_y_train,
          batch_size=vAR_CSLAB_batch_size,
          epochs=vAR_CSLAB_epochs,
          validation_data=(vAR_CSLAB_x_test, vAR_CSLAB_y_test))

vAR_CSLAB_model.evaluate(vAR_CSLAB_x_test,vAR_CSLAB_y_test)

# ****************************************************************************************************************************
#   Disclaimer.

# We are providing this code block strictly for learning and researching, this is not a production
# ready code. We have no liability on this particular code under any circumstances; users should use
# this code on their own risk. All software, hardware and othr products that are referenced in these 
# materials belong to the respective vendor who developed or who owns this product.

# ****************************************************************************************************************************
  

Loading data...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
25000 train sequences
25000 test sequences
x_train shape: (25000, 400)
x_test shape: (25000, 400)
Build model...


[0.28475964069366455, 0.8788400292396545]

**Code Explanation**

The code is a script that trains a binary sentiment classification model using the IMDB dataset, which consists of movie reviews that have been labeled as positive or negative. The model uses a Convolutional Neural Network (CNN) architecture. The script is written in Python and uses the Keras library for building and training the model.

The code starts with importing several modules from the Keras library, including Sequential, Dense, Dropout, Activation, Embedding, Conv1D, GlobalMaxPooling1D, and pad_sequences. The from __future__ import print_function statement is a way of importing print statements from future versions of Python into the current version of Python.

The script then sets several hyperparameters that control the model's architecture and training process, such as the number of words to consider as features (vAR_CSLAB_max_features), the length of the input sequences (vAR_CSLAB_maxlen), the batch size (vAR_CSLAB_batch_size), the size of the word embeddings (vAR_CSLAB_embedding_dims), the number of filters in the Conv1D layer (vAR_CSLAB_filters), the size of the filters (vAR_CSLAB_kernel_size), the size of the hidden layer (vAR_CSLAB_hidden_dims), and the number of epochs to train the model (vAR_CSLAB_epochs).

The code then loads the IMDB dataset using the imdb.load_data function and pads the input sequences so that they all have the same length (vAR_CSLAB_maxlen).

The model architecture is defined next using the Keras Sequential model. The model starts with an embedding layer that maps the input sequences (word indices) into continuous vectors (word embeddings) with vAR_CSLAB_embedding_dims dimensions. The embedding layer is followed by a dropout layer that randomly sets a fraction of the input values to zero to prevent overfitting.


The next layer is a Conv1D layer that learns filters for word groupings in the input sequences. The Conv1D layer is followed by a global max pooling layer that takes the maximum value from each feature map generated by the Conv1D layer.

The model then has a dense hidden layer with vAR_CSLAB_hidden_dims units and a dropout layer, followed by an activation layer with a ReLU activation function.


The final layer is a dense output layer with a single unit and a sigmoid activation function, which generates a binary output indicating the sentiment of the input movie review.

The model is then compiled using the binary_crossentropy loss function, the Adam optimizer, and the accuracy metric. The model is trained using the training data (vAR_CSLAB_x_train and vAR_CSLAB_y_train) for vAR_CSLAB_epochs epochs with a batch size of vAR_CSLAB_batch_size.

Finally, the model is evaluated on the test data (vAR_CSLAB_x_test and vAR_CSLAB_y_test) and the evaluation results are displayed.