# **Text classification in Python using Tensorflow**
## **Written by:** Aarish Asif Khan
## **Date:** 13 January 2024

> ## **Fetching and creating a model on the IMDB dataset**

In [1]:
from tensorflow.keras.datasets import imdb

# Load the IMDb dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In this example, num_words=10000 means you want to keep only the top 10,000 most frequently occurring words in the dataset, discarding less common words. Adjust this parameter based on your specific needs.

After loading the data, x_train and x_test contain lists of movie reviews, where each review is a list of integers representing word indices. y_train and y_test contain the corresponding sentiment labels (0 for negative, 1 for positive).

Keep in mind that the IMDb dataset from Keras is already preprocessed, and the reviews are converted into sequences of word indices.

> **If you want to convert the indices back to words for readability, you can use the following code snippet:**

In [2]:
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in x_train[0]])
print(decoded_review)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for th

> # **Creating a ML model with IMDB dataset**

Here's a basic example of how you can create a simple neural network model for sentiment analysis using the IMDb dataset in Python with TensorFlow/Keras:

In [3]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Load the IMDb dataset
max_features = 10000  # Only consider the top 10,000 words
maxlen = 500  # Cut reviews after 500 words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Preprocess the data
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Create the model
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
batch_size = 64
epochs = 3
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")




Epoch 1/3


Epoch 2/3
Epoch 3/3
Test Loss: 0.3249, Test Accuracy: 0.8762


> **Explaining the code above**

Loading Data: We load the IMDb dataset and limit it to the top 10,000 words (max_features). We also limit the length of each review to 500 words (maxlen).

Preprocessing Data: We pad sequences to make sure they all have the same length.

Creating the Model:

We use an Embedding layer to convert word indices to vectors.
An LSTM layer processes the sequence of vectors.
Finally, a Dense layer with a sigmoid activation function is used for binary classification (positive or negative sentiment).
Compiling the Model: We specify the optimizer, loss function, and metrics.

Training the Model: We train the model on the training data and validate it on the test data.

Evaluating the Model: We evaluate the model on the test set and print the test loss and accuracy.

This is a basic example, and you may need to adjust hyperparameters, model architecture, or use more advanced techniques depending on your specific requirements and the performance you aim to achieve.