# Project 1.2 Sentiment analysis using the IMDB movie reviews

Example project of logical reasoning and programming within ML to solve a real-world problem. In this example, we'll use a deep learning model for sentiment analysis using the IMDB movie reviews dataset:

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from sklearn.metrics import accuracy_score, classification_report

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Data Preparation
max_features = 10000
max_length = 500
num_classes = 2

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = pad_sequences(x_train, maxlen=max_length)
x_test = pad_sequences(x_test, maxlen=max_length)

# Logical Reasoning: Model Selection and Design
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

# Programming: Model Compilation and Training
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)
model.fit(
    x_train,
    y_train,
    batch_size=32,
    epochs=5,
    validation_data=(x_test, y_test)
)

# Model Evaluation: Accuracy and Classification Report
y_pred = np.argmax(model.predict(x_test), axis=1)
accuracy = accuracy_score(y_test, y_pred)
classification_report = classification_report(y_test, y_pred)
print('Accuracy:', accuracy)
print('Classification Report:')
print(classification_report)


2023-06-01 13:12:00.672444: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
Metal device set to: Apple M1

systemMemory: 8.00 GB
maxCacheSize: 2.67 GB



2023-06-01 13:12:16.164357: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-06-01 13:12:16.164774: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Epoch 1/5


2023-06-01 13:12:19.511852: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


In this code:

Logical reasoning is applied by selecting a deep learning architecture suitable for sentiment analysis on the IMDB movie reviews dataset. The chosen architecture consists of an Embedding layer, LSTM layer, and dense layers with dropout.

Programming skills are utilized to implement the model using the TensorFlow and Keras libraries. The code defines a sequential model and adds layers to it, specifying the appropriate input shapes and activation functions.

The code prepares the data for training by loading the IMDB movie reviews dataset using imdb.load_data. The reviews are then padded to a maximum length using pad_sequences to ensure uniform input dimensions.

The model is compiled with a loss function, optimizer, and evaluation metric. It is then trained using the fit function, passing in the training data and specifying the batch size and number of epochs.

After training, the model is evaluated on the test data by predicting the sentiment labels and calculating the accuracy score. Additionally, a classification report is generated using the classification_report function from scikit-learn, which provides a detailed analysis of precision, recall, and F1 scores for each class.

Please note that this code is a simplified example and may require additional modifications based on your specific use case, such as preprocessing the text data, adjusting hyperparameters, or incorporating additional layers or techniques.