## Lab10. Deep Learning for NLP -I - Text Classification using Feed Forward Networks

In this tutorial, let's repeat what we did in Week 7 i.e., build a sentiment classification model. But this time we will build it using a deep feed forward neural architecture.

**This tutorial is graded**. complete the exercises and turn in under week 10.

### 1. What is Sentiment Analysis
Sentiment analysis (SA), formally known as opinion mining, is a natural language processing (NLP) task that involves determining and quantifying the emotional tone or sentiment expressed within a piece of text, typically written or spoken language. In simple terms, sentiment analysis aims to classify text into predefined categories that represent the sentiment or emotional polarity conveyed by the text. These categories are typically binary, classifying text as either *positive* or *negative*, but they can also be more fine-grained, such as *positive*, *negative*, or *neutral*.

We will use the IMDB Movie Review dataset again, which contains reviews and binary sentiment values

**Note:** I have already extracted and provided the training and test data in the form of CSV files.
- `train.csv` - Contains 80% of the IMDB data to be used for training classifiers.

- `test.csv` - Contains 20% of the IMDB data to be used for training classifiers.

Each CSV file has two columns

- **text** : containing the movie review
- **sentiment** : containing the original sentiment -- 0 representing negaitve and 1 representing positive

Let's load the data in dataframes and vectorize the sentences.


In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from keras.models import Sequential
from keras.layers import Dense

# Load training and testing data
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

# Extract input texts and labels
train_texts = train_data['text']
train_labels = train_data['label']
test_texts = test_data['text']
test_labels = test_data['label']

# Create CountVectorizer to convert text into count vectors. We will restrict the vocabulary size to 1000.

vectorizer = CountVectorizer(max_features = 1000)
X_train = vectorizer.fit_transform(train_texts)
X_test = vectorizer.transform(test_texts)

# Vocabulary size. This should come out to be 1000
vocab_size = len(vectorizer.get_feature_names_out())

## Design a Feed Forward Network using the Tensorflow

Let's design a neural network with the following configuration

1. Input Layer:
   - The input layer of the neural network is implicitly defined by the shape of the input data, which is the count vectors obtained from the text data using `CountVectorizer`.
   - Each input sample is represented as a count vector, where each element corresponds to the frequency of a particular word in the vocabulary.
   - Therefore, the input layer has `vocab_size` neurons, where `vocab_size` is the size of the vocabulary (i.e., the number of unique words in the text corpus).

2. Hidden Layer:
   - The hidden layer consists of 16 neurons (units) and uses the ReLU (Rectified Linear Unit) activation function.
   - Each neuron in the hidden layer takes the input from all `vocab_size` neurons of the input layer.
   - The output of each neuron in the hidden layer is computed by taking a weighted sum of the inputs followed by the ReLU activation function.
   - The ReLU activation function introduces non-linearity to the network, allowing it to learn complex patterns in the data.

3. Output Layer:
   - The output layer consists of a single neuron, which serves as the binary classifier's output.
   - It uses the sigmoid activation function, which squashes the output into the range [0, 1], effectively representing the probability of the input belonging to the positive class (in this case, class 1).
   - The output value closer to 1 indicates a higher probability of belonging to the positive class, while a value closer to 0 indicates a higher probability of belonging to the negative class (class 0).

In summary, the neural network architecture can be described as follows:
- Input layer: `vocab_size` neurons (input features)
- Hidden layer: 16 neurons with ReLU activation
- Output layer: Single neuron with sigmoid activation

The network learns to map the input count vectors to the binary labels (0 or 1) by adjusting the weights and biases of the connections between neurons during the training process, using the binary cross-entropy loss function and the Adam optimizer.

In [6]:
model = Sequential()
model.add(Dense(16, input_shape=(vocab_size,), activation='relu'))  # 16 units in the hidden layer
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))  # Output layer with sigmoid activation for binary classification

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 16)                16016     
                                                                 
 dense_5 (Dense)             (None, 8)                 136       
                                                                 
 dense_6 (Dense)             (None, 1)                 9         
                                                                 
Total params: 16161 (63.13 KB)
Trainable params: 16161 (63.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Training and evaluating the model using backpropagation


In [7]:
model.fit(X_train.toarray(), train_labels, epochs=50, batch_size=16, verbose=2)

# Evaluate the model on testing data
loss, accuracy = model.evaluate(X_test.toarray(), test_labels, verbose=0)
print(f'Accuracy on testing data: {accuracy * 100:.2f}%')

Epoch 1/50
100/100 - 1s - loss: 0.6385 - accuracy: 0.6475 - 1s/epoch - 13ms/step
Epoch 2/50
100/100 - 0s - loss: 0.4329 - accuracy: 0.8200 - 275ms/epoch - 3ms/step
Epoch 3/50
100/100 - 0s - loss: 0.2951 - accuracy: 0.8856 - 293ms/epoch - 3ms/step
Epoch 4/50
100/100 - 0s - loss: 0.2188 - accuracy: 0.9250 - 318ms/epoch - 3ms/step
Epoch 5/50
100/100 - 0s - loss: 0.1791 - accuracy: 0.9456 - 283ms/epoch - 3ms/step
Epoch 6/50
100/100 - 0s - loss: 0.1355 - accuracy: 0.9644 - 316ms/epoch - 3ms/step
Epoch 7/50
100/100 - 0s - loss: 0.0940 - accuracy: 0.9794 - 265ms/epoch - 3ms/step
Epoch 8/50
100/100 - 0s - loss: 0.0779 - accuracy: 0.9819 - 180ms/epoch - 2ms/step
Epoch 9/50
100/100 - 0s - loss: 0.0527 - accuracy: 0.9912 - 182ms/epoch - 2ms/step
Epoch 10/50
100/100 - 0s - loss: 0.0355 - accuracy: 0.9962 - 171ms/epoch - 2ms/step
Epoch 11/50
100/100 - 0s - loss: 0.0270 - accuracy: 0.9981 - 209ms/epoch - 2ms/step
Epoch 12/50
100/100 - 0s - loss: 0.0206 - accuracy: 0.9987 - 177ms/epoch - 2ms/step
Epo

# Exercise E1. Design a different neural network and compara the tesat accuracy with the network given in the example.

Design a Feed Forward Neural Network (FFNN) for text classification with the following architecture:

- Input Layer: vocab_size neurons (input features)
- Hidden Layer 1: 32 neurons with ReLU activation
- Hidden Layer 2: 16 neurons with ReLU activation
- Output Layer: Single neuron with sigmoid activation

Also change the batch size to 32 and train the system in simmilar manner as above. Write down your observations.