<a href="https://colab.research.google.com/github/Guliko24/CE807_Text_Analytics/blob/main/Lab08/lab08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 08- Building blocks of NN

In this notebook, we work on a toy NLP task. Following resources have been used in preparation of this notebook:
* ["Word Window Classification" tutorial notebook](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1204/materials/ww_classifier.ipynb) by Matt Lamm, from Winter 2020 offering of CS224N
* Official PyTorch Documentation on [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) by Soumith Chintala
* PyTorch Tutorial Notebook, [Build Basic Generative Adversarial Networks (GANs) | Coursera](https://www.coursera.org/learn/build-basic-generative-adversarial-networks-gans) by Sharon Zhou, offered on Coursera
* Official PyTorch Sentiment Analysis [Tutorial](https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html)

Before starting copy the `lab08.ipynb` into your GDrive in `CE807-24-SP/Lab08/` and copy [`train,csv`, `valid.csv`, `test.csv`] from the `lab05`



In [1]:
import torch
import torch.nn as nn

# Import pprint, module we use for making our print statements prettier
import pprint
pp = pprint.PrettyPrinter()

We are all set to start our tutorial. Let's dive in!

## Neural Network Module

So far we have looked into the tensors, their properties and basic operations on tensors. We will use predefined blocks in the `torch.nn` module of `PyTorch`. We will then put together these blocks to create complex networks. Let's start by importing this module with an alias so that we don't have to type `torch` every time we use it.

Think NN as Lego block and we will see different lego blocks and how to combine it.

In [2]:
import torch
import torch.nn as nn

### **Linear Layer**
We can use `nn.Linear(H_in, H_out)` to create a a linear layer. This will take a matrix of `(N, *, H_in)` dimensions and output a matrix of `(N, *, H_out)`. The `*` denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation `Ax+b`, where `A` and `b` are initialized randomly. If we don't want the linear layer to learn the bias parameters, we can initialize our layer with `bias=False`.

In [3]:
# Create the inputs
input = torch.ones(2,3,4)
# N* H_in -> N*H_out


# Make a linear layers transforming N,*,H_in dimensinal inputs to N,*,H_out
# dimensional outputs
linear = nn.Linear(4, 2) ##### here the 4 columns are linearly reduced to 2 and the new tensor is (2,2)
linear_output = linear(input)
print(input.shape)

print(linear_output.shape)

#in sentiment analysis , i can reduce a large vector into 2 classes

torch.Size([2, 3, 4])
torch.Size([2, 3, 2])


In [4]:
input

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

In [5]:
linear_output

tensor([[[ 0.2081, -0.4345],
         [ 0.2081, -0.4345],
         [ 0.2081, -0.4345]],

        [[ 0.2081, -0.4345],
         [ 0.2081, -0.4345],
         [ 0.2081, -0.4345]]], grad_fn=<ViewBackward0>)

In [51]:
list(linear.parameters()) #y =  Wx + b

#here it randomly created w and b

[Parameter containing:
 tensor([[ 0.2787, -0.4403,  0.1815,  0.2074],
         [-0.4668,  0.3468,  0.1456, -0.4095]], requires_grad=True),
 Parameter containing:
 tensor([-0.0191, -0.0506], requires_grad=True)]

Data of shape [batch_size, feature_dim] # 4
[batch_size, output_dim] # 2

linear layer of shape (feature_dim, output_dim)

### **Other Module Layers**
There are several other preconfigured layers in the `nn` module. Some commonly used examples are `nn.RNN`. `nn.LSTM`, `nn.Transformer`, `nn.embedding`, `nn.dropout`, `nn.Conv2d`, `nn.ConvTranspose2d`, `nn.BatchNorm1d`, `nn.BatchNorm2d`, `nn.Upsample` and `nn.MaxPool2d` among many others. You should explore these. For now, the only important thing to remember is that we can treat each of these layers as plug and play components.

### **Activation Function Layer**
We can also use the `nn` module to apply activations functions to our tensors. Activation functions are used to add non-linearity to our network. Some examples of activations functions are `nn.ReLU()`, `nn.Sigmoid()` and `nn.LeakyReLU()`. Activation functions operate on each element seperately, so the shape of the tensors we get as an output are the same as the ones we pass in.

In [7]:
linear_output

tensor([[[ 0.2081, -0.4345],
         [ 0.2081, -0.4345],
         [ 0.2081, -0.4345]],

        [[ 0.2081, -0.4345],
         [ 0.2081, -0.4345],
         [ 0.2081, -0.4345]]], grad_fn=<ViewBackward0>)

In [8]:
sigmoid = nn.Sigmoid()
output = sigmoid(linear_output)
output

tensor([[[0.5518, 0.3931],
         [0.5518, 0.3931],
         [0.5518, 0.3931]],

        [[0.5518, 0.3931],
         [0.5518, 0.3931],
         [0.5518, 0.3931]]], grad_fn=<SigmoidBackward0>)

### **Putting the Layers Together**
So far we have seen that we can create layers and pass the output of one as the input of the next. Instead of creating intermediate tensors and passing them around, we can use `nn.Sequentual`, which does exactly that.

In [9]:
block = nn.Sequential(
    nn.Linear(4, 2),
    nn.Sigmoid()
)

input = torch.ones(2,3,4)
output = block(input)
output

tensor([[[0.4831, 0.5242],
         [0.4831, 0.5242],
         [0.4831, 0.5242]],

        [[0.4831, 0.5242],
         [0.4831, 0.5242],
         [0.4831, 0.5242]]], grad_fn=<SigmoidBackward0>)

### Custom Modules

Instead of using the predefined modules, we can also build our own by extending the `nn.Module` class. For example, we can build a the `nn.Linear` (which also extends `nn.Module`) on our own using the tensor introduced earlier! We can also build new, more complex modules, such as a custom neural network. You will be practicing these in the later assignment.

To create a custom module, the first thing we have to do is to extend the `nn.Module`. We can then initialize our parameters in the `__init__` function, starting with a call to the `__init__` function of the super class. All the class attributes we define which are `nn` module objects are treated as parameters, which can be learned during the training. Tensors are not parameters, but they can be turned into parameters if they are wrapped in `nn.Parameter` class.

All classes extending `nn.Module` are also expected to implement a `forward(x)` function, where `x` is a tensor. This is the function that is called when a parameter is passed to our module, such as in `model(x)`.

In [10]:
class MultilayerPerceptron(nn.Module):

  def __init__(self, input_size, hidden_size, output_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptron, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size

    # Defining of our model
    # There isn't anything specific about the naming of `self.model`. It could
    # be something arbitrary.
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        nn.Linear(self.hidden_size, self.output_size),
    )

  def forward(self, x):
    output = self.model(x)
    return output

Here is an alternative way to define the same class. You can see that we can replace `nn.Sequential` by defining the individual layers in the `__init__` method and connecting the in the `forward` method.

**You should look explore this on your time. We will not use that for now. Very useful for later.**

In [54]:
class MultilayerPerceptronOther(nn.Module):

  def __init__(self, input_size, hidden_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptronOther, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size=output_size

    # Defining of our layers## when you do this way then modifying each layer can be done here although this is not the popular way of writing a code
    self.linear = nn.Linear(self.input_size, self.hidden_size)
    self.relu = nn.ReLU()
    self.linear2 = nn.Linear(self.hidden_size, self.output_size)
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    linear = self.linear(x)
    relu = self.relu(linear)
    linear2 = self.linear2(relu)


    output = self.sigmoid(linear2)
    return output

Now that we have defined our class, we can instantiate it and see what it does.

In [55]:
# Make a sample input
input = torch.randn(2, 5)#### ex.2 sentences with 5 vocabs each. in practice vectorizer will assign it
print(input)

# Create our model
model = MultilayerPerceptronOther(5, 3)

# view your model
print(model)

# Pass our input through our model
out= model(input)
print(out)

tensor([[-1.9230,  0.4782,  1.6176,  0.7878, -1.7590],
        [ 0.2071,  1.3469, -0.9991, -0.0467, -1.2783]])
MultilayerPerceptronOther(
  (linear): Linear(in_features=5, out_features=3, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=3, out_features=2, bias=True)
  (sigmoid): Sigmoid()
)
tensor([[0.5107, 0.4819],
        [0.4779, 0.4817]], grad_fn=<SigmoidBackward0>)


We can inspect the parameters of our model with `named_parameters()` and `parameters()` methods.

In [56]:
list(model.named_parameters())

[('linear.weight',
  Parameter containing:
  tensor([[ 0.0349, -0.3513, -0.0706,  0.1802, -0.0018],
          [-0.3174, -0.2573, -0.2898,  0.0943,  0.3882],
          [ 0.3254, -0.2324,  0.4276,  0.4453, -0.0173]], requires_grad=True)),
 ('linear.bias',
  Parameter containing:
  tensor([ 0.2166, -0.1114, -0.1002], requires_grad=True)),
 ('linear2.weight',
  Parameter containing:
  tensor([[ 0.3380, -0.2888,  0.5388],
          [ 0.3218,  0.4531, -0.0139]], requires_grad=True)),
 ('linear2.bias',
  Parameter containing:
  tensor([-0.0884, -0.0731], requires_grad=True))]

### Optimization
We have showed how gradients are calculated with the `backward()` function. Having the gradients isn't enought for our models to learn. We also need to know how to update the parameters of our models. This is where the optomozers comes in. `torch.optim` module contains several optimizers that we can use. Some popular examples are `optim.SGD` and `optim.Adam`. When initializing optimizers, we pass our model parameters, which can be accessed with `model.parameters()`, telling the optimizers which values it will be optimizing. Optimizers also has a learning rate (`lr`) parameter, which determines how big of an update will be made in every step. Different optimizers have different hyperparameters as well.

In [14]:
import torch.optim as optim

After we have our optimization function, we can define a `loss` that we want to optimize for. We can either define the loss ourselves, or use one of the predefined loss function in `PyTorch`, such as `nn.BCELoss()`. Let's put everything together now! We will start by creating some dummy data.

In [15]:
# Create the y data
y = torch.ones(10, 5)

# Add some noise to our goal y to generate our x
# We want out model to predict our original data, albeit the noise
x = y + torch.randn_like(y)
x

tensor([[ 0.2786,  1.8921,  0.7995, -1.2886,  0.0257],
        [-0.2344,  1.7342,  1.0817,  0.0888,  2.8343],
        [ 1.1760,  0.8958,  1.2854,  1.6205, -0.3008],
        [ 0.7822, -1.2502,  2.5650,  1.4337,  1.4476],
        [ 0.8445,  1.8184,  3.0637,  1.3483,  0.8513],
        [ 0.3399,  1.1210,  1.0093,  1.4824,  0.9893],
        [ 1.9022,  0.0725,  0.5891,  2.0505,  1.0934],
        [ 1.5009, -0.5597,  0.4042,  0.7613, -0.0666],
        [ 1.4914,  0.9149,  2.7942,  1.1217,  0.5490],
        [ 0.8789,  1.4303,  1.3690,  0.7378,  4.1539]])

Now, we can define our model, optimizer and the loss function.

In [16]:
# Instantiate the model
model = MultilayerPerceptronOther(5, 3)

# Define the optimizer
adam = optim.Adam(model.parameters(), lr=1e-1)

# Define loss using a predefined loss function
loss_function = nn.BCELoss()

# Calculate how our model is doing now
y_pred = model(x)
loss_function(y_pred, y).item()

0.6406251788139343

Let's see if we can have our model achieve a smaller loss. Now that we have everything we need, we can setup our training loop.

In [17]:
# Set the number of epoch, which determines the number of training iterations
n_epoch = 10

for epoch in range(n_epoch):
  # Set the gradients to 0
  adam.zero_grad()

  # Get the model predictions
  y_pred = model(x)

  # Get the loss
  loss = loss_function(y_pred, y)

  # Print stats
  print(f"Epoch {epoch}: traing loss: {loss}")

  # Compute the gradients
  loss.backward()

  # Take a step to optimize the weights
  adam.step()


Epoch 0: traing loss: 0.6406251788139343
Epoch 1: traing loss: 0.48480308055877686
Epoch 2: traing loss: 0.359578013420105
Epoch 3: traing loss: 0.255191445350647
Epoch 4: traing loss: 0.1731002926826477
Epoch 5: traing loss: 0.11153435707092285
Epoch 6: traing loss: 0.06772509217262268
Epoch 7: traing loss: 0.03912103548645973
Epoch 8: traing loss: 0.022084558382630348
Epoch 9: traing loss: 0.012538430280983448


You can see that our loss is decreasing. Let's check the predictions of our model now and see if they are close to our original `y`, which was all `1s`.

In [18]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.4208, -0.3934, -0.3607,  0.0577,  0.1004],
         [-0.5230, -0.6468, -0.3240, -0.2652, -0.0835],
         [ 1.2747,  0.5732,  0.8060,  0.5988,  1.2854]], requires_grad=True),
 Parameter containing:
 tensor([-0.1881, -0.6383,  1.2423], requires_grad=True),
 Parameter containing:
 tensor([[-0.5468,  0.1247,  1.2182],
         [ 0.3323,  0.1235,  0.5095],
         [ 0.4330,  0.5799,  1.2415],
         [ 0.2845,  0.7340,  1.2838],
         [ 0.3317,  0.3691,  1.1488]], requires_grad=True),
 Parameter containing:
 tensor([0.8807, 1.0150, 1.1187, 0.6302, 0.8505], requires_grad=True)]

In [19]:
# See how our model performs on the training data
y_pred = model(x)
y_pred

tensor([[0.9826, 0.9116, 0.9870, 0.9812, 0.9786],
        [0.9999, 0.9870, 0.9999, 0.9999, 0.9998],
        [0.9989, 0.9706, 0.9992, 0.9990, 0.9984],
        [0.9998, 0.9856, 0.9999, 0.9998, 0.9997],
        [1.0000, 0.9930, 1.0000, 1.0000, 0.9999],
        [0.9993, 0.9761, 0.9995, 0.9994, 0.9990],
        [0.9999, 0.9889, 0.9999, 0.9999, 0.9998],
        [0.9944, 0.9434, 0.9959, 0.9943, 0.9927],
        [0.9999, 0.9913, 1.0000, 1.0000, 0.9999],
        [1.0000, 0.9979, 1.0000, 1.0000, 1.0000]], grad_fn=<SigmoidBackward0>)

In [20]:
# Create test data and check how our model performs on it
x2 = y + torch.randn_like(y)
y_pred = model(x2)
y_pred

tensor([[0.9995, 0.9788, 0.9997, 0.9996, 0.9993],
        [0.8592, 0.8027, 0.8874, 0.8331, 0.8488],
        [0.9912, 0.9323, 0.9935, 0.9908, 0.9887],
        [0.9572, 0.8751, 0.9674, 0.9516, 0.9503],
        [0.9948, 0.9450, 0.9962, 0.9947, 0.9931],
        [0.9974, 0.9581, 0.9981, 0.9974, 0.9964],
        [0.9997, 0.9839, 0.9998, 0.9998, 0.9996],
        [0.9723, 0.9237, 0.9888, 0.9821, 0.9805],
        [0.9867, 0.9203, 0.9901, 0.9858, 0.9834],
        [0.9992, 0.9747, 0.9995, 0.9993, 0.9989]], grad_fn=<SigmoidBackward0>)

Great! Looks like our model almost perfectly learned to filter out the noise from the `x` that we passed in!

# Text Classification

Until this part of the notebook, we have learned the fundamentals of PyTorch and built a basic network solving a toy dataset. Now we will attempt to solve sentiment analysis, we will use part of IMDB dataset https://www.kaggle.com/datasets/columbine/imdb-dataset-sentiment-analysis-in-csv-format.

Here are the things we will learn:

1. Data: Creating a Dataset of Batched Tensors
2. Modeling
3. Training
4. Prediction

In this section, our goal will be to train a model that will predict the sentiment of a sentence, called `Sentiment Classification`.

Let's dive in!

## Data

In NLP tasks, the corpus would generally be a `.txt`, `.json` or `.csv` file where each row corresponds to a sentence or a tabular datapoint. We are using `.csv` format.

Before we start working on the code, let's mount the Gdrive and get path of the GDrive. Make sure that you have all files in the `Lab07` folder.

In [21]:
%load_ext autoreload
%autoreload 2

In [22]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [23]:
import os

# TODO: Fill in the Google Drive path where you uploaded the assignment
# Example: If you create a 2020FA folder and put all the files under Lab07 folder
# GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = 'Lab08'
GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = './CE807-24-SU/Lab08/'
GOOGLE_DRIVE_PATH = os.path.join('gdrive', 'MyDrive', GOOGLE_DRIVE_PATH_AFTER_MYDRIVE)
GOOGLE_DRIVE_DATA_PATH = os.path.join(GOOGLE_DRIVE_PATH, 'data')
print(os.listdir(GOOGLE_DRIVE_PATH))
print(os.listdir(GOOGLE_DRIVE_DATA_PATH))

['data', 'lab08.ipynb']
['valid.csv', 'test.csv', 'sample.csv', 'train.csv']


**Note the from last lab, instead of coping files in the current working directory, we are using semi-automatically getting the PATH and adding PATH in the while reading the files. This reduces the coping time.**

In [24]:
# Let's read the train file

import pandas as pd

train_file = os.path.join(GOOGLE_DRIVE_DATA_PATH, 'train.csv')
train_data = pd.read_csv(train_file)
train_data.shape

(40000, 2)

In [25]:
train_data.head(100)


Unnamed: 0,text,label
0,I grew up (b. 1965) watching and loving the Th...,0
1,"When I put this movie in my DVD player, and sa...",0
2,Why do people who do not know what a particula...,0
3,Even though I have great interest in Biblical ...,0
4,Im a die hard Dads Army fan and nothing will e...,1
...,...,...
95,...I saw this movie when it first came out in ...,1
96,"Released in December of 1957, Sayonara went on...",1
97,"War, Inc. - Corporations take over war in the ...",0
98,What is your freaking problem? Do you have not...,1


In [26]:
# For sake of simplicity let's use only 10% data points
train_data = train_data.sample(frac=0.1).reset_index(drop=True) # Shuffling and selecting 10% of data
train_data.shape

(4000, 2)

## Preprocessing

To make it easier for our models to learn, we usually apply a few preprocessing steps to our data. This is especially important when dealing with text data. Here are some examples of text preprocessing:
* **Tokenization**: Tokenizing the sentences into words.
* **Lowercasing**: Changing all the letters to be lowercase.
* **Noise removal:** Removing special characters (such as punctuations).
* **Stop words removal**: Removing commonly used words.

Which preprocessing steps are necessary is determined by the task at hand. For example, although it is useful to remove special characters in some tasks, for others they may be important (for example, if we are dealing with multiple languages). For our task, we will lowercase our words and tokenize.

**We are not going to apply any preprocessing steps. You should explore which steps in needed.**


In [27]:
import numpy
from sklearn.feature_extraction.text import CountVectorizer

In [28]:
count_vectorizer = CountVectorizer(stop_words='english',max_features=5000)
train_values = count_vectorizer.fit_transform(train_data['text'].values)
train_labels = train_data['label'].values

In [29]:
type(train_values), type(train_labels)

(scipy.sparse._csr.csr_matrix, numpy.ndarray)

Pytorch needs data into the `Tensor`. Let's convert it.
Remember, text representation (here, Conunt Vectorized) needs to be in `float` and label in `int` format.

In [30]:
train_values = torch.tensor(train_values.toarray()).float()

train_labels = torch.tensor(train_labels)


## Batching Sentences

We have learned about batches in class. Waiting our whole training corpus to be processed before making an update is constly. On the other hand, updating the parameters after every training example causes the loss to be less stable between updates. To combat these issues, we instead update our parameters after training on a batch of data. This allows us to get a better estimate of the gradient of the global loss. In this section, we will learn how to structure our data into batches using the `torch.util.data.DataLoader` class.

We will be calling the `DataLoader` class as follows: `DataLoader(data, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)`.  The `batch_size` parameter determines the number of examples per batch. In every epoch, we will be iterating over all the batches using the `DataLoader`. The order of batches is deterministic by default, but we can ask `DataLoader` to shuffle the batches by setting the `shuffle` parameter to `True`. This way we ensure that we don't encounter a bad batch multiple times.

**Note:** We are not exploring `collate_fn` you must explore that. It would be very useful and needed in the excercise/next labs.

If provided, `DataLoader` passes the batches it prepares to the `collate_fn`. We can write a custom function to pass to the `collate_fn` parameter in order to print stats about our batch or perform extra processing. In our case, we will not use the `collate_fn`, you should explore how `collate_fn` works and will be useful in the following labs.

In [31]:
from torch.utils.data import Dataset, DataLoader, TensorDataset

In [32]:
train_dataset = TensorDataset(train_values, train_labels)

train_loader = DataLoader(train_dataset, batch_size=8,shuffle=True)

For each training example we have, we should also have a corresponding sentiment label. Recall that the goal of our model was to determine sentiment of a give sentence.

Now, we can see the `DataLoader` in action.

In [33]:
for i, batch in enumerate(train_loader):
  inputs = batch[0]
  labels = batch[1]
  print(i,inputs.shape, labels.shape)

  print("Batched Input:")
  print(inputs)
  print("Batched Labels:")
  print(labels)

  # Let's see only 3 batchs
  if i > 3:
    break

0 torch.Size([8, 5000]) torch.Size([8])
Batched Input:
tensor([[0., 0., 1.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
Batched Labels:
tensor([0, 1, 0, 0, 0, 0, 0, 0])
1 torch.Size([8, 5000]) torch.Size([8])
Batched Input:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
Batched Labels:
tensor([1, 0, 0, 1, 1, 1, 1, 0])
2 torch.Size([8, 5000]) torch.Size([8])
Batched Input:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0.,

The batched input tensors you see above will be passed into our model.

## Model

Now that we have prepared our data, we are ready to build our model. We have learned how to write custom `nn.Module` classes. We will do the same here and put everything we have learned so far together.

In [34]:
class MultilayerPerceptron(nn.Module):

  def __init__(self, input_size, hidden_size, output_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptron, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size

    # Defining of our model
    # There isn't anything specific about the naming of `self.model`. It could
    # be something arbitrary.
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        # nn.Linear(self.hidden_size, self.hidden_size),
        # nn.ReLU(),
        nn.Linear(self.hidden_size, self.output_size),

    )

  def forward(self, x):
    output = self.model(x)
    return output

We need to perform processing on the GPU, so let's check we have GPU access or not. If output is `cpu`, go to `RunTime` and change it to `GPU`

In [35]:
import torch

if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")
device

device(type='cuda')

## Training

We are now ready to put everything together. Let's start with preparing our data and intializing our model. We can then intialize our optimizer and define our loss function.

In [36]:
def prepare_dataset(PATH, file_name, sample_flag=False,count_vectorizer=None):
  # Prepare the data
  file_path = os.path.join(PATH, file_name)
  data = pd.read_csv(file_path)
  if sample_flag:
    # For sake of simplicity let's use only 10% data points
    data = data.sample(frac=0.1).reset_index(drop=True) # Shuffling dataset

  if count_vectorizer == None:
    count_vectorizer = CountVectorizer(stop_words='english',max_features=5000)
    values = count_vectorizer.fit_transform(data['text'].values) #TODO: This is the best way to do this, because you need to use same vectorization menthod
  else:
    values = count_vectorizer.transform(data['text'].values)

  labels = data['label'].values

  # Convert into Tensor
  values = torch.tensor(values.toarray()).float()
  labels = torch.tensor(labels)

  dataset = TensorDataset(values, labels)
  input_size = values.shape[1]
  return dataset, input_size, count_vectorizer

Unlike our earlier example, this time instead of passing all of our training data to the model at once in each epoch, we will be utilizing batches. Hence, in each training epoch iteration, we also iterate over the batches.

In [37]:
train_dataset, input_size, count_vectorizer = prepare_dataset(GOOGLE_DRIVE_DATA_PATH, 'train.csv',sample_flag=True)

batch_size = 16
shuffle = True

# Instantiate a DataLoader
train_loader = DataLoader(train_dataset, batch_size=batch_size,shuffle=shuffle) # In training always make shuffle True



Now we initialize the Model and Optimizers

In [38]:
# Number of Epochs
epochs = 5

# Define loss using a predefined loss function
loss_function = nn.CrossEntropyLoss()

# Initialize a model
input_size = input_size # TODO: This needs to be automatically get using the train dataset
hidden_size = 5000
output_size = 2 # TODO: This needs to be automatically get using the train dataset

model = MultilayerPerceptron(input_size=input_size,hidden_size=hidden_size, output_size=output_size )
model = model.to(device)

# Define the optimizer
lr = 3e-4
# Remember your optimizer has to come after model definition
optimizer = optim.Adam(model.parameters(), lr=lr)


We could print the model to see it's *structure*

In [39]:
print(model)

MultilayerPerceptron(
  (model): Sequential(
    (0): Linear(in_features=5000, out_features=5000, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5000, out_features=2, bias=True)
  )
)


We could print the model to see it's *parameters*

In [40]:
list(model.named_parameters())

[('model.0.weight',
  Parameter containing:
  tensor([[-0.0110,  0.0088, -0.0136,  ...,  0.0078, -0.0126,  0.0100],
          [ 0.0057,  0.0072,  0.0048,  ...,  0.0076, -0.0056,  0.0085],
          [-0.0069,  0.0050,  0.0086,  ...,  0.0007, -0.0077,  0.0082],
          ...,
          [ 0.0033,  0.0028, -0.0010,  ..., -0.0030,  0.0113, -0.0043],
          [-0.0070,  0.0122, -0.0116,  ..., -0.0071, -0.0101,  0.0093],
          [-0.0111, -0.0036,  0.0052,  ..., -0.0098, -0.0125,  0.0099]],
         device='cuda:0', requires_grad=True)),
 ('model.0.bias',
  Parameter containing:
  tensor([ 0.0052, -0.0036, -0.0086,  ...,  0.0097, -0.0134,  0.0124],
         device='cuda:0', requires_grad=True)),
 ('model.2.weight',
  Parameter containing:
  tensor([[-0.0091, -0.0096,  0.0063,  ...,  0.0093,  0.0064, -0.0026],
          [-0.0060, -0.0089, -0.0026,  ..., -0.0036,  0.0125,  0.0061]],
         device='cuda:0', requires_grad=True)),
 ('model.2.bias',
  Parameter containing:
  tensor([-0.0016,  

Training Contained 2 loops


*   Batch Processing: Take one batch on input and update the model parameters
*   Epoch Processing: All traning data needs to be processed `n` number of epochs

In practice, we sum all loss in an epoch and monitor it in each epochs. In practice, our training loss decreases as training progresses.



In [41]:
# Function that will be called in every epoch
def train_epoch(loss_function, optimizer, model, data_loader):

  # Keep track of the total loss for the batch
  total_loss = 0
  for i, batch in enumerate(data_loader):
    # We could move whole data to GPU but it will take lots of space so only move one batch at a time
    batch_inputs = batch[0].to(device)
    batch_labels = batch[1].to(device)
    # print(batch_inputs.shape)
     # Clear the gradients
    optimizer.zero_grad()
    # Run a forward pass
    outputs = model.forward(batch_inputs)
    # Compute the batch loss
    loss = loss_function(outputs, batch_labels)
    # Calculate the gradients
    loss.backward()
    # Update the parameteres
    optimizer.step()
    total_loss += loss.item()

  return total_loss

In [42]:
# Function containing our main training loop
def train(loss_function, optimizer, model, data_loader, num_epochs=5):

  # Iterate through each epoch and call our train_epoch function
  for epoch in range(num_epochs):
    epoch_loss = train_epoch(loss_function, optimizer, model, data_loader)
    print('epoch_loss', epoch_loss)

Let's start training!

In [43]:
print('Start')
# print(list(model.parameters()))

train(loss_function, optimizer, model, train_loader, num_epochs=epochs)

print('End')

Start
epoch_loss 105.95724188536406
epoch_loss 24.900978152640164
epoch_loss 4.912369846919319
epoch_loss 0.8976526413898682
epoch_loss 0.3457113619078882
End


## Prediction

Let's see how well our model is at making predictions. We can start by creating our test data.

In [44]:
# Load test Data
test_dataset, c , _ = prepare_dataset(GOOGLE_DRIVE_DATA_PATH, 'test.csv',sample_flag=True,count_vectorizer=count_vectorizer)

# Instantiate a DataLoader
# in practice you need to keep shuffle false at the testing time, so that you could easily match data with predicted label
test_loader = DataLoader(test_dataset, batch_size=8,shuffle=False)

In [45]:
# Model in evaluation state
model.eval()

MultilayerPerceptron(
  (model): Sequential(
    (0): Linear(in_features=5000, out_features=5000, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5000, out_features=2, bias=True)
  )
)

Let's loop over our test examples to see how well we are doing.

In [46]:
all_gt_labels = []
all_predict_labels = []

for i, batch in enumerate(test_loader):
    batch_inputs = batch[0].to(device)
    batch_labels = batch[1].to(device)

    outputs = model.forward(batch_inputs) # Remember you have probability here, need to get labels
    out_labels = torch.argmax(outputs, dim=1).detach()

    # Let's save both GT and Prected labels for the Acuracy, F1-Score Calculation
    all_gt_labels.extend(batch_labels.tolist())
    all_predict_labels.extend(out_labels.tolist())
    # print(out_labels)
    # break

In [47]:
len(all_gt_labels), len(all_predict_labels)

(500, 500)

In [48]:
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score

In [49]:
test_f1 = f1_score(all_gt_labels, all_predict_labels, average='macro')
test_acc = accuracy_score(all_gt_labels, all_predict_labels)

print("F1 Score : ", test_f1)
print("Accuracy Score : ", test_acc)

F1 Score :  0.8555469953775038
Accuracy Score :  0.856


In [50]:
print('Confusion Matrix')
print(confusion_matrix(all_gt_labels, all_predict_labels))

Confusion Matrix
[[228  33]
 [ 39 200]]


# Excercise



*   **Add Extra Hidden Layer**: Currently we have only one hidden/linear layer in the model. Modify it, by adding one more hidden/linear layer. You might have to incorporate another activation function also.
*   **Use Of Validation Set**: We have not used Validation set you should use validation set and monitors loss over epochs and stop training when loss keep increasing for continious 3 epochs
*   **Save output**: In practice, we need each test sentences and it's predicted output and ground truth label. Find a way to save model's output in a `csv` file having columns `text`, `label` (for ground truth) and `prediction` (for predcited output).
*   **Loss vs Epoch Plot**: It is always good practice to plot training and validation loss in a plot with respect to epoch to visually analyze the training process. Plot it, where epoch is X-axis and loss is Y-axis
*   **Using Embedding Layer**: Current setup in not optimal because vocabulary size is large and needs a very large projection. We could reduce this by adding an Embedding layer. Incorporate Embedding layer into your code. Emedding could learned by two different ways
  * Learn from scratch
  * Use pre-trained embedding like `word2vec`

  Implement both, take inspiration from https://colab.research.google.com/drive/1WUy4G2SsoLelrZDkO2I0v9tHx9x27NJK?usp=sharing#scrollTo=4wv875jUYtBD




**Note that all excercises are very important and will help you in the Assignment 2. I will ask submission in particular format and these help you in getting that.**


NOTE: here countvectorizer was used for preprocessing the data. You CANNOT use countvectrize in RNN.

in the above examples you could use word2vec, check the above link


use validation set to check the model. if the loss increasing in 3 epochs of the validation set then stop the training.

introduce a line of code whihc says if the next epoch loss is bigger then the previous one then stop the loop.

another example of problem is when performance start falling after each epoch on teh validation test then you overfit the training model.