# TP 2: Linear Algebra and Feedforward neural network
Master LiTL - 2021-2022

## Requirements
In this section, we will go through some code to learn how to manipulate matrices and tensors, and we will take a look at some PyTorch code that allows to define, train and evaluate a simple neural network. 
The modules used are the the same as in the previous session, *Numpy* and *Scikit*, with the addition of *PyTorch*. They are all already available within colab. 

## Part 1: Linear Algebra

In this section, we will go through some python code to deal with matrices and also tensors, the data structures used in PyTorch.

Sources:    
* Linear Algebra explained in the context of deep learning: https://towardsdatascience.com/linear-algebra-explained-in-the-context-of-deep-learning-8fcb8fca1494
* PyTorch tutorial: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py
* PyTorch doc on tensors: https://pytorch.org/docs/stable/torch.html


## 1.1 Numpy arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type


In [None]:
import numpy as np

x = np.array([1,2])
print("Our input vector with 2 elements:\n", x)
print( "x shape:", x.shape)
print( "x data type", x.dtype)
# Give a list of elements
# a = np.array(1,2,3,4)    # WRONG
# a = np.array([1,2,3,4])  # RIGHT

# Generate a random matrix (with e generator, for reproducible results)
rng = np.random.default_rng(seed=42)
W = rng.random((3, 2))
print("\n Our weight matrix, of shape 3x2:\n", W)
print( "W shape:", W.shape)
print( "W data type", W.dtype)

# Bias, a scalar
b = 1

# Now, try to multiply
h = W.dot(x) + b
print("\n Our h layer:\n", h)
print( "h shape:", h.shape)
print( "h data type", h.dtype)

Our input vector with 2 elements:
 [1 2]
x shape: (2,)
x data type int64

 Our weight matrix, of shape 3x2:
 [[0.77395605 0.43887844]
 [0.85859792 0.69736803]
 [0.09417735 0.97562235]]
W shape: (3, 2)
W data type float64

 Our h layer:
 [2.65171293 3.25333398 3.04542205]
h shape: (3,)
h data type float64


In [None]:
# Useful transformations
h = h.reshape((3,1))
print("\n h reshape:\n", h)
print( "h shape:", h.shape)

h1 = np.transpose(h)
print("\n h transpose:\n", h1)
print( "h shape:", h1.shape)

h2 = h.T
print("\n h transpose:\n", h2)
print( "h shape:", h2.shape)

Wt = W.T
print("\nW:\n", W)
print("\nW.T:\n", Wt)


 h reshape:
 [[2.65171293]
 [3.25333398]
 [3.04542205]]
h shape: (3, 1)

 h transpose:
 [[2.65171293 3.25333398 3.04542205]]
h shape: (1, 3)

 h transpose:
 [[2.65171293 3.25333398 3.04542205]]
h shape: (1, 3)

W:
 [[0.77395605 0.43887844]
 [0.85859792 0.69736803]
 [0.09417735 0.97562235]]

W.T:
 [[0.77395605 0.85859792 0.09417735]
 [0.43887844 0.69736803 0.97562235]]


In [None]:
## numpy code to create identity matrix
import numpy as np
a = np.eye(4)
print(a)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


## 1.2 Tensors

For neural networks implementation in PyTorch, we use tensors: 
* a specialized data structure that are very similar to arrays and matrices
* used to encode the inputs and outputs of a model, as well as the model’s parameters
* similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing

### 1.2.1 Tensor initialization

In [None]:
import torch
import numpy as np

# Tensor initialization

## from data. The data type is automatically inferred.
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
print( "x_data", x_data)
print( "data type x_data=", x_data.dtype)

## from a numpy array
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print("\nx_np", x_np)
print( "data type, np_array=", np_array.dtype, "x_data=", x_np.dtype)

## from another tensor
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"\nOnes Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

## with random values
shape = (2, 3,) # shape is a tuple of tensor dimensions
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

x_data tensor([[1, 2],
        [3, 4]])
data type x_data= torch.int64

x_np tensor([[1, 2],
        [3, 4]])
data type, np_array= int64 x_data= torch.int64

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.0313, 0.8400],
        [0.7177, 0.9185]]) 

Random Tensor: 
 tensor([[0.0546, 0.0047, 0.5284],
        [0.1540, 0.3733, 0.2583]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


### 1.2.2 Tensor attributes

In [None]:
# Tensor attributes
tensor = torch.rand(3, 4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### 1.2.3 Move to GPU

In [None]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')
  print(f"Device tensor is stored on: {tensor.device}")
else:
  print("no gpu")

print(tensor)

no gpu
tensor([[0.0742, 0.3130, 0.0134, 0.2691],
        [0.8818, 0.9389, 0.1006, 0.9912],
        [0.1168, 0.3271, 0.7650, 0.9646]])


**If you’re using Colab, allocate a GPU by going to Edit > Notebook Settings.**

▶▶ **move to GPU, and re run last cells.**

In [None]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')
  print(f"Device tensor is stored on: {tensor.device}")
else:
  print("no gpu")

print(tensor)

NameError: ignored

### 1.2.4 Tensor operations

Doc: https://pytorch.org/docs/stable/torch.html

▶▶ **Look at slicing operations.**

In [None]:
# Tensor operations: similar to numpy arrays

tensor = torch.ones(4, 4)
print(tensor)

# ---------------------------------------------------------
# TODO: What do you expect?
# ---------------------------------------------------------
## Slicing
print("\nSlicing")
tensor[:,1] = 0 
print(tensor)

# ---------------------------------------------------------
# TODO: Change the first column with the value in l
# ---------------------------------------------------------
l =[1.,2.,3.,4.] 
l = torch.tensor( l )
tensor[:, 0] = l
print(tensor)


## Concatenation
print("\nConcatenate")
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

## Multiplication: element_wise
print("\nMultiply")
# This computes the element-wise product
t2 = tensor.mul(tensor)
print(f"tensor.mul(tensor) \n {t2} \n")
# Alternative syntax:
t3 = tensor * tensor
print(f"tensor * tensor \n {t3}")

## Matrix multiplication
t4 = tensor.matmul(tensor.T)
print(f"tensor.matmul(tensor.T) \n {t4} \n")
# Alternative syntax:
t5 = tensor @ tensor.T
print(f"tensor @ tensor.T \n {t5}")

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Slicing
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
tensor([[1., 0., 1., 1.],
        [2., 0., 1., 1.],
        [3., 0., 1., 1.],
        [4., 0., 1., 1.]])

Concatenate
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],
        [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],
        [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]])

Multiply
tensor.mul(tensor) 
 tensor([[ 1.,  0.,  1.,  1.],
        [ 4.,  0.,  1.,  1.],
        [ 9.,  0.,  1.,  1.],
        [16.,  0.,  1.,  1.]]) 

tensor * tensor 
 tensor([[ 1.,  0.,  1.,  1.],
        [ 4.,  0.,  1.,  1.],
        [ 9.,  0.,  1.,  1.],
        [16.,  0.,  1.,  1.]])
tensor.matmul(tensor.T) 
 tensor([[ 3.,  4.,  5.,  6.],
        [ 4.,  6.,  8., 10.],
        [ 5.,  8., 11., 14.],
        [ 6., 10., 14

The tensor is stored on CPU by default.

▶▶ **Initialize the tensor using *device='cuda'*: where are stored t1, ..., t5?**

In [None]:
# Tensor operations: similar to numpy arrays

tensor = torch.ones(4, 4, device='cuda')
print(tensor)

# ---------------------------------------------------------
# TODO: What do you expect?
# ---------------------------------------------------------
## Slicing
print("\nSlicing")
tensor[:,1] = 0 
print(tensor)

# ---------------------------------------------------------
# TODO: Change the first column with the value in l
# ---------------------------------------------------------
l =[1.,2.,3.,4.] 
l = torch.tensor( l )
tensor[:, 0] = l
print(tensor)


## Concatenation
print("\nConcatenate")
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

## Multiplication: element_wise
print("\nMultiply")
# This computes the element-wise product
t2 = tensor.mul(tensor)
print(f"tensor.mul(tensor) \n {t2} \n")
# Alternative syntax:
t3 = tensor * tensor
print(f"tensor * tensor \n {t3}")

## Matrix multiplication
t4 = tensor.matmul(tensor.T)
print(f"tensor.matmul(tensor.T) \n {t4} \n")
# Alternative syntax:
t5 = tensor @ tensor.T
print(f"tensor @ tensor.T \n {t5}")

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], device='cuda:0')

Slicing
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]], device='cuda:0')
tensor([[1., 0., 1., 1.],
        [2., 0., 1., 1.],
        [3., 0., 1., 1.],
        [4., 0., 1., 1.]], device='cuda:0')

Concatenate
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [2., 0., 1., 1., 2., 0., 1., 1., 2., 0., 1., 1.],
        [3., 0., 1., 1., 3., 0., 1., 1., 3., 0., 1., 1.],
        [4., 0., 1., 1., 4., 0., 1., 1., 4., 0., 1., 1.]], device='cuda:0')

Multiply
tensor.mul(tensor) 
 tensor([[ 1.,  0.,  1.,  1.],
        [ 4.,  0.,  1.,  1.],
        [ 9.,  0.,  1.,  1.],
        [16.,  0.,  1.,  1.]], device='cuda:0') 

tensor * tensor 
 tensor([[ 1.,  0.,  1.,  1.],
        [ 4.,  0.,  1.,  1.],
        [ 9.,  0.,  1.,  1.],
        [16.,  0.,  1.,  1.]], device='cuda:0')
tensor.matmul(tensor.T) 
 tensor([[

### 1.2.5 Exercise

▶▶ **Compute the tensor h, using the same data for x and W as at the beginning of this TP.**

```
x = np.array([1,2])
rng = np.random.default_rng(seed=42)
W = rng.random((3, 2))
```

In [None]:
# --------------------------------------------------------
# TODO: Write the code to compute h = W.x+b
# --------------------------------------------------------

# h = x.W + b
x = torch.tensor([1,2])
x = x.to( torch.float64) # be careful: using just 'float' here gives float32
## OR
#x = torch.tensor([1,2], dtype=float)
print("Our input vector with 2 elements:\n", x)
print( "x shape:", x.shape)
print( "x type:", x.dtype )

# Generate a random matrix (with e generator, for reproducible results)
rng = np.random.default_rng(seed=42)
W = rng.random((3, 2))
W_t = torch.from_numpy(W)
print("\n Our weight matrix, of shape 3x2:\n", W)
print( "W shape:", W_t.shape)
print( "W type:", W.dtype)

# Bias, a scalar
b = 1.0

# Now, try to multiply
h_t = W_t.matmul(x) + b
print("\n Our h layer:\n", h_t)
print( "h shape:", h_t.shape)

Our input vector with 2 elements:
 tensor([1., 2.], dtype=torch.float64)
x shape: torch.Size([2])
x type: torch.float64

 Our weight matrix, of shape 3x2:
 [[0.77395605 0.43887844]
 [0.85859792 0.69736803]
 [0.09417735 0.97562235]]
W shape: torch.Size([3, 2])
W type: float64

 Our h layer:
 tensor([2.6517, 3.2533, 3.0454], dtype=torch.float64)
h shape: torch.Size([3])


**Note:** when multiplying matrices, we need to have the same data type, e.g. not **x** with *int* and **W** with *float*.

In [None]:
# Other notes on tensors

## Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.
print(tensor, "\n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [2., 0., 1., 1.],
        [3., 0., 1., 1.],
        [4., 0., 1., 1.]], device='cuda:0') 

tensor([[6., 5., 6., 6.],
        [7., 5., 6., 6.],
        [8., 5., 6., 6.],
        [9., 5., 6., 6.]], device='cuda:0')


# Part 2: Feedforward Neural Network

In this practical session, we will explore a simple neural network architecture for NLP applications ; specifically, we will train a feedforward neural network for sentiment analysis, using the same dataset of reviews as in the previous session.  We will also keep the bag of words representation. 


Sources:
* This TP is inspired by a TP by Tim van de Cruys
* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/
* https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html
* https://medium.com/swlh/sentiment-classification-using-feed-forward-neural-network-in-pytorch-655811a0913f 
* https://www.deeplearningwizard.com/deep_learning/practical_pytorch/pytorch_feedforward_neuralnetwork/

## 2.1 Read and load the data

First, we need to understand how to use text data. Here we will keep the bag of word representation, as in the previous session. 

You can find different ways of dealing with the input data. The simplest solution is to use the DataLoader from PyTorch:    
* the doc here https://pytorch.org/docs/stable/data.html and here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
* an example of use, with numpy array: https://www.kaggle.com/arunmohan003/sentiment-analysis-using-lstm-pytorch






You can also find many datasets for text ready to load in pytorch on: https://pytorch.org/text/stable/datasets.html

#### 2.1.1 Build BoW vectors

The code below allows to use scikit methods you already know to generate the bag of word representation.

In [None]:
import pandas as pd
import numpy as np
import re
import sklearn

from sklearn.feature_extraction.text import CountVectorizer

# This will be the size of the vectors reprensenting the input
MAX_FEATURES = 5000 

# Load train and test set
train = pd.read_csv("allocine_train.tsv", header=0,
                    delimiter="\t", quoting=3)
test = pd.read_csv("allocine_test.tsv", header=0,
                   delimiter="\t", quoting=3)

print("Creating features from bag of words...")

# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.  
vectorizer = CountVectorizer(
    analyzer = "word",
    max_features = MAX_FEATURES
) 

# fit_transform() performs two operations; first, it fits the model
# and learns the vocabulary; second, it transforms our training data
# into feature vectors. The input to fit_transform should be a list of
# strings.
train_data_features = vectorizer.fit_transform(train["review"])

# output from vectorizer is a sparse array; our classifier needs a
# dense array
x_train = train_data_features.toarray()

# construct a matrix of two columns (one for positive class, one for
# negative class) where the correct class is indicated with 1 and the
# incorrect one with 0
y_train = np.asarray(train["sentiment"])

print( "TRAIN:", x_train.shape )
count_train = x_train.shape[0]

Creating features from bag of words...
TRAIN: (5027, 5000)


#### 2.1.2 Transform to tensors

Now we need to transform our data to tensors, to provide them as input to PyTorch.

* **torch.utils.data.TensorDataset(*tensors)**: Dataset wrapping tensors. Take tensors as inputs, obtained via **torch.from_numpy( an numpy array )**. Note: don't forget to transform tensor type to float, with **to(torch.float)** (or cryptic error saying it was expecting long...).
* **DataLoader**: 

```
DataLoader(
    dataset,
    batch_size=1,
    shuffle=False,
    num_workers=0,
    collate_fn=None,
    pin_memory=False,
 )
 ```


In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor dataset
train_data = TensorDataset(torch.from_numpy(x_train).to(torch.float), torch.from_numpy(y_train))

# dataloaders
batch_size = 1 #no batch, or batch = 1

# make sure to SHUFFLE your data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)

## 2.2 Neural Network

Now we can build our learning model.

For this TP, we're going to walk through the code of a simple feedforward neural network, with one hidden layer.

This network takes as input bag of words vectors, exactly as our 'classic' models: each review is represented by a vector of the size the number of tokens in the vocabulary with '1' when a word is present and '0' for the other words. 

▶▶ **What is the input dimension?** 

▶▶ **What is the output dimension?** 

In [None]:
import torch
import torch.nn as nn

class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function ==> W1
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # Non-linearity ==> g
        self.sigmoid = nn.Sigmoid()

        # Linear function (readout) ==> W2
        self.fc2 = nn.Linear(hidden_dim, output_dim)  

    def forward(self, x):
        '''
        y = g(x.W1+b).W2
        '''
        # Linear function  # LINEAR ==> x.W1+b
        out = self.fc1(x)

        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)
        out = self.sigmoid(out) 

        # Linear function (readout)  # LINEAR ==> y = h1.W2
        out = self.fc2(out)
        return out

▶▶ **What is the input dimension?** --> MAX FEATURES = 5000

▶▶ **What is the output dimension?** --> number of classes = 2

We need to set up the values for the hyper-parameters, and define the form of the loss and the optimization methods.

▶▶ **What is the hidden dimension?** 

In [None]:
# Many choices here!
VOCAB_SIZE = MAX_FEATURES
input_dim = VOCAB_SIZE 
hidden_dim = 4
output_dim = 2

learning_rate = 0.1
num_epochs = 5

criterion = nn.CrossEntropyLoss()

▶▶ **What is the hidden dimension?**  --> 4

In [None]:
# Initialization of the model
model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Training the network

▶▶ **Look at the loss after each training step.** 

In [None]:
# Start training
for epoch in range(num_epochs):
    train_loss, total_acc, total_count = 0, 0, 0
    for input, label in train_loader:

        # Clearing the accumulated gradients
        # torch *accumulates* gradients. Before passing in a
        # new instance, you need to zero out the gradients from the old
        # instance
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits 
        # = apply all our functions: y = g(x.W1+b).W2
        outputs = model( input )

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, label)

        # Getting gradients w.r.t. parameters
        # Here is the way to find how to modify the parameters in
        # order to lower the loss
        loss.backward()

        # Updating parameters
        optimizer.step()

        # -- a useful print
        # Accumulating the loss over time
        train_loss += loss.item()
        total_acc += (outputs.argmax(1) == label).sum().item()
        total_count += label.size(0)

    # Compute accuracy on train set at each epoch
    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch, train_loss/count_train, total_acc/count_train))
        
    total_acc, total_count = 0, 0
    train_loss = 0

Epoch: 0. Loss: 0.5302336635336692. ACC 0.7306544658842252 
Epoch: 1. Loss: 0.37574898928052036. ACC 0.834692659637955 
Epoch: 2. Loss: 0.3053986730219786. ACC 0.8708971553610503 
Epoch: 3. Loss: 0.2752747689059537. ACC 0.8874079968171872 
Epoch: 4. Loss: 0.2564912591494419. ACC 0.8961607320469465 


### Evaluate the model 

In [None]:
# ---------------------------------------------
# TODO: Process the Test data
# ---------------------------------------------

#test = pd.read_csv("allocine_test.tsv", header=0,
#                   delimiter="\t", quoting=3)

test_data_features = vectorizer.transform(test["review"])
x_test = test_data_features.toarray()
y_test = np.asarray(test["sentiment"])

print( "TEST:", x_test.shape )

# create Tensor datasets
valid_data = TensorDataset(torch.from_numpy(x_test).to(torch.float), torch.from_numpy(y_test))
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)

TEST: (544, 5000)


In [None]:
from sklearn.metrics import classification_report
predictions = []
gold = []

# Disabling gradient calculation is useful for inference, 
# when you are sure that you will not call Tensor.backward(). 
with torch.no_grad():
    for input, label in valid_loader:
        probs = model(input)
        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )
        gold.append(int(label))

print(classification_report(gold, predictions))

              precision    recall  f1-score   support

           0       0.88      0.89      0.88       272
           1       0.89      0.88      0.88       272

    accuracy                           0.88       544
   macro avg       0.88      0.88      0.88       544
weighted avg       0.88      0.88      0.88       544



## 3. Move to GPU

In [None]:
## 1- Define the device to be used

# CUDA for PyTorch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print(device)

cuda


In [None]:
## 2- No change here

import torch
import torch.nn as nn

class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function ==> W1
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # Non-linearity ==> g
        self.sigmoid = nn.Sigmoid()

        # Linear function (readout) ==> W2
        self.fc2 = nn.Linear(hidden_dim, output_dim)  

    def forward(self, x):
        '''
        y = g(x.W1+b).W2
        '''
        # Linear function  # LINEAR ==> x.W1+b
        out = self.fc1(x)

        # Non-linearity  # NON-LINEAR ==> h1 = g(x.W1+b)
        out = self.sigmoid(out) 

        # Linear function (readout)  # LINEAR ==> y = h1.W2
        out = self.fc2(out)
        return out

In [None]:
## 3- Move your model to the GPU

# Initialization of the model
model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

model = model.to(device)

In [None]:
## 4- Move your data to GPU

# Start training
for epoch in range(num_epochs):
    train_loss, total_acc, total_count = 0, 0, 0
    for input, label in train_loader:
        ## ------------ CHANGE HERE -----------------
        input = input.to(device)
        label = label.to(device)

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model( input )

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, label)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        # Accumulating the loss over time
        train_loss += loss.item()
        total_acc += (outputs.argmax(1) == label).sum().item()
        total_count += label.size(0)

    # Compute accuracy on train set at each epoch
    print('Epoch: {}. Loss: {}. ACC {} '.format(epoch, train_loss/count_train, total_acc/count_train))
        
    total_acc, total_count = 0, 0
    train_loss = 0

Epoch: 0. Loss: 0.41117278576548644. ACC 0.8093480934809348 
Epoch: 1. Loss: 0.33078268146318346. ACC 0.8663386633866339 
Epoch: 2. Loss: 0.3004303043457857. ACC 0.8712587125871258 
Epoch: 3. Loss: 0.25682551275738624. ACC 0.8970889708897088 
Epoch: 4. Loss: 0.22718892477737973. ACC 0.9122591225912259 


In [None]:
# -- 5- Again, move your data to GPU

#test = pd.read_csv("allocine_test.tsv", header=0,
#                   delimiter="\t", quoting=3)

test_data_features = vectorizer.transform(test["review"])
x_test = test_data_features.toarray()
y_test = np.asarray(test["sentiment"])

print( "TEST:", x_test.shape )

# create Tensor datasets
valid_data = TensorDataset(torch.from_numpy(x_test).to(torch.float), torch.from_numpy(y_test))
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)


from sklearn.metrics import classification_report
predictions = []
gold = []

with torch.no_grad():
    for input, label in valid_loader:
        ## ------------ CHANGE HERE -----------------
        input = input.to(device)
        

        probs = model(input)
        #Here, we need CPU: else, it will generate the following error
        # can't convert cuda:0 device type tensor to numpy. 
        # Use Tensor.cpu() to copy the tensor to host memory first.
        # (if we need a numpy array)
        predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )
        #print( probs )
        #print( torch.argmax(probs, dim=1) ) # Return the index of the max value
        #print( torch.argmax(probs, dim=1).cpu().numpy()[0] )
        gold.append(int(label))

print(classification_report(gold, predictions))

TEST: (544, 5000)
              precision    recall  f1-score   support

           0       0.88      0.67      0.76       272
           1       0.73      0.90      0.81       272

    accuracy                           0.79       544
   macro avg       0.80      0.79      0.79       544
weighted avg       0.80      0.79      0.79       544

