# TP 4: Training a Feedforward neural network
Master LiTL - 2021-2022

## Requirements
In this section, we will investigate variations of the setting and the hyper-parameter values of a feedforward NN, still on sentiment analysis on a French dataset of reviews.
Our goal is to find the best model for the task, we will thus make use of the development set this time! 

We will explore:    
* varied architectures
* varied optimizers
* varied activation functions
* varied values for the hyper-parameters

And in the part 2:
* varied representation: sparse and continuous bag-of-words
* Add plots of cost function
* Add plots of number of training examples 

In [None]:
import torch

# If you’re using Colab, allocate a GPU by going to Edit > Notebook Settings.
# We move our tensor to the GPU if available
if torch.cuda.is_available():
  print(f"GPU ok")
else:
  print("no gpu")


## 1. Code for running a FFNN

### 1.1 Read the data

The code below is the same as last time: the input is a BoW representation.

In [None]:
import pandas as pd
import numpy as np
import re
import sklearn
import torch
from torch.utils.data import TensorDataset, DataLoader

from sklearn.feature_extraction.text import CountVectorizer

train_path = "allocine_train.tsv"
dev_path = "allocine_dev.tsv"
test_path = "allocine_test.tsv"

# This will be the size of the vectors reprensenting the input
MAX_FEATURES = 5000 

# Load train set
train_df = pd.read_csv(train_path, header=0, delimiter="\t", quoting=3)
    
# -- VECTORIZE
print("Creating features from bag of words...")  
vectorizer = CountVectorizer( analyzer = "word", max_features = MAX_FEATURES ) 
train_data_features = vectorizer.fit_transform(train_df["review"])
# -- TO DENSE
x_train = train_data_features.toarray()
y_train = np.asarray(train_df["sentiment"])
print( "TRAIN:", x_train.shape )

dev_df = pd.read_csv(dev_path, header=0, delimiter="\t", quoting=3)
dev_data_features = vectorizer.transform(dev_df["review"])
x_dev = dev_data_features.toarray()
y_dev = np.asarray(dev_df["sentiment"])
print( "DEV:", x_dev.shape )

test_df = pd.read_csv(test_path, header=0, delimiter="\t", quoting=3)
test_data_features = vectorizer.transform(test_df["review"])
x_test = test_data_features.toarray()
y_test = np.asarray(test_df["sentiment"])
print( "TEST:", x_test.shape )

count_train = x_train.shape[0]

### 1.2 Load the data

Note that batch size is chosen here.

In [None]:
# Load data into TENSORS

def load_data( x_train, y_train, x_dev, y_dev, x_test, y_test, batch_size=1 ):
    #batch_size = 1 # == no batch
    # create Tensor dataset
    train_data = TensorDataset(torch.from_numpy(x_train).to(torch.float), torch.from_numpy(y_train))

    # dataloaders
    # make sure to SHUFFLE your data
    train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
    
    # Don t need batch at test time
    dev_data = TensorDataset(torch.from_numpy(x_dev).to(torch.float), torch.from_numpy(y_dev))
    dev_loader = DataLoader(dev_data, shuffle=True, batch_size=1)

    test_data = TensorDataset(torch.from_numpy(x_test).to(torch.float), torch.from_numpy(y_test))
    test_loader = DataLoader(test_data, shuffle=True, batch_size=1)

    return train_loader, dev_loader, test_loader

### 1.3 Neural Network Definition

Now we can build our learning model.

▶▶**What are the elements that can be changed here?**

In [None]:
import torch
import torch.nn as nn

class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        # calls the init function of nn.Module.  Dont get confused by syntax,
        # just always do it in an nn.Module
        super(FeedforwardNeuralNetModel, self).__init__()

        # Define the parameters that you will need. 
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # Non-linearity
        self.sigmoid = nn.Sigmoid()

        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)  

    def forward(self, x):
        # Linear function  # LINEAR
        out = self.fc1(x)

        # Non-linearity  # NON-LINEAR
        out = self.sigmoid(out) 

        # Linear function (readout)  # LINEAR
        out = self.fc2(out)
        return out

### 1.4 Training function

Below, the code for a function that trains a model.

▶▶**What are the elements that can be changed here?**

In [None]:
# TRAINING
def train( model, train_loader, optimizer, num_epochs=5 ):
    for epoch in range(num_epochs):
        train_loss, total_acc, total_count = 0, 0, 0
        for input, label in train_loader:
            # Step1. Clearing the accumulated gradients
            optimizer.zero_grad()

            # Step 2. Forward pass to get output/logits
            outputs = model( input )

            # Step 3. Compute the loss, gradients, and update the parameters by
            # calling optimizer.step()
            # - Calculate Loss: softmax --> cross entropy loss
            loss = criterion(outputs, label)
            # - Getting gradients w.r.t. parameters
            loss.backward()
            # - Updating parameters
            optimizer.step()

            # Accumulating the loss over time
            train_loss += loss.item()
            total_acc += (outputs.argmax(1) == label).sum().item()
            total_count += label.size(0)

        # Compute accuracy on train set at each epoch
        print('Epoch: {}. Loss: {}. ACC {} '.format(epoch, train_loss/count_train, total_acc/count_train))
        
        total_acc, total_count = 0, 0
        train_loss = 0

### 1.5 Evaluation

Below you have the code for a function that can be used to evaluate the model: it prints the classification report and return the gold and predicted labels.

In [None]:
from sklearn.metrics import classification_report, accuracy_score


def evaluate( model, dev_loader ):
    predictions = []
    gold = []

    with torch.no_grad():
        for input, label in dev_loader:
            probs = model(input)
            predictions.append( torch.argmax(probs, dim=1).cpu().numpy()[0] )
            gold.append(int(label))

    print(classification_report(gold, predictions))
    return gold, predictions

## 2. Runing an experiment

Below a function that could be used to save the results, don't hesitate to write your own or modify it.

In [None]:
import os
# Save the scores and settings
my_expe = 'scores_ffnn_bow.txt'

def write_expe_settings( my_file, batch=1, hidden=1, hsize=4, act='sigmoid', lr=0.1, opt='sgd', epochs=5, score=0. ):
  with open( my_file, 'a' ) as f:
    f.write( 'batch:{batch}\thidden:{hidden}\thsize:{hsize}\tact:{act}\tlr:{lr}\topt:{opt}\tepochs:{epochs}\tscore:{score}\n'.format( 
        batch=batch, hidden=hidden, hsize=hsize, act=act, lr=lr, opt=opt, epochs=epochs, score=score ) )

### TEST #1

Start testing! The first test is the one with the 'default' parameters used in the previous practical session.

▶▶**Describe the setting of the 'default' experiment**

▶▶ **Run the model and evaluate on dev. Save the score for future comparison.**

In [None]:
# Load data
train_loader, dev_loader, test_loader = load_data( x_train, y_train, x_dev, y_dev, x_test, y_test, batch_size=1 )

In [None]:
# Many choices here!
VOCAB_SIZE = MAX_FEATURES
input_dim = VOCAB_SIZE 
hidden_dim = 4
output_dim = 2

learning_rate = 0.1
num_epochs = 5

criterion = nn.CrossEntropyLoss()

# Initialization of the model
# the model uses cross-entropy as a loss function, finds the best
# parameters using stochastic gradient descent, and prints accuracy
# metrics
model_bow = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

optimizer = torch.optim.SGD( model_bow.parameters(), lr=learning_rate )

# Train and evaluate
# now we train the model by feeding it a batch of 1 training samples
# at a time. When we get to the end of the training set, we repeat the
# process, and we do this 5 times (epochs).
train( model_bow, train_loader, optimizer )

# We evaluate on dev set
gold, pred = evaluate( model_bow, dev_loader )
accuracy = accuracy_score( gold, pred )

In [None]:
write_expe_settings( my_expe, batch=1, hidden=1, hsize=4, act='sigmoid', lr=0.1, opt='sgd', epochs=5, score=accuracy )

## 3. Exercises

▶▶ **Now, try to change:**
1. Batch size 
2. Max number of epochs (with best batch size)
3. Size of the hidden layer
4. Activation function
5. Optimizer
6. Learning rate
7. Try with 1 additional layers 
 

How does this affect the loss and the performance of the model?

---
### 3.1 Batch size
---

Let's try with: 1, 10, 100, 1000

#### TEST #2

▶▶**Describe the setting of the second experiment**

In [None]:
batch_size = 10

# Load data
train_loader, dev_loader, test_loader = load_data( x_train, y_train, x_dev, y_dev, x_test, y_test, batch_size=batch_size )

# Hyper-parameters
hidden_dim = 4
output_dim = 2
learning_rate = 0.1
num_epochs = 5
criterion = nn.CrossEntropyLoss()

# Initialization of the model
model_bow = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
optimizer = torch.optim.SGD( model_bow.parameters(), lr=learning_rate )

# Train and evaluate
train( model_bow, train_loader, optimizer )
gold, pred = evaluate( model_bow, dev_loader )
accuracy = accuracy_score( gold, pred )

write_expe_settings( my_expe, batch=batch_size, hidden=1, hsize=4, act='sigmoid', lr=0.1, opt='sgd', epochs=5, score=accuracy )

#### TEST #3



#### TEST #4

---
### 3.2 Number of epochs
---

Try with: 5, 50

#### TEST #5



---
### 3.3 Size of the hidden layer
---

Try with: 4, 16, 128, (5000 with 5 epochs max)

---
### 3.4 Activation function
---
Try with Sigmoid, Tanh, ReLU

https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

---
### 3.5 Optimizer
---
Try SGD, RMSProp, Adam

https://pytorch.org/docs/stable/optim.html#module-torch.optim

---
### 3.6 Learning rate
---
Try with 0.1, 0.5, 0.001.

---
### 3.7 Number of hidden layers
---

Try with 1 and 2.