In [2]:
import numpy as np
import torch
import torch.utils.data
import scipy
import torch.nn.functional as F
import datasets
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import TfidfVectorizer
datasets.logging.set_verbosity_error()

  from .autonotebook import tqdm as notebook_tqdm


# Exercise: Shallow neural networks

In this exercise session, you will be defining and training a multiclass logistic regression model (a.k.a. softmax regression or multinomial logit) as a simple neural net in PyTorch. First, we will set up the model. Then, we will load the `tweet_eval` dataset and turn it into a Torch dataset (the type of data input that PyTorch can work with). Next, we will train our network on the data using a variant of gradient descent called Adam. Finally, we will evaluate the model's performance.

We will provide you with code snippets to help you get started. Where you see a `FILLINTHEBLANK` in the code, or a line that ends in a `=` that is where you complete the code to make it functional.

Note: in order to have this notebook display the Figure in problem 5, you should also download the fil `Net for DLI exercise.drawio.png` from Absalon and store it in the same folder as this notebook.

# 1. Define the PyTorch model

Finish the definition of this model.

1. Tthe `__init__` method of your model you should define all the layers you are going to use. PyTorch provides a large amount of commonly used layers that are very easy to use. Please refer to the [documentation of PyTorch](https://pytorch.org/docs/stable/nn.html) for a complete list of layers. Here, the `__init__` method should contain one layer that linearly combines the inputs, and then a softmax activation function, which will produce the outputs.
2. In the forward pass, the model should do the following:
 - compute the z values of the node (linear combinations of the inputs, using the [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) neural net building block). There should be as many outputs as classes here (since the probability of each class gets to be based on its own linear combination of the features)
 - pass them through the [Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) activation function with `dim=1` to normalize them (squeeze the probabilities of all output classes so that they sum to one)
 - return the outputs
3. Try initializing and inspecting the model as a toy_model instance. It should have 4 features and 3 classes.

In [3]:

class LogisticRegression(torch.nn.Module): # With Pytorch, creating a new model means creating a new class.

    # initializing the model with a certain number of input features
    # and output classes
    def __init__(self, n_features, n_classes):
        super().__init__()

        # creating a layer that applies a linear transformation
        # to the incoming data: z = wT * x (here x is input tensor and z will be output)
        self.linear = torch.nn.Linear(n_features, n_classes) # layer applying linear transformation to input

        # creating a softmax activation function
        self.activ = torch.nn.Softmax(dim=1) # Creating a softmax-activation - normalizing output to have sum=1

    # you have to define the forward() method which will specify the forward propagation:
    # how the input values get to the next layer(s)
    def forward(self, inputs):

        # compute the linear z-values for the layer
        z = self.linear(inputs) # z-values are computed using the linear transformation on the input

        # pass them through the softmax activation function
        outputs = self.activ(z) # Outputs are computed using the softmax activation on the z-values

        return outputs

This model is a logistic regression model that takes in input data, applies a linear transformation, and passes it through a softmax activation function to predict the probability of an instance belonging to a class.

In [4]:
# Setting number of input features and classes
n_features=4
n_classes=3

# Initializing a model refering to the class LogisticRegression defined above. 
toy_model = LogisticRegression(n_features,n_classes)

In [5]:
# Inspecting the toy_model

print('The definition of the toy_model is: \n \n',toy_model) # Printing the definition of the model

toy_model_dict=toy_model.state_dict() # creating a dictionary containing names and values of model parameters

print('\nThe parameters of the model: ')
for param in toy_model_dict.keys():
    print('\nThe parameter: {} has shape: {}'.format(param,toy_model_dict[param].shape))

The definition of the toy_model is: 
 
 LogisticRegression(
  (linear): Linear(in_features=4, out_features=3, bias=True)
  (activ): Softmax(dim=1)
)

The parameters of the model: 

The parameter: linear.weight has shape: torch.Size([3, 4])

The parameter: linear.bias has shape: torch.Size([3])


# 2. Prepare the *Tweet_eval* dataset

This is a data set of tweets that are hand-coded as either negative, neutral or positive

## Load the data set

1. Have a look at the paper by Rosenthal et al. below. How were the tweets selected? How were they annotated?
2. Load the dataset (train and validation splits) from the huggingface library.
3. Have a look at the dataset on [huggingface's website](https://huggingface.co/datasets/tweet_eval/viewer/sentiment/train), to get a sense of what's in it.  Also have a look at the size of the training and validation sets, and do a count of the number of training samples with each label, to see whether the data is balanced (all classes represented evenly) or unbalanced.

The sentiment task is described in this paper:

> Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, Vancouver, Canada. Association for Computational Linguistics. [https://aclanthology.org/S17-2088/](https://aclanthology.org/S17-2088/)



**Selection and annotation of tweets**

English-language tweets were collected if their content was related to popular current events that were trending on Twitter. 

Tweets were annotated for sentiment on a 2-point, 3-point and 5-point scale. Annotators both annotated overall sentiment and sentiment towards a topic. All tweet annotations were performed through Crowdflower. 

The data on Huggingface contains strings from tweets and sentiment scores on a 3-point scale.

In [6]:
# Loading traing and validation data 
train = datasets.load_dataset('tweet_eval', 'sentiment', split='train')
val = datasets.load_dataset('tweet_eval', 'sentiment', split='validation')

In [7]:
# Inspecting the size of the loaded data:
print('Number of tweets in training data:' ,len(train))
print('Number of tweets in validation data:', len(val))

# Inspecting if data is balanced by counting labels for each class  
print('\nNumber of tweets labeled with 0: ',(val['label']+train['label']).count(0)) 
print('Number of tweets labeled with 1: ',(val['label']+train['label']).count(1))
print('Number of tweets labeled with 2: ',(val['label']+train['label']).count(2)) 

Number of tweets in training data: 45615
Number of tweets in validation data: 2000

Number of tweets labeled with 0:  7405
Number of tweets labeled with 1:  21542
Number of tweets labeled with 2:  18668


## Turn it to Torch tensors

1. Vectorize the tweet texts with the TfIDF vectorizer from sklearn
2. Convert this data to Torch tensors. Note that the output of the sklearn vectorizer is not numpy arrays but scipy matrices, which can be converted with `toarray()` method. Next, you can convert those to torch vectors using the torch[from_numpy](https://pytorch.org/docs/stable/generated/torch.from_numpy.html) method. Finally, you will need to convert the feature tensors to float type with `float()` method.

Labels are lists, and so can be converted to numpy arrays with `np.array(mylist)`. Then we also need to turn those into torch vectors. 

If your computer is struggling with the conversion, simply reduce the amount of training data to a slice (e.g. first 10K examples).


In [8]:
# Vectorizing the data with TF-IDF corpus
vectorizer = TfidfVectorizer()  # the default ngram range is (1,1)

train_corpus = train["text"][:10000] # creating training corpus and subsetting to only 10k tweets
train_labels = train["label"][:10000] # Defining labels for subset of 10K tweets
train_features = vectorizer.fit_transform(train_corpus) # Vectorizing text and learning features for training data 

val_corpus = val["text"] # Creating corpus and labels for validation set
val_labels = val["label"]
val_features = vectorizer.transform(val_corpus)# vectorizing text in validation set 

In [9]:
# Converting the vectorizer to Torch tensors

x_train = torch.from_numpy(train_features.toarray()).float() # toarray() makes features np.arrays and torch.from_numpy makes these torch.tensors
y_train = torch.from_numpy(np.array(train_labels))
x_test = torch.from_numpy(val_features.toarray()).float()
y_test = torch.from_numpy(np.array(val_labels))

print(x_train.size(), x_test.size()) # Printing size of test data note that we get size of train and test data and of vocabulary/features 
print(y_train.size(), y_test.size()) # The number of features are the same for train and test data since we only fitted the vectorizer to train data 

torch.Size([10000, 18484]) torch.Size([2000, 18484])
torch.Size([10000]) torch.Size([2000])


## Turn the data into Torch Datasets

We're still not done with the data preparation! The canonical way to handle data in PyTorch is with objects of the [Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.Dataset) class. More specifically, we are going to define our own subclass `TweetEvalData` that is a special case of the `Dataset` class. There is an official PyTorch [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).

1. You are provided with the code for the class `TweetEvalData` (a subclass of the torch Dataset class) that can contain our `tweet_eval` data. Fill in the parts of the new class definition where provide the features and labels. Next, you will need to complete the code that creates two instances of that class: one for the train data and one for the validation data. The features and labels must come in the form of torch tensors. Luckily, you have just done that in the previous step!

In [10]:
# Making a subclass of the torch Datasets class
# You don't need to modify this code; just take a look at
# what it does and run it

class TweetEvalData(torch.utils.data.Dataset): # initiating a class for representing data with three methods

    def __init__(self, X, y): # describes how the dataset is initialized; the arguments (when initializing) are the features and labels
        self.X = X #any instance of the TweetEvalData class will have an attribute X that contains its features 
        self.y = y #same with an attribute y that contains its labels

    def __getitem__(self, index): # getitem allows to retrieve a datapoint from the dataset by its index.
        X = self.X[index] 
        y = self.y[index].unsqueeze(0) #tensor is unsqueezed to ensure correct shape for training
        return X, y # methods returns corresponding data point to input index by an x and y tensor 

    # a helper to check and return the size of the dataset
    def __len__(self):
        return len(self.y) # Returning the number of labels in the data


In [11]:
# Initializing datasets

dataset_class_train = TweetEvalData(x_train, y_train) # initiating an instance of the class that contains the train data
dataset_class_val = TweetEvalData(x_test, y_test) #initating an instance of the class that contains the test data

2. Finally, we need so-called Dataloaders for each of these data sets. As you will see in the example code later on, a Dataloader allows us to easily iterate over samples of data and corresponding labels during training of our model. Create Dataloaders for the train and test sets using `torch.utils.data.DataLoader`, with `batch_size` 64. Batch size is number of samples that are processed before the model is updated while training; each step of our gradient descent is based on one batch of data.

In [12]:
# Initiating dataloaders 

train_loader = torch.utils.data.DataLoader(dataset_class_train, batch_size = 64)
val_loader = torch.utils.data.DataLoader(dataset_class_val, batch_size = 64)

# 3. Let's train our neural network!

1. Create an instance of our LogisticRegression. The input feature size should correspond to the size of tfidf vectors. We still have 3-class classification.
2. Choose a loss function ([CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) and an optimizer, a.k.a. a gradient descent algorithm ([Adam optimizer](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) with learning rate 0.001)
3. Complete and run the provided code for training the model across 5 epochs. Is your loss going down?

In [13]:
# Instantiating model to take input vectors with correct size
model = LogisticRegression(18484, 3) # n_features = number of features found with tfidf, n_classes=3 labels on 3-point sentiment scale
model

LogisticRegression(
  (linear): Linear(in_features=18484, out_features=3, bias=True)
  (activ): Softmax(dim=1)
)

In [14]:
# Defining loss function and optimizer

loss_function = torch.nn.CrossEntropyLoss() # loss function for tracing loss during training
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # optimizer to update weights and biases 

In [15]:
# Training the model

for epoch in range(5): # epochs are the number of complete passes through the training dataset 
    losses = [] # storing the loss for each epoch
    for batch_index, (inputs, targets) in enumerate(train_loader): # looping over elements in train_loader

        # zero the gradients that are stored from the previous optimization step
        optimizer.zero_grad()
        
        # get the model outputs and the original target (true) labels
        outputs = model(inputs) # compute the outputs of the model
        targets = torch.flatten(targets) # converting target tensor from shape (batch_size,1) to (batch_size)
        targets = targets.type(torch.LongTensor) # Converting to torch.LongTensor as required for loss function
    
        # compute the loss here
        loss = loss_function(outputs,targets) #loss function compares model output and the true labels

        # back-propagate
        loss.backward()

        # perform the optimization step
        optimizer.step()
        
        #add this batch's loss to the losses for this epoch
        losses.append(loss.item())
        
    print(f'Epoch {epoch}: loss {np.mean(losses)}')

Epoch 0: loss 1.0828479057664324
Epoch 1: loss 1.0539012995495158
Epoch 2: loss 1.032682115105307
Epoch 3: loss 1.0158119463616875
Epoch 4: loss 1.0016070854891637


# 4. Evaluating the trained model

1. Complete the following code to evaluate the model on the `tweet_eval` validation set. We are looping over batches (subsets) of the validation data using our validation loader, getting the most-probable category for each prediction, and adding them to a list. Then we compare them to the true categories.

In [16]:
predictions = []

with torch.no_grad(): # for evaluation we don't backpropagate and update weights anymore
    for batch_index, (inputs, targets) in enumerate(val_loader):
        outputs = model(inputs) # Compute model outputs
        # getting the indices of the logit with the highest value, which corresponds to the predicted class (as labels 0, 1, 2)
        vals, indices = torch.max(outputs, 1)
        # accumulating the predictions
        predictions += indices.tolist()

# compute accuracy with sklearn accuracy_score using predictions and val_labels
acc =accuracy_score(predictions, val_labels)
print(f'Model accuracy: {acc}')

Model accuracy: 0.558


2. When we train a model on an unbalanced dataset, it sometimes does not learn enough in order to predict the less-represented categories. Instead, it will (almost) always guess one of the larger classes, as this is a "safer bet" in the absence of enough informaiton. Check if in our case, the model managed to produce predictions of all classes.

In [17]:
# Checking whether the model is able to predict all three classes
set(predictions)

{1, 2}

# 5. Optional: a taste of deep neural nets

Next week in lecture, we will see that deep neural nets are nothing more than neural nets with more than one layer. The layers are stacked on top of each other, so that the outputs from the first layer don't go into a prediction, until the last layer makes a prediction. Each layer first combines the outputs from the previous layer (a.k.a. activations) linearly using weights, and then applies an activation function.

The neural net we are implementing here looks like this:

<img src="Net for DLI exercise.drawio.png"> 

1. Finish the definition of this model. The `__init__` method should contain:
- one hidden layer with sigmoid activation functions (this is the "hidden" layer of the model, between inputs and outputs, and the number of nodes in it is a variable `hidden_size` that we should be able to set when we initialize the model)
- one softmax output layer.
2. In the forward pass, the model should do the following:
 - compute the z values for the hidden layer (linear combinations of the inputs, using the [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) neural net building block)
 - pass them through the [sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html) activation function
 - compute the z values for the output layer (linear combinations of the hidden layer outputs, a.k.a. its activations)
 - pass them through the Softmax activation function
3. Try initializing and inspecting the model as a toy_model instance. It should have 4 features, hidden_size 8, and 3 classes.

In [18]:

class SimpleNN(torch.nn.Module):

    # initializing the model with a certain number of input features
    # output classes, and size of hidden layer
    def __init__(self, n_features, hidden_size, n_classes):
        super().__init__()

        # creating one hidden layer h1
        # that first applies a linear transformation to the incoming data: z1 = w1T * x
        self.h1_linear = torch.nn.Linear(n_features, hidden_size)

        # creating the sigmoid activation function for the hidden layer
        self.h1_activ = torch.nn.Sigmoid()

        # creating one output layer
        # that again applies a linear transformation to the incoming activations: z2 = w2T * a1
        self.output_linear = torch.nn.Linear(hidden_size,n_classes)
        
        # creating a softmax activation function for the output layer
        self.output_activ = torch.nn.Sigmoid()

    # you have to define the forward() method which will specify the forward propagation:
    # how the input values get to the next layer(s)
    def forward(self, inputs):

        # compute the z-values of the hidden layer
        z1 =self.h1_linear(inputs)

        # pass them through the sigmoid activation function
        a1 = self.h1_activ(z1)

        #compute the z-values of the output layer
        z2 = self.output_linear(a1)
        
        # get the final values
        outputs =self.output_activ(z2)

        return outputs

In [None]:
# Initiating model
toymodel = SimpleNN(4,8,3)

# Inspecting the toy_model
print('The definition of the toy_model is: \n \n',toymodel) # Printing the definition of the model - note that there is a bias in the linear layer

toy_model_dict=toymodel.state_dict() # creating a dictionary containing names and values of model parameters

print('\nThe parameters of the model: ')
for param in toy_model_dict.keys():
    print('\nThe parameter: {} has shape: {}'.format(param,toy_model_dict[param].shape))


The definition of the toy_model is: 
 
 SimpleNN(
  (h1_linear): Linear(in_features=4, out_features=8, bias=True)
  (h1_activ): Sigmoid()
  (output_linear): Linear(in_features=8, out_features=3, bias=True)
  (output_activ): Sigmoid()
)

The parameters of the model: 

The parameter: h1_linear.weight has shape: torch.Size([8, 4])

The parameter: h1_linear.bias has shape: torch.Size([8])

The parameter: output_linear.weight has shape: torch.Size([3, 8])

The parameter: output_linear.bias has shape: torch.Size([3])
