#LELA 60342 Research Methods in Computational and Corpus Linguistics

### Week 5

Today we will continue to expand our knowledge of PyTorch for implementing and training models.

At the end of week 2's session you were asked to implement a logistic regression for some simulated data, making use of the backwards step implementation in PyTorch and the appropriate PyTorch loss function. Your code should have looked something like this:

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
## Create simulated data
np.random.seed(10)
w1_center = (2, 3)
w2_center = (3, 2)
batch_size=50

x = np.zeros((batch_size, 2))
y = np.zeros(batch_size)
for i in range(batch_size):
    if np.random.random() > 0.5:
        x[i] = np.random.normal(loc=w1_center)
    else:
        x[i] = np.random.normal(loc=w2_center)
        y[i] = 1
x=np.insert(x, 2, 1,axis=1)
x=torch.tensor(x,dtype=torch.float32)
y=torch.tensor(y,dtype=torch.float32)

In [None]:
num_features=3
weights = torch.randn(num_features, requires_grad=True)
n_iters = 5000
num_samples = len(y)
lr=0.001
logistic_loss=[]
loss_fn = torch.nn.BCELoss()

for i in range(n_iters):
    z=torch.mv(x,weights)
    q=torch.sigmoid(z)

    loss=loss_fn(q,y)
    logistic_loss.append(loss.item())
    loss.backward()

    dw1 =  weights.grad[0]
    dw2 =  weights.grad[1]
    db =   weights.grad[2]
    with torch.no_grad():
      weights[0] = weights[0] - lr * dw1
      weights[1] = weights[1] - lr * dw2
      weights[2] = weights[2] - lr * db
      weights.grad=None


plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

Today we are going to switch to using the full resources of PyTorch in order to implement model training. First it will be useful to learn about a resource in scikit-learn for forming batches:

### Forming batches using scikit-learn

As we have discussed previously, model training is often improved by splitting the data up into batches. There is a function in Scikit-learn gen_batches, that is useful to us.

In [None]:
from sklearn.utils import gen_batches
batches = gen_batches(len(x),10)

In [None]:
for s in batches:
     print(x[s])

In [None]:
num_features=3
weights = torch.randn(num_features, requires_grad=True)
n_iters = 5000
num_samples = len(y)
lr=0.001
logistic_loss=[]
loss_fn = torch.nn.BCELoss()

for i in range(n_iters):
    batches = gen_batches(len(x),10)
    cumul_loss=0.0
    for j in batches:
        inputs = x[j]
        outputs = y[j]
        z=torch.mv(inputs,weights)
        q=torch.sigmoid(z)

        loss=loss_fn(q,outputs)
        cumul_loss += loss.item()
        loss.backward()

        dw1 =  weights.grad[0]
        dw2 =  weights.grad[1]
        db =   weights.grad[2]
        with torch.no_grad():
          weights[0] = weights[0] - lr * dw1
          weights[1] = weights[1] - lr * dw2
          weights[2] = weights[2] - lr * db
          weights.grad=None
        cumul_loss += loss.item()/10
    logistic_loss.append(cumul_loss)

plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

### Using torch.nn.Module to define models

torcn.nn is a subpackage containing all sorts of tools for working with neural networks. One particularly valuable component is nn.Module which we can use as base class for specifying all of our models

In [None]:
import torch.nn as nn
class logistic_regression(nn.Module):

  def __init__(self):
    super(logistic_regression, self).__init__()
    self.layer_1 = nn.Linear(3,1)

  def forward(self, x):
    return(torch.sigmoid(self.layer_1(x)))

### Using torch.optim


So far we have been updating our model weights (optimizing our models) manually. This become fiddly when we work with anything but toy models. Fortunately Pytorch provides a package that implements optimization for us. (https://pytorch.org/docs/stable/optim.html)

In [None]:
n_iters = 1000
logistic_loss=[]
model = logistic_regression()
loss_fn = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)
for i in range(n_iters):
    batches = gen_batches(len(x),10)
    cumul_loss = 0.0
    for j in batches:
        inputs=x[j]
        labels=y[j]
        optimizer.zero_grad()
        pred=model(inputs)
        loss=loss_fn(torch.squeeze(pred),labels)
        loss.backward()
        optimizer.step()
        cumul_loss += loss.item()
    logistic_loss.append(cumul_loss)

plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

### Using CUDA

In [None]:
n_iters = 500
logistic_loss=[]
model = logistic_regression()
model.to("cuda")
loss_fn = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(),lr=0.01)
for i in range(n_iters):
    batches = gen_batches(len(x),10)
    cumul_loss = 0.0
    for j in batches:
        inputs=x[j].to("cuda")
        labels=y[j].to("cuda")
        optimizer.zero_grad()
        pred=model(inputs)
        loss=loss_fn(torch.squeeze(pred),labels)
        loss.backward()
        optimizer.step()
        cumul_loss += loss.item()
    logistic_loss.append(cumul_loss)

plt.plot(range(1,n_iters),logistic_loss[1:])
plt.xlabel("number of epochs")
plt.ylabel("loss")

### Building and testing a Sentiment Classifier in Pytorch

In [None]:
!wget https://raw.githubusercontent.com/cbannard/lela60342/refs/heads/main/yelp_reviews_oh.csv.gz
!gunzip yelp_reviews_oh.csv.gz
import pandas as pd
reviews_df_oh=pd.read_csv("yelp_reviews_oh.csv")
reviews_oh=torch.tensor(reviews_df_oh.iloc[:,2:5002].values,dtype=torch.float32)
labels=torch.tensor((reviews_df_oh.iloc[:,1] == "positive").astype(int),dtype=torch.float32)


In [None]:
from sklearn.model_selection import train_test_split
reviews_train, reviews_test, labels_train, labels_test = train_test_split(reviews_oh, labels, test_size=0.2, random_state=30)

Problem 1: Implement a logistic regression sentiment classifier using Pytorch. Use batch training. Use the GPU via CUDA.

Once you have trained your model you can calculate its precision and recall on the test data with the following lines:

In [None]:
reviews_test=reviews_test.to("cuda")
labels_test=labels_test.to("cuda")

labels_pred=(model(reviews_test) > 0.5).int().flatten().tolist()
TPs=sum([int(s == 1 and labels_test[i].item() == 1) for i,s in enumerate(labels_pred)])
FPs=sum([int(s == 1 and labels_test[i].item() == 0) for i,s in enumerate(labels_pred)])
FNs=sum([int(s == 0 and labels_test[i].item() == 1) for i,s in enumerate(labels_pred)])
precision=TPs/(TPs+FPs)
recall=TPs/(TPs+FNs)
print(precision)
print(recall)

In order to check for overfitting it is also sometimes useful to compare performance on the test data to performance on the training data. An overfit model will perform near perfectly on the training data but poorly on the test data.

In [None]:
reviews_train=reviews_train.to("cuda")
labels_train=labels_train.to("cuda")

train_labels_pred=(model(reviews_train) > 0.5).int().flatten().tolist()
TPs=sum([int(s == 1 and labels_train[i].item() == 1) for i,s in enumerate(train_labels_pred)])
FPs=sum([int(s == 1 and labels_train[i].item() == 0) for i,s in enumerate(train_labels_pred)])
FNs=sum([int(s == 0 and labels_train[i].item() == 1) for i,s in enumerate(train_labels_pred)])
precision=TPs/(TPs+FPs)
recall=TPs/(TPs+FNs)
print(precision)
print(recall)

### Building more complex models

The great advantage of using Pytorch like this is that you can easily add greater complexity to your model without having to worry about implementing all the fiddly details.

### Adding layers

You can add multiple layers to your network. You just need to make sure that the shape of the tensor input to each layer matches the tensor output by the preceding layer

In [None]:
import torch.nn as nn

class multilayer_network(nn.Module):

  def __init__(self):
    super(multilayer_network, self).__init__()
    self.layer_1 = nn.Linear(5000,250)
    self.relu = nn.ReLU()
    self.layer_2 = nn.Linear(250,1)

  def forward(self, x):
    x = self.relu(self.layer_1(x))
    return(torch.sigmoid(self.layer_2(x)))


Problem 2: Train the model above on the sentiment classification task. This should require minimal changes over the training code you used for problem 1. Examine its performance on the test data and the training data.

Problem 3: Add a second hidden layer to the model above. Train and test this model.

### Dropout

Once your model starts to get bigger, overfitting becomes a substantial problem. One strategy for avoiding overfitting is dropout. This involves randomly dropping neurons from the network for each iteration.  This prevents the network from getting stuck in reliance on any single parameter. To implement dropout in PyTorch we simply set a proportion of neurons that we want to drop out for a given layer at each iteration.



In [107]:
class multilayer_network(nn.Module):

  def __init__(self):
    super(multilayer_network, self).__init__()
    self.layer_1 = nn.Linear(5000,250)
    self.relu = nn.ReLU()
    self.dropout1 = nn.Dropout(0.8)
    self.layer_2 = nn.Linear(250,1)

  def forward(self, x):
    x = self.relu(self.layer_1(x))
    x = self.dropout1(x)
    return(torch.sigmoid(self.layer_2(x)))

Problem 4. Update the model above so that dropout is applied to both the input and the hidden layer. Train and test the model. Experiment with different dropout values. What impact does this have on performance on the test and the training data?

Once you have finished all these problems, make a start on the programming exercise here:

https://livemanchesterac-my.sharepoint.com/:w:/g/personal/colin_bannard_manchester_ac_uk/EVgwZiT4urFCr9J6EELCsdIByg-v0bR7T9Ph3O4GFpwiDA?e=eUzh0l

Next week we will look at implementing networks for processing sequences (RNNs and Transformers) in Pytorch