Alright, welcome to Pytorch 101.

What this notebook and session intends to cover:
1. How to deal with your data
2. How to work with pytorch tensors, and make use of the gpu
3. How to set up a basic model
4. How to use loss functions and optimisers
5. How to set up a basic training loop

What this notebook and session will not cover:
1. How to work with custom architectures
2. How the actual mathematics of autograd works (I'll go over a basic version of it though)
3. How to deal with checkpoints

I'll can cover all this in a second part if you guys are fine with it and want it.

Prerequisites: Python and numpy knowledge, ml/dl basic knowledge

Let us begin with our imports

In [1]:
import torch
import numpy as np

This is a good place to start, let us now pull in our csv file

In [2]:
data=np.genfromtxt("BostonHousing.csv", delimiter=',')
data

array([[       nan,        nan,        nan, ...,        nan,        nan,
               nan],
       [6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 3.9690e+02, 4.9800e+00,
        2.4000e+01],
       [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 3.9690e+02, 9.1400e+00,
        2.1600e+01],
       ...,
       [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 3.9690e+02, 5.6400e+00,
        2.3900e+01],
       [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 3.9345e+02, 6.4800e+00,
        2.2000e+01],
       [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 3.9690e+02, 7.8800e+00,
        1.1900e+01]])

Now I prefer calling in via numpy, you can use pandas or something else, it really doesn't matter too much

Looking at the top row, we can see they aren't numbers, and they are the headings, so we can get rid of that.

Looking at the way the data is structured on the competition page, the last coloumn is the output we need.

In [3]:
#getting rid of headings
data=np.delete(data,0,axis=0)

In [4]:
#looking at one row of the data
data[1]

array([2.7310e-02, 0.0000e+00, 7.0700e+00,        nan, 4.6900e-01,
       6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,
       1.7800e+01, 3.9690e+02, 9.1400e+00, 2.1600e+01])

the fourth coloumn here is a string and a dummy variable as per the competition, so let's get rid of that too

In [5]:
#getting rid of the river bank datapoint cuz idk wtf to do with it
data=np.delete(data,3,axis=1)

In [6]:
#looking at one row of the data
data[1]

array([2.7310e-02, 0.0000e+00, 7.0700e+00, 4.6900e-01, 6.4210e+00,
       7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02, 1.7800e+01,
       3.9690e+02, 9.1400e+00, 2.1600e+01])

Alright, we have a bunch of numbers now! What next? Let us sort out our input and output

In [7]:
#what does our data look like?
data.shape

(506, 13)

Remember, once we have our data ready, we should convert to torch tensors for further processing. And for the ones over here who might have worked with cuda before, and are wondering why we haven't sent our data over to the device yet; it is for memory optimisation reasons.

In [8]:
#splitting it into x and y
x=data[::,:12]
y=data[::,12:]
x=torch.Tensor(x).float()
y=torch.Tensor(y).float()
#then confirming the shapes of those
x.shape, y.shape

(torch.Size([506, 12]), torch.Size([506, 1]))

So now we have an input and an output. Let us now split it into a train and validation set

In [9]:
#to split it into train and validation parts
#sorting out our indexes
idx=np.arange(506)
#shuffling it
np.random.shuffle(idx)
#putting the corresponding parts into train and test indexes
train_idx, test_idx=idx[:450],idx[450:]
#and finally splitting our our shuffled data
x_train, y_train=x[train_idx],y[train_idx]
x_test, y_test=x[test_idx],y[test_idx]

This can be made a whole lot simpler with the Dataloader package

In [10]:
#need a bunch of imports for this
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
from torch.utils.data.dataset import random_split
#you can simply call it from torch itself too, but this 
#is slightly more convinient and legible with the actual code
dataset= TensorDataset(x,y)
train, val= random_split(dataset,[450, 56])
batch_size=20
train_dl=DataLoader(train, batch_size=batch_size, shuffle=True)
val_dl=DataLoader(val, batch_size=batch_size, shuffle=True)

What we have done essentially, is set up the pairing of data here, and it is very useful later on when we will sort out our training loop. We can unpack train_dl to x and y during training in batches, and simplifies the process of splitting up the data for us too

Now let us set up our gpu device

In [11]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda', index=0)

Brilliant, that seems to have worked very well. Now everytime we want something to be run on the gpu, we will simply send it over to the gpu via device

Next step, our model. For this session, I have kept things super simple by using a normal Linear model. We will call it from pytorch itself.

In [12]:
import torch.nn as nn
model= nn.Linear(12,1).to(device)
print(model.parameters)
list(model.parameters())

<bound method Module.parameters of Linear(in_features=12, out_features=1, bias=True)>


[Parameter containing:
 tensor([[ 0.1718,  0.0935, -0.2790, -0.1299, -0.0726, -0.1178, -0.1808,  0.0628,
          -0.2526,  0.2728,  0.0803,  0.0480]], device='cuda:0',
        requires_grad=True),
 Parameter containing:
 tensor([0.0584], device='cuda:0', requires_grad=True)]

There are a lot of things happening here, so lets break it down.

We first called in nn.Linear, and passed two parameters to it, the first one is number of input parameters, and the second one is number of output parameters. We then sent it over to "device", device here being our gpu.
We then saw the general structure of the model, and even saw what the exact random initialisations of the model looks like.

There is a line in there which says requires_grad=True for both the weights and the biases. Keep that in the back of your mind, it is super useful later on.

Now let us set up our Loss function. For this exercise, we are limited to using RMSE, so we will call in mse from the functions of torch, and then apply a square root to it.

In [13]:
import torch.nn.functional as F
def loss_fn(inp,tar):
    loss = F.mse_loss(inp,tar)
    RMSE_loss=loss.sqrt()
    return RMSE_loss

In [14]:
#sanity check of the loss function
loss_fn(model(x_train.to(device)),y_train.to(device))

tensor(109.7696, device='cuda:0', grad_fn=<SqrtBackward>)

So pretty simple, we called in mse loss, from torch's functional package, then sqaure rooted the answer and returned it. When we ran it, we had to send over the data in the form of torch tensors which were on the gpu, since our model is on the gpu as well. Not doing so will cause conflicts and weird ass errors.

Finally, let us set up our optimiser. For this one, we are using the stochastic gradient descent optimiser. For those of you familiar with how SGD works, while I might not be having specific code to sort my data into minibatches, it is happening in the back end of the optimiser automatically.

In [15]:
lr=1e-7
optimizer=torch.optim.SGD(model.parameters(), lr=lr)

And finally, we have moved all our pieces into play, we can now start with out training loop.

In [16]:
#setting up our training loop
import copy
best_model=model
def train(epoch_count, model, loss_method, optimizer=optimizer, train_dl=train_dl):
    lowest_loss=None
    for epoch in range (epoch_count):
        for xb, yb in train_dl:
            xb=xb.to(device)
            yb=yb.to(device)
            hyp=model(xb)
            loss=loss_method(hyp, yb)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        if (epoch==0):
            lowest_loss=loss.item()
        if ((epoch+1)%100==0):
            print(f"Epoch number:{epoch+1}: Loss={loss.item():.2f}")
        if (lowest_loss>loss.item()):
            lowest_loss=loss.item()
            best_model=copy.deepcopy(model)
    print(f"Best model error={lowest_loss}")

In [17]:
train(1000, model, loss_fn)

Epoch number:100: Loss=54.71
Epoch number:200: Loss=26.25
Epoch number:300: Loss=12.64
Epoch number:400: Loss=16.01
Epoch number:500: Loss=7.82
Epoch number:600: Loss=11.66
Epoch number:700: Loss=13.14
Epoch number:800: Loss=6.04
Epoch number:900: Loss=10.62
Epoch number:1000: Loss=9.91
Best model error=2.3848319053649902


So there are a ton of things happening here again, let's break it down.

I first called in an import which would help us extract our best model, we'll come to that later.

I then started my training loop based on the number of epochs I decided to pass to it, in this case, 1000, and started unpacking my data into batches of xb and yb. If we go back and see, I set my batch size as 20, which means if there is a set where there aren't 20 rows available, then it will take as many as there are left, and then move on to the next epoch. 

Then I sent over my batch data to my gpu, this way I am not cluttering it with the entirity of my data at one go. This is super useful when dealing with hundreds of gigabytes of data at once.

Then I set up my hypothesis, passed it through the loss function, and ran loss.backwards. Now this is one of the many places where pytorch really shines. What loss.backwards is doing, is looking through every equation from the loss equation, all the way to the original linear equation which is our model. It will compute the gradients of every parameter which has requires_grad set to True. And this is super helpful for the next step.

When I run optimizer.step(), we essentially update each and every parameter. If we go back to where I defined the optimizer, we can see one of the parameters for it is the parameters of the model (note, former parameter here is the programming one, latter is the parameter of the model).

Finally, I run optimizer.zero_grad(). This is because I want to set all the gradients to zero for the next time it is computed. Pytorch normally has them accumalate to make writing RNNs simpler, and so for any other case, we set them to zero every iteration.

The next bit handles the saving of the best model. I noticed while training that the loss converges a decent amount, then keeps jumping around. So I wanted to save the model with the lowest loss. I essentially deepcopied the model to best_model whenever the current loss was lower than the best loss recorded till then, initialising best_loss with the first epoch's loss. The deepcopying is necessary, otherwise simply setting it to equal will act as a pointer.

Let us now test our model.

In [18]:
loss_fn(best_model(x_test.to(device)),y_test.to(device))

tensor(9.3726, device='cuda:0', grad_fn=<SqrtBackward>)

Let us give this a quick test

In [20]:
outputs=best_model(x_test.to(device))
outputs[11]

tensor([21.8435], device='cuda:0', grad_fn=<SelectBackward>)

In [21]:
y_test[11]

tensor([20.1000])

That's it for today folks! 

*If you liked it, make sure to like, share, subscribe, hit notification bell, forward, put as story, tag, eat, shit and breathe pytorch from now on*