![MLU Logo](../data/MLU_Logo.png)

# <a name="0">Machine Learning Accelerator - Tabular Data - Lecture 3</a>


## PyTorch

1. <a href="#1">PyTorch: Tensors and Autograd</a>
2. <a href="#2">PyTorch: Building a Neural Network</a>


In [1]:
%%capture
%pip install -q -r ../requirements.txt

## 1. <a name="1">PyTorch: Tensors and Autograd</a>
<a href="#0">Go to top</a>

This tutorial follows the concepts from the original MXNet tutorial but uses PyTorch instead.

To get started, let's import PyTorch and NumPy.


In [2]:
import torch

Next, let's see how to create a 2D tensor (also called a matrix) with values from two sets of numbers: 1, 2, 3 and 4, 5, 6.

In [3]:
torch.tensor([[1,2,3],[5,6,7]])

tensor([[1, 2, 3],
        [5, 6, 7]])

We can also create a very simple matrix with the same shape (2 rows by 3 columns), but fill it with 1s.

In [4]:
x = torch.ones((2,3))
x

tensor([[1., 1., 1.],
        [1., 1., 1.]])

Often we'll want to create tensors whose values are sampled randomly. For example, sampling values uniformly between -1 and 1.

In [5]:
y = torch.rand(2, 3) * 2 - 1  # Values between -1 and 1
y

tensor([[ 0.6748,  0.4310,  0.6130],
        [-0.9225, -0.8389, -0.4594]])

You can also fill a tensor of a given shape with a given value, such as 2.0.

In [6]:
x = torch.full((2,3), 2.0)
x

tensor([[2., 2., 2.],
        [2., 2., 2.]])

As with NumPy, the dimensions of each tensor are accessible by accessing the .shape attribute. We can also query its size and data type.

In [7]:
(x.shape, x.numel(), x.dtype)

(torch.Size([2, 3]), 6, torch.float32)

### Operations

PyTorch supports a large number of standard mathematical operations. Such as element-wise multiplication:

In [8]:
x * y

tensor([[ 1.3496,  0.8619,  1.2259],
        [-1.8450, -1.6778, -0.9188]])

Exponentiation:

In [9]:
y.exp()

tensor([[1.9637, 1.5387, 1.8459],
        [0.3975, 0.4322, 0.6317]])

And matrix multiplication:

In [10]:
torch.mm(x, y.t())

tensor([[ 3.4375, -4.4415],
        [ 3.4375, -4.4415]])

### Indexing

PyTorch tensors support slicing in all the ways you might imagine accessing your data. Here's an example of reading a particular element, which returns a scalar tensor.

In [11]:
y[1,2]

tensor(-0.4594)

Read the second and third columns from y.

In [12]:
y[:,1:3]

tensor([[ 0.4310,  0.6130],
        [-0.8389, -0.4594]])

and writing to a specific element

In [13]:
y[:,1:3] = 2
y

tensor([[ 0.6748,  2.0000,  2.0000],
        [-0.9225,  2.0000,  2.0000]])

Multi-dimensional slicing is also supported.

In [14]:
y[1:2,0:2] = 4
y

tensor([[0.6748, 2.0000, 2.0000],
        [4.0000, 4.0000, 2.0000]])

### Automatic differentiation with autograd

PyTorch provides automatic differentiation through its autograd package. Let's see how it works with a simple example.

In [15]:
x = torch.tensor([[1., 2.], [3., 4.]], requires_grad=True)
x

tensor([[1., 2.],
        [3., 4.]], requires_grad=True)

Now let's define a function $y=f(x) = 0.6x^2$

In [16]:
y = 0.6 * x * x
y

tensor([[0.6000, 2.4000],
        [5.4000, 9.6000]], grad_fn=<MulBackward0>)

Let's compute the gradients

In [17]:
y.sum().backward()
x.grad

tensor([[1.2000, 2.4000],
        [3.6000, 4.8000]])

## 2. <a name="2">PyTorch: Building a Neural Network</a>
<a href="#0">Go to top</a>

### Implement a network with sequential mode 

Let's implement a simple neural network with two hidden layers of size 64 and 128 using the sequential mode. We will have 5 inputs, 1 output and some dropouts between the layers.

In [18]:
import torch.nn as nn

net = nn.Sequential(
    nn.Linear(5, 64),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(64, 128),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(128, 1),
    nn.Sigmoid()
)
net

Sequential(
  (0): Linear(in_features=5, out_features=64, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.4, inplace=False)
  (3): Linear(in_features=64, out_features=128, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.3, inplace=False)
  (6): Linear(in_features=128, out_features=1, bias=True)
  (7): Sigmoid()
)

Let's send a batch of data to this network (batch size is 4 in this case)

In [19]:
# Input shape is (batch_size, data length)
x = torch.rand(4, 5)
y = net(x)

print("Random input data with shape", x.shape)
print(x)
print("\nOutput shape:", y.shape)
print("Network output: ", y)

Random input data with shape torch.Size([4, 5])
tensor([[0.6891, 0.5221, 0.7773, 0.9408, 0.7547],
        [0.2574, 0.5219, 0.3243, 0.9965, 0.1699],
        [0.5062, 0.1165, 0.5882, 0.4178, 0.0667],
        [0.7801, 0.5441, 0.5210, 0.3496, 0.3415]])

Output shape: torch.Size([4, 1])
Network output:  tensor([[0.4622],
        [0.5201],
        [0.5014],
        [0.4897]], grad_fn=<SigmoidBackward0>)


We can also see the initialized weights for each layer.

In [20]:
print(net[0].weight.shape, net[0].bias.shape)
print(net[0].weight, net[0].bias)

torch.Size([64, 5]) torch.Size([64])
Parameter containing:
tensor([[ 0.4140, -0.2097,  0.1934,  0.0987, -0.3828],
        [-0.3258, -0.1371, -0.2716,  0.2433,  0.3157],
        [ 0.3060,  0.2025, -0.1249, -0.2841, -0.1136],
        [ 0.0635, -0.2865,  0.3451, -0.2566,  0.2379],
        [-0.2022, -0.3182, -0.1616,  0.1147,  0.0196],
        [ 0.0514,  0.4180, -0.1799, -0.3582, -0.3167],
        [-0.2233, -0.0761,  0.3520, -0.1367,  0.0231],
        [ 0.0652,  0.0074, -0.1976,  0.0652, -0.0874],
        [ 0.2888,  0.1323,  0.2426, -0.3566, -0.1998],
        [-0.2552,  0.4010, -0.3824, -0.0141, -0.0860],
        [-0.2668, -0.2012, -0.0907, -0.2436,  0.1911],
        [ 0.1006, -0.0848, -0.3372,  0.4433,  0.1452],
        [ 0.0564,  0.0578, -0.0198, -0.2309, -0.0589],
        [-0.1424,  0.3267, -0.4456,  0.3973, -0.2852],
        [-0.4185, -0.0388,  0.3620,  0.2704, -0.0656],
        [-0.3409,  0.0460, -0.2915, -0.3246,  0.0052],
        [ 0.0496, -0.3019,  0.3156,  0.0079,  0.3143],
      

### Implement the network flexibly:

Now let's implement the same network using a custom module, which gives more flexibility in defining the forward pass.

In [21]:
class MixMLP(nn.Module):
    def __init__(self):
        super(MixMLP, self).__init__()
        self.fc1 = nn.Linear(5, 64)
        self.fc2 = nn.Linear(64, 128)
        self.fc3 = nn.Linear(128, 1)
        self.dropout1 = nn.Dropout(0.4)
        self.dropout2 = nn.Dropout(0.3)
        
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout1(x)
        x = torch.relu(self.fc2(x))
        x = self.dropout2(x)
        x = torch.sigmoid(self.fc3(x))
        return x

net = MixMLP()
net

MixMLP(
  (fc1): Linear(in_features=5, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=128, bias=True)
  (fc3): Linear(in_features=128, out_features=1, bias=True)
  (dropout1): Dropout(p=0.4, inplace=False)
  (dropout2): Dropout(p=0.3, inplace=False)
)

The usage of net is similar as before.

In [22]:
# Input shape is (batch_size, data length)
x = torch.rand(4, 5)
net(x)

tensor([[0.4729],
        [0.4819],
        [0.4444],
        [0.4414]], grad_fn=<SigmoidBackward0>)