In [1]:
import numpy as np
import torch

# Linear Regression

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables and offset by a constant (aka bias).

_y = w * x + b_

In [2]:
def convert_C_to_F(num):
    return (num * 9/5) + 32

In [3]:
inputs = np.array(np.random.uniform(low=0.5, high=13.3, size=(20,)), dtype='float32')
inputs

array([ 4.4394255 ,  7.5692253 ,  3.072012  , 11.790111  ,  1.9905111 ,
        0.90371615,  5.455385  , 12.137819  ,  2.0246139 ,  9.015684  ,
       11.657337  ,  3.6755729 ,  3.372492  , 11.852789  ,  9.210675  ,
        7.7850423 ,  6.697618  ,  1.2883344 ,  7.097615  ,  5.8117223 ],
      dtype=float32)

In [4]:
targets = np.array(list(map(convert_C_to_F, inputs)), dtype='float32')
targets

array([39.990967, 45.624607, 37.52962 , 53.2222  , 35.58292 , 33.62669 ,
       41.819695, 53.848076, 35.644306, 48.228233, 52.983208, 38.61603 ,
       38.070484, 53.335022, 48.579216, 46.013077, 44.055714, 34.319   ,
       44.775707, 42.4611  ], dtype=float32)

Converting numpy arrays into tensors

In [5]:
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [6]:
inputs.shape

torch.Size([20])

In [7]:
targets.shape

torch.Size([20])

Building a linear regression model from scratch

In [8]:
w = torch.randn(1, 1, requires_grad=True)
b = torch.randn(1,  requires_grad=True)
print(w)
print(b)

tensor([[-0.0467]], requires_grad=True)
tensor([0.3558], requires_grad=True)


In [9]:
def model(x):
    return x * w.t() + b

In [10]:
preds = model(inputs)
preds

tensor([[ 0.1483,  0.0020,  0.2122, -0.1952,  0.2627,  0.3135,  0.1008, -0.2115,
          0.2611, -0.0656, -0.1890,  0.1840,  0.1982, -0.1981, -0.0747, -0.0081,
          0.0428,  0.2955,  0.0241,  0.0842]], grad_fn=<AddBackward0>)

In [11]:
targets

tensor([39.9910, 45.6246, 37.5296, 53.2222, 35.5829, 33.6267, 41.8197, 53.8481,
        35.6443, 48.2282, 52.9832, 38.6160, 38.0705, 53.3350, 48.5792, 46.0131,
        44.0557, 34.3190, 44.7757, 42.4611])

# Loss Function

Loss function is a way to evaluate how the model is performing, we can do that by comparing the model's predictions with the actual targets

We can use Mean Squared Error (MSE):
- Calculate the difference between `preds` and `targets`
- Square all elements of the difference matrix
- Calculate the average element in the difference matrix

In [12]:
def MSE(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()
# .numel() returns the number of elements

In [13]:
loss = MSE(preds, targets)
loss

tensor(1925.1062, grad_fn=<DivBackward0>)

# Gradients

with PyTorch, we can automatically compute gradien of the loss wrt to the weights and biases by using `requires_grad`

In [14]:
loss.backward()

In [15]:
print(w)
print(w.grad)

tensor([[-0.0467]], requires_grad=True)
tensor([[-599.0138]])


If a grad is **positive**:
- increasing the weight will increase the loss
- decreasing the weight will decrease the loss

and vice versa, if a grad is **negative**:
- increasing the weight will decrease the loss
- decreasing the weight will increase the loss

Check out more here [Wiki](https://en.wikipedia.org/wiki/Slope)

The increase or ddecrease in the loss by changing a weight is proprotional to the gradient of the loss wrt to that element. This observation forms the basis of **gradient descend optimization** algorithm.

In [16]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5

Here we choose the number `1e-5` (learning rate) to not modify the weights and biases significantly, taking a small step in the downhill direction.

`torch.no_grad()` simply let PyTorch know not to track the gradient when we update the weights and biases manually.

In [17]:
preds = model(inputs)

In [18]:
updated_loss = MSE(preds, targets)
updated_loss < loss

tensor(True)

We also need to reset gradients by using `.zero-()` method because in PyTorch, gradients are calculated accumulatively.

In [19]:
w.grad.zero_()
b.grad.zero_()

tensor([0.])

# Training

1. Generate predictions
2. Calculate loss
3. Compute gradients wrt to weights and biases
4. Adjust weights and biases in proportion to gradient
5. Reset gradients

In [20]:
preds = model(inputs)
preds

tensor([[ 0.1758,  0.0482,  0.2315, -0.1237,  0.2755,  0.3198,  0.1344, -0.1379,
          0.2741, -0.0107, -0.1183,  0.2069,  0.2192, -0.1263, -0.0186,  0.0394,
          0.0838,  0.3041,  0.0675,  0.1198]], grad_fn=<AddBackward0>)

In [21]:
loss = MSE(preds, targets)
loss

tensor(1921.4449, grad_fn=<DivBackward0>)

In [22]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-598.3618]])
tensor([-86.6361])


In [23]:
lr = 1e-5
with torch.no_grad():
    w -= w.grad * lr
    b -= b.grad * lr
    w.grad.zero_()
    b.grad.zero_()

In [24]:
preds = model(inputs)
loss = MSE(preds, targets)
loss

tensor(1917.7914, grad_fn=<DivBackward0>)

## Train for multiple epochs

In [25]:
epochs = 100
for _ in range(epochs):
    preds = model(inputs)
    loss = MSE(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * lr
        b -= b.grad * lr
        w.grad.zero_()
        b.grad.zero_()

In [26]:
preds = model(inputs)
loss = MSE(preds, targets)
loss

tensor(1589.7255, grad_fn=<DivBackward0>)

In [27]:
preds

tensor([[2.8015, 4.4661, 2.0742, 6.7110, 1.4990, 0.9210, 3.3418, 6.8960, 1.5172,
         5.2354, 6.6404, 2.3952, 2.2340, 6.7444, 5.3391, 4.5809, 4.0025, 1.1256,
         4.2153, 3.5314]], grad_fn=<AddBackward0>)

In [28]:
targets

tensor([39.9910, 45.6246, 37.5296, 53.2222, 35.5829, 33.6267, 41.8197, 53.8481,
        35.6443, 48.2282, 52.9832, 38.6160, 38.0705, 53.3350, 48.5792, 46.0131,
        44.0557, 34.3190, 44.7757, 42.4611])

In [29]:
lr = 1e-3
epochs = 10000
for _ in range(epochs):
    preds = model(inputs)
    loss = MSE(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * lr
        b -= b.grad * lr
        w.grad.zero_()
        b.grad.zero_()

In [30]:
targets

tensor([39.9910, 45.6246, 37.5296, 53.2222, 35.5829, 33.6267, 41.8197, 53.8481,
        35.6443, 48.2282, 52.9832, 38.6160, 38.0705, 53.3350, 48.5792, 46.0131,
        44.0557, 34.3190, 44.7757, 42.4611])

In [31]:
preds

tensor([[39.8817, 45.6018, 37.3826, 53.3159, 35.4061, 33.4198, 41.7385, 53.9513,
         35.4684, 48.2453, 53.0732, 38.4857, 37.9318, 53.4304, 48.6017, 45.9962,
         44.0088, 34.1228, 44.7398, 42.3897]], grad_fn=<AddBackward0>)

In [32]:
w

tensor([[1.8276]], requires_grad=True)

In [33]:
b

tensor([31.7683], requires_grad=True)

And here you can see, our weight `w` and bias `b` are getting really close to the true value from the `convert_C_to_F` function we made in the beginning.