# Batch size, Optimizers, Loss

We will study the behaviour of batch size, optimizers, and loss on a linear regression, as it makes the parameters easy to visualize.

---

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt
import time

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

## Fake data & basic linear regression

We create some random points roughly aligned, so we can perform a linear regression.

In [None]:
np.random.seed(0)

XS = np.random.rand(100, 1)
YS = 1 + 2 * XS + .1 * np.random.randn(100, 1)

plt.scatter(XS, YS)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.show()

A linear regression is a simple model, where $y_{pred} = a*x+b$.
The loss (function we try to minimize) is usually the mean squared error $\mathcal{L}(a,b) = \sum_{i=1}^N ({y_{pred}}_i-y_i)^2 = \sum_{i=1}^N ((ax_i+b)-y_i)^2$.

Let's define a function that calculates the loss for $a \in \left[0,2\right]$ and $b \in \left[-1,3\right]$ (with a grid of $100 \times 100$ points), and plot the loss.

In [None]:
def loss_map(xs, ys, pow_err=2):
    aa = np.linspace(0,4, 100)
    bb = np.linspace(-1,3, 100)
    A, B = np.meshgrid(aa, bb)
    L = np.zeros((100,100))
    for i in range(100):
        for j in range(100):
            a = A[i,j]
            b = B[i,j]
            yhat = a*xs + b
            L[i,j] = np.log(np.mean(np.abs(ys-yhat)**pow_err))
    return L

L = loss_map(XS, YS)
plt.imshow(L, extent=[0, 4, -1, 3])
plt.xlabel('a')
plt.ylabel('b')
plt.show()