In [1]:
x = [1.,2.,3.]

In [2]:
x[::-1]

[3.0, 2.0, 1.0]

Install Pytorch, tensorboard, torchvision, graphviz and Torch viz, git

In [9]:
! pip install -q numpy scikit-learn


[notice] A new release of pip is available: 23.1.2 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

Data Generation

In [11]:
np.random.seed(42)

In [12]:
true_b = 1
true_w = 2
N = 100

# Data generation
np.random.seed(42)
x = np.random.rand(N,1)
epsilon = (.1 * np.random.rand(N, 1))
y = true_b + true_w * x + epsilon

In [13]:
idx = np.arange(N)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:int(N*.8)]
val_idx = idx[int(N*.8):]

x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

In [14]:
np.random.seed(42)
b = np.random.randn(1)
w = np.random.randn(1)

print(b, w)

[0.49671415] [-0.1382643]


In [15]:
# Model's predicted output - Forward pass
yhat = b + w * x_train

Loss Function

In [16]:
error = (yhat - y_train)

In [17]:
loss = (error ** 2).mean()

In [18]:
loss

np.float64(2.808129216295391)

Gradients

In [19]:
b_grad = (2 * error).mean()

In [20]:
w_grad = (2 * error * x_train). mean()

In [21]:
b_grad, w_grad

(np.float64(-3.108262701823821), np.float64(-1.8206663430690853))

In [22]:
lr = 0.1
print(b,w)
#Updates parameters using gradients and the learning rate

b = b - lr * b_grad
w = w - lr * w_grad
print(b, w)

[0.49671415] [-0.1382643]
[0.80754042] [0.04380233]


Standardization: Transforms a feature to have zero mean and unit standard deviation

First, compute mean and standard deviation, then uses both values to scale the feature

scaled x_i = (x_i - mean) / standard deviation

Zero mean: center the feature at 0 to avoid vanishing gradients
UNit standard deviation: all numerical features in a  similar scale


Preprocessing like Standardization must be performed after split

Fit only the training set to the StandardScaler, then use its transform to all datasets (training, validation, test)

In [23]:
scaler = StandardScaler(with_mean=True, with_std=True)
# We use Train set only to fit the scaler
scaler.fit(x_train)

scaled_x_train = scaler.transform(x_train)
scaled_x_val = scaler.transform(x_val)

Always standardize your feature

batch: (n = N), one epoch, one update 
ex: 80 data points, since we use it all, then one epoch= one update,
100 epochs = 100 updates

Stochastic (n=1) one epoch, N updates, 1 data point per batch
ex: 80 data points, 80 epochs, 80 updates
100 epochs x 80 updates = 8000 updates


Mini-batch (a part of N), one epoch, N/n updates
ex: 80 data points, we want 16 data points per mini-batch, one epoch (80/16 = 5), 5 updates
100 epochs = 500 updates

Training a model is performing forward pass (computing the prediction), computing loss, computing gradients, and updating parameters for many epochs