# <center>**Chapter 10 : Building Neural Networks with Pytorch**</center>

## **PyTorch Fundamentals**
**The core data structure of pytorch is a tensor. It's a multidimensional array with a shape and data type, used for numerical computations.**

In [73]:
import torch

In [74]:
X = torch.tensor([[1.0 , 4.0 , 7.0] , [2.0 , 3.0 , 6.0]])
X

tensor([[1., 4., 7.],
        [2., 3., 6.]])

In [75]:
X.shape , X.dtype

(torch.Size([2, 3]), torch.float32)

In [76]:
# indexing in tensor works same as numpy arrays

X[0 , 1], X[: , 1]

(tensor(4.), tensor([4., 3.]))

In [77]:
10 * (X + 1.0)

tensor([[20., 50., 80.],
        [30., 40., 70.]])

**you can also convert a tensor to a numpy array using the ``numpy()``  method and create a tensor from a numpy array**

In [78]:
import numpy as np
X.numpy()

array([[1., 4., 7.],
       [2., 3., 6.]], dtype=float32)

In [79]:
torch.tensor(np.array([[1. , 4. , 7.] , [2. , 3. , 6.]]))

tensor([[1., 4., 7.],
        [2., 3., 6.]], dtype=torch.float64)

**the default precision for floats in 32 bits in pytorch whereas its 64 bits in Numpy. Its generally better to use 32 bits in deep learning because this takes half the RAM and speeds up computations , and neural networks do not actually need the extra precision offered by 64 bit floats.** 

In [80]:
# you can use ``torch.FloatTensor()`` which automatically converts the array to 32 bits

torch.FloatTensor(np.array([[1. , 4. , 7.] , [2. ,3. , 6.]]))

tensor([[1., 4., 7.],
        [2., 3., 6.]])

In [81]:
# you can also modify a tensor in place using indexing and slicing as with a numpy array

X[: , 1] = -99
X

tensor([[  1., -99.,   7.],
        [  2., -99.,   6.]])

In [82]:
# the relu method applies the ReLU activation function in place by replacing all negative values with 0s

X.relu_()
X


tensor([[1., 0., 7.],
        [2., 0., 6.]])

#### **Tip : Pytorch's inplace operations are easy to spot at a glance because their name always ends with an underscore**

## **Hardware Acceleration**

In [83]:
torch.cuda.is_available()

True

In [84]:
if torch.cuda.is_available():
    device = "cuda"
elif torch.backend.mps.is_available():
    device = "mps"
else :
    device = "cpu"

In [85]:
M = torch.tensor([[1. , 2. , 3.] , [4. , 5. , 6.]])
M = M.to(device)

In [86]:
M.device

device(type='cuda', index=0)

In [87]:
R = M @ M.T
R

tensor([[14., 32.],
        [32., 77.]], device='cuda:0')

In [88]:
M = torch.rand((1000 , 1000))  # on the CPU
%timeit M @M.T

4.91 ms ± 79.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [89]:
M = torch.rand((1000 , 1000) , device = "cuda")  # on the GPU
%timeit M @ M.T

931 μs ± 81.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## **Autograd**
**PyTorch comes with an efficient implementation of reverse mode auto differentation called autograd which stands for automatic gradients**

In [90]:
x = torch.tensor(5.0 , requires_grad=True)
f = x**2
f

tensor(25., grad_fn=<PowBackward0>)

In [91]:
f.backward
x.grad

In [92]:
# learning_rate = 0.1
# with torch.no_grad():
#     x -= learning_rate * x.grad
#     x.grad.zero_()

In [94]:

# x_detached = x.detach()
# x_detached -= learning_rate * x.grad 

In [96]:
# x.grad.zero_()

### **Warning : if you forget to zero out the gradients at each training iteration , the backward() method will just accumulate them, causing incorrect gradient updates. Since there wont be any explicit error, just low performance (and perhaps infinite or NaN values) , this issue may be hard to debug**

In [97]:
# the whole traning looks like this 

learning_rate  = 0.1
x = torch.tensor(5.0 , requires_grad=True)
for iteration in range(100):
    f = x ** 2  # forward pass
    f.backward()  # backward pass
    with torch.no_grad():
        x -= learning_rate * x.grad   # gradient descent step

    x.grad.zero_() # reset the gradients

In [98]:
t = torch.tensor(2.0 , requires_grad=True)
z = t.exp() # this is an intermediate result
z += 1 # this is an inplace operation
# z.backward() # runtime error

- **Some operations such as ``exp()`` , `relu()` , `rsqrt()` , `sigmoid()` , `sqrt()` , `tan()` , and `tanh()` save their outputs in the computation graph during the forward pass , then use these outputs to compute the gradients during the backward pass.This means that you must not modify such an operation's output in place, or you will get an error during the forward pass**

- **Other operations such as `abs()` , `cos()` , `log()` , `sin()` , `square()` , and `var()` save their inputs instead of their output. Such an operation doesnot care if you modify its output in place , but you must not modify its input in place before the backward pass.**

## **Implementing Linear Regression**

In [99]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [100]:
housing = fetch_california_housing()
X_train_full , X_test , y_train_full , y_test = train_test_split(housing.data , housing.target , random_state = 42)
X_train , X_valid , y_train , y_valid = train_test_split(X_train_full , y_train_full , random_state=42)

In [101]:
X_train = torch.FloatTensor(X_train)
X_valid = torch.FloatTensor(X_valid)
X_test = torch.FloatTensor(X_test)
means = X_train.mean(dim = 0 , keepdim=True)
stds = X_train.std(dim = 0 , keepdim=True)
X_train = (X_train - means) / stds
X_valid = (X_valid - means) / stds
X_test = (X_test - means) / stds

In [102]:
# now lets convert the targerts to tensors too

y_train = torch.FloatTensor(y_train)
y_valid = torch.FloatTensor(y_valid)
y_test = torch.FloatTensor(y_test)

In [103]:
# lets create the parameters of our linear regeression model

torch.manual_seed(42)
n_features = X_train.shape[1] # there are 8 input features
w = torch.randn((n_features , 1), requires_grad = True)
b = torch.tensor(0. , requires_grad=True)

In [104]:
# we will use batch gradient descent (BGD) , using the full training set at each training step

learning_rate = 0.4
n_epochs = 20
for epoch in range(n_epochs):
    y_pred = X_train @ w + b
    loss = ((y_pred - y_train) ** 2).mean()
    loss.backward()
    with torch.no_grad():
        b -= learning_rate * b.grad
        w -= learning_rate * w.grad
        b.grad.zero_()
        w.grad.zero_()
    print(f"Epoch {epoch + 1}/{n_epochs}, Loss : {loss.item()}")

Epoch 1/20, Loss : 16.235748291015625
Epoch 2/20, Loss : 5.34673547744751
Epoch 3/20, Loss : 2.8323731422424316
Epoch 4/20, Loss : 1.944043517112732
Epoch 5/20, Loss : 1.6030147075653076
Epoch 6/20, Loss : 1.4679163694381714
Epoch 7/20, Loss : 1.4126158952713013
Epoch 8/20, Loss : 1.388764500617981
Epoch 9/20, Loss : 1.377511739730835
Epoch 10/20, Loss : 1.3714261054992676
Epoch 11/20, Loss : 1.3675490617752075
Epoch 12/20, Loss : 1.3646897077560425
Epoch 13/20, Loss : 1.3623602390289307
Epoch 14/20, Loss : 1.3603529930114746
Epoch 15/20, Loss : 1.3585747480392456
Epoch 16/20, Loss : 1.3569780588150024
Epoch 17/20, Loss : 1.3555351495742798
Epoch 18/20, Loss : 1.35422682762146
Epoch 19/20, Loss : 1.3530385494232178
Epoch 20/20, Loss : 1.35195791721344


In [105]:
X_new = X_test[:3] # pretend these are new instances
with torch.no_grad():
    y_pred = X_new @ w + b # use the trained parameters to make predictions

y_pred

tensor([[2.1702],
        [2.0141],
        [2.0942]])

## **Linear Regeression using pytorch's high - level API**

In [106]:
import  torch.nn as nn

torch.manual_seed(42)
model = nn.Linear(in_features= n_features , out_features= 1)

In [107]:
model.bias

Parameter containing:
tensor([0.3117], requires_grad=True)

In [108]:
model.weight


Parameter containing:
tensor([[ 0.2703,  0.2935, -0.0828,  0.3248, -0.0775,  0.0713, -0.1721,  0.2076]],
       requires_grad=True)

In [109]:
for param in model.parameters():
    [...]

In [110]:
model(X_train[:2])

tensor([[-0.4718],
        [ 0.1131]], grad_fn=<AddmmBackward0>)

In [111]:
optimizer = torch.optim.SGD(model.parameters() , lr = learning_rate)
mse = nn.MSELoss()

In [114]:
def train_bgd(model , optimizer , criterion , X_train , y_train , n_epochs):
    for epoch in range(n_epochs):
        y_pred = model(X_train)
        loss = criterion(y_pred , y_train)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(f"Epoch {epoch + 1} / {n_epochs} , Loss : {loss.item()}")

**we are now using higher level constructs rather than working directly with tensors and autograd.**
- **in pytorch the loss funtion object is commonly referred to as criterion, to distinguish it from the loss value itslef. in this example, its ther MSELoss instance.**
- **the ``optimizer.step()`` line corresponds to the two lines that updated b and w in our earlier code**
- **the ``optimizer.zero_grad()`` line corresponds to the twwo lines that zeroed out b.grad and w.grad.**

In [115]:
# lets call this function to train our model

train_bgd(model , optimizer , mse , X_train , y_train , n_epochs)

  return F.mse_loss(input, target, reduction=self.reduction)


Epoch 1 / 20 , Loss : 4.795647144317627
Epoch 2 / 20 , Loss : 1.513105034828186
Epoch 3 / 20 , Loss : 1.3611774444580078
Epoch 4 / 20 , Loss : 1.3481870889663696
Epoch 5 / 20 , Loss : 1.3454773426055908
Epoch 6 / 20 , Loss : 1.3444790840148926
Epoch 7 / 20 , Loss : 1.3439384698867798
Epoch 8 / 20 , Loss : 1.3435535430908203
Epoch 9 / 20 , Loss : 1.3432371616363525
Epoch 10 / 20 , Loss : 1.3429617881774902
Epoch 11 / 20 , Loss : 1.3427164554595947
Epoch 12 / 20 , Loss : 1.3424957990646362
Epoch 13 / 20 , Loss : 1.3422966003417969
Epoch 14 / 20 , Loss : 1.342116355895996
Epoch 15 / 20 , Loss : 1.3419532775878906
Epoch 16 / 20 , Loss : 1.341805338859558
Epoch 17 / 20 , Loss : 1.3416712284088135
Epoch 18 / 20 , Loss : 1.3415493965148926
Epoch 19 / 20 , Loss : 1.3414386510849
Epoch 20 / 20 , Loss : 1.3413382768630981


In [113]:
X_new = X_test[:3]
with torch.no_grad():
    y_pred = model(X_new)

In [112]:
y_pred

tensor([[2.1702],
        [2.0141],
        [2.0942]])

## **Implementing a Regression MLP**


In [117]:
torch.manual_seed(42)
model = nn.Sequential(
    nn.Linear(n_features , 50),
    nn.ReLU(),
    nn.Linear(50 , 40),
    nn.ReLU(),
    nn.Linear(40,1)
)

In [119]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters() , lr = learning_rate)
mse = nn.MSELoss()
train_bgd(model , optimizer , mse , X_train , y_train , n_epochs)

Epoch 1 / 20 , Loss : 1.3674356937408447
Epoch 2 / 20 , Loss : 1.3664956092834473
Epoch 3 / 20 , Loss : 1.3656134605407715
Epoch 4 / 20 , Loss : 1.3647804260253906
Epoch 5 / 20 , Loss : 1.3639895915985107
Epoch 6 / 20 , Loss : 1.3632375001907349
Epoch 7 / 20 , Loss : 1.3625197410583496
Epoch 8 / 20 , Loss : 1.3618351221084595
Epoch 9 / 20 , Loss : 1.3611795902252197
Epoch 10 / 20 , Loss : 1.360555648803711
Epoch 11 / 20 , Loss : 1.359958529472351
Epoch 12 / 20 , Loss : 1.3593848943710327
Epoch 13 / 20 , Loss : 1.358834147453308
Epoch 14 / 20 , Loss : 1.3583043813705444
Epoch 15 / 20 , Loss : 1.3577940464019775
Epoch 16 / 20 , Loss : 1.3573025465011597
Epoch 17 / 20 , Loss : 1.3568295240402222
Epoch 18 / 20 , Loss : 1.3563734292984009
Epoch 19 / 20 , Loss : 1.355933427810669
Epoch 20 / 20 , Loss : 1.355509638786316


## **Implementing Mini-Batch Gradient Decent Using DataLoaders**


In [120]:
from torch.utils.data import TensorDataset, DataLoader

train_dataset = TensorDataset(X_train , y_train)
train_loader = DataLoader(train_dataset , batch_size = 32 , shuffle=True)

In [123]:
torch.manual_seed(42)
model = nn.Sequential(
    nn.Linear(n_features , 50),
    nn.ReLU(),
    nn.Linear(50 , 40),
    nn.ReLU(),
    nn.Linear(40 , 1)
)
model = model.to(device)

In [124]:
# lets create a train() function to implement mini batch GD

def train(model , optimizer , criterion , train_loader , n_epochs):
    model.train()
    for epoch in range(n_epochs):
        total_loss = 0.
        for X_batch , y_batch in train_loader:
            X_batch , y_batch = X.batch.to(device) , y_batch.to(device)
            y_pred = model(X_batch)
            loss = criterion(y_pred , y_batch)
            total_loss += loss.item()
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        mean_loss = total_loss / len(train_loader)
        print(f"Epoch {epoch + 1} / {n_epochs}, Loss : {mean_loss : 4f}")