# introduction in NN with pytorch

torch.autograd is PyTorch’s automatic differentiation engine that powers neural network training. 

## Training a NN

1. Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess. 

Basically training by input -> nodes/weights -> output (guesses) 

2. Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent (Your coursera course).
 
Output mistake -> optimize itself by travelling back to the nodes -> inputs (labels)

## NN sample in pytorch

#### Description: pretraining with resnet 18 model from torchvision


fun fact: resnet 18 (18 layer) model = CNN model that is apart of ResNet (residual networks) family, introducted in 2015. It is known for its simplicity and effectiveness in image classification tasks

Residual learning: 
* uses residual connections (skip connections) that allow gradients to flow through the network more easily, as NN can be too deep sometimes that it can degrade the performance of the model (bottlenecks)
* consist of CNN layers, batch normalization layers, ReLu activation function, and fully connected layers

more info: https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8

In [6]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
# create a model that takes 1 input of imafe with 3 channels, height & width of 64, 
data = torch.rand(1, 3, 64, 64)
#model has shape of (1, 1000)
labels = torch.rand(1, 1000)

In [7]:
# conducting foward pass
prediction = model(data)

# conducting backward pass
loss = (prediction - labels).sum() # error = loss
# the autograd calculates and stores the gradients for each model parameter in the parameter's .grad attribute
loss.backward()

In [8]:

#create an optimizer called SGD with learning rate of 0.01 and momentum of 0.9
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

# SGD = Stochastic Gradient Descent
# parameters = learnable weights and biases of the model
# learning rate = the size of steps that is taken during gradient descent
# momentum = usually paired with SGD to control the momentum of the optimizer. it can help to accelerate the optimizer in the relevant direction. 0.9 momentum is a common choice, indicating that the optimizer should remember 90% of the previous update's momentum

In [None]:
# we use step to initiate gradient descent
# the optimizer will adjust it by its gradient in .grad
optim.step() #gradient descent


## How AutoGrad help in building NN 

In [9]:
import torch

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
# putting requires_grad to true will allow the autograd to track the collected gradients

In [10]:
# a and b = parameteres of the NN
# Q is the error

Q = 3*a**3 - b**2