## AutoGrad
1. **Feedforward as a Nested Function**:  
   When we perform feedforward in a neural network, we're essentially chaining multiple functions (e.g., matrix multiplications, activation functions, etc.) together. Mathematically, this can be seen as a **composition of functions**, which grows complex as the network deepens.


   Example:
   \[
   y = f(g(h(x)))
   \]
   where \(f\), \(g\), and \(h\) represent different layers or transformations.


2. **Manual Backpropagation is Tedious**:  
   To compute gradients manually during backpropagation, you would need to apply the **chain rule** for derivatives repeatedly for every parameter in the network. This is error-prone and computationally intensive, especially for large models.

3. **How Autograd Helps**:  
   PyTorch's **Autograd** automates the computation of gradients. It does so by:
   - Recording operations performed on tensors in a **computational graph**.
   - During backpropagation, it traverses this graph **in reverse** (Reverse Mode Automatic Differentiation) to compute gradients efficiently.

   With `autograd`, you just define the forward pass, and gradients for all trainable parameters are computed automatically.

### In Summary:
Autograd removes the complexity of manually computing derivatives for every parameter in the network, enabling faster experimentation and model development. It’s especially useful when dealing with deep and complex networks!

In [11]:
import torch
x = torch.tensor(3.0,requires_grad=True)

In [12]:
y = x**2

In [13]:
x

tensor(3., requires_grad=True)

In [14]:
y

tensor(9., grad_fn=<PowBackward0>)

In [15]:
y.backward()

In [16]:
x.grad

tensor(6.)

In [23]:
x = torch.tensor(4.0,requires_grad=True)
y = x**2
z = torch.sin(y)

In [24]:
print(x)
print(y)
print(z)

tensor(4., requires_grad=True)
tensor(16., grad_fn=<PowBackward0>)
tensor(-0.2879, grad_fn=<SinBackward0>)


In [25]:
z.backward()

In [26]:
x.grad

tensor(-7.6613)

In [27]:
# inputs
x = torch.tensor(6.7)

# true label
y = torch.tensor(0.0)


In [31]:
w = torch.tensor(1.0,requires_grad=True)
b = torch.tensor(0.0,requires_grad=True)

print(w)
print(b)

tensor(1., requires_grad=True)
tensor(0., requires_grad=True)


In [32]:
z = w*x + b
z

tensor(6.7000, grad_fn=<AddBackward0>)

In [33]:
y_pred = torch.sigmoid(z)
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [34]:
def binary_cross_entropy(prediction, target):
    epsilon = 1e-7
    prediction = torch.clamp(prediction, epsilon, 1 - epsilon)
    return -(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))
    

In [35]:
loss = binary_cross_entropy(y_pred, y)

In [36]:
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [37]:
loss.backward()

In [38]:
w.grad

tensor(6.6918)

In [39]:
b.grad

tensor(0.9988)

In [41]:
x = torch.tensor([1.,2.,3.],requires_grad=True)
x

tensor([1., 2., 3.], requires_grad=True)

In [42]:
y = (x**2).mean()
y

tensor(4.6667, grad_fn=<MeanBackward0>)

In [43]:
y.backward()

In [44]:
x.grad

tensor([0.6667, 1.3333, 2.0000])

### clearing grad

The gradients accumulate after each pass and so we need to clear itm


In [46]:
x.grad.zero_()

tensor([0., 0., 0.])