## Backprop in Pytorch 

Pytorch is what you would use in production. Micrograd is roughly modelled off of it. 

Micrograd works only on scalars. but pytorch is suited for tensors by default, just like matlab is suited for matrices. 
Tensors: n dimension arrays of scalars. 

In [3]:
import torch 
import numpy as np

In [8]:
T1 = torch.Tensor([[1,2,3], [-1, 0, 4]])
T1

tensor([[ 1.,  2.,  3.],
        [-1.,  0.,  4.]])

In [7]:
T1.shape

torch.Size([2, 3])

In [11]:
# all xi, wi and b are tensors with single elements (scalars), then cast them as double() since thats the default precision in python. 

# since we are constructing a toy example with all leaf nodes, we have to explicitly set requires_grad = True. 

x1 = torch.Tensor([2.0]).double(); x1.requires_grad = True
x2 = torch.Tensor([0.0]).double(); x2.requires_grad = True

w1 = torch.Tensor([-3.0]).double(); w1.requires_grad = True
w2 = torch.Tensor([1.0]).double(); w2.requires_grad = True

b = torch.Tensor([6.88137358]).double() ; b.requires_grad = True

In [17]:
# additional, multiplication etc operations of tensor objects is pre-defined in pytorch

n = w1*x1 + w2*x2 + b
o = torch.tanh(n)

In [14]:
print(o.data.item())
o.backward()

0.7071066904050358


In [18]:
print(x2.grad)

tensor([0.5000], dtype=torch.float64)


strip the scalar value from the object returned using item()

In [16]:
print('x1', x1.grad.item())
print('w1', w1.grad.item())
print('x2', x2.grad.item())
print('w2', w2.grad.item())

x1 -1.5000003851533106
w1 1.0000002567688737
x2 0.5000001283844369
w2 0.0


And these above gradient values are the same as those obtained from micrograd or by manual backprop calculation using chain rule. 

### Single neuron implementation

in pytorch

In [76]:
from value_class import Value, draw_dot
import random


ImportError: cannot import name 'draw_dot' from 'value_class' (c:\Users\AN80050181\OneDrive - Wipro\Desktop\tutorials\ML\calm-notebooks\karpathy-micrograd\value_class.py)

In [None]:
class Neuron: 

    def __init__(self, nin):
        self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1,1))

    def __call__(self,x ,*args, **kwds):
        print(list(zip(self.w, x)))
        return 0.0

x = [2.0, 3.0]
n = Neuron(2)

# invokes __call__ for object 'n' with argument 'x' when used with n(x) type of syntax. 
n(x)

[(Value(data=-0.919596969426151), 2.0), (Value(data=-0.47084479556658243), 3.0)]


0.0

In [None]:
class Neuron: 

    def __init__(self, nin):
        self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1,1))

    def __call__(self,x ,*args, **kwds):
        # w*x + b
        act = sum(wi*xi for wi, xi in zip(self.w, x)) + self.b
        out = act.tanh()
        return out

# define xi for single neuron 
x = [2.0, 3.0]

# create neuron object
n = Neuron(2)

# output after activation 
n(x)

Value(data=0.9941205898182556)

In [None]:
class Layer:

    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)] #create n_out no of independent nuerons, using the above class Neuron for one neuron
    
    def __call__(self, x):
        outs = [n(x) for n in self.neurons]
        return outs[0] if len(outs)==1 else outs
    
x = [1,-2,5]

# 3 dim input x, 4 no of neurons in the layer with random weights => expect 4 outputs 
slp = Layer(3,4)
slp(x)

[Value(data=0.999914335333105),
 Value(data=0.9642613711098029),
 Value(data=-0.9999899434718277),
 Value(data=0.9658675228484803)]

![MLP](images/MLP.jfif)

Lets model the above multi layer perceptron (MLP) with 3 inputs, 2 hidden layers (4 neurons each) and 1 output. 

In [None]:
class MLP:

    def __init__(self, nin, nouts):
        # nouts is a list containing n of neurons in each layer. ex [3,2, 4]: 3 hidden layers with 3,2,4 neurons respectively
        # nin = no of inputs xi -- scalar
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))] #linking each consecutive pair of layers
    
    def __call__(self, x):
        for layer in self.layers:
            x = layer(x) # terminates when x contains output of final layer
        return x 


x = [3.0, -1.0, 4.0]
n = MLP(3, [4,4,1]) # 2 hidden  layers with 4 neurons each and 1 output layer. 
n(x)

Value(data=0.9101997637614062)

In [None]:
# run if graphviz is installed locally and added to path. 

draw_dot(n(x))

## Experiement with toy dataset 

In [None]:
xs = [
  [2.0, 3.0, -1.0],
  [3.0, -1.0, 0.5],
  [0.5, 1.0, 1.0],
  [1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0] # desired targets

In [None]:
init = [n(xi) for xi in xs]
init

[Value(data=0.21126605361559753),
 Value(data=0.9093985319093566),
 Value(data=0.8852615962030312),
 Value(data=0.10693136196333881)]

y_target = [1 , -1, -1, 0]

y_pred = [Value(data=0.21126605361559753),
 Value(data=0.9093985319093566),
 Value(data=0.8852615962030312),
 Value(data=0.10693136196333881)]

In [86]:
init_loss = [(yp - ygt)**2 for ygt, yp in zip(ys, init)]
loss = np.sum(init_loss)
loss

Value(data=8.619686870199377)

Notes: 
- so yp is an object of class MLP (since it derived from n(x)) and Value too. 

- while performing yp - ygt, where ygt was a simple float, we convert it into a Value object to _allow_ usual substraction (see Value class from value_class.py).

- so init_loss contains Value objects, on which backward() can be called. backward() creates a topological list of notes and calculates gradient for each. 

- hence when `init_loss.backward()` is called, 

weights, biases and gradient are assigned to each neuron, in each layer!

<img src="images/backprop.jpg" width="50%">

In [88]:
loss.backward()

In [106]:
# n.layers[0].neurons[0].w[0] 
print(n.layers[0].neurons[0].w[0].data) # will be the randomly initialized value

print(n.layers[0].neurons[0].w[0].grad) # returns gradient calculated from micrograd backward() function!

-0.019204102116678
-0.00556858261092813


Now lets make it more convinient and collate all gradient data in a single output variable by defining the `parameter()` function. 

In [123]:
class Neuron:
  
  def __init__(self, nin):
    self.w = [Value(random.uniform(-1,1)) for _ in range(nin)]
    self.b = Value(random.uniform(-1,1))
  
  def __call__(self, x):
    # w * x + b
    act = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)
    out = act.tanh()
    return out
  
  def parameters(self):
    return self.w + [self.b]

class Layer:
  
  def __init__(self, nin, nout):
    self.neurons = [Neuron(nin) for _ in range(nout)]
  
  def __call__(self, x):
    outs = [n(x) for n in self.neurons]
    return outs[0] if len(outs) == 1 else outs
  
  def parameters(self):
    return [p for neuron in self.neurons for p in neuron.parameters()]

class MLP:
  
  def __init__(self, nin, nouts):
    sz = [nin] + nouts
    self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]
  
  def __call__(self, x):
    for layer in self.layers:
      x = layer(x)
    return x
  
  def parameters(self):
    return [p for layer in self.layers for p in layer.parameters()]

In [124]:
n = MLP(3, [4,4,1])

In [None]:
xs = [
  [2.0, 3.0, -1.0],
  [3.0, -1.0, 0.5],
  [0.5, 1.0, 1.0],
  [1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0] # desired targets

l0 = [n(xi) for xi in xs]

local_loss0 = [(yp - ygt)**2 for ygt, yp in zip(ys, l0)]
loss0 = np.sum(local_loss0)

loss0.backward()

In [130]:
print(n.layers[0].neurons[0].w[0].data) 

0.9014365617490046


In [131]:
print(n.layers[0].neurons[0].w[0].grad) 

0.18721938424305554


In [None]:
for p in n.parameters():
    p.data += -0.01* p.grad # -0.01 since gradient descent. 

In [None]:
print(n.layers[0].neurons[0].w[0].data) 

#from 0.9 reduced to 0.89 using step size = -0.01

0.899564367906574


In [None]:
# new predictions in l
l1 = [n(xi) for xi in xs]

local_loss1 = [(yp - ygt)**2 for ygt, yp in zip(ys, l1)]
new_loss = np.sum(local_loss1)
new_loss 
# ought to have decreased

Value(data=2.201736934259056)

In [141]:
new_loss.backward()

In [142]:
for p in n.parameters():
    p.data += -0.01* p.grad # -0.01 since gradient descent. 

In [145]:
l2 = [n(xi) for xi in xs]

local_loss2 = [(yp - ygt)**2 for ygt, yp in zip(ys, l2)]
new_loss2 = np.sum(local_loss2)
new_loss

Value(data=1.706491915486441)

Loss decrease from 2.2 to 1.7 

In iteration k
- forward pass - comutation of l[k] losses..
- call .backward() -- updates gradient values
- update parameters (weights) of n 

New iteration k+1
- compute new loss l[k+1]
...

In [151]:
# re-initialize data and corresponding variables

xs = [
  [2.0, 3.0, -1.0],
  [3.0, -1.0, 0.5],
  [0.5, 1.0, 1.0],
  [1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0] # desired targets

In [152]:
n = MLP(3, [4,4,1])

In [None]:
# lets loop the gradient diescent process described above: 

max_iters = 40
step_size = 0.05

for k in range(max_iters):

    ypred = [n(xi) for xi in xs]
    local_loss = [(yp - ygt)**2 for ygt, yp in zip(ys, ypred)]
    loss = np.sum(local_loss)

    loss.backward()

    for p in n.parameters():
        p.data += -1*step_size*p.grad
    
    print(k, loss.data)



0 5.668765801705549
1 4.024769648069887
2 2.8191944208169097
3 0.565064630324257
4 0.1324156543778897
5 0.08835582660638998
6 0.1162459694392493
7 0.08318254163853375
8 0.025849185331762822
9 0.0057970361238869235
10 0.0012373622348871517
11 0.00027574813645262343
12 6.502687474950696e-05
13 1.582925719791322e-05
14 3.857808868591064e-06
15 9.27737805626855e-07
16 2.2081758595418366e-07
17 5.262502141380421e-08
18 1.2709835367238213e-08
19 3.139035419610271e-09


In [154]:
ypred

[Value(data=0.9999997685340459),
 Value(data=-0.999965023119551),
 Value(data=-0.9999562324825685),
 Value(data=0.9999999360073136)]

very closed to desired values of [1, -1 , -1 , 1]!