# Notes

Now it's time to do the exact same thing using a modern deep neural network library, like `PyTorch`. Micrograd is roughly modeled to this. 

So, we'll be doing the exact same thing but using the `PyTorch` API - a production grade package.

# Imports

In [1]:
import torch
import random
from value import Value

# PyTorch Node

In [2]:
x1 = torch.Tensor([2.0]).double()  ; x1.requires_grad = True
x2 = torch.Tensor([0.0]).double()  ; x2.requires_grad = True
w1 = torch.Tensor([-3.0]).double() ; w1.requires_grad = True
w2 = torch.Tensor([1.0]).double()  ; w2.requires_grad = True

b = torch.Tensor([6.8813735870195432]).double() ; b.requires_grad = True

n = x1 * w1 + x2 * w2 + b

o = torch.tanh(n)

print(o.data.item())
o.backward()

print("----")
print('x2', x2.grad.item())
print('w2', w2.grad.item())
print('x1', x1.grad.item())
print('w1', w1.grad.item())

0.7071066904050358
----
x2 0.5000001283844369
w2 0.0
x1 -1.5000003851533106
w1 1.0000002567688737


Micrograd is a scalar value engine. PyTorch is based on tensors (which are n-dimensional arrays of scalars). In our case, we are using just a scalar valued tensor (a tensor with only one element in it). Our scalar is cast to double (by default the values are float 32, so this gives us float 64 types).

`o` is a tensor object. Like Micrograd, PyTorch's tensor objects have `grad` and `data` attributes. The only difference here is that we need to call `.item()`, in order to grab the element from a position in the tensor (we are grabbing the specific element and stripping out the tensor).

Note by default PyTorch sets the `requires_grad` prop to `False` for efficiency reasons. You typically don't want gradients for leaf nodes (like inputs in the network).

PyTorch can do the same as Micrograd whent the tensors are single valued elements.

The big deal with PyTorch is that everything is significantly more efficient, because we are working with tensors. We can do a lot of operations in parallel with these tensors.

## Tensor Objects

Normally, you would use more complicated tensors like the following 2-by-3 array of scalars:

In [3]:
torch.Tensor([[1, 2, 3], [4, 5, 6]])

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [4]:
torch.Tensor([[1, 2, 3], [4, 5, 6]]).shape

torch.Size([2, 3])

This is usually what you would work with in the actual libraries.

What our output tensor looks like:

In [5]:
o

tensor([0.7071], dtype=torch.float64, grad_fn=<TanhBackward0>)

In [6]:
o.item() # which is the same here as o.data.item()

0.7071066904050358

In [7]:
print(x2.grad)
print(x2.grad.item())

tensor([0.5000], dtype=torch.float64)
0.5000001283844369


# Building Neural Nets

Now that we have the mathematical machinery, we can build out neural nets. Neural nets are just a specific class of mathematical expression.

## A Single Neuron

Let's create a neuron that subscribes to the PyTorch API. 

In [8]:
class Neuron:
    def __init__(self, nin):
        """
        nin: number of inputs
        """
        self.w = [Value(random.uniform(-1, 1)) for _ in range(nin)] # random weight b/w -1 and 1 for every input
        self.b = Value(random.uniform(-1, 1)) # the bias, which controls the overall trigger happiness of the neuron

    def __call__(self, x):
        """
        What we want to do here is the weighted sum, including the bias: w * x + b

        In other words, the dot product of w and x to get the forward pass of the neuron

        What we need to do here:
            1. Multiply all the elements of w, with all of the elements of x, pairwise
            2. Add the bias to the weighted sum
        """
        act = sum((wi*xi for wi, xi in zip(self.w, x)), self.b)
        out = act.tanh()
        return out

In [9]:
x = [2.0, 3.0]
n = Neuron(2)
n(x)

Value(data=0.9825110257057832)

Now we can forward a single neuron!

## Defining a Layer

Here we will define a layer of neurons. Each layer has a number of neurons. They are not connected to each other. All of the neurons are fully connected to the "input". A layer of neurons is a set of nodes evaluated independently.

In [10]:
class Layer:
    """
    A list of neurons

    nin: number of inputs for the neuron in the layer
    nout: how many neurons we will have in a layer
    """
    
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

    def __call__(self, x):
        outs = [n(x) for n in self.neurons]
        return outs

In [12]:
x = [2.0, 3.0]
n = Layer(2, 3) # a layer of three neurons, with each having two inputs
n(x)

[Value(data=0.7833953133734779),
 Value(data=0.945892417147608),
 Value(data=-0.996948563512722)]

Let's complete the picture and define a complete MLP. An MLP, the layers feed into each other sequentially. We take the number of inputs (nins) and list of nouts (sizes of all the layers in the MLP).

In [19]:
class MLP:

    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i + 1]) for i in range(len(nouts))]

    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

In [20]:
x = [2.0, 3.0, -1.0]   # three inputs into the MLP
n = MLP(3, [4, 4, 1])  # 3 layers of size 4, 4, and 1 - the last being the output
n(x)

[Value(data=0.6838416421276302)]