**INITIALIZATION**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**LIBRARIES AND DEPENDENCIES**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
# !pip install -Uqq fastbook
# import fastbook
# fastbook.setup_book()

In [32]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES: 
from fastbook import *                                  # Getting all the Libraries. 
from fastai.callback.fp16 import *
from torch.autograd import Function

**MODELING NEURON**

In [None]:
#@ MODELING NEURON FROM SCRATCH: UNCOMMENT BELOW: 
# output = sum([x*w for x,w in zip(inputs, weights)]) + bias          # Adding Weighted Inputs and Bias. 
# def relu(x): return x if x >= 0 else 0                              # Defining RELU Activation Function. 
# y[i, j] = sum([a*b for a,b in zip(x[i,:], w[j,:])]) + b[j]          # Initializing Matrix Multiplication. 

In [5]:
#@ MATRIX MULTIPLICATION FROM SCRATCH: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        for j in range(bc):                  # Getting Column Indices. 
            for k in range(ac):              # Getting Inner Sum. 
                c[i,j] += a[i,k] * b[k,j]    # Getting Matrix Multiplication. 
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 
%timeit -n 20 t2 = m1@m2                     # Implementation of Matrix Multiplication. 

CPU times: user 659 ms, sys: 0 ns, total: 659 ms
Wall time: 731 ms
The slowest run took 142.02 times longer than the fastest. This could mean that an intermediate result is being cached.
20 loops, best of 5: 7.82 µs per loop


In [6]:
#@ ELEMENTWISE ARITHMETIC: 
a = tensor([10., 6, -4])                     # Initializing a Tensor. 
b = tensor([2., 8, 7])                       # Initializing a Tensor. 
a + b                                        # Initializing Elementwise Addition. 
(a<b).all(), (a==b).all()                    # Combining Elementwise Operations. 
(a + b).sum().item()                         # Converting Tensors. 

29.0

In [7]:
#@ MATRIX MULTIPLICATION: SIMPLIFIED: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        for j in range(bc):                  # Getting Column Indices. 
            c[i,j] = (a[i] * b[:,j]).sum()
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 

CPU times: user 1.76 ms, sys: 0 ns, total: 1.76 ms
Wall time: 1.87 ms


In [8]:
#@ BROADCASTING VECTOR TO A MATRIX: 
c = tensor([10., 20, 30])                    # Initializing a Tensor. 
m = tensor([[1.,2,3], [4,5,6], [7,8,9]])     # Initializing a Tensor. 
m.shape, c.shape                             # Inspecting Tensors. 

(torch.Size([3, 3]), torch.Size([3]))

In [9]:
#@ IMPLEMENTATION OF BROADCASTING: 
c = tensor([10., 20, 30])                    # Initializing a Tensor. 
m = tensor([[1.,2,3], [4,5,6], [7,8,9]])     # Initializing a Tensor. 
c = c.unsqueeze(1)                           # Adding a Unit Dimension. 
m.shape, c.shape                             # Inspecting Tensors. 

(torch.Size([3, 3]), torch.Size([3, 1]))

In [10]:
#@ MATRIX MULTIPLICATION: SIMPLIFIED: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        c[i]=(a[i].unsqueeze(-1) * b
              ).sum(dim=0)
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 
%timeit -n 20 t4 = m1@m2                     # Implementation of Matrix Multiplication. 

CPU times: user 671 µs, sys: 0 ns, total: 671 µs
Wall time: 9.57 ms
The slowest run took 9.73 times longer than the fastest. This could mean that an intermediate result is being cached.
20 loops, best of 5: 5.76 µs per loop


**EINSTEIN SUMMATION**
- Einstein Summation is a compact representation for combining products and sums. 

In [11]:
#@ EINSTEIN SUMMATION IMPLEMENTATION: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    return torch.einsum("ik,kj->ij",a,b)     # Implementation of Einstein Summation. 
%timeit -n 20 t5 = m1@m2                     # Implementation of Matrix Multiplication. 

20 loops, best of 5: 7.12 µs per loop


**FORWARD AND BACKWARD PASSES**
- Computing all the gradients of a given loss with respect to its parameters is known as **Backward Pass**. Similarly computing the output of the model on a given input based on the matrix products is known as **Forward Pass**. 

In [12]:
#@ DEFINING AND INITIALIZING LAYER: 
def lin(x, w, b): return x @ w + b                  # Defining Linear Layer. 
x = torch.randn(200, 100)                           # Initializing Random Input Tensors. 
y = torch.randn(200)                                # Initializing Random Output Tensors. 
w1 = torch.randn(100, 50)                           # Initializing Random Weights. 
b1 = torch.zeros(50)                                # Initializing Random Bias. 
w2 = torch.randn(50, 1)                             # Initializing Random Weights. 
b2 = torch.zeros(1)                                 # Initializing Random Bias. 

#@ IMPLEMENTATION OF LINEAR FUNCTION: 
l1 = lin(x, w1, b1)                                 # Implementation of Function. 
l1.shape                                            # Inspecting the Shape. 

torch.Size([200, 50])

In [13]:
#@ INSPECTING MEAN AND STANDARD DEVIATION: 
l1.mean(), l1.std()

(tensor(-0.0569), tensor(10.1162))

In [14]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ torch.randn(100, 100)                    # Initializing Matrix Multiplications. 
x[0:5, 0:5]

tensor([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])

In [15]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ (torch.randn(100, 100) * 0.01)           # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [16]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ (torch.randn(100, 100) * 0.1)            # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[ 2.5783e-02,  5.1167e-01,  1.0287e-03,  1.4076e-01, -1.0012e+00],
        [ 5.4434e-01,  1.2963e+00,  1.0347e+00,  1.5280e+00, -9.2810e-01],
        [-6.0311e-01, -1.0227e+00, -3.7155e-01, -6.7006e-01,  5.1546e-01],
        [-1.4638e-01, -7.7870e-01, -4.5685e-01, -9.4609e-01,  4.8461e-01],
        [-3.0557e-01, -5.9048e-02, -8.6489e-02, -4.2007e-01, -3.3071e-01]])

In [17]:
#@ IMPLEMENTATION OF XAVIER INITIALIZATION: 
x = torch.randn(200, 100)                           # Initializing Random Input Tensors. 
y = torch.randn(200)                                # Initializing Random Output Tensors. 
w1 = torch.randn(100, 50) / math.sqrt(100)          # Initializing Random Weights. 
b1 = torch.zeros(50)                                # Initializing Random Bias. 
w2 = torch.randn(50, 1) / math.sqrt(50)             # Initializing Random Weights. 
b2 = torch.zeros(1)                                 # Initializing Random Bias. 

#@ IMPLEMENTATION OF LINEAR FUNCTION: 
l1 = lin(x, w1, b1)                                 # Implementation of Function. 
l1.mean(), l1.std()                                 # Inspection. 

(tensor(0.0013), tensor(1.0043))

In [18]:
#@ IMPLEMENTATION OF RELU ACTIVATION FUNCTION: 
def relu(x):                                        # Defining Relu Function. 
    return x.clamp_min(0.)                          # Replacing Negatives with Zeros. 
l2 = relu(l1)                                       # Implementation of RELU. 
l2.mean(), l2.std()                                 # Inspection. 

(tensor(0.3996), tensor(0.5879))

In [19]:
#@ MATRIX MULTIPLICATIONS AND RELU: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = relu(x @ (torch.randn(100, 100) * 0.1))      # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[1.3917e-08, 1.9313e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.1168e-08, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [2.0183e-08, 1.0267e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [2.0421e-08, 9.1188e-10, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.1986e-08, 1.4588e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00]])

In [20]:
#@ MATRIX MULTIPLICATIONS AND RELU: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = relu(x @ (torch.randn(100, 100) * 
                  math.sqrt(2/100)))                 # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[0.4409, 0.0000, 0.0000, 0.0000, 0.0000],
        [1.7408, 0.0000, 0.0000, 0.4863, 0.1915],
        [1.0620, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.7964, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.3552, 0.0000, 0.0000, 0.0000, 0.0000]])

In [21]:
#@ IMPLEMENTATION OF KAIMING INITILIZATION: 
x = torch.randn(200, 100)                             # Initializing Random Input Tensors.
y = torch.randn(200)                                  # Initializing Random Output Tensors.
w1 = torch.randn(100, 50) * math.sqrt(2/100)          # Initializing Weight Tensors. 
b1 = torch.zeros(50)                                  # Initializing Bias. 
w2 = torch.rand(50, 1) * math.sqrt(2/50)              # Initializing Weight Tensors. 
b2 = torch.zeros(1)                                   # Initializing Bias. 

#@ IMPLEMENTATION OF LINEAR LAYER AND RELU: 
l1 = lin(x, w1, b1)                                   # Implementation of Linear Function. 
l2 = relu(l1)                                         # Implementation of RELU. 
l2.mean(), l2.std()                                   # Inspection. 

(tensor(0.5623), tensor(0.8257))

In [22]:
#@ DEFINING MODEL: 
def model(x):                                         # Defining Model Function. 
    l1 = lin(x, w1, b1)                               # Implementation of Linear Function. 
    l2 = relu(l1)                                     # Implementation of RELU Function. 
    l3 = lin(l2, w2, b2)                              # Implementation of Linear Function. 
    return l3

#@ IMPLEMENTATION OF MODEL: 
out = model(x)                                        # Initializing the Model. 
out.shape                                             # Inspection. 

torch.Size([200, 1])

In [23]:
#@ DEFINING LOSS FUNCTION: 
def mse(output, targ):                                # Defining Loss Function. 
    return (output.squeeze(-1)-targ).pow(2).mean()    # Initializing Mean Squared Error. 
loss = mse(out, y)                                    # Inspecting Loss. 

In [24]:
#@ GRADIENT OF LOSS FUNCTION: 
def mse_grad(inp, targ):                              # Defining Gradient Function. 
    inp.g = 2. * (inp.squeeze() - targ
                  ).unsqueeze(-1) / inp.shape[0]      # Calculating Gradients of Loss. 

#@ GRADIENT OF RELU ACTIVATION FUNCTION: 
def relu_grad(inp, out):                              # Defining Gradient Function. 
    inp.g = (inp>0).float() * out.g                   # Calculating Gradients. 

In [25]:
#@ GRADIENT OF MATRIX MULTIPLICATION: 
def lin_grad(inp, out, w, b):                         # Defining the Function. 
    inp.g = out.g @ w.t()                             # Getting the Gradients of Input. 
    w.g = inp.t() @ out.g                             # Getting the Gradients of Weight. 
    b.g = out.g.sum(0)                                # Getting the Gradients of Bias. 

#@ FORWARD AND BACKWARD PROPAGATION FUNCTION: 
def forward_and_backward(inp, targ):                  # Defining the Function. 
    l1 = inp @ w1 + b1                                # Initializing Linear Layer. 
    l2 = relu(l1)                                     # Implementation of RELU. 
    out = l2 @ w2 + b2                                # Initializing Linear Layer. 
    loss = mse(out, targ)                             # Initializing Loss Function. 

    mse_grad(out, targ)                               # Calculating Gradients of Loss. 
    lin_grad(l2, out, w2, b2)                         # Getting Gradients. 
    relu_grad(l1, l2)                                 # Getting Gradients. 
    lin_grad(inp, l1, w1, b1)                         # Getting Gradients. 

**REFACTORING THE MODEL**

In [26]:
#@ INITIALIZING RELU CLASS: 
class Relu():                                         # Defining RELU Class. 
    def __call__(self, inp):                          # Initializing Callable Function. 
        self.inp = inp                                # Initialization. 
        self.out = inp.clamp_min(0.)                  # Initialization. 
        return self.out
    
    def backward(self):                               # Backward Propagation Function. 
        self.inp.g = (self.inp>0).float()*self.out.g  # Getting Gradients of Inputs. 

In [27]:
#@ INITIALIZING LINEAR CLASS: 
class Lin():                                          # Defining Linear Class. 
    def __init__(self, w, b):                         # Initializing Constructor Function. 
        self.w, self.b = w, b                         # Initialization. 
    
    def __call__(self, inp):                          # Initializing Callable Function. 
        self.inp = inp                                # Initializing Inputs. 
        self.out = inp @ self.w + self.b              # Initializing Outputs. 
        return self.out
    
    def backward(self):                               # Backward Propagation Function. 
        self.inp.g = self.out.g @ self.w.t()          # Calculating Gradients of Inputs. 
        self.w.g = self.inp.t() @ self.out.g          # Calculating Gradients of Weights.
        self.b.g = self.out.g.sum(0)                  # Calculating Gradients of Bias.  

In [28]:
#@ INITIALIZING MEAN SQUARED ERROR: 
class Mse():                                          # Defining MSE Class. 
    def __call__(self, inp, targ):                    # Initializing Callable Function. 
        self.inp = inp                                # Initializing Inputs. 
        self.targ = targ                              # Initializing Targets. 
        self.out = (inp.squeeze()-targ).pow(2).mean() # Initializing Outputs. 
        return self.out 
    
    def backward(self):                               # Backward Propagation Function. 
        x = (self.inp.squeeze()-self.targ
             ).unsqueeze(-1)                          # Calculating Loss. 
        self.inp.g = 2.*x / self.targ.shape[0]        # Calculating Gradients of Inputs. 

In [29]:
#@ DEFINING THE MODEL ARCHITECTURE: 
class Model():                                              # Defining Model Class. 
    def __init__(self, w1, b1, w2, b2):                     # Initializing Constructor Function. 
        self.layers = [Lin(w1,b1), Relu(), Lin(w2,b2)]      # Initializing Layers of Network. 
        self.loss = Mse()                                   # Initializing Mean Squared Error Loss. 
    
    def __call__(self, x, targ):                            # Initializing Callable Function. 
        for l in self.layers: x = l(x)                      # Implementation of Layers of Network. 
        return self.loss(x, targ)                           # Implementation of Loss Function. 

    def backward(self):                                     # Initializing Back Propagation Function. 
        self.loss.backward()                                # Calculating Gradients of Loss. 
        for l in reversed(self.layers): l.backward()        # Calculating Gradients. 

#@ IMPLEMENTATION OF MODEL: 
model = Model(w1, b1, w2, b2)                               # Initializing the Model. 
loss = model(x, y)                                          # Forward Propagation Function.
model.backward()                                            # Back Propagation Function. 

**PYTORCH IMPLEMENTATION**

In [30]:
#@ DEFINING BASE CLASS FUNCTION: 
class LayerFunction():                                      # Defining Layer Function Class. 
    def __call__(self, *args):                              # Initializing Callable Function. 
        self.args = args                                    # Initialization. 
        self.out = self.forward(*args)                      # Initialization. 
        return self.out
    
    def forward(self): raise Exception("not implemented")   # Forward Propagation Function. 
    def bwd(self): raise Exception("not implemented")       
    def backward(self): self.bwd(self.out, *self.args)      # Backward Propagation Function. 

#@ DEFINING SUBCLASS: RELU FUNCTION: 
class Relu(LayerFunction):                                  # Defining RELU. 
    def forward(self, inp): return inp.clamp_min(0.)        # Forward Propagation Function. 
    def bwd(self, out, inp): inp.g = (inp>0).float()*out.g  # Back Propagation Function. 

In [31]:
#@ DEFINING SUBCLASS: LINEAR LAYER FUNCTION:
class Lin(LayerFunction):                                   # Defining Linear Layer Function. 
    def __init__(self, w, b):                               # Initializing Constructor Function. 
        self.w, self.b = w, b                               # Initialization. 
    
    def forward(self, inp):                                 # Forward Propagation Function. 
        return inp @ self.w + self.b                        # Initializing Linear Function. 

    def bwd(self, out, inp):                                # Initializing Back Propagation. 
        inp.g = out.g @ self.w.t()                          # Calculating the Gradients. 
        self.w.g = self.inp.t() @ self.out.g                # Calculating the Gradients. 
        self.b.g = out.g.sum(0)                             # Calculating the Gradients. 

#@ DEFINING SUBCLASS: MEAN SQUARED ERROR FUNCTION: 
class Mse(LayerFunction):                                   # Defining MSE Function. 
    def forward(self, inp, targ):                           # Forward Propagation Function. 
        return (inp.squeeze() - targ).pow(2).mean()         # Calculating Mean Squared Error. 
    
    def bwd(self, out, inp, targ):
        inp.g = 2*(inp.squeeze() - targ
                   ).unsqueeze(-1) / targ.shape[0]          # Initializing Back Propagation. 

In [33]:
#@ DEFINING CUSTOM FUNCTION: 
class MyRelu(Function):                                     # Defining RELU Function. 
    @staticmethod
    def forward(ctx, i):                                    # Forward Propagation Function. 
        result = i.clamp_min(0.)
        ctx.save_for_backward(i)
        return result
    
    @staticmethod
    def backward(ctx, grad_output):                         # Back Propagation Function. 
        i, = ctx.saved_tensors
        return grad_output * (i>0).float()

**NEURAL NETWORK MODULE**

In [34]:
#@ DEFINING LINEAR LAYER FROM SCRATCH: 
class LinearLayer(nn.Module):                               # Defining Linear Layer. 
    def __init__(self, n_in, n_out):                        # Initializing Constructor Function. 
        super().__init__()
        self.weight = nn.Parameter(
            torch.randn(n_out, n_in) * math.sqrt(2/n_in))   # Initializing Weight Parameters. 
        self.bias = nn.Parameter(torch.zeros(n_out))        # Initializing Bias Parameters.

    def forward(self, x):                                   # Forward Propagation Function.          
        return x @ self.weight.t() + self.bias              # Initializing Linear Function. 

#@ IMPLEMENTATION OF LINEAR LAYER FUNCTION: 
lin = LinearLayer(10, 2)                                    # Initializing Linear Layer. 
p1, p2 = lin.parameters()                                   # Initializing Parameters. 
p1.shape, p2.shape                                          # Insepcting Parameters. 

(torch.Size([2, 10]), torch.Size([2]))

In [36]:
#@ DEFINING LINEAR MODEL: PYTORCH:
class Model(nn.Module):                                     # Defining Linear Model. 
    def __init__(self, n_in, nh, n_out):                    # Initializing Constructor Function. 
        super().__init__()
        self.layers = nn.Sequential(nn.Linear(n_in, nh),    # Initializing Linear Layer. 
                                    nn.ReLU(),              # Initializing RELU Function. 
                                    nn.Linear(nh, n_out))   # Initializing Linear Layer. 
        self.loss = mse                                     # Initializing MSE Loss Function. 
    
    def forward(self, x, targ):                             # Forward Propagation Function. 
        return self.loss(self.layers(x).squeeze(), targ)    # Calculating Loss. 

#@ DEFINING LINEAR MODEL: FASTAI:
class Model(Module):                                        # Defining Linear Model. 
    def __init__(self, n_in, nh, n_out):                    # Initializing Constructor Function. 
        self.layers = nn.Sequential(nn.Linear(n_in, nh),    # Initializing Linear Layer. 
                                    nn.ReLU(),              # Initializing RELU Function. 
                                    nn.Linear(nh, n_out))   # Initializing Linear Layer. 
    
    def forward(self, x, targ):                             # Forward Propagation Function. 
        return self.loss(self.layers(x).squeeze(), targ)    # Calculating Loss. 