**INITIALIZATION**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**LIBRARIES AND DEPENDENCIES**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ INSTALLING DEPENDENCIES: UNCOMMENT BELOW: 
# !pip install -Uqq fastbook
# import fastbook
# fastbook.setup_book()

In [4]:
#@ DOWNLOADING LIBRARIES AND DEPENDENCIES: 
from fastbook import *                                  # Getting all the Libraries. 
from fastai.callback.fp16 import *

**MODELING NEURON**

In [None]:
#@ MODELING NEURON FROM SCRATCH: UNCOMMENT BELOW: 
# output = sum([x*w for x,w in zip(inputs, weights)]) + bias          # Adding Weighted Inputs and Bias. 
# def relu(x): return x if x >= 0 else 0                              # Defining RELU Activation Function. 
# y[i, j] = sum([a*b for a,b in zip(x[i,:], w[j,:])]) + b[j]          # Initializing Matrix Multiplication. 

In [5]:
#@ MATRIX MULTIPLICATION FROM SCRATCH: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        for j in range(bc):                  # Getting Column Indices. 
            for k in range(ac):              # Getting Inner Sum. 
                c[i,j] += a[i,k] * b[k,j]    # Getting Matrix Multiplication. 
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 
%timeit -n 20 t2 = m1@m2                     # Implementation of Matrix Multiplication. 

CPU times: user 652 ms, sys: 0 ns, total: 652 ms
Wall time: 713 ms
The slowest run took 171.59 times longer than the fastest. This could mean that an intermediate result is being cached.
20 loops, best of 5: 5.61 µs per loop


In [6]:
#@ ELEMENTWISE ARITHMETIC: 
a = tensor([10., 6, -4])                     # Initializing a Tensor. 
b = tensor([2., 8, 7])                       # Initializing a Tensor. 
a + b                                        # Initializing Elementwise Addition. 
(a<b).all(), (a==b).all()                    # Combining Elementwise Operations. 
(a + b).sum().item()                         # Converting Tensors. 

29.0

In [7]:
#@ MATRIX MULTIPLICATION: SIMPLIFIED: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        for j in range(bc):                  # Getting Column Indices. 
            c[i,j] = (a[i] * b[:,j]).sum()
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 

CPU times: user 1.29 ms, sys: 53 µs, total: 1.34 ms
Wall time: 1.46 ms


In [8]:
#@ BROADCASTING VECTOR TO A MATRIX: 
c = tensor([10., 20, 30])                    # Initializing a Tensor. 
m = tensor([[1.,2,3], [4,5,6], [7,8,9]])     # Initializing a Tensor. 
m.shape, c.shape                             # Inspecting Tensors. 

(torch.Size([3, 3]), torch.Size([3]))

In [9]:
#@ IMPLEMENTATION OF BROADCASTING: 
c = tensor([10., 20, 30])                    # Initializing a Tensor. 
m = tensor([[1.,2,3], [4,5,6], [7,8,9]])     # Initializing a Tensor. 
c = c.unsqueeze(1)                           # Adding a Unit Dimension. 
m.shape, c.shape                             # Inspecting Tensors. 

(torch.Size([3, 3]), torch.Size([3, 1]))

In [10]:
#@ MATRIX MULTIPLICATION: SIMPLIFIED: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    ar, ac = a.shape                         # Inspecting Shape. 
    br, bc = b.shape                         # Inspecting Shape. 
    assert ac == br                          # Asserting Rows and Columns. 
    c = torch.zeros(ar, bc)                  # Initializing Tensor. 
    for i in range(ar):                      # Getting Row Indices. 
        c[i]=(a[i].unsqueeze(-1) * b
              ).sum(dim=0)
    return c

#@ IMPLEMENTATION OF MATRIX MULTIPLICATION: 
m1 = torch.randn(5, 28*28)                   # Initializing Matrix. 
m2 = torch.randn(784, 10)                    # Initializing Matrix. 
%time t1 = matmul(m1, m2)                    # Implementation of Matrix Multiplication. 
%timeit -n 20 t4 = m1@m2                     # Implementation of Matrix Multiplication. 

CPU times: user 2.72 ms, sys: 178 µs, total: 2.9 ms
Wall time: 10.7 ms
20 loops, best of 5: 11.6 µs per loop


**EINSTEIN SUMMATION**
- Einstein Summation is a compact representation for combining products and sums. 

In [11]:
#@ EINSTEIN SUMMATION IMPLEMENTATION: 
def matmul(a, b):                            # Defining Matrix Multiplication Function. 
    return torch.einsum("ik,kj->ij",a,b)     # Implementation of Einstein Summation. 
%timeit -n 20 t5 = m1@m2                     # Implementation of Matrix Multiplication. 

The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
20 loops, best of 5: 5.44 µs per loop


**FORWARD AND BACKWARD PASSES**
- Computing all the gradients of a given loss with respect to its parameters is known as **Backward Pass**. Similarly computing the output of the model on a given input based on the matrix products is known as **Forward Pass**. 

In [12]:
#@ DEFINING AND INITIALIZING LAYER: 
def lin(x, w, b): return x @ w + b                  # Defining Linear Layer. 
x = torch.randn(200, 100)                           # Initializing Random Input Tensors. 
y = torch.randn(200)                                # Initializing Random Output Tensors. 
w1 = torch.randn(100, 50)                           # Initializing Random Weights. 
b1 = torch.zeros(50)                                # Initializing Random Bias. 
w2 = torch.randn(50, 1)                             # Initializing Random Weights. 
b2 = torch.zeros(1)                                 # Initializing Random Bias. 

#@ IMPLEMENTATION OF LINEAR FUNCTION: 
l1 = lin(x, w1, b1)                                 # Implementation of Function. 
l1.shape                                            # Inspecting the Shape. 

torch.Size([200, 50])

In [13]:
#@ INSPECTING MEAN AND STANDARD DEVIATION: 
l1.mean(), l1.std()

(tensor(-0.0569), tensor(10.1162))

In [14]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ torch.randn(100, 100)                    # Initializing Matrix Multiplications. 
x[0:5, 0:5]

tensor([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])

In [15]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ (torch.randn(100, 100) * 0.01)           # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [16]:
#@ UNDERSTANDING MATRIX MULTIPLICATIONS: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = x @ (torch.randn(100, 100) * 0.1)            # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[ 2.5783e-02,  5.1167e-01,  1.0287e-03,  1.4076e-01, -1.0012e+00],
        [ 5.4434e-01,  1.2963e+00,  1.0347e+00,  1.5280e+00, -9.2810e-01],
        [-6.0311e-01, -1.0227e+00, -3.7155e-01, -6.7006e-01,  5.1546e-01],
        [-1.4638e-01, -7.7870e-01, -4.5685e-01, -9.4609e-01,  4.8461e-01],
        [-3.0557e-01, -5.9048e-02, -8.6489e-02, -4.2007e-01, -3.3071e-01]])

In [17]:
#@ IMPLEMENTATION OF XAVIER INITIALIZATION: 
x = torch.randn(200, 100)                           # Initializing Random Input Tensors. 
y = torch.randn(200)                                # Initializing Random Output Tensors. 
w1 = torch.randn(100, 50) / math.sqrt(100)          # Initializing Random Weights. 
b1 = torch.zeros(50)                                # Initializing Random Bias. 
w2 = torch.randn(50, 1) / math.sqrt(50)             # Initializing Random Weights. 
b2 = torch.zeros(1)                                 # Initializing Random Bias. 

#@ IMPLEMENTATION OF LINEAR FUNCTION: 
l1 = lin(x, w1, b1)                                 # Implementation of Function. 
l1.mean(), l1.std()                                 # Inspection. 

(tensor(0.0013), tensor(1.0043))

In [18]:
#@ IMPLEMENTATION OF RELU ACTIVATION FUNCTION: 
def relu(x):                                        # Defining Relu Function. 
    return x.clamp_min(0.)                          # Replacing Negatives with Zeros. 
l2 = relu(l1)                                       # Implementation of RELU. 
l2.mean(), l2.std()                                 # Inspection. 

(tensor(0.3996), tensor(0.5879))

In [19]:
#@ MATRIX MULTIPLICATIONS AND RELU: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = relu(x @ (torch.randn(100, 100) * 0.1))      # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[1.3917e-08, 1.9313e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.1168e-08, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [2.0183e-08, 1.0267e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [2.0421e-08, 9.1188e-10, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.1986e-08, 1.4588e-09, 0.0000e+00, 0.0000e+00, 0.0000e+00]])

In [20]:
#@ MATRIX MULTIPLICATIONS AND RELU: 
x = torch.randn(200, 100)                            # Initializing Random Numbers. 
for i in range(50):
    x = relu(x @ (torch.randn(100, 100) * 
                  math.sqrt(2/100)))                 # Initializing Matrix Multiplications. 
x[0:5, 0:5]                                          # Inspection. 

tensor([[0.4409, 0.0000, 0.0000, 0.0000, 0.0000],
        [1.7408, 0.0000, 0.0000, 0.4863, 0.1915],
        [1.0620, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.7964, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.3552, 0.0000, 0.0000, 0.0000, 0.0000]])

In [21]:
#@ IMPLEMENTATION OF KAIMING INITILIZATION: 
x = torch.randn(200, 100)                             # Initializing Random Input Tensors.
y = torch.randn(200)                                  # Initializing Random Output Tensors.
w1 = torch.randn(100, 50) * math.sqrt(2/100)          # Initializing Weight Tensors. 
b1 = torch.zeros(50)                                  # Initializing Bias. 
w2 = torch.rand(50, 1) * math.sqrt(2/50)              # Initializing Weight Tensors. 
b2 = torch.zeros(1)                                   # Initializing Bias. 

#@ IMPLEMENTATION OF LINEAR LAYER AND RELU: 
l1 = lin(x, w1, b1)                                   # Implementation of Linear Function. 
l2 = relu(l1)                                         # Implementation of RELU. 
l2.mean(), l2.std()                                   # Inspection. 

(tensor(0.5623), tensor(0.8257))

In [22]:
#@ DEFINING MODEL: 
def model(x):                                         # Defining Model Function. 
    l1 = lin(x, w1, b1)                               # Implementation of Linear Function. 
    l2 = relu(l1)                                     # Implementation of RELU Function. 
    l3 = lin(l2, w2, b2)                              # Implementation of Linear Function. 
    return l3

#@ IMPLEMENTATION OF MODEL: 
out = model(x)                                        # Initializing the Model. 
out.shape                                             # Inspection. 

torch.Size([200, 1])

In [23]:
#@ DEFINING LOSS FUNCTION: 
def mse(output, targ):                                # Defining Loss Function. 
    return (output.squeeze(-1)-targ).pow(2).mean()    # Initializing Mean Squared Error. 
loss = mse(out, y)                                    # Inspecting Loss. 

In [24]:
#@ GRADIENT OF LOSS FUNCTION: 
def mse_grad(inp, targ):                              # Defining Gradient Function. 
    inp.g = 2. * (inp.squeeze() - targ
                  ).unsqueeze(-1) / inp.shape[0]      # Calculating Gradients of Loss. 

#@ GRADIENT OF RELU ACTIVATION FUNCTION: 
def relu_grad(inp, out):                              # Defining Gradient Function. 
    inp.g = (inp>0).float() * out.g                   # Calculating Gradients. 