Question 1: Write the Python code to implement a single neuron.

In [5]:
# import all necessary libraries
from numpy import exp, array, random, dot, tanh
 
# Class to create a neural
# network with single neuron
class NeuralNetwork():
     
    def __init__(self):
         
        # Using seed to make sure it'll 
        # generate same weights in every run
        random.seed(1)
         
        # 3x1 Weight matrix
        self.weight_matrix = 2 * random.random((3, 1)) - 1
        # tanh as activation function
    def tanh(self, x):
        return tanh(x)
 
    # derivative of tanh function.
    # Needed to calculate the gradients.
    def tanh_derivative(self, x):
        return 1.0 - tanh(x) ** 2
 
    # forward propagation
    def forward_propagation(self, inputs):
        return self.tanh(dot(inputs, self.weight_matrix))
     
    # training the neural network.
    def train(self, train_inputs, train_outputs,
                            num_train_iterations):
                                 
        # Number of iterations we want to
        # perform for this set of input.
        for iteration in range(num_train_iterations):
            output = self.forward_propagation(train_inputs)
 
            # Calculate the error in the output.
            error = train_outputs - output
        # multiply the error by input and then
            # by gradient of tanh function to calculate
            # the adjustment needs to be made in weights
            adjustment = dot(train_inputs.T, error *
                             self.tanh_derivative(output))
                              
            # Adjust the weight matrix
            self.weight_matrix += adjustment
 

Question 2: Write the Python code to implement ReLU.

In [6]:
def relu(input):
    return max(0,input)


Question 3: Write the Python code for a dense layer in terms of matrix multiplication.

In [7]:
def matmul(a,b):
    ar,ac = a.shape # n_rows * n_cols
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            for k in range(ac): c[i,j] += a[i,k] * b[k,j]
    return c

Questiion 4: What is the “hidden size” of a layer?

The number of neurons present in a layer constitute to the hidden size of the layer.

Question 5: What does the t method do in PyTorch?

Expects input to be <= 2-D tensor and transposes dimensions 0 and 1.

0-D and 1-D tensors are returned as is. When input is a 2-D tensor this is equivalent to transpose(input, 0, 1).



Question 6: Why is matrix multiplication written in plain Python very slow?

PyTorch didn't write its matrix multiplication in Python, but rather in C++ to make it fast. In general, whenever we do computations on tensors we will need to vectorize them so that we can take advantage of the speed of PyTorch, usually by using two techniques: elementwise arithmetic and broadcasting.

Question 7: In matmul, why is ac==br?

In maths it is a rule for a matrix multiplicatiion to be possible:

No of columns of matrix A = No of rows of matrix B

Question 8: In Jupyter Notebook, how do you measure the time taken for a single cell to execute?

In [12]:
import torch
m1 = torch.randn(5,28*28)
m2 = torch.randn(784,10)


In [13]:
%time t1=matmul(m1, m2)

Wall time: 611 ms


Here is the magic command

Question 9: What is elementwise arithmetic?

All the basic operators (+, -, *, /, >, <, ==) can be applied elementwise. That means if we write a+b for two tensors a and b that have the same shape, we will get a tensor composed of the sums the elements of a and b:

In [15]:
from torch import tensor
a = tensor([10., 6, -4])
b = tensor([2., 8, 7])
a + b

tensor([12., 14.,  3.])

Question 10: Write the PyTorch code to test whether every element of a is greater than the corresponding element of b.

In [16]:
(a>b).all()

tensor(False)

We can use the .all() function to get our desired output.

Question 11:What is a rank-0 tensor? How do you convert it to a plain Python data type?

Reduction operations like all(), sum() and mean() return tensors with only one element, called rank-0 tensors. If you want to convert this to a plain Python Boolean or number, you need to call .item()

Question 12: How does elementwise arithmetic help us speed up matmul?

With elementwise arithmetic, we can remove one of our three nested loops: we can multiply the tensors that correspond to the i-th row of a and the j-th column of b before summing all the elements, which will speed things up because the inner loop will now be executed by PyTorch at C speed.

To access one column or row, we can simply write a[i,:] or b[:,j]. The : means take everything in that dimension. We could restrict this and take only a slice of that particular dimension by passing a range, like 1:5, instead of just :. In that case, we would take the elements in columns or rows 1 to 4 (the second number is noninclusive).

In [18]:
def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc): c[i,j] = (a[i] * b[:,j]).sum()
    return c

In [19]:
a = torch.randn(5,28*28)
b = torch.randn(784,10)
%time t1=matmul(a, b)

Wall time: 9.32 ms


The execution time reduced drastically.

Question 13: What are the broadcasting rules?

 Broadcasting is a term introduced by the NumPy library that describes how tensors of different ranks are treated during arithmetic operations. For instance, it's obvious there is no way to add a 3×3 matrix with a 4×5 matrix, but what if we want to add one scalar (which can be represented as a 1×1 tensor) with a matrix? Or a vector of size 3 with a 3×4 matrix? In both cases, we can find a way to make sense of this operation.

Broadcasting gives specific rules to codify when shapes are compatible when trying to do an elementwise operation, and how the tensor of the smaller shape is expanded to match the tensor of the bigger shape. It's essential to master those rules if you want to be able to write code that executes quickly. In this section, we'll expand our previous treatment of broadcasting to understand these rules.

When operating on two tensors, PyTorch compares their shapes elementwise. It starts with the trailing dimensions and works its way backward, adding 1 when it meets empty dimensions. Two dimensions are compatible when one of the following is true:

They are equal.
One of them is 1, in which case that dimension is broadcast to make it the same as the other.