<a href="https://colab.research.google.com/github/A-SHIVASAI/A-SHIVASAI/blob/A-SHIVASAI-Deep_learning/pytorch_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![picture](https://upload.wikimedia.org/wikipedia/commons/9/96/Pytorch_logo.png)

## Python lists or tuples of numbers are collections of Python objects that are individually allocated in memory.

In [None]:
# first let's define a list and see the size it occupies
a = [1.0, 2.0, 3.0]
print(type(a))
print(a.__sizeof__())

<class 'list'>
64


**Task 1**

In [1]:
# define a list of integers and see the size it occupies
# print exactly like we did in previous cell
## code below
b= [3.5, 2.8, 7.8]
print(type(b))
print(b.__sizeof__())

<class 'list'>
72


## PyTorch tensors or NumPy arrays, on the other hand, are views over typically) contiguous memory blocks containing unboxed C numeric types rather than Python objects. Each element is 32 bit (4-byte for 32 bit and 8 byte for 64 bit in general)  float in this case used in pytorch by default. This means storing a 1D tensor of 1000,000 float numbers will require exactly 4,000,000 contiguous bytes, plus a small overhead for the metadata (such as dimensions and numeric type).

In [2]:
import numpy as np
a = np.array([1.0, 2.0, 3.0])
print("a: {}, and it's dtype: {}".format(a, a.dtype))
print(a.nbytes)

a: [1. 2. 3.], and it's dtype: float64
24


In [3]:
import torch as t
a = t.ones(3)
print("a: {}, and it's dtpe: {}".format(a, a.dtype))
print("size: ", a.element_size()*a.nelement())

a: tensor([1., 1., 1.]), and it's dtpe: torch.float32
size:  12


In [4]:
# we can create a tensor of zeros and replace it's values
my_tensor = t.zeros((3, 2))
my_tensor

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [5]:
# replace it's zeros with 1., 2., 3., 4., 5., 6.
my_tensor[0, 0] = 1.
my_tensor[0, 1] = 2.
my_tensor[1, 0] = 3.
my_tensor[1, 1] = 4.
my_tensor[2, 0] = 5.
my_tensor[2, 1] = 6.

In [6]:
# print my_tensor
my_tensor

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

**Task 2**

In [7]:
# create a tensor of ones with size (5,5) and replace each element's values with any value you like
# then print the tensor
## code below
my_tensor2= t.ones((5,5))
my_tensor2

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [8]:
# we can pass lists of list to create a tensor
two_D_tensor = t.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
two_D_tensor

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

**Task 3**

In [10]:
# create a list of lists and pass into t.tensor and print your tensor
## code below
two_D_tensor2 = t.tensor([[5.0, 6.0], [4.0, 7.0], [1.0, 3.0]])
two_D_tensor2

tensor([[5., 6.],
        [4., 7.],
        [1., 3.]])

In [13]:
# creating 3-d tensor
random_tensor = t.randn((3,4,3)) # here 3,4,3 ---> refers channels, rows and  columns
random_tensor

tensor([[[-0.7778,  0.9299, -0.1691],
         [-0.0705,  0.1997,  1.0388],
         [ 0.1231,  0.8448,  1.0463],
         [-0.1381, -0.6701, -1.4659]],

        [[-0.5170,  0.0648, -0.2707],
         [ 0.4349,  0.6208, -0.1516],
         [-0.6418,  1.2714, -1.3335],
         [-0.4179, -0.4850,  0.7496]],

        [[-1.2134, -1.1155,  0.1450],
         [-2.0442, -1.1166,  0.5070],
         [-0.7863, -0.0631, -0.0749],
         [-0.7330, -0.8681,  1.0167]]])

In [14]:
# indexing tensors
print("Printing 4th row last element of first (4,3) matrix: ", random_tensor[0][3][2])
print("Printing 1st row first element of second (4,3) matrix: ", random_tensor[1][0][0])
print("Printing 2nd row 2nd element of second (4,3) matrix: ", random_tensor[1][1][1])
print("Printing 4th row first element of last (4,3) matrix: ", random_tensor[2][3][0])


Printing 4th row last element of first (4,3) matrix:  tensor(-1.4659)
Printing 1st row first element of second (4,3) matrix:  tensor(-0.5170)
Printing 2nd row 2nd element of second (4,3) matrix:  tensor(0.6208)
Printing 4th row first element of last (4,3) matrix:  tensor(-0.7330)


**Task 4**

In [None]:
# Print 3rd row last element of first (4,3) matrix
# Print 2nd row first element of second (4,3) matrix
# Print 1st row 2nd element of first (4,3) matrix
# Print 1st row first element of last (4,3) matrix

In [15]:
print("Print 3rd row last element of first (4,3) matrix: ", random_tensor[0][2][2])
print("Print 2nd row first element of second (4,3) matrix: ", random_tensor[1][1][0])
print(" Print 1st row 2nd element of first (4,3) matrix: ", random_tensor[0][0][1])
print("Print 1st row first element of last (4,3) matrix: ", random_tensor[2][0][0])

Print 3rd row last element of first (4,3) matrix:  tensor(1.0463)
Print 2nd row first element of second (4,3) matrix:  tensor(0.4349)
 Print 1st row 2nd element of first (4,3) matrix:  tensor(0.9299)
Print 1st row first element of last (4,3) matrix:  tensor(-1.2134)


**Task 5**

In [16]:
# create a 3d tensor of channels=5, rows=6, columns=3
## print your tensor and try to access each element with indexing like we did above
## code below
my_3d_tensor = t.randn((5,6,3))
my_3d_tensor

tensor([[[-0.2304,  0.1093, -0.7692],
         [ 0.2112,  2.1351,  0.4098],
         [ 0.0526, -0.3009, -1.0388],
         [-0.2587,  1.0922, -1.8071],
         [ 1.6540, -0.6492,  0.3919],
         [-0.1536, -1.0300, -0.4620]],

        [[ 0.9418,  1.2954,  2.6528],
         [ 0.8947,  0.3401,  1.2587],
         [ 0.1300, -1.5800,  0.0461],
         [ 0.6856,  1.5014, -1.1540],
         [-0.5383, -0.0910,  0.0306],
         [-1.7414, -0.2184, -0.4240]],

        [[-1.0981, -0.1732,  1.1724],
         [ 0.4495, -0.9714, -0.1676],
         [-0.1550,  0.4984, -0.2937],
         [-1.1667, -0.9507,  2.2164],
         [-0.4626, -0.4417, -0.7422],
         [-1.2040, -0.4752,  0.4123]],

        [[-0.8407, -0.9034,  0.2357],
         [ 0.1619, -0.4692, -1.4234],
         [-0.9078, -0.3344, -1.3615],
         [-0.0365,  1.7272, -0.6513],
         [ 0.1957, -0.9921, -1.3553],
         [ 0.7376,  0.9643, -0.4064]],

        [[-0.7267, -1.1101, -0.2916],
         [ 0.4720, -0.9713, -1.1065],
    

In [17]:
# Create a tensor with a fixed value for every element
fixed_tensor = t.full((3, 3), 100)
fixed_tensor

tensor([[100, 100, 100],
        [100, 100, 100],
        [100, 100, 100]])

**Task 6**

In [18]:
# create a tensor of shape of your choice and fill with value 50 like we did above
# print your tensor
fixed_tensor_2 = t.full((4,4), 75)
fixed_tensor_2

tensor([[75, 75, 75, 75],
        [75, 75, 75, 75],
        [75, 75, 75, 75],
        [75, 75, 75, 75]])

In [19]:
# Concatenate two tensors with compatible shapes
concat_tensor = t.cat((fixed_tensor, t.full((3,3), 50)))
concat_tensor

tensor([[100, 100, 100],
        [100, 100, 100],
        [100, 100, 100],
        [ 50,  50,  50],
        [ 50,  50,  50],
        [ 50,  50,  50]])

**Task 7**

In [20]:
# create two tensors of your favorite shape and concatenate them using t.cat as we did above
# print both tensor before concatenation and also print concatenated tensor
concat_tensor_2 = t.cat((fixed_tensor_2, t.full((4,4), 56)))
concat_tensor_2

tensor([[75, 75, 75, 75],
        [75, 75, 75, 75],
        [75, 75, 75, 75],
        [75, 75, 75, 75],
        [56, 56, 56, 56],
        [56, 56, 56, 56],
        [56, 56, 56, 56],
        [56, 56, 56, 56]])

In [25]:
# Change the shape of a tensor
print(concat_tensor)
reshaped = concat_tensor.reshape(3, 3, 2)
print(reshaped)

tensor([[100, 100, 100],
        [100, 100, 100],
        [100, 100, 100],
        [ 50,  50,  50],
        [ 50,  50,  50],
        [ 50,  50,  50]])
tensor([[[100, 100],
         [100, 100],
         [100, 100]],

        [[100, 100],
         [100,  50],
         [ 50,  50]],

        [[ 50,  50],
         [ 50,  50],
         [ 50,  50]]])


**Task 8**

In [26]:
# create a tensor of shape (5,4,2) and then reshape it into some other dimension of your choice
tensor_to_reshape = t.randn((5,4,2))
tensor_reshaped = t.reshape(tensor_to_reshape, (10,4))
tensor_reshaped


tensor([[ 1.8738, -0.5317, -1.8175, -0.8872],
        [ 1.0478, -1.2008,  0.1422, -1.0126],
        [-0.2404,  0.2068,  0.0037, -0.9049],
        [-1.7139, -1.5826,  1.5638,  1.6667],
        [ 0.2593,  1.5020,  0.6265,  0.6767],
        [ 0.0904,  0.5417, -0.3495,  0.2505],
        [ 0.7648,  0.1108, -0.7218,  0.1892],
        [-0.1274,  0.0226, -0.0586, -0.7774],
        [-1.4850, -1.1488,  0.2413, -0.0959],
        [-0.5346,  1.4992, -1.7216,  1.4108]])

In [27]:
# creating tensor from numpy array
x = np.array([[1, 2], [3, 4.]])
array_to_tensor = t.from_numpy(x)
array_to_tensor

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

In [28]:
# Convert a torch tensor to a numpy array
tensor_to_array = array_to_tensor.numpy()
tensor_to_array

array([[1., 2.],
       [3., 4.]])

**Task 9**

In [30]:
# create a 2-D numpy array of your choice and then convert into tensor using from_numpy and convert tensor back to array using .numpy()
y = np.array([[6,7], [8,9]])
array_to_tensor_2 = t.from_numpy(y)
print(array_to_tensor_2)

tensor([[6, 7],
        [8, 9]])


In [31]:
tensor_to_array_2 = array_to_tensor_2.numpy()
print(tensor_to_array_2)

[[6 7]
 [8 9]]


In [32]:
# multiplying tensors
a = t.tensor([[1,2,3], [4,5,6]]) # observe the shape (2,3)
b = t.tensor([[0,1], [4,5], [1,2]]) # observe the shape (3,2)
a.matmul(b)

tensor([[11, 17],
        [26, 41]])

**Task 10**

In [34]:
# create two tensors like we did above and multiply them using matmul
c = t.tensor([[2,3], [4,5], [8,3]]) # observe the shape (3,2)
d = t.tensor([[0,1,5], [4,5,6]]) # observe the shape (2,3)
print(c)
print(d)

tensor([[2, 3],
        [4, 5],
        [8, 3]])
tensor([[0, 1, 5],
        [4, 5, 6]])


In [35]:
c.matmul(d)

tensor([[12, 17, 28],
        [20, 29, 50],
        [12, 23, 58]])

In [36]:
# check it's documentation
? t.linalg.matrix_power

In [37]:
# To compute nth power of a square matrix
sq_matrix = t.ones((2,2))
print("sq matrix: ", sq_matrix)
t.linalg.matrix_power(sq_matrix, 3)

sq matrix:  tensor([[1., 1.],
        [1., 1.]])


tensor([[4., 4.],
        [4., 4.]])

**Tensor Operations and How to do Gradients**

In [38]:
# Create tensors.
x = t.tensor(7.)
w = t.tensor(4., requires_grad=True)
b = t.tensor(3., requires_grad=True)
x, w, b

(tensor(7.), tensor(4., requires_grad=True), tensor(3., requires_grad=True))

In [39]:
# Making the equation by arithmetic operation
y = w * x + b
y

tensor(31., grad_fn=<AddBackward0>)

In [40]:
# Computing derivatives
y.backward()

In [41]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(7.)
dy/db: tensor(1.)


As expected, `dy/dw` has the same value as `x`, i.e., `7`, and `dy/db` has the value `1`. Note that `x.grad` is `None` because `x` doesn't have `requires_grad` set to `True`.

The "grad" in `w.grad` is short for _gradient_, which is another term for derivative. The term _gradient_ is primarily used while dealing with vectors and matrices.

**Task 11**

In [42]:
# create 3 tensors x,w,b of your choice and then compute y=w*x + b. After defining y, compute gradients like we did in previous steps.
# observe all the steps of tensor operations and gradients. Look into the documentation of tensor if needed.
x = t.tensor(8.)
w = t.tensor(5., requires_grad=True)
b = t.tensor(6., requires_grad=True)
x, w, b

(tensor(8.), tensor(5., requires_grad=True), tensor(6., requires_grad=True))

In [43]:
y = w*x + b
y

tensor(46., grad_fn=<AddBackward0>)

In [44]:
y.backward()

In [45]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(8.)
dy/db: tensor(1.)


**Task 12**

Understand the whole Linear Regression From Scratch in next section. Then implement the same by taking input as area of the house and output as price of the house. You can take some random inputs of your own choice just like we did.

**Linear Regression From Scratch**



In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
percent  = w11 * maths + w12 * physics + w13 * chemistry + b1

```

![linear-regression-graph](https://i.stack.imgur.com/O5036.png)

In [46]:
# Here i'm taking marks of maths, physics, chemistry as input and i'm creating an array of it
marks = np.array([[60, 70, 75],
                  [70, 75, 80],
                  [70, 72, 77],
                  [90, 91, 95],
                  [95, 95, 99],
                  [50, 60, 65]], dtype='float32')

# target is percentage which a person will get
percent = np.array([68.3, 75.0, 76.3, 92.0, 96.3, 58.3], dtype='float32')

In [47]:
# converting array into tensor
marks = t.from_numpy(marks)
percent = t.from_numpy(percent)

# printing marks and percentage both
print(marks)
print(percent)

tensor([[60., 70., 75.],
        [70., 75., 80.],
        [70., 72., 77.],
        [90., 91., 95.],
        [95., 95., 99.],
        [50., 60., 65.]])
tensor([68.3000, 75.0000, 76.3000, 92.0000, 96.3000, 58.3000])


The weights and biases (w11, w12,... w23, b1 & b2) can also be represented as matrices, initialized as random values. The first row of w and the first element of b are used to predict the first target variable which is percent.

Notice, our input is having 3 features, and output is 1, so our weight will be (1,3) shape and bias will be 1.

In [49]:
# creating weights and biases using randn function
marks_wt = t.randn(1, 3, requires_grad=True)
marks_bias = t.randn(1, requires_grad=True)
print(marks_bias)
print(marks_wt)

tensor([0.0031], requires_grad=True)
tensor([[ 0.2833, -0.3065,  0.8854]], requires_grad=True)


In [50]:
# creating the model
#  @ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor.
def marks_model(x):
  return x @ marks_wt.t() + marks_bias

In [51]:
# Generate predictions
percent_pred = marks_model(marks)
print(percent_pred)

tensor([[61.9494],
        [67.6768],
        [65.9402],
        [81.7194],
        [85.4514],
        [53.3276]], grad_fn=<AddBackward0>)


In [52]:
# Compare with percent
print(percent)

tensor([68.3000, 75.0000, 76.3000, 92.0000, 96.3000, 58.3000])


You can see a big difference between our model's predictions and the actual targets because we've initialized our model with random weights and biases. Obviously, we can't expect a randomly initialized model to just work.


## Loss function

How to define a loss function ?


* Calculate the difference between the two matrices (`marks` and `percent`).
* Square all elements of the difference matrix to remove negative values.
* Calculate the average of the elements in the resulting matrix.

You will get in the end,  **mean squared error** (MSE).

In [53]:
# t.sum returns the sum of all the elements in a tensor.
# The .numel method of a tensor returns the number of elements in a tensor.
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return t.sum(diff * diff) / diff.numel()

In [54]:
# Let's check the loss
percent_loss = mse(percent_pred, percent)
print(percent_loss)

tensor(363.5859, grad_fn=<DivBackward0>)


Here’s how we can interpret the result: On average, each element in the prediction differs from the actual target by the square root of the loss.



```
# This is formatted as code
```

## Compute gradients

With PyTorch, we can automatically compute the gradient or derivative of the loss w.r.t. to the weights and biases because they have requires_grad set to True. We'll see how this is useful in just a moment.

In [55]:
# Compute gradients
percent_loss.backward()

In [56]:
# check weights and it's gradient
print(marks_wt)
print(marks_wt.grad)

tensor([[ 0.2833, -0.3065,  0.8854]], requires_grad=True)
tensor([[ -863.4450, -1020.3940, -1107.8762]])


## How to adjust weights and biases to reduce the loss?

If a gradient element is **positive**:

* **increasing** the weight element's value slightly will **increase** the loss, and will go towards local maxima.
* **decreasing** the weight element's value slightly will **decrease** the loss and will go towards local minima.

![postive-gradient](https://i.imgur.com/WLzJ4xP.png)

If a gradient element is **negative**:


* **increasing** the weight element's value slightly will **decrease** the loss, will go towards local minima.
* **decreasing** the weight element's value slightly will **increase** the loss, will go towards local maxima.

![negative=gradient](https://i.imgur.com/dvG2fxU.png)

In 3-D, local minima and global minima will look like this:
![3-D=gradient](https://blog.paperspace.com/content/images/2018/05/challenges-1.png)


In [57]:
# We can subtract from each weight element a small quantity proportional to the derivative of the loss w.r.t. that element to reduce the loss slightly.
with t.no_grad():

    marks_wt -= marks_wt.grad * 1e-5
    marks_bias -= marks_bias.grad * 1e-5
    marks_wt.grad.zero_() # note this step of resetting the gradient to zero
    marks_bias.grad.zero_() # We need to do this because PyTorch accumulates gradients, which may lead to unexpected results.

Here 1e-5 is the learning rate.
We multiply the gradients with a very small number (10^-5 in this case) to ensure that we don't modify the weights by a very large amount. We want to take a small step in the downhill direction of the gradient, not a giant leap. This number is called the learning rate of the algorithm.

We use torch.no_grad to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the weights and biases.

In [58]:
# Let's verify that the loss is actually lower
percent_pred = marks_model(marks)
percent_loss = mse(percent_pred, percent)
print(percent_loss)

tensor(338.9765, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the .zero_() method. We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke .backward on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [59]:
# You can check if gradients are set to zero or not
print(marks_wt.grad)
print(marks_bias.grad)

tensor([[0., 0., 0.]])
tensor([0.])


## Training the model with these steps:


1) Generating predictions

2) Calculating the loss

3) Compute gradients w.r.t the weights and biases

4) Adjust the weights by subtracting a small quantity proportional to the gradient

5) Reset the gradients to zero

In [60]:
# Generate predictions
percent_pred = marks_model(marks)
print(percent_pred)

tensor([[64.0128],
        [69.9330],
        [68.1325],
        [84.4777],
        [88.3381],
        [55.0918]], grad_fn=<AddBackward0>)


In [61]:
# Calculate the loss
percent_loss = mse(percent_pred, percent)
print(percent_loss)

tensor(338.9765, grad_fn=<DivBackward0>)


In [62]:
# Compute gradients and print gradients of weight and bias
percent_loss.backward()
print(marks_wt.grad)
print(marks_bias.grad)

tensor([[-514.8153, -652.8641, -719.0262]])
tensor([-12.0714])


Let's update the weights and biases using the gradients computed above.

In [63]:
# Adjust weights & reset gradients
with t.no_grad():
    marks_wt -= marks_wt.grad * 1e-5
    marks_bias -= marks_bias.grad * 1e-5
    marks_wt.grad.zero_()
    marks_bias.grad.zero_()

In [64]:
# looking at new weight and bias
print(marks_wt)
print(marks_bias)

tensor([[ 0.2971, -0.2898,  0.9037]], requires_grad=True)
tensor([0.0034], requires_grad=True)


With the new weights and biases, the model should have a lower loss

In [65]:
# Calculate loss
preds = marks_model(marks)
loss = mse(preds, percent)
print(loss)

tensor(329.0998, grad_fn=<DivBackward0>)


We have achieved a little reduction in the loss merely by adjusting the weights and biases slightly using gradient descent.

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an epoch. Let's train the model for 50 epochs.

In [66]:
# training for multiple epochs

for i in range(150):
  percent_pred = marks_model(marks)
  percent_loss = mse(percent_pred, percent)
  percent_loss.backward()
  with t.no_grad():
    marks_wt -= marks_wt.grad * 1e-5
    marks_bias -= marks_bias.grad * 1e-5
    marks_wt.grad.zero_()
    marks_bias.grad.zero_()

In [67]:
# calculate loss

percent_pred = marks_model(marks)
percent_loss = mse(percent_pred, percent)
print(percent_loss)

tensor(308.2213, grad_fn=<DivBackward0>)


The loss is now much lower than its initial value. Let's look at the model's predictions and compare them with the targets.

In [68]:
# result is good and we can make it better with more epochs
print(percent_pred)
print(percent)

tensor([[68.3916],
        [73.9884],
        [71.7503],
        [88.6667],
        [92.5842],
        [59.0645]], grad_fn=<AddBackward0>)
tensor([68.3000, 75.0000, 76.3000, 92.0000, 96.3000, 58.3000])


**FEEDBACK FORM**

Please help us to improve by filling this form.
https://forms.zohopublic.in/cloudyml/form/CloudyMLDeepLearningFeedbackForm/formperma/VCFbldnXAnbcgAIl0lWv2blgHdSldheO4RfktMdgK7s

![](https://images.freeimages.com/images/large-previews/737/track-finish-1442273.jpg)