# **HW 1. PyTorch practice**
### Course: Deep Learning (DSBA and ICEF), 2025, HSE
### Authors: Alexey Boldyrev, ML Teaching Team

Issue Date: 29.01.2025

Deadline: \
(Soft) 23:59 MSK 04.02.2025 \
(Hard) 18:00 MSK 05.02.2025

Authors: Alexey Boldyrev, ML Teaching Team

### About the assignment

The assignment is in two parts. The first part (PyTorch basics) is not assessed, but if you are not familiar with PyTorch or are unsure, we strongly recommend that you start with it. The second part (Practice) has graded and ungraded exercises on working with tensors and building your first neural network.\
The goal of the assignment is to gain practical skills in working with PyTorch.

### Assessment and penalties
Each of the tasks has a certain “cost” (indicated in brackets near the task). The maximum allowable grade for a task is 20 points.

You may not turn in an assignment after a strict (Hard) deadline. In case of assigning an incomplete grade for a task due to errors, the reviewer has the opportunity to correct the work under the conditions specified in the response letter.

The assignment is completed independently. “Similar” solutions are considered plagiarism and all students involved (including those who have been copied from) cannot receive more than 0 points for it. If you have found a solution to any of the assignments (or part of it) in an open source, you must provide a link to that source (most likely you will not be the only one who found it, so to exclude suspicion of plagiarism, a link to the source is required).

If you used a generative model (large language model like ChatGPT, DeepSeek, built-in Colab generative AI, or similar) for finding a solution, you need to provide the link to the service used (browser verion, tg bot, etc.), network name, the prompt(s) used and how you evaluate the performance of this model. Without this reflection the home assignment won't be graded.

### Format of submission
Assignments are submitted through the Smart LMS system here https://edu.hse.ru/mod/assign/view.php?id=1544066. You should send a notebook with the completed assignment. Name the notebook itself in the format **HW1-pytorch-NameLastname.ipynb**, where Name and Lastname are your first and last name.

For ease of checking yourself, calculate your maximum grade (based on the set of problems solved) and indicate it below.

Score: **xx**

## 0. PyTorch basics

Inspired by https://nrehiew.github.io/blog/pytorch/

Load necessary libraries:

In [1]:
import torch
import torch.optim as optim
from torch import nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import random

PyTorch uses its own data representation (i.e. **tensor** objects) for several important reasons that are key to its purpose as an efficient, flexible, and scalable deep learning framework.

PyTorch's custom data representation enables
- High performance on hardware accelerators.
- Support for automatic differentiation.
- Flexibility through dynamic computational graphs.
- Advanced multidimensional data manipulation.
- Multi-device scalability.

These features make PyTorch tensors ideal for deep learning, unlike traditional Python data structures such as lists or arrays.

In the context of machine learning, a tensor is an *n*-dimensional matrix. OK, but what is a `torch.tensor`? More specifically, what actually happens when the following piece of code is executed: `a = torch.tensor(1.0, requires_grad=True)`? It turns out that PyTorch allocates the data on the heap and returns the pointer to that data as a shared pointer (see more [here](https://discuss.pytorch.org/t/where-does-torch-tensor-create-the-object-stack-or-heap/182753)). To better understand pointers in PyTorch, let's look at some examples.

The following cell creates a tensor with shape `(2, 6)`, two rows and six columns, containing values randomly distributed according to a normal distribution with mean zero and standard deviation one.

In [2]:
features = torch.randn((2, 6))
print(features)

tensor([[ 1.1369, -0.0412,  0.7060,  1.1004, -0.2327,  0.3416],
        [ 0.2624,  0.5259,  0.3270, -0.4666, -0.4540, -0.8650]])


The next cell creates another tensor with the same shape as the features, again containing values from a normal distribution.

In [3]:
weights = torch.randn_like(features)
print(weights)

tensor([[-1.3841, -0.8045, -1.1129,  0.7401, -0.8318,  0.7168],
        [-1.8798,  3.0728,  0.6047,  0.6948,  0.0504,  0.1118]])


PyTorch tensors can be added, multiplied, subtracted, etc., just like Numpy arrays. In general, you'll use PyTorch tensors pretty much the same way you use Numpy arrays.

In [4]:
a = torch.arange(9).reshape(3, 3) # 3x3 tensor
print(a)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])


Original and derived tensors in PyTorch are objects linked to the same memory area:

In [5]:
b = a.t() # Transpose
b[0, 0] = 123
print(a, '\n\n', b)

tensor([[123,   1,   2],
        [  3,   4,   5],
        [  6,   7,   8]]) 

 tensor([[123,   3,   6],
        [  1,   4,   7],
        [  2,   5,   8]])


Note that `.t()` returns a pointer, which means that both a and b point to the same underlying data, and any changes to that underlying data can be seen from both pointers.

Compare with:

In [6]:
a = torch.arange(9).reshape(3, 3) # torch.int64
b = a.to(torch.float16) #
b[0][0] = 42
print(a[0][0].item(), b[0][0].item())
print(a,'\n\n', b)

0 42.0
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]) 

 tensor([[42.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.]], dtype=torch.float16)


PyTorch needs to represent a `float16` differently to how `int64` is represented in memory. So in this example PyTorch needs to make a copy of the data and represent it differently.

### PyTorch broadcasting

If necessary, check the PyTorch [broadcasting semantics](https://pytorch.org/docs/stable/notes/broadcasting.html#general-semantics).


In [7]:
a = torch.tensor([1, 2]).reshape(1, 2) # 1 x 2
b = torch.tensor([[3, 4], [5, 6]]) # 2 x 2
c = torch.zeros((2, 2)) # we know that a + b gives a 2x2 tensor

In [8]:
print(a)
print(b)
print(c)

tensor([[1, 2]])
tensor([[3, 4],
        [5, 6]])
tensor([[0., 0.],
        [0., 0.]])


In [9]:
c[0][0] = a[0][0] + b[0][0]
c[0][1] = a[0][1] + b[0][1]
print(c)

tensor([[4., 6.],
        [0., 0.]])


In [10]:
c[1][0] = a[0][0] + b[1][0]
c[1][1] = a[0][1] + b[1][1]
print(c)

tensor([[4., 6.],
        [6., 8.]])


In [11]:
print(a+b)

tensor([[4, 6],
        [6, 8]])


In [12]:
torch.equal(a + b, c) # true

True

#### Matrix Multiplication

In [13]:
a = torch.randn((3, 4, 1, 2)) # 3 x 4 x 1 x 2
b = torch.randn((1, 2, 3)) # 1 x 2 x 3
print(a)
print(b)

tensor([[[[-0.1275, -1.1976]],

         [[ 0.9557, -0.1067]],

         [[-0.3287,  1.1247]],

         [[-0.0341, -0.4208]]],


        [[[ 0.4082, -1.0361]],

         [[ 0.1449,  0.9473]],

         [[-0.0441,  0.0817]],

         [[ 0.7565,  2.0654]]],


        [[[ 0.8078,  1.0138]],

         [[ 0.6545,  0.4114]],

         [[-1.1439, -1.0106]],

         [[-1.4946, -0.3072]]]])
tensor([[[ 0.7409,  0.6773,  1.9137],
         [-1.2731, -1.2322,  0.5689]]])


In [14]:
c = torch.zeros((3, 4, 1, 3))
print(c)

tensor([[[[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]]],


        [[[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]]],


        [[[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]],

         [[0., 0., 0.]]]])


In [15]:
for i in range(3):
	print(i)
	for j in range(4):
		print(j)
		a_slice = a[i][j]
		print(a_slice)
		b_slice = b[0]
		print(b_slice)
		c[i][j] = a_slice @ b_slice

print(torch.matmul(a, b).shape, c.shape)

0
0
tensor([[-0.1275, -1.1976]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
1
tensor([[ 0.9557, -0.1067]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
2
tensor([[-0.3287,  1.1247]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
3
tensor([[-0.0341, -0.4208]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
1
0
tensor([[ 0.4082, -1.0361]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
1
tensor([[0.1449, 0.9473]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
2
tensor([[-0.0441,  0.0817]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
3
tensor([[0.7565, 2.0654]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
2
0
tensor([[0.8078, 1.0138]])
tensor([[ 0.7409,  0.6773,  1.9137],
        [-1.2731, -1.2322,  0.5689]])
1
tensor([[0.6545, 0.4114]])
tensor([[ 0.7409,

### Backpropagation

The core of PyTorch is its automatic differentiation engine. In general terms, each time a differentiable operation between 2 tensors occurs, PyTorch will automatically build the entire computational graph through a callback function. The gradient of each tensor is then updated when `.backward()` is called. This is PyTorch's biggest abstraction. Often `.backward()` is called and we just hope that the gradients flow properly. In this section I will try to build some intuition for visualising gradient flows.

In [16]:
x = torch.tensor([2.0], requires_grad=True)
print(x)

# Define a function y = x^2
y = x**2
print(y)
# Compute the gradient (dy/dx)
y.backward()
z = y.backward
print(z)

# x.grad will hold the derivative of y with respect to x
print(x.grad)  # Output: tensor([4.])

tensor([2.], requires_grad=True)
tensor([4.], grad_fn=<PowBackward0>)
<bound method Tensor.backward of tensor([4.], grad_fn=<PowBackward0>)>
tensor([4.])


A common problem in trying to understand backpropagation is that most people understand derivatives and the chain rule for scalars, but how this translates to higher dimensional tensors is not particularly obvious.

Thinking about gradients from this local scalar perspective has the added benefit of making the effect of tensor operations on gradients intuitive. For example, operations such as `.reshape()`, `.transpose()`, `.cat()` and `.split()` do not affect the single value and its gradient on a local scale. It follows naturally that the effect of these operations on the gradient tensor of a tensor is the operation itself. For example, flattening a tensor with `.reshape(-1)` will have the same effect on its gradient as calling `.reshape(-1)`.

In [17]:
z = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)  # Shape: (2, 2)

# Define a vector-valued function: f(z) = z^3 + 2z
f = z**3 + 2 * z  # Element-wise
print(f)

# Compute gradients by summing over all outputs (scalarization)
# The .backward() method requires scalar output, hence `sum()`
f.sum().backward()


# Gradients of f w.r.t the input z
print(z.grad)  # Contains derivatives for each element in z

tensor([[ 3., 12.],
        [33., 72.]], grad_fn=<AddBackward0>)
tensor([[ 5., 14.],
        [29., 50.]])


PyTorch also supports higher-order derivatives (e.g., second or third derivatives). To compute these, you need to set `create_graph=True` for the first derivative computation. For example:

In [18]:
x = torch.tensor([2.0], requires_grad=True)

f = x**3

# Compute first derivative df/dx using autograd.grad
first_derivative = torch.autograd.grad(f, x, create_graph=True)[0]

print("First derivative:", first_derivative)

# Compute second derivative d^2f/dx^2
second_derivative = torch.autograd.grad(first_derivative, x)[0]

print("Second derivative:", second_derivative)

First derivative: tensor([12.], grad_fn=<MulBackward0>)
Second derivative: tensor([12.])


To compute the Jacobian (partial derivatives of each output w.r.t each input):

In [19]:
x = torch.tensor([1.0, 2.0], requires_grad=True)

# Define a vector-valued function
y = torch.stack([x[0]**2, x[1]**3])  # Outputs: [x[0]^2, x[1]^3]

# Compute Jacobian (manually populate each row)
jacobian = []
for i in range(y.size(0)):
    x.grad = None  # Clear previous gradients
    y[i].backward(retain_graph=True)  # Compute partial derivatives
    jacobian.append(x.grad.clone())  # Store the gradient for output i

jacobian = torch.stack(jacobian)

print("Jacobian matrix:")
print(jacobian)  # Displays the partial derivatives

Jacobian matrix:
tensor([[ 2.,  0.],
        [ 0., 12.]])


## 1. Practice

### Documentation
Please use the PyTorch documentation in case of difficulties:
* The documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor)
* The documentation on [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics)


### [0 points] Create a random tensor with shape `(5, 5)`.

In [20]:
t1= torch.randn((5, 5))
print(t1)

tensor([[ 0.1950,  0.7289, -0.6164, -2.0146, -1.1022],
        [ 0.7565, -0.0824,  0.0985, -1.0250,  0.4488],
        [ 0.1991,  0.2152, -1.2486,  1.6180,  3.1256],
        [-0.7890, -1.2829, -1.5522, -1.2321,  0.8292],
        [-0.8057, -1.7435, -0.6823, -0.1751,  0.7122]])


### [0 points] Perform matrix multiplication of the resulting tensor with another random tensor of shape `(1, 5)`.
Hint: you may have to transpose the second tensor.

In [21]:
t2 = torch.randn((1,5))
print(t2)

tensor([[-1.9558, -0.1168,  1.0211,  0.6961,  0.1479]])


As we cant multiply mat5x5 with mat1x5-lets transpose matrix 1x5 to be matrix 5x1

In [22]:
t2 = t2.T
print(t2)

tensor([[-1.9558],
        [-0.1168],
        [ 1.0211],
        [ 0.6961],
        [ 0.1479]])


In [23]:
t3 = t1 @ t2
print(t3)

tensor([[-2.6613],
        [-2.0165],
        [-0.1010],
        [-0.6270],
        [ 1.0663]])


### [1 point] Set the random seed to `42` and repeat code in two previous cells.
The output should be:\
```tensor([[1.1697],
        [0.9278],
        [0.9534],
        [0.6976],
        [0.7460]])```

In [24]:
torch.manual_seed(42)

<torch._C.Generator at 0x7cc384aadff0>

In [25]:
t1 = torch.rand(5, 5)
t2 = torch.rand(1, 5)
result = t1 @ t2.T
print(t1 @ t2.T)

tensor([[1.1697],
        [0.9278],
        [0.9534],
        [0.6976],
        [0.7460]])


### [1 point] Set random seed on the GPU.

Hint: You'll need to read the [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics) documentation for this.

In [26]:
torch.cuda.manual_seed(42)

### [1 point] Create two random tensors of shape `(3, 4)` and send them both to the GPU. Set random seed `123` when creating the tensors (this doesn't have to be the GPU random seed).

Hint: The output should contain `device='cuda:0'`.
Hint: How to access GPU in Colab?
Setting up the Runtime: In Google Colab, go to the "Runtime" menu and select "Change runtime type." A dialog box will appear where you can choose the runtime type and hardware accelerator. Select "T4 GPU" as the hardware accelerator and click "Save." This step ensures that your Colab notebook is configured to use the GPU.



In [27]:
torch.manual_seed(123)

t1 = torch.rand(3, 4)
t2 = torch.rand(3, 4)
check = torch.device("cuda" if torch.cuda.is_available() else "cpu")
t1 = t1.to(check)
t2 = t2.to(check)

In [28]:
print(f'Сheck:{t1.device}, {t2.device}')


Сheck:cuda:0, cuda:0


### [0 points] Perform a matrix multiplication on the tensors you created in previous cell.
Hint: you may have to adjust the shapes of one of the tensors.


In [29]:
t2_T = t2.T
mul = t1 @ t2_T
print(mul)

tensor([[0.4802, 1.0135, 1.3960],
        [0.2604, 0.8456, 0.7648],
        [0.5326, 1.1972, 1.4639]], device='cuda:0')


### [0 points] Find the minimum and maximum values of the output of previous cell.

In [30]:
print(f'Maximum-{mul.max()}')
print(f'Minimum-{mul.min()}')

Maximum-1.4639008045196533
Minimum-0.2603716552257538


### [0 points] Find the indices of these minimum and maximum values.

In [31]:
print(f'Maximum-{torch.argmax(mul)}')
print(f'Minimum-{torch.argmin(mul)}')
print('-' * 10)
print(f'Maximum-{torch.unravel_index(torch.argmax(mul), mul.shape)}')
print(f'Minimum-{torch.unravel_index(torch.argmin(mul), mul.shape)}')

Maximum-8
Minimum-3
----------
Maximum-(tensor(2, device='cuda:0'), tensor(2, device='cuda:0'))
Minimum-(tensor(1, device='cuda:0'), tensor(0, device='cuda:0'))


### [2 points] Make a random tensor with shape `(1, 1, 1, 8)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(8)`. Set the seed to `999` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

In [32]:
torch.manual_seed(999)

t = torch.rand(1, 1, 1, 8)
print("Original: \n", t)
print("Shape:", t.shape)
print('-' * 15)
t = t.squeeze()
print("fixed: \n", t)
print("Shape:", t.shape)

Original: 
 tensor([[[[0.6776, 0.6531, 0.0457, 0.9424, 0.4925, 0.9985, 0.7585, 0.0317]]]])
Shape: torch.Size([1, 1, 1, 8])
---------------
fixed: 
 tensor([0.6776, 0.6531, 0.0457, 0.9424, 0.4925, 0.9985, 0.7585, 0.0317])
Shape: torch.Size([8])


### [1 point] Print the index of the maximum value of the second tensor.

In [33]:
print(f'Maximum-{torch.argmax(t)}')

Maximum-5


### Your first neural network


In this exercise you will implement a small neural network with two linear layers. The first layer takes an eight-dimensional input and the last layer outputs a one-dimensional tensor.
The following exercices are inspired by the DataCamp course [Introduction to Deep Learning with PyTorch](https://app.datacamp.com/learn/courses/introduction-to-deep-learning-with-pytorch).

### [0 points] Create a neural network of **two linear layers** that takes `input_tensor` as input, representing `8` features, and outputs a tensor of dimensions `1`.
Hint: use [`torch.nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) and [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html).

In [34]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2]])

model = nn.Sequential(
    nn.Linear(8, 8),  # я не понял какой аутпут должен быть у первого слоя-поэтому оставил 8
    nn.Linear(8, 1)
)

output = model(input_tensor)
print(output)


tensor([[-1.2518]], grad_fn=<AddmmBackward0>)


### [0 points] Create a [`torch.nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html) and apply it on `input_tensor` to generate a probability for a binary classification task.

In [35]:
input_tensor = torch.tensor([[1.3]])

sigmoid = nn.Sigmoid()
probability = sigmoid(input_tensor)

print(probability)

tensor([[0.7858]])


### [0 points] Create a [`torch.nn.Softmax`](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) and apply it on `input_tensor` to generate a probability for a multiclass classification task.

In [36]:
input_tensor = torch.tensor([[6.6, -3.2, -4.3, 0.3, -0.7, -4.7]])

softmax = nn.Softmax(dim=1)

probabilities = softmax(input_tensor)
print(probabilities)

tensor([[9.9741e-01, 5.5308e-05, 1.8410e-05, 1.8315e-03, 6.7379e-04, 1.2341e-05]])


### [0 points] How can you avoid the following warning message when calling the softmax function from PyTorch?

> UserWarning: Implicit dimension choice for softmax has been deprecated.



>**Use dim=1 to insure thaw we are working with features dimension**


### Neural Networks for Classification and Regression

### [2 points] Create a neural network that takes a tensor of dimension `1x8` as input and returns an output of the correct shape for binary classification.

In [37]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2]])


model = nn.Sequential(
    nn.Linear(8, 1),
    nn.Sigmoid()
)


output = model(input_tensor)
print(output)

tensor([[0.8761]], grad_fn=<SigmoidBackward0>)


### [2 points] Create a 4-layer linear neural network compatible with input_tensor as input and a regression value as output.

In [38]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2, 6, 2]])

# сделал условное уменьшение размера признаков ибо не знаю как правильно тут :D
model = nn.Sequential(
    nn.Linear(10, 8),
    nn.Linear(8, 6),
    nn.Linear(6, 4),
    nn.Linear(4, 1)
)


output = model(input_tensor)
print(output)

tensor([[-0.0894]], grad_fn=<AddmmBackward0>)


### [1 point] Create a one-hot encoded vector of the ground truth label `y` using [`torch.nn.functional.one_hot`](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html).

In [39]:
y = 1
num_classes = 4

one_hot = F.one_hot(torch.tensor(y), num_classes=num_classes)

### [2 points] Calculate the cross entropy loss using the one-hot encoded vector of the ground truth label `y`, with `4` features (one for each class).

In [40]:
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[3.1, -5.0, 1.8, 4.3]])

ohe = F.one_hot(torch.tensor(y), num_classes=4)
ohe = ohe.float()

criterion = CrossEntropyLoss()

loss = criterion(scores, ohe)

print(loss)

tensor(2.8245)


### [1 point] Accessing the model parameters.
Hint: model parameters are weights and biases.\
Hint: use [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) documentation.\
Hint: try to discover [PyTorch discussion forum](https://discuss.pytorch.org/t/access-weights-of-a-specific-module-in-nn-sequential/3627).

In [41]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 4)
                     )

weight_0 = model[0].weight
bias_1 = model[1].bias


print(weight_0, bias_1)

Parameter containing:
tensor([[ 0.0650, -0.1691,  0.2195, -0.1818,  0.1693, -0.2301, -0.1864,  0.0435,
         -0.0817,  0.2321,  0.2254, -0.2284, -0.0452, -0.0728, -0.1676,  0.0484],
        [-0.0461,  0.0330, -0.1811,  0.1718,  0.1790, -0.0036, -0.0160,  0.1063,
         -0.1950, -0.1921, -0.0919,  0.0159,  0.2449,  0.2384,  0.1306, -0.0911],
        [-0.2396,  0.1677, -0.2184,  0.0946, -0.0969, -0.0373, -0.1033,  0.0782,
          0.1053,  0.1502,  0.0607,  0.0010,  0.0079, -0.1084, -0.0202,  0.1200],
        [-0.1736,  0.2232,  0.2257,  0.1896, -0.2084, -0.1083,  0.1057,  0.0876,
         -0.1075,  0.1114, -0.0726, -0.1333, -0.0332,  0.0384,  0.0225, -0.2013],
        [ 0.2192,  0.1386,  0.2435,  0.1542, -0.1793,  0.1354, -0.0907, -0.0218,
         -0.1648, -0.2149,  0.0522,  0.1983, -0.1658,  0.2426,  0.1634,  0.0017],
        [ 0.0899, -0.0246,  0.2204, -0.2235, -0.1836,  0.2353, -0.0810, -0.2213,
          0.0012, -0.0274,  0.2292, -0.1078, -0.2320,  0.1670, -0.0671, -0.1283],


### [3 point] Create a 3-layer neural network.



In [42]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2, 6, 2, 4, 1, 4, 0, 5, 12]])

model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2),
    nn.Softmax(dim=1)
)


prediction = model(input_tensor)

criterion = CrossEntropyLoss()
target = torch.tensor([[1., 0.]])

loss = criterion(prediction, target)


loss.backward()
print("Prediction:", prediction)
print("Loss:", loss)

Prediction: tensor([[0.2097, 0.7903]], grad_fn=<SoftmaxBackward0>)
Loss: tensor(1.0250, grad_fn=<DivBackward1>)


### [2 points] Update the weights of the first layer using the gradients scaled by the learning rate `0.001`.

In [43]:
lr = 0.001

weight = model[0].weight
print(weight)

grads = model[0].weight.grad
with torch.no_grad():
    weight -= lr * grads
print(weight)


Parameter containing:
tensor([[ 0.2480, -0.1458,  0.1999, -0.0229,  0.1366,  0.2351, -0.1843,  0.0115,
          0.0452,  0.0154, -0.0792, -0.1693, -0.1519, -0.1825, -0.0966,  0.2236],
        [ 0.0460, -0.1317,  0.2442,  0.0726,  0.1820, -0.0207,  0.0685, -0.1216,
          0.0282,  0.0315,  0.1285,  0.1125,  0.1011,  0.0474, -0.1298,  0.0819],
        [ 0.2447,  0.1693,  0.2194,  0.2487, -0.0641,  0.1796,  0.0121,  0.1662,
         -0.1202, -0.0981, -0.1119,  0.1198,  0.0352,  0.1727, -0.1446,  0.1442],
        [-0.1537,  0.0586, -0.0470,  0.0703,  0.2457, -0.0713,  0.1364, -0.0197,
         -0.2254,  0.0879, -0.1380, -0.1945,  0.1994,  0.0540,  0.0227,  0.0071],
        [ 0.0044,  0.2260, -0.1922,  0.1138,  0.1870,  0.0366, -0.2456, -0.1633,
          0.2149, -0.2070,  0.2131,  0.0132, -0.1513,  0.2164,  0.1883, -0.0600],
        [ 0.0658, -0.2168, -0.0010,  0.0626,  0.1958,  0.0964, -0.0696,  0.1542,
          0.2296, -0.0402,  0.1443, -0.0332, -0.0478, -0.2487,  0.0484,  0.0335],


### [1 point] Update the model's parameters using the [`torch.optim.SGD`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) optimizer and the learning rate `0.001`.

In [44]:
def train(model, input_tensor, target, criterion, optimizer, n_epoch):
    for epoch in range(n_epoch):
        optimizer.zero_grad()
        y_pred = model(input_tensor)
        loss = criterion(y_pred, target)
        print(f'Epoch {epoch+1}/{n_epoch}, Loss = {loss.item()}')
        loss.backward()
        optimizer.step()

    return model

In [45]:
#optimizer = optim.SGD(model.parameters(), lr=lr)

#optimizer.zero_grad()
#prediction = model(input_tensor)
#loss = criterion(prediction, target)
#loss.backward()
#optimizer.step()
#optimizer.zero_grad()

#print("Final Prediction:", prediction)
#print("Final Loss:", loss.item())

Final Prediction: tensor([[0.2121, 0.7879]], grad_fn=<SoftmaxBackward0>)
Final Loss: 1.0218780040740967


In [49]:
model = nn.Sequential(
    nn.Linear(16, 8),
    nn.Linear(8, 4),
    nn.Linear(4, 2),
    nn.Softmax(dim=1)
)
optimizer = optim.SGD(model.parameters(), lr=0.001)
criterion = CrossEntropyLoss()

input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2, 6, 2, 4, 1, 4, 0, 5, 12]])
target = torch.tensor([1])

model = train(model, input_tensor, target, criterion, optimizer, 20)

Epoch 1/20, Loss = 0.547433614730835
Epoch 2/20, Loss = 0.5449258685112
Epoch 3/20, Loss = 0.5424408316612244
Epoch 4/20, Loss = 0.5399786829948425
Epoch 5/20, Loss = 0.5375396013259888
Epoch 6/20, Loss = 0.5351235866546631
Epoch 7/20, Loss = 0.532730758190155
Epoch 8/20, Loss = 0.5303610563278198
Epoch 9/20, Loss = 0.5280147194862366
Epoch 10/20, Loss = 0.52569180727005
Epoch 11/20, Loss = 0.5233920812606812
Epoch 12/20, Loss = 0.5211158990859985
Epoch 13/20, Loss = 0.5188630819320679
Epoch 14/20, Loss = 0.5166337490081787
Epoch 15/20, Loss = 0.5144277811050415
Epoch 16/20, Loss = 0.5122451782226562
Epoch 17/20, Loss = 0.510085940361023
Epoch 18/20, Loss = 0.5079500675201416
Epoch 19/20, Loss = 0.5058375597000122
Epoch 20/20, Loss = 0.5037481188774109


That's all for this PyTorch practice.

**Don't forget to rename the jupyter notebook to HW1-pytorch-NameLastname.ipynb, where Name and Lastname are your first and last name. Then send a jupyter notebook file to SmartLMS.**