In [1]:
!pip install numpy
!pip install torch



# 1. What is a PyTorch tensor?

The most basic data structure in PyTorch is the tensor

In [None]:
import torch

In [None]:
ten = torch.tensor([[[1], [2], [3]], [[4], [5], [6]]])

In [None]:
ten

To understand the shape of a tensor, read from the outside. In this example:
[[[1], [2],[3]], [[4], [5], [6]]]
- the outermost list has 2 items
- each of those 2 items has 3 items internally
- each of those 3 items has 1 item internally

In [None]:
ten.shape

In [None]:
list(ten.shape)

In [None]:
list(ten.shape) [1]

**Exercise 1: Create a PyTorch Tensor and Check Its Shape**
Use the following nested list as your data:
[[[7], [8]], [[9], [10]], [[11], [12]]]
Create a PyTorch tensor from the given data and store it in a variable named myten.

Display myten to confirm it was created correctly.

Display the shape of myten.

Convert the shape into a Python list using exactly this command (with the space):
list (myten.shape)

From the shape list, display the third value (index [2]) using exactly this expression pattern:
list(myten.shape)[2]

In one short sentence, explain what the shape means for this tensor.



# 2. Reshaping with .view()

This example demonstrates how tensor shapes work in PyTorch and how the view() method can reshape a tensor without changing its underlying data. First, we create a tensor and check its original shape. Then we reshape the same 24 values into different valid 2D layouts (3×8, 2×12, and 4×6). The goal is to understand that reshaping changes only the tensor’s dimensions, not the order or the content of the elements.

In [None]:
ten=torch.tensor([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16],
 [17,18,19,20], [21,22,23,24]]], dtype=torch.float32)

In [None]:
list(ten.shape)

In [None]:
ten.view(3,8)

In [None]:
ten.view(2,12)

In [None]:
ten.view(4,6)

In [None]:
ten.view(8,1,3)

In [None]:
## Note: view() doesn't modify the tensor, so the original still has the same shape
ten.shape

**Exercise 2: Reshape a Tensor with view() and Use -1**

Use the following tensor data (keep the values exactly as given). Create a PyTorch tensor from this data and store it in a variable named myten2 (use dtype=torch.float32):
[[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24]]]

Display myten2 to confirm it was created correctly.

Display the shape of myten2 and convert it into a Python list using: list (myten2.shape).

Reshape myten2 using view() and display the reshaped tensor each time:
(-1, 8)
(4, 2, 3)

Try the following reshape and observe that it will fail (this line is only for understanding; it should raise an Exception):
(5, 5)

Convert the shape of each successfully reshaped tensor into a Python list (use the same style as before, e.g., list ( ... .shape)).

In one short sentence, explain why the (5, 5) reshape fails but the other reshapes work.

Convert the shape of each successful reshaped tensor into a Python list (use the same style as before, e.g., list ( ... .shape)).

In one short sentence, explain why the (5, 5) reshape fails but the other reshapes work.

**Exercise 2b: Reshape a Tensor with view()**

Use the numbers 1–30 as your data (all values must be included exactly once). Create a PyTorch tensor from these values and store it in a variable named myten2.

Display myten2 to confirm it was created correctly.

Display the shape of myten2 and convert it into a Python list using: list (myten2.shape).

Reshape myten2 into each of the following shapes using view() and display the reshaped tensor each time:

(5, 6)
(3, 10)
(2, 15)

Convert the shape of each reshaped tensor into a Python list (use the same style as before, e.g., list ( ... .shape)).

Reshape myten2 using view(6, -1) and display the reshaped tensor. Then display its shape and convert it into a Python list using: list ( ... .shape).

In one short sentence, explain what stays the same and what changes when you use view() on a tensor.

# 3. "Wildcard" dimension length of -1

In view(), you can set one dimension to -1 and PyTorch will compute it automatically. Reshaping is only possible if the product of the new dimensions equals the total number of elements in the tensor.
This example demonstrates how PyTorch’s view() method can reshape a tensor using the special value -1, and why reshaping must always preserve the total number of elements.
Then we reshape the same tensor into a valid 3D layout with mytensor.view(3, 2, 4).
Finally, we try mytensor.view(3, 2, 3), which fails because the product of the requested dimensions does not match the tensor’s total number of elements.

The goal is to understand that view() can only change the tensor’s shape when the element count remains the same, and that -1 can be used to let PyTorch compute one dimension automatically.

In [None]:
ten=torch.tensor([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16], [17,18,19,20], [21,22,23,24]]], dtype=torch.float32)

In [None]:
ten.view (-1,6)

In [None]:
ten.view (3,2,4)

In [None]:
ten.view (3,2,3)

In [None]:
myten3.view(4, 2, 3)

In [None]:
myten.view(5, 5)  # this will raise an error (25 != 24)

**Exercise 3: Reshape a Tensor with view() and use -1**

Use the following tensor data (keep the values exactly as given). Create a PyTorch tensor from this data and store it in a variable named myten3:

[[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18], [19, 20, 21], [22, 23, 24]]]

Display myten3 to confirm it was created correctly.

Display the shape of myten3 and convert it into a Python list using: list (myten3.shape).

Reshape myten3 using view() and display the reshaped tensor each time:

(-1, 8)

(4, 2, 3)

Try the following reshape and observe that it will fail (this line is only for understanding):

(5, 5)

Convert the shape of each successful reshaped tensor into a Python list (use the same style as before, e.g., list ( ... .shape)).

In one short sentence, explain why the (5, 5) reshape fails but the other reshapes work.


# 4. Reshaping tensors with .squeeze()

This example shows that squeeze() removes dimensions of size 1. To make this effect visible, the tensor is first reshaped with view(24, -1), which creates a shape like [24, 1] (PyTorch computes the -1 automatically). Then squeeze() removes that extra 1 dimension, resulting in a simpler 1D tensor with shape [24].

In [None]:
mytensor = torch.tensor([[1, 2, 3, 4, 5, 6],
                         [7, 8, 9, 10, 11, 12],
                         [13, 14, 15, 16, 17, 18],
                         [19, 20, 21, 22, 23, 24]], dtype=torch.float32)

In [None]:
mytensor.shape

In [None]:
mytensor.squeeze()#no effect bc more than 1 dimension has a length > 1

In [None]:
mytensor.view(24, -1).squeeze ().shape

In [None]:
mytensor.view(24,-1).shape

**Exercise 4: Use view() and squeeze() to Remove a Size-1 Dimension**

Use the following data (do not change the values). Create a PyTorch tensor from this data and store it in a variable named myten4 (use dtype=torch.float32):

[[101, 102, 103, 104, 105], [106, 107, 108, 109, 110], [111, 112, 113, 114, 115]]

Display myten4 to confirm it was created correctly.

Display the shape of myten4 and convert it into a Python list using: list (myten4.shape).

Reshape myten4 using view(15, -1) and display the shape of the reshaped tensor.

Apply squeeze() to the reshaped tensor and display the shape again.

Convert both shapes (before and after squeeze()) into Python lists using the same style as before (e.g., list ( ... .shape)).

In one short sentence, explain what squeeze() removed and why it could remove it in this example.

# 5. Reshaping tensors with .unsqueeze()

This example demonstrates that unsqueeze() is the opposite of squeeze(): it adds a new dimension of size 1, and you must specify where to insert it. After reshaping the tensor to have shape [1, 24], x.unsqueeze(1) inserts a new dimension in the middle to produce shape [1, 1, 24], while x.unsqueeze(2) inserts it at the end to produce shape [1, 24, 1]. The goal is to see that unsqueeze() changes only the shape, not the data.

In [None]:
x = mytensor.view(-1, 24)

In [None]:
x

In [None]:
x.shape

In [None]:
x.unsqueeze(1)

In [None]:
## Note that just like .view(), .unsqueeze() does not overwrite the original tensor
## Here, x still has the old shape
x.shape

In [None]:
x.unsqueeze(2)

In [None]:
x.unsqueeze(2).shape

In [None]:
x.unsqueeze(3)#dimension out of range

**Exercise 5:**

Use the following data (do not change the values). Create a PyTorch tensor from this data and store it in a variable named myten5:

[[31, 32, 33, 34, 35], [36, 37, 38, 39, 40]]

Display myten5 to confirm it was created correctly.

Display the shape of myten5 and convert it into a Python list using: list (myten5.shape).

Reshape myten5 using view(1, 10) and display the shape of the reshaped tensor.

Apply unsqueeze(1) to the reshaped tensor and display the new shape.

Apply unsqueeze(2) to the reshaped tensor and display the new shape.

Convert each resulting shape into a Python list (use the same style as before, e.g., list ( ... .shape)).

In one short sentence, explain the difference between unsqueeze(1) and unsqueeze(2) in this example.

# 6. .unsqueeze() = [:, None, ...]

This example shows an alternative to unsqueeze(): you can add a new dimension by using slicing with None (also called newaxis). The expression mytensor[:, None, :, :] inserts a size-1 dimension at a specific position, producing the same kind of shape change as mytensor.unsqueeze(1). The most important point is that the position of None in the indexing determines where the new dimension is inserted. The goal is to understand that both methods change only the tensor’s shape, not its data.

In [None]:
import torch

In [None]:
mytensor = torch.arange(1, 25).view(2, 3, 4).to(torch.float32)
mytensor

In [None]:
mytensor.shape

In [None]:
mytensor.unsqueeze(1)

In [None]:
mytensor[:, None, :, :].shape

In [None]:
mytensor = torch.arange(1, 25).view(2, 3, 4)
mytensor[:, None, :, :].shape

**Exercise 6: Add a New Dimension with None (Slicing) in Different Positions**

Use the following data (do not change the values). Create a PyTorch tensor from this data and store it in a variable named myten6:

[[201, 202, 203, 204], [205, 206, 207, 208], [209, 210, 211, 212]]

Display myten6 to confirm it was created correctly.

Display the shape of myten6 and convert it into a Python list using: list (myten6.shape).

Create three new tensors using slicing with None (do not use unsqueeze() in this exercise). Display the shape of each result:

Insert a new dimension at the beginning (before all existing dimensions).

Insert a new dimension in the middle (between the first and second dimension).

Insert a new dimension at the end (after the last dimension).

Convert each resulting shape into a Python list using the same style as before (e.g., list ( ... .shape)).

In one short sentence, explain how the position of None affects the resulting shape.

# 7. Slicing tensors, use of .item()

This example demonstrates that PyTorch tensors can be indexed and sliced in a NumPy-like way. mytensor[0, 2, 1] selects a single element using three indices (one per dimension), while mytensor[0, 2, 1:3] uses a slice to select a range of elements from the last dimension. Finally, .item() converts a single-element tensor (a 0-dim tensor) into a standard Python scalar value, which is useful for printing or using the value outside PyTorch.

In [None]:
mytensor

In [None]:
mytensor[0, 2, 1]

In [None]:
mytensor[0, 2, 1:3]

In [None]:
mytensor[0, 2, 1].item()

**Exercise 7: Indexing, Slicing, and .item() in PyTorch**

Use the following data (do not change the values). Create a PyTorch tensor from this data and store it in a variable named myten_index:

`[[[41, 42, 43, 44],
[45, 46, 47, 48],
[49, 50, 51, 52]],

[[53, 54, 55, 56],
[57, 58, 59, 60],
[61, 62, 63, 64]]]`

Tasks:

Create the tensor exactly as given and store it in myten_index. Display the tensor.

Select and display a single element at index [0, 2, 1].

Select and display a slice at index [0, 2, 1:3].

Convert the single element from step 2 into a Python scalar using .item() and display the scalar value.
Note: .item() works only if the result contains exactly one value.

In one short sentence, explain the difference between selecting a single element and selecting a slice.

# 8. Other ways to create PyTorch tensors

This example shows that PyTorch provides many convenient tensor constructors. torch.zeros(3, 2) creates a 3×2 tensor filled with zeros, torch.randn(3, 2) creates a 3×2 tensor with random values from a standard normal distribution
N(0,1), and 3 * torch.rand(3, 2) + 2 generates uniform random values in the [0, 1) range, and then scales and shifts them so the final numbers fall in the range [2,5).

In [None]:
torch.zeros(3, 2)

In [None]:
torch.randn(3, 2) #normal dist - N(0, 1) with specified size tensor

In [None]:
3*torch.randn(3, 2) + 2 # uniform dist in range 2, 5 tensor

**Exercise 8: Tensor Constructors (zeros, randn, and scaling)**

Create the following tensors in PyTorch and display each result:

Create a 4×3 tensor filled with zeros.

Create a 4×3 tensor with random values from a standard normal distribution
N(0,1).

Create a 4×3 tensor with random values from N(0,1), then scale and shift it using the formula: 2 * (random_tensor) - 1

In one short sentence, explain the difference between the three tensors you created.

# 9. Use of other types of random distributions

This example demonstrates how to use PyTorch’s Poisson distribution to generate random samples and compute basic statistics and probabilities. It creates a Poisson distribution with a given rate parameter, draws samples as tensors (or as Python scalars using .item()), retrieves the variance, evaluates log-probabilities for specific values, and finally generates a batch of samples with a chosen output shape (e.g., [3, 2]).

In [None]:
from torch.distributions import poisson

In [None]:
p = poisson.Poisson(4.0)

In [None]:
p.sample()

In [None]:
p.sample().item()

In [None]:
p.sample().item()

In [None]:
p.sample().item()

In [None]:
p.variance.item()

In [None]:
-1 * p.log_prob(torch.tensor(4.)).item()

In [None]:
-1 * p.log_prob(torch.tensor(int(0.1))).item()

In [None]:
p.sample((3, 2))

**Exercise 9: Sampling and Log-Probabilities with a Poisson Distribution (PyTorch)**

Create a Poisson distribution in PyTorch with rate = 6.0 and store it in a variable named p9.
Tasks:
Draw one sample from p9 and display it as a tensor.
Draw another sample and convert it to a Python scalar using .item() (display the scalar).
Display the variance of p9 as a Python scalar using .variance.item().
Compute and display the log-probability of k = 5 using log_prob(...) (return a Python scalar).
Compute and display the log-probability of k = 0 using log_prob(...) (return a Python scalar).
Draw a batch of samples with shape (2, 4) and display the result. Calculate the negative log probability of the whole batch.

In one short sentence, explain what the rate parameter controls in a Poisson distribution.

# 10. Matrix multiplication using .mm()

This example shows that matrix multiplication in PyTorch can be performed with mm() on 2D tensors, and that the inner dimensions must match. A 1D vector b is reshaped into a 2D column vector (via transpose) so it has shape [2, 1], making it compatible with a of shape [2, 2]. Then a.mm(b) computes the matrix–vector product and returns a [2, 1] result.

In [None]:
a = torch.tensor([[1, 2], [3, 4]])

In [None]:
b = torch.tensor([[5, 9]])

In [None]:
a.shape

In [None]:
b.shape

In [None]:
b=b.t()

In [None]:
b.shape

In [None]:
a

In [None]:
a.mm(b)

**Exercise 10: Matrix Multiplication with mm() and Shape Matching**

Use the following values (do not change them):

Matrix A: [[3, 1], [2, 5]]

Vector b (start as a 1D tensor): [4, 7]

Tasks:
- Create A as a 2D tensor and b as a 1D tensor.
- Display A.shape and b.shape.
- Reshape b into a 2D column vector with shape [2, 1].
- Compute A.mm(b_column) and display the result.
- Display the final result shape.

# 11. Multiplying tensors via temporary reshaping

Example: Matrix Multiplication with a 3D Tensor (Temporary Reshape)
In this example, you learn what to do when a tensor has more than 2 dimensions, but you still want to use matrix multiplication (mm()), which works only with 2D tensors.

We start with a tensor x that is 3D (shape [4, 3, 2]). Each “row” in the last dimension contains 2 values, like [1, 2], [3, 4], etc.
We also create a small column vector y with shape [2, 1].

Because mm() requires 2D tensors, we temporarily reshape x into a 2D tensor with shape [-1, 2]. This keeps the same values in the same order in memory—only the shape changes.
After multiplication, we reshape the result back to a structured form (shape [4, 3, 1]) so it matches the original 3D layout.

Key idea: Reshaping changes only the tensor’s shape, not the order of elements, so the last dimension still contains the pairs of values we expect during multiplication.

In [None]:
x = torch.tensor([[[1.0, 2],   [3, 4],   [5, 6]], [[7, 8],   [9, 10],  [11, 12]], [[13, 14], [15, 16], [17, 18]], [[19, 20], [21, 22], [23, 24]]])

In [None]:
x.shape

In [None]:
y = torch.tensor([[2, 3]], dtype=torch.float32).t()

In [None]:
y.shape

In [None]:
x

In [None]:
y

In [None]:
print(x.shape)
print(x.dtype)
print(y.shape)
print(y.dtype)
print(x.view(-1,2).shape)
new_var = x.view(-1,2).to(torch.float32).mm(y)
#new_var = x.to(torch.float32).view(-1, 2).mm(y)
new_var

**Exercise 11: Matrix Multiplication with a 3D Tensor**

This exercise shows a common beginner mistake: matrix multiplication fails if the two tensors have different dtypes (e.g., long vs float). You will first run a “wrong” version that produces the error, then fix it by converting the dtypes to match.

Data (use exactly these values):

3D tensor x with shape [2, 2, 2]:
[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]

Column vector y (shape [2, 1]):
[[1.5], [2.0]]

Tasks:

Create x and y exactly as given.

Try to compute x.view(-1, 2).mm(y) (this should fail).

Fix the dtype mismatch and run the multiplication again.

Reshape the result back to shape [2, 2, 1].

Expected error message (or very similar):RuntimeError: expected m1 and m2 to have the same dtype, but got: long int != float
# --- Step 4: FIX (make dtypes match) ---
# Option A (recommended): convert x to float
x_fix = x.to(torch.float32)

# Now the multiplication works
out = x_fix.view(-1, 2).mm(y)
out

# --- Step 5: Reshape back to the structured 3D layout [2, 2, 1] ---
out_3d = out.reshape(2, 2, 1)
out_3d, out_3d.shape


# 12. Element-wise multiplication

This example shows that tensors can be multiplied in two different ways in PyTorch. First, it demonstrates matrix multiplication with mm(), which combines rows and columns and follows the rules of linear algebra (works for 2D tensors). Then it demonstrates element-wise multiplication with * (or torch.mul()), which multiplies values position by position and works for tensors of any dimension as long as their shapes are compatible. The 3D example confirms that element-wise multiplication keeps the same shape when the input shapes match.

In [None]:
a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)

In [None]:

b = torch.tensor([[1, 2], [5, 3]], dtype=torch.float32)

In [None]:
a

In [None]:
b

In [None]:
a.mm(b)

In [None]:
a * b

In [None]:
torch.mul(a, b)

In [None]:
x = torch.tensor([[[1]], [[3]]], dtype=torch.float32)

In [None]:
y = torch.tensor([[[6]], [[2]]], dtype=torch.float32)

In [None]:
x.shape

In [None]:
y.shape

In [None]:
x * y

In [None]:
(x * y).shape

In [None]:
torch.mul(x, y)

In [None]:
torch.mul(x, y).shape

**Exercise 12: Matrix Multiplication vs. Element-wise Multiplication
Use the following values exactly.**

Part A (2D tensors)

Create two 2D tensors:

A = [[2, 0], [1, 3]]

B = [[4, 1], [2, 5]]

Tasks:

Create tensors A and B (use dtype=torch.float32) and display them.

Compute and display the matrix multiplication A.mm(B).

Compute and display the element-wise multiplication A * B.

Compute and display the same element-wise result using torch.mul(A, B).

Part B (3D tensors)

Create two 3D tensors:

X = [[[2]], [[5]]]

Y = [[[3]], [[4]]]

Tasks:
5. Create tensors X and Y (use dtype=torch.float32) and display X.shape and Y.shape.
6. Compute and display the element-wise multiplication X * Y.
7. Display the shape of the result using both:

(X * Y).shape

torch.mul(X, Y).shape

In one short sentence: explain the difference between mm() and * for tensors.

# 13. Shape mismatches aren't always the end of the world. Broadcasting is an important mechanism in PyTorch

This example demonstrates broadcasting in PyTorch during element-wise multiplication. Even if two tensors have a different number of dimensions, the operation can still work if their shapes are compatible: PyTorch automatically “expands” the smaller tensor (as needed) to match the larger tensor’s shape. Here, y.squeeze() reduces y to a 1D tensor, but x.mul(y) still works because PyTorch broadcasts y across the extra size-1 dimensions of x. The example also explains why multiplying a tensor by a scalar (e.g., x * 3) will apply the operation to every element.

In [None]:
x = torch.tensor([[[1]], [[3]]], dtype=torch.float32)

In [None]:
x

In [None]:
x.shape

In [None]:
y = torch.tensor([[[6]], [[2]]], dtype=torch.float32)

In [None]:
y.shape

In [None]:
y_squeezed = y.squeeze()

In [None]:
y_squeezed

In [None]:
y_squeezed.shape

**Here are the criteria for broadcasting:**

When operating on two tensors:

1. Align shapes from the rightmost dimension

2. For each aligned dimension:

  - The sizes must either be:

    - equal, or

    - one of them is 1

3. Missing leading dimensions are treated as size 1

If all dimensions satisfy this rule → broadcasting works.

In [None]:
## in this example:
## shapes aligned from the right:
## x -         (2, 1, 1)
## ysqueezed -       (2,)
## w/ missing dims:
##.            (1, 1, 2)
## -> Great! All the aligned dimension lengths either match or one of them is len 1

z = x.mul(y_squeezed)
z

In [None]:
z.shape

In [None]:
x * 3

**Exercise 13: Broadcasting in Element-wise Multiplication (with squeeze())**

This exercise practices broadcasting in PyTorch: element-wise multiplication can still work even if two tensors have different numbers of dimensions, as long as their shapes are compatible.

Use the following values exactly.

Data

Create a 3D tensor x7 (shape [2, 1, 1]):
[[[2]], [[4]]]

Create a 3D tensor y7 (shape [2, 1, 1]):
[[[5]], [[3]]]

Tasks

Import PyTorch.

Create x7 and display it and its shape.

Create y7, then create y7_squeezed = y7.squeeze(). Display y7_squeezed and its shape.

Compute and display x7.mul(y7_squeezed) (this should work because of broadcasting).

Multiply x7 by the scalar -2 and display the result.

In one short sentence: explain why x7.mul(y7_squeezed) works even though the tensors have different numbers of dimensions.

# 14. Moving Tensors to the GPU (CUDA) for Faster Computation

This example demonstrates how to place PyTorch tensors on the GPU to speed up computations. First, it checks the tensor’s type on the CPU. Then it moves the tensor to the GPU using .cuda() (only available if a CUDA-capable GPU is present). Finally, it shows the recommended, portable approach: select a device ("cuda" if available, otherwise "cpu") and move tensors to that device using .to(device).

In [None]:
a = torch.tensor([[1, 2], [3, 4]])

In [None]:
a

In [None]:
a.device

In [None]:
a.dtype

In [None]:
a.type()

In [None]:
torch.cuda.is_available()

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

In [None]:
b = a.to(device)

In [None]:
b

In [None]:
b.device

In [None]:
b.dtype

In [None]:
b.type()

**Exercise 14: Move a Tensor to GPU (CUDA) if Available**

This exercise practices how to check CUDA availability, choose a device, and move a tensor to the selected device using .to(device).

Data

Use this matrix exactly: [[10, 20, 30], [40, 50, 60]]

Tasks

Import PyTorch.

Create a tensor named t1 from the data above (use dtype=torch.float32) and display it.

Print t1.device and t1.dtype.

Check whether CUDA is available using torch.cuda.is_available() and display the result.

Create a variable named device using:
torch.device("cuda" if torch.cuda.is_available() else "cpu")

Move t1 to the selected device and store it in t2. Display t2.

Print t2.device and t2.dtype.

Theory question (one short sentence):
Why do we check torch.cuda.is_available() before using CUDA?

# 15. Automatic Differentiation (Autograd) and Gradient Accumulation in PyTorch

This example demonstrates how PyTorch can automatically compute derivatives using autograd. We treat a tensor w as a learnable variable by setting requires_grad=True, then define an expression
y=a⋅w**2+a⋅w
Calling y.backward() computes the derivative dy/dw and stores it in w.grad. The example also shows that gradients accumulate by default: calling backward() twice adds the new gradient on top of the old one, so we must reset (zero) the gradient before recomputing it.

However, do note that before calling backward() again, we must re-compute the graph because everything gets freed when calling backward()

In [None]:
torch.__version__

In [None]:
a = torch.tensor([10.0])

In [None]:
w = torch.tensor([2.0], requires_grad=True)

In [None]:
a, w

In [None]:
y = a * w**2 + a * w

In [None]:
y

In [None]:
w.grad

In [None]:
y.backward() # derivative w.r.t. w: 2aw + a = 50

In [None]:
w.grad

In [None]:
y = a * w**2 + a * w # must re-compute the graph!!
y.backward()

In [None]:
w.grad

In [None]:
w.grad.zero_()

In [None]:
y = a * w**2 + a * w

In [None]:
y.backward()

In [None]:
w.grad

**Exercise 15: Autograd and Gradient Accumulation in PyTorch**

This exercise practices how PyTorch computes derivatives automatically using autograd, how gradients are stored in .grad, and why gradients must be reset before recomputing them.

Tasks

Import PyTorch.

Create two tensors:

a1 = 4.0 (no gradient needed)

w1 = 3.0 with requires_grad=True

Define the function:
y1 = a1 * w1**2 + a1 * w1

Print w1.grad before calling backward().

Call y1.backward() and print w1.grad.

Call y1.backward() a second time and print w1.grad again (observe what happens).

Reset the gradient with w1.grad.zero_().

Recompute y1 (same formula) and call backward() again. Print w1.grad.

Theory question (one short sentence):
Why does w1.grad become larger after calling backward() twice?

# 16. Build a Simple Feed-Forward Neural Network Manually (Focus on Shapes)

This example puts earlier concepts together and shows how a simple feed-forward (FF) neural network can be built manually using matrix multiplication. The goal is not training yet, but to understand how tensor shapes must match at each step.

You will:

choose a device (GPU if available, otherwise CPU),

create a small input dataset and target values,

initialize weights and biases for a 2-layer network,

compute the forward pass step by step,

print the shapes to verify that every matrix multiplication and bias addition is valid.

Key idea: matrix multiplication requires matching inner dimensions, and biases must be broadcastable to the layer output shape.

In [None]:
import torch

In [None]:
# 1) Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

In [None]:
# 2) Training data (5 samples, 2 features) and targets (5 samples, 1 output)
training_inputs = torch.tensor(
    [[1, 0],
     [1, 1],
     [0, 0],
     [0, 1],
     [1, 0]],
    dtype=torch.float32
).to(device)

training_targets = torch.tensor(
    [[1],
     [0],
     [0],
     [1],
     [1]],
    dtype=torch.float32
).to(device)

print("Input dimensions:", repr(training_inputs.shape))
print("Target dimensions:", repr(training_targets.shape))

In [None]:
# 3) Define network sizes
input_dim = training_inputs.shape[1]   # 2
hidden_dim = 1                         # keep it simple
output_dim = 1                         # binary output (one value)


In [None]:
# 4) Initialize weights and biases
# W1: (input_dim, hidden_dim) so inputs.mm(W1) -> (N, hidden_dim)
W1 = torch.randn(input_dim, hidden_dim, device=device)
b1 = torch.randn(hidden_dim, device=device)   # broadcastable bias

# W2: (hidden_dim, output_dim) so hidden.mm(W2) -> (N, output_dim)
W2 = torch.randn(hidden_dim, output_dim, device=device)
b2 = torch.randn(output_dim, device=device)   # broadcastable bias

print("\nShapes:")
print("W1 shape:", repr(W1.shape))
print("b1 shape:", repr(b1.shape))
print("W2 shape:", repr(W2.shape))
print("b2 shape:", repr(b2.shape))

In [None]:
# 5) Forward pass (no training yet)
# Layer 1: linear
z1 = training_inputs.mm(W1) + b1
print("\nShape of (inputs.mm(W1)):", repr(training_inputs.mm(W1).shape))
print("Shape after adding b1:", repr(z1.shape))

In [None]:
# Activation (optional, but typical in FF networks)
h1 = torch.sigmoid(z1)
print("Shape after sigmoid:", repr(h1.shape))


In [None]:
# Layer 2: linear
z2 = h1.mm(W2) + b2
print("\nShape of (h1.mm(W2)):", repr(h1.mm(W2).shape))
print("Shape after adding b2:", repr(z2.shape))

In [None]:
# Final output (e.g., probability-like output)
y_pred = torch.sigmoid(z2)
print("Final output shape:", repr(y_pred.shape))

In [None]:
# 6) (Optional) Show the output values
print("\nPredictions:\n", y_pred)

input("\nPress Enter to finish...")

**Exercise 16: Manual Forward Pass of a Tiny Feed-Forward Network (Check Shapes)**

In this exercise you will build a very small 2-layer feed-forward network manually (no torch.nn yet). The focus is to practice shape matching for mm() and bias broadcasting.

Data (use exactly these values):

Inputs (6 samples, 3 features):

[[1, 0, 1],
 [0, 1, 1],
 [1, 1, 0],
 [0, 0, 1],
 [1, 0, 0],
 [0, 1, 0]]


Targets (6 samples, 1 output):

[[1], [0], [1], [0], [1], [0]]


Network sizes:

input_dim = 3

hidden_dim = 2

output_dim = 1

Tasks

Choose a device (cuda if available, else cpu).

Create training_inputs and training_targets on the device (dtype=torch.float32).

Initialize:

W1 with shape (input_dim, hidden_dim)

b1 with shape (1, hidden_dim) (broadcastable)

W2 with shape (hidden_dim, output_dim)

b2 with shape (1, output_dim) (broadcastable)

Compute a forward pass:

z1 = training_inputs.mm(W1) + b1

h1 = torch.sigmoid(z1)

z2 = h1.mm(W2) + b2

y_pred = torch.sigmoid(z2)

Print the shapes of training_inputs, W1, z1, h1, W2, z2, and y_pred to verify everything matches.

Print y_pred.

Theory question (one short sentence):
Why do we use b1 with shape (1, hidden_dim) instead of (6, hidden_dim)?

# 17. Manual Training Step with Autograd

This example demonstrates a fully manual training step for a tiny 2-layer feed-forward network in PyTorch without using torch.nn or optimizers. The network has two linear layers:

W1 is the weight matrix of the first (input → hidden) layer (shape: (input_dim, hidden_dim)), and b1 is the bias of the first layer (shape: (1, hidden_dim)), added after x.mm(W1).

W2 is the weight matrix of the second (hidden → output) layer (shape: (hidden_dim, output_dim)), and b2 is the bias of the output layer (shape: (1, output_dim)), added after h.mm(W2).

To make these parameters learnable, we set requires_grad=True so PyTorch tracks operations involving them and can compute gradients automatically.

In each training step we:

Zero stored gradients (because gradients accumulate by default).

Run a forward pass to compute predictions.

Compute a sum of squared errors loss.

Call loss.backward() to compute gradients for W1, b1, W2, and b2.

Update parameters using gradient descent with a learning rate.

The goal is to understand the core mechanics behind training: forward pass → loss → gradients → parameter update, and why we must reset gradients between steps.

In [None]:
# 1) Device (optional)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

In [None]:
# 2) Small training set (5 samples, 2 features) + targets (5 samples, 1 output)
training_inputs = torch.tensor(
    [[1, 0],
     [1, 1],
     [0, 0],
     [0, 1],
     [1, 0]],
    dtype=torch.float32,
    device=device
)

training_targets = torch.tensor(
    [[1],
     [0],
     [0],
     [1],
     [1]],
    dtype=torch.float32,
    device=device
)

In [None]:
# 3) Network sizes
input_dim = 2
hidden_dim = 1
output_dim = 1

In [None]:
# 4) Initialize parameters
W1 = torch.randn(input_dim, hidden_dim, device=device, requires_grad=True)
b1 = torch.randn(1, hidden_dim, device=device, requires_grad=True)

W2 = torch.randn(hidden_dim, output_dim, device=device, requires_grad=True)
b2 = torch.randn(1, output_dim, device=device, requires_grad=True)

learning_rate = 0.01

def forward(x: torch.Tensor) -> torch.Tensor:
    """Compute network output (no activation here, just linear layers)."""
    h = x.mm(W1) + b1          # (N, hidden_dim)
    y = h.mm(W2) + b2          # (N, output_dim)
    return y

def compute_loss(y_pred: torch.Tensor, y_true: torch.Tensor) -> torch.Tensor:
    """Sum of squared errors."""
    losses = (y_pred - y_true) ** 2
    loss = losses.sum()
    return loss

def zero_gradients():
    """Zero stored gradients to avoid accumulation across steps."""
    for p in (W1, b1, W2, b2):
        if p.grad is not None:
            p.grad.zero_()

In [None]:
# --- One manual training step ---
zero_gradients()

y_pred = forward(training_inputs)
loss = compute_loss(y_pred, training_targets)
print("Loss:", loss.item())

loss.backward()  # compute gradients

In [None]:
W1

In [None]:
# Gradient descent update (manual)
with torch.no_grad():
    W1 -= learning_rate * W1.grad
    b1 -= learning_rate * b1.grad
    W2 -= learning_rate * W2.grad
    b2 -= learning_rate * b2.grad


In [None]:
W1

In [None]:
# (Optional) show gradients after backward
print("W1.grad:", W1.grad)
print("b1.grad:", b1.grad)
print("W2.grad:", W2.grad)
print("b2.grad:", b2.grad)

**Exercise 17: One Manual Training Step with Autograd (2-Layer Network)**

In this exercise you will perform one full manual training step for a small 2-layer feed-forward network in PyTorch (no torch.nn, no optimizer). The goal is to practice:

setting requires_grad=True for learnable parameters,

computing a forward pass using mm() and bias addition,

computing a loss,

calling backward() to get gradients,

updating parameters with gradient descent,

zeroing gradients to avoid accumulation.

Data (use exactly these values):

Inputs (4 samples, 2 features):

[[1, 0],
 [0, 1],
 [1, 1],
 [0, 0]]


Targets (4 samples, 1 output):

[[1],
 [1],
 [0],
 [0]]


Network sizes:

input_dim = 2

hidden_dim = 2

output_dim = 1

Tasks

Create training_inputs and training_targets (dtype=torch.float32) on the chosen device.

Initialize W1, b1, W2, b2 with random values and set requires_grad=True.

Write:

forward(x) that computes h = x.mm(W1) + b1 and y_pred = h.mm(W2) + b2

compute_loss(y_pred, y_true) that returns the sum of squared errors

zero_gradients() that zeros .grad for all parameters if it exists

Run one training step in this order:

zero_gradients()

y_pred = forward(training_inputs)

loss = compute_loss(y_pred, training_targets) and print loss.item()

loss.backward()

update all parameters with learning_rate = 0.05 using with torch.no_grad():

Print W1.grad once (to confirm gradients were computed).

Theory question (one short sentence):
Why must we call zero_gradients() before each new backward pass?

# 18. This example shows a complete manual training loop for a small neural network in PyTorch. It brings together everything you learned earlier:

Forward pass: compute the model output y from the inputs.

Loss computation: measure how wrong y is compared to the targets.

Backward pass: call loss.backward() to compute gradients (.grad) for the learnable parameters (W1, b1, W2, b2).

Parameter update (gradient descent): update each parameter by moving it a small step in the negative gradient direction using a learning rate.

Repeat many times: run this cycle for many iterations (e.g., 2000) to gradually reduce the loss.

Key idea: training is an iterative process:
zero gradients → forward → loss → backward → update → repeat.

In [None]:
learning_rate = 0.01

In [None]:
training_inputs = torch.tensor(
    [[1, 0],
     [1, 1],
     [0, 0],
     [0, 1],
     [1, 0]],
    dtype=torch.float32,
    device=device
)

training_targets = torch.tensor(
    [[1],
     [0],
     [0],
     [1],
     [1]],
    dtype=torch.float32,
    device=device
)

input_dim = 2
hidden_dim = 1
output_dim = 1

W1 = torch.randn(input_dim, hidden_dim, device=device, requires_grad=True)
b1 = torch.randn(1, hidden_dim, device=device, requires_grad=True)

W2 = torch.randn(hidden_dim, output_dim, device=device, requires_grad=True)
b2 = torch.randn(1, output_dim, device=device, requires_grad=True)

learning_rate = 0.01

def forward(x: torch.Tensor) -> torch.Tensor:
    """Compute network output (no activation here, just linear layers)."""
    h = x.mm(W1) + b1          # (N, hidden_dim)
    y = h.mm(W2) + b2          # (N, output_dim)
    return y

def compute_loss(y_pred: torch.Tensor, y_true: torch.Tensor) -> torch.Tensor:
    """Sum of squared errors."""
    losses = (y_pred - y_true) ** 2
    loss = losses.sum()
    return loss

def zero_gradients():
    """Zero stored gradients to avoid accumulation across steps."""
    for p in (W1, b1, W2, b2):
        if p.grad is not None:
            p.grad.zero_()

def refresh(params):
    # Manual gradient descent update (safe way)
    with torch.no_grad():
        for p in params:
            p -= learning_rate * p.grad

for i in range(2000):
    zero_gradients()

    y = forward(training_inputs)
    loss = compute_loss(y, training_targets)


    if i % 200 == 0:
        print(f"round {i} - loss {loss}")

    loss.backward()
    refresh([W1, b1, W2, b2])

print("output:")
print(repr(y))
print("final loss:", loss.item())

**Exercise 18: Train a Tiny Network with a Manual Training Loop (200 Iterations)**

In this exercise you will implement a full manual training loop for a small 2-layer network in PyTorch (no torch.nn, no optimizer). The goal is to practice the standard training cycle:

zero gradients → forward → loss → backward → update → repeat

Data (use exactly these values)

Inputs (4 samples, 2 features):

[[1, 0],
 [0, 1],
 [1, 1],
 [0, 0]]


Targets (4 samples, 1 output):

[[1],
 [1],
 [0],
 [0]]

Network sizes

input_dim = 2

hidden_dim = 2

output_dim = 1

Tasks

Choose a device (cuda if available, else cpu).

Create training_inputs and training_targets on the device (dtype=torch.float32).

Initialize learnable parameters W1, b1, W2, b2 with requires_grad=True.

Write:

forward(x) that computes h = x.mm(W1) + b1 and y_pred = h.mm(W2) + b2

compute_loss(y_pred, y_true) that returns sum of squared errors

zero_gradients() that zeros .grad for all parameters

refresh() that updates parameters using gradient descent with learning_rate = 0.05

Run a training loop for 200 iterations. Print the loss every 50 iterations.

After training, print the final predictions y_pred.

Theory question (one short sentence):
What are the five main steps repeated in each training iteration?