# PyTorch Notebook for External Learning

---

## Section 1: PyTorch Basics – Tensors


In [2]:
!pip install torch



**Q1.** What is a `torch.Tensor`? How is it different from a NumPy array?


**Definition :** A tensor is a multi-dimensional array that is the core data structure in PyTorch.
It stores data in numerical form (like scalars, vectors, matrices, or higher dimensions) and supports efficient computation on CPU and GPU.

**How its different from numpy arrays :**

- It has GPU support, means it can utilize GPU's memory to store tensors thus giving faster computation.
- Allows easier gradient calculation (derivatives) which is not available in numpy
- Tensors are designed to be used in deep learning (neural networks), unless numpy arrays which are primarily for numerical computations.


In [4]:
# Q1 Code Task: Create a 1D tensor with values [1, 2, 3, 4, 5]
import torch
import numpy

tensor_1d = torch.tensor([1, 2, 3, 4, 5])
numpy_1d = numpy.array([1, 2, 3, 4, 5])

tensor_1d, numpy_1d

(tensor([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5]))

**Q2.** How can you convert a PyTorch tensor to a NumPy array and vice versa?

1.  To convert tensor to numpy array : Use the built in `.numpy()` method that comes with every tensor object.
2.  To convert numpy array to tensor : Use the `from_numpy()` method.


In [None]:
# Q2 Code Task: Convert the tensor [10, 20, 30] into a NumPy array and back to a tensor.

num_from_ten = tensor_1d.numpy()
print(type(num_from_ten))

ten_from_num = torch.from_numpy(numpy_1d)
print(type(ten_from_num))

<class 'numpy.ndarray'>
<class 'torch.Tensor'>


**Q3.** Create a **2x3 tensor** filled with random numbers between 0 and 1. Print its shape and data type.


There are many ways to create tensors with random values, which are listed as follows :

- rand(x, y) - tensor of shape with random float values between 0 and 1
- randn(x, y) - same with normal distribution
- zeros(z, y) - fill with all 0
- randint(low, high, size=(x, y)) - fills with random integer values within specified range


In [6]:
# Q3 Code Task
rand_tensor = torch.rand(2, 3)
print(rand_tensor)
print(rand_tensor.shape)
print(rand_tensor.dtype)

tensor([[0.3014, 0.7377, 0.8437],
        [0.7214, 0.6504, 0.6914]])
torch.Size([2, 3])
torch.float32


**Q4.** Demonstrate element-wise addition and matrix multiplication with tensors. What is the difference between `*` and `@` operators in PyTorch?


Addition and Subtraction operations are simply done using the same operators used for scalar operations. Torch provides builtin functions too.

When it comes to multiplication, since we are working with matrices there are two kind of multiplications.

- Element-wise multiplication : `*` is used
- Matrix multiplication : `@` is used


In [None]:
# Q4 Code Task: Perform element-wise addition and matrix multiplication on two 2x2 tensors.
ten1 = torch.tensor([[1, 2], [3, 4]])
ten2 = torch.tensor([[5, 6], [7, 8]])

print(ten1 + ten2)  # torch.add()
print(ten1 * ten2)  # torch.mul()
print(ten1 @ ten2)  # torch.matmul()

tensor([[ 6,  8],
        [10, 12]])
tensor([[ 5, 12],
        [21, 32]])
tensor([[19, 22],
        [43, 50]])


**Q5.** Explain broadcasting in PyTorch with an example.


**Broadcasting** allows PyTorch to perform element-wise operations on tensors of different shapes, by automatically expanding the smaller tensor to match the shape of the larger tensor without copying data unnecessarily.


In [9]:
# Q5 Code Task: Add a tensor of shape (3,1) to a tensor of shape (3,4).
ten1 = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
ten2 = torch.tensor([[1], [2], [3]])

res = ten1 + ten2
print(res)

tensor([[ 2,  3,  4,  5],
        [ 7,  8,  9, 10],
        [12, 13, 14, 15]])


**Q6.** What is the difference between `view()` and `reshape()` in PyTorch?


**view**

**Purpose:** Returns a new tensor with the same data but a different shape.
**Important:**

- The tensor must be contiguous in memory.
- If it’s not contiguous, view will throw an error.

**reshape**

**Purpose:** Returns a tensor with a different shape, just like view.
**Key difference:**

- reshape can handle non-contiguous tensors.
- If the memory is not contiguous, it will make a copy under the hood to return the desired shape.


In [None]:
# Q6 Code Task: Create a tensor of shape (2,3) and reshape it to (3,2).

tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

print("Original Tensor (2x3):")
print(tensor)
print("Shape:", tensor.shape)

reshaped_tensor = tensor.view(3, 2)  # or tensor.reshape(3,2)
print("\nReshaped Tensor (3x2):")
print(reshaped_tensor)
print("Shape:", reshaped_tensor.shape)

Original Tensor (2x3):
tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3])

Reshaped Tensor (3x2):
tensor([[1, 2],
        [3, 4],
        [5, 6]])
Shape: torch.Size([3, 2])


**Q7.** How do you check if a tensor is allocated on **CPU or GPU**?


Every tensor object has a `.device` property which tells you where it is stored - CPU or GPU


In [None]:
# Q7 Code Task: Create a tensor and move it to GPU (if available).

# Tensor on CPU
a = torch.tensor([1, 2, 3])
print(a.device)  # Output: cpu

# Tensor on GPU (if available)
if torch.cuda.is_available():
    b = a.to("cuda")
    print(b.device)  # Output: cuda:0 (first GPU)

cpu
cuda:0


**Q8.** Create an **identity matrix** of size 4x4 in PyTorch.


`torch.eye(n)` allows to create identity matrix of size n in torch.


In [6]:
# Q8 Code Task
I4 = torch.eye(4)
print(I4)

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])


**Q9.** How do you find the maximum, minimum, and mean values of a tensor?


Torch provides built-in functions to calculate all these. Remember for mean, the function can only be called on tensor of type float.


In [8]:
# Q9 Code Task: Compute max, min, mean of tensor [4, 7, 9, 2, 5].
ten = torch.tensor([4, 7, 9, 2, 5])

print(ten.max())
print(ten.min())
print(ten.float().mean())

tensor(9)
tensor(2)
tensor(5.4000)


**Q10.** Explain slicing and indexing in tensors with an example.


**Indexing :** You can access individual elements using square brackets [ ].

**Slicing :** Slicing uses the start:stop:step syntax, just like Python lists.


In [None]:
# Q10 Code Task: Create a 3x3 tensor and extract the first row and last column.

tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("Original Tensor:")
print(tensor)

first_row = tensor[0, :]
print("\nFirst Row:")
print(first_row)

last_column = tensor[:, -1]
print("\nLast Column:")
print(last_column)

Original Tensor:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

First Row:
tensor([1, 2, 3])

Last Column:
tensor([3, 6, 9])


---

## Section 2: Autograd & Gradients


**Q11.** What is autograd in PyTorch? Why is it useful?


**Autograd** in PyTorch is a module that automatically calculates gradients (derivatives) for tensors that have `requires_grad=True`.

- In deep learning, we need to compute gradients of a loss with respect to model parameters (weights & biases).
- Doing this manually is tedious and error-prone.
- Autograd does it automatically, so you can focus on building models instead of calculating derivatives.


In [None]:
# Q11 Code Task: Create a tensor `x` with requires_grad=True and compute gradient of y = x**2

x = torch.tensor([3.0], requires_grad=True)
y = x**2

y.backward()  # Computes the gradients dy/dx

print("y : ", y)
print("Gradient of y : ", x.grad)

y :  tensor([9.], grad_fn=<PowBackward0>)
Gradient of y :  tensor([6.])


**Q12.** Explain the difference between `.backward()` and `.detach()`.


`.backward()`

**Purpose:** Computes gradients of a tensor w.r.t. some scalar (usually a loss)

**Used for:** Training neural networks, updating weights

**Requirements:** The tensor must have requires_grad=True

`.detach()`

**Purpose:** Creates a new tensor that shares the same data but is not tracked by Autograd

**Used for:**

- Temporarily stopping gradient computation
- Avoiding memory overhead during inference

**Effect:** The new tensor won’t track operations, so .backward() won’t propagate through it.


In [14]:
# Q12 Code Task: Show how to stop gradient tracking for a tensor.# Create a tensor with gradient tracking enabled

x = torch.tensor([2.0, 3.0], requires_grad=True)
print("Before detaching: requires_grad =", x.requires_grad)

# Stop gradient tracking
x_detached = x.detach()
print("After detaching: requires_grad =", x_detached.requires_grad)

Before detaching: requires_grad = True
After detaching: requires_grad = False


**Q13.** Compute gradients for y = 3x^3 + 2x^2 + 5 at x=2 using autograd.

---


In [None]:
# Q13 Code Task
x = torch.tensor([2.0], requires_grad=True)
y = 3 * x**3 + 2 * x**2 + 5

y.backward()

print("Gradient of y : ", x.grad)

Gradient of y :  tensor([44.])


**Q14.** What happens if you call `.backward()` on a tensor without `requires_grad=True`?


If you try to call .backward() on a tensor that does not have `requires_grad=True`, PyTorch will raise an error (for non-scalar tensors) or simply do nothing for scalar tensors without gradients depending on the situation.

**Reason:** PyTorch’s Autograd only tracks operations on tensors with `requires_grad=True`.

Without it, there’s no computational graph, so gradients cannot be computed.


In [None]:
# Q14 Code Task: Demonstrate the error with an example.

x = torch.tensor(2.0)  # requires_grad=False by default
y = x**2

# Try to compute gradient
try:
    y.backward()
except Exception as e:
    print("Error:", e)

Error: element 0 of tensors does not require grad and does not have a grad_fn


**Q15.** Perform gradient descent on f(w) = (w-3)^2 for 10 iterations with learning rate 0.1.


In [None]:
# Q15 Code Task

w = torch.tensor([0.0], requires_grad=True)
learning_rate = 0.1

for i in range(10):
    f = (w - 3) ** 2

    f.backward()

    with torch.no_grad():  # temporarily stop tracking
        w -= learning_rate * w.grad

    w.grad.zero_()

    print(f"Iteration {i+1}: w = {w.item()}, f(w) = {f.item()}")

Iteration 1: w = 0.6000000238418579, f(w) = 9.0
Iteration 2: w = 1.0800000429153442, f(w) = 5.760000228881836
Iteration 3: w = 1.4639999866485596, f(w) = 3.6863999366760254
Iteration 4: w = 1.7711999416351318, f(w) = 2.3592960834503174
Iteration 5: w = 2.0169599056243896, f(w) = 1.5099495649337769
Iteration 6: w = 2.2135679721832275, f(w) = 0.9663678407669067
Iteration 7: w = 2.370854377746582, f(w) = 0.6184753179550171
Iteration 8: w = 2.4966835975646973, f(w) = 0.39582422375679016
Iteration 9: w = 2.597346782684326, f(w) = 0.2533273994922638
Iteration 10: w = 2.677877426147461, f(w) = 0.16212961077690125


---

## Section 3: Building Neural Networks


**Q16.** What is `torch.nn.Module` and why is it useful?


`torch.nn.Module` is a **base class** in PyTorch that all **neural network models** inherit from.

It provides a convenient way to define layers, parameters, and the forward pass of a model, and automatically tracks **learnable parameters** for optimization.


In [None]:
# Q16 Code Task: Define a simple linear model y = Wx + b using torch.nn.Linear
import torch.nn as nn

# Define input and output dimensions
input_dim = 1  # x has 1 feature
output_dim = 1  # y has 1 output

# Define the linear model
linear_model = nn.Linear(input_dim, output_dim)

# Print model parameters
print("Model:", linear_model)
print("Weight:", linear_model.weight)
print("Bias:", linear_model.bias)

# Example: Forward pass
x = torch.tensor([[2.0]])
y_pred = linear_model(x)
print("Predicted y:", y_pred)

Model: Linear(in_features=1, out_features=1, bias=True)
Weight: Parameter containing:
tensor([[-0.3079]], requires_grad=True)
Bias: Parameter containing:
tensor([0.0311], requires_grad=True)
Predicted y: tensor([[-0.5847]], grad_fn=<AddmmBackward0>)


**Q17.** Create a feedforward neural network with 2 input features, 1 hidden layer (size=4, ReLU), and 1 output.


In [None]:
# Q17 Code Task
import torch.nn as nn
import torch.nn.functional as F


class FeedForwardNN(nn.Module):
    def __init__(self):
        super(FeedForwardNN, self).__init__()
        # Input layer to hidden layer
        self.hidden = nn.Linear(2, 4)  # 2 input features, 4 hidden neurons
        # Hidden layer to output
        self.output = nn.Linear(4, 1)  # 1 output

    def forward(self, x):
        x = F.relu(self.hidden(x))  # ReLU activation on hidden layer
        x = self.output(x)  # Output layer (no activation)
        return x


# Instantiate the model
model = FeedForwardNN()
print(model)

# Example forward pass
x = torch.tensor([[1.0, 2.0]])
y_pred = model(x)
print("Predicted y:", y_pred)

FeedForwardNN(
  (hidden): Linear(in_features=2, out_features=4, bias=True)
  (output): Linear(in_features=4, out_features=1, bias=True)
)
Predicted y: tensor([[0.4487]], grad_fn=<AddmmBackward0>)


**Q18.** Explain the role of activation functions. Implement ReLU and Sigmoid manually in PyTorch.


**Role of Activation Functions**

**1. Introduce Non-Linearity**

- Without activation functions, a neural network with multiple layers would behave like a single linear layer.
- Non-linearity allows the network to model complex patterns.

**2. Control Output Range**

- Some activations (like Sigmoid) squash outputs between 0 and 1.
- Useful for probabilities, classification, etc.

**3. Enable Learning**

- Activations like ReLU help avoid vanishing gradients in deep networks.


In [None]:
# Q18 Code Task: Define functions relu(x) and sigmoid(x) using tensors.

x = torch.tensor([-2.0, -0.5, 0.0, 0.5, 2.0])


# --- ReLU function ---
def relu(x):
    return torch.maximum(torch.tensor(0.0), x)


# --- Sigmoid function ---
def sigmoid(x):
    return 1 / (1 + torch.exp(-x))


# Test the functions
print("Input:", x)
print("ReLU output:", relu(x))
print("Sigmoid output:", sigmoid(x))

Input: tensor([-2.0000, -0.5000,  0.0000,  0.5000,  2.0000])
ReLU output: tensor([0.0000, 0.0000, 0.0000, 0.5000, 2.0000])
Sigmoid output: tensor([0.1192, 0.3775, 0.5000, 0.6225, 0.8808])


**Q19.** What is the difference between `model.parameters()` and `model.state_dict()`?


**model.parameters()**

- Returns an iterator over all learnable parameters (weights and biases) of the model.
- These are nn.Parameter tensors that require gradients.
- Typically used when defining an optimizer.

**model.state_dict()**

- Returns a Python dictionary that maps each layer’s name to its parameters and buffers.
- Includes: Weights and biases, Other stateful buffers (e.g., running mean/variance in BatchNorm)
- Commonly used for saving and loading models.


In [None]:
# Q19 Code Task: Print the parameters of a small linear layer.

linear_layer = nn.Linear(2, 1)

print("Learnable parameters:")
for param in linear_layer.parameters():
    print(param)

print("\nParameters with names:")
for name, param in linear_layer.state_dict().items():
    print(name, param)

Learnable parameters:
Parameter containing:
tensor([[-0.1711, -0.1451]], requires_grad=True)
Parameter containing:
tensor([-0.1956], requires_grad=True)

Parameters with names:
weight tensor([[-0.1711, -0.1451]])
bias tensor([-0.1956])


**Q20.** Implement forward pass of a 2-layer network without using nn.Module.


In [None]:
# Q20 Code Task

x = torch.tensor([[1.0, 2.0]])  # shape: [1, 2]

# --- Initialize weights and biases manually ---
# Layer 1 (input → hidden)
W1 = torch.randn(2, 4)  # weights for hidden layer
b1 = torch.randn(4)  # bias for hidden layer

# Layer 2 (hidden → output)
W2 = torch.randn(4, 1)  # weights for output layer
b2 = torch.randn(1)  # bias for output layer


# --- Define ReLU activation ---
def relu(x):
    return torch.maximum(torch.tensor(0.0), x)


# --- Forward pass ---
# Hidden layer computation
hidden = relu(x @ W1 + b1)  # x @ W1 = matrix multiplication

# Output layer computation
output = hidden @ W2 + b2

print("Hidden layer output:", hidden)
print("Network output:", output)

Hidden layer output: tensor([[0.2690, 0.0000, 0.0000, 0.8984]])
Network output: tensor([[-0.8663]])


---

## Section 4: Training a Simple Model (Logic Gates)


**Q21.** What is the purpose of a loss function? Give two common examples.


A **loss function** (or cost function) measures how far the network's predictions are from the actual target values.

- It quantifies the error between predicted outputs and true outputs.
- The network tries to minimize this loss using optimization algorithms like gradient descent.

Examples : Mean Squared Error, Cross-Entropy


In [None]:
# Q21 Code Task: Use torch.nn.MSELoss to compute loss between y_true=[1.0, 2.0] and y_pred=[1.5, 2.5].

# True and predicted values
y_true = torch.tensor([1.0, 2.0])
y_pred = torch.tensor([1.5, 2.5])

# Define the MSE loss
mse_loss = nn.MSELoss()

# Compute the loss
loss = mse_loss(y_pred, y_true)

print("MSE Loss:", loss.item())

MSE Loss: 0.25


**Q22.** What is the role of an optimizer in training neural networks?


An **optimizer** is an algorithm that updates the model’s parameters (weights and biases) based on the computed gradients to minimize the loss function.

- During training, the network computes the gradient of the loss w.r.t. each parameter using Autograd.
- The optimizer uses these gradients to adjust the parameters in the direction that reduces the loss.


In [None]:
# Q22 Code Task: Define SGD optimizer for a linear model with learning rate=0.01.

model = nn.Linear(2, 1)  # 2 input features → 1 output

# Define SGD optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Print optimizer to check
print(optimizer)

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.01
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)


**Q23.** Train a simple linear regression model to fit y = 2x + 1 for x in [1,2,3,4,5].


In [None]:
# Q23 Code Task
import torch.optim as optim

x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0], [5.0]])
y_train = torch.tensor([[3.0], [5.0], [7.0], [9.0], [11.0]])

model = nn.Linear(1, 1)

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

num_epochs = 1000

for epoch in range(num_epochs):
    # Forward pass
    y_pred = model(x_train)

    # Compute loss
    loss = criterion(y_pred, y_train)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()

    # Update parameters
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Example prediction
x_test = torch.tensor([[6.0]])
y_test = model(x_test)
print("Prediction for x=6:", y_test.item())

Epoch 100, Loss: 0.0645
Epoch 200, Loss: 0.0327
Epoch 300, Loss: 0.0166
Epoch 400, Loss: 0.0085
Epoch 500, Loss: 0.0043
Epoch 600, Loss: 0.0022
Epoch 700, Loss: 0.0011
Epoch 800, Loss: 0.0006
Epoch 900, Loss: 0.0003
Epoch 1000, Loss: 0.0001
Prediction for x=6: 13.018630027770996


**Q24.** Implement and train a neural network for the AND gate.


In [None]:
# Q24 Code Task

X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
Y = torch.tensor([[0], [0], [0], [1]], dtype=torch.float32)


class ANDNet(nn.Module):
    def __init__(self):
        super(ANDNet, self).__init__()
        self.fc1 = nn.Linear(2, 2)  # 2 input features → 2 hidden neurons
        self.fc2 = nn.Linear(2, 1)  # 2 hidden neurons → 1 output

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))  # Sigmoid activation on hidden layer
        x = torch.sigmoid(self.fc2(x))  # Sigmoid activation on output layer
        return x


model = ANDNet()

criterion = nn.BCELoss()  # Binary Cross-Entropy Loss
optimizer = optim.SGD(model.parameters(), lr=0.1)

num_epochs = 5000

for epoch in range(num_epochs):
    # Forward pass
    y_pred = model(X)

    # Compute loss
    loss = criterion(y_pred, Y)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 500 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Make predictions
with torch.no_grad():
    predictions = model(X)
    predicted_classes = (predictions > 0.5).float()
    print("Predicted probabilities:\n", predictions)
    print("Predicted classes:\n", predicted_classes)

Epoch 500, Loss: 0.5123
Epoch 1000, Loss: 0.3470
Epoch 1500, Loss: 0.1759
Epoch 2000, Loss: 0.0933
Epoch 2500, Loss: 0.0575
Epoch 3000, Loss: 0.0398
Epoch 3500, Loss: 0.0298
Epoch 4000, Loss: 0.0235
Epoch 4500, Loss: 0.0193
Epoch 5000, Loss: 0.0162
Predicted probabilities:
 tensor([[5.7623e-04],
        [1.3765e-02],
        [1.1081e-02],
        [9.6140e-01]])
Predicted classes:
 tensor([[0.],
        [0.],
        [0.],
        [1.]])


**Q25.** Implement and train a neural network for the XOR gate (with hidden layer).


In [None]:
# Q25 Code Task

X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
Y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)


class XORNet(nn.Module):
    def __init__(self):
        super(XORNet, self).__init__()
        self.hidden = nn.Linear(2, 2)  # 2 inputs → 2 hidden neurons
        self.output = nn.Linear(2, 1)  # 2 hidden neurons → 1 output

    def forward(self, x):
        x = torch.sigmoid(self.hidden(x))  # hidden layer activation
        x = torch.sigmoid(self.output(x))  # output layer activation
        return x


model = XORNet()

criterion = nn.BCELoss()  # Binary Cross-Entropy for binary classification
optimizer = optim.SGD(model.parameters(), lr=0.1)

num_epochs = 10000

for epoch in range(num_epochs):
    # Forward pass
    y_pred = model(X)

    # Compute loss
    loss = criterion(y_pred, Y)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 1000 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

with torch.no_grad():
    predictions = model(X)
    predicted_classes = (predictions > 0.5).float()
    print("Predicted probabilities:\n", predictions)
    print("Predicted classes:\n", predicted_classes)

Epoch 1000, Loss: 0.6932
Epoch 2000, Loss: 0.6931
Epoch 3000, Loss: 0.6931
Epoch 4000, Loss: 0.6930
Epoch 5000, Loss: 0.6929
Epoch 6000, Loss: 0.6927
Epoch 7000, Loss: 0.6920
Epoch 8000, Loss: 0.6886
Epoch 9000, Loss: 0.6612
Epoch 10000, Loss: 0.5298
Predicted probabilities:
 tensor([[0.4046],
        [0.4255],
        [0.7603],
        [0.3758]])
Predicted classes:
 tensor([[0.],
        [0.],
        [1.],
        [0.]])
