# Check for Understanding — Autograded (PyTorch)
Run each cell. **Do not delete the asserts.**

**Passing condition:** all asserts pass.

Tip: If an assert fails, read its message, fix your code, and rerun the cell.


In [1]:
# Setup
import torch
import torch.nn as nn

torch.manual_seed(42)

def _is_close(a, b, tol=1e-5):
    return torch.allclose(a, b, atol=tol, rtol=0)

print("PyTorch version:", torch.__version__)


PyTorch version: 2.9.1+cpu


  cpu = _conversion_method_template(device=torch.device("cpu"))


## Part 1 — Tensors & Representations

In [10]:
# Exercise 1: Tensor basics
# TODO:
# 1) Create a 2x3 tensor of random values called X
# 2) Print X, X.shape, X.dtype
# 3) Compute the mean of all elements and store it in x_mean (a 0-d tensor)

X = torch.rand(2, 3)  # YOUR CODE HERE
x_mean = torch.mean(X)  # YOUR CODE HERE

# Print statements (uncomment after implementing)
# print("X=\n", X)
# print("shape:", X.shape)
# print("dtype:", X.dtype)
# print("mean:", x_mean)

# --- autograder asserts (do not delete) ---
assert isinstance(X, torch.Tensor), "X must be a torch.Tensor"
assert X.shape == (2, 3), f"X must have shape (2,3), got {tuple(X.shape)}"
assert X.dtype in (torch.float32, torch.float64), f"X should be float32/float64, got {X.dtype}"
assert isinstance(x_mean, torch.Tensor) and x_mean.shape == (), "x_mean must be a scalar (0-d) tensor"
assert _is_close(x_mean, X.sum() / X.numel()), "x_mean should equal X.sum()/X.numel()"


In [13]:
torch.rand(5)

tensor([0.5832, 0.3376, 0.8090, 0.5779, 0.9040])

In [18]:
# Exercise 2: Manual vector operations
# TODO:
# 1) Create v1 and v2 as 1-D tensors of length 5
# 2) Compute element-wise sum: v_sum
# 3) Compute dot product: v_dot (scalar tensor)

v1 = torch.rand(5)  # YOUR CODE HERE
v2 = torch.rand(5)  # YOUR CODE HERE

v_sum = v1 + v2  # YOUR CODE HERE
v_dot = torch.dot(v1, v2)  # YOUR CODE HERE

# Print statements (uncomment after implementing)
print("v1:", v1)
print("v2:", v2)
print("v_sum:", v_sum)
print("v_dot:", v_dot)

# --- autograder asserts (do not delete) ---
assert v1.shape == (5,) and v2.shape == (5,), "v1 and v2 must both be shape (5,)"
assert v_sum.shape == (5,), "v_sum must be a length-5 vector"
assert v_dot.shape == (), "v_dot must be a scalar (0-d) tensor"
manual_dot = (v1 * v2).sum()
assert _is_close(v_dot, manual_dot), "v_dot must equal (v1*v2).sum()"


v1: tensor([0.6790, 0.9155, 0.2418, 0.1591, 0.7653])
v2: tensor([0.2979, 0.8035, 0.3813, 0.7860, 0.1115])
v_sum: tensor([0.9769, 1.7189, 0.6231, 0.9452, 0.8768])
v_dot: tensor(1.2405)


## Part 2 — Embeddings

In [19]:
help(nn.Embedding)

Help on class Embedding in module torch.nn.modules.sparse:

class Embedding(torch.nn.modules.module.Module)
 |  Embedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None, _freeze: bool = False, device=None, dtype=None) -> None
 |  
 |  A simple lookup table that stores embeddings of a fixed dictionary and size.
 |  
 |  This module is often used to store word embeddings and retrieve them using indices.
 |  The input to the module is a list of indices, and the output is the corresponding
 |  word embeddings.
 |  
 |  Args:
 |      num_embeddings (int): size of the dictionary of embeddings
 |      embedding_dim (int): the size of each embedding vector
 |      padding_idx (int, optional): If specified, the entries at :attr:`padding_idx` do not contribute to the gradient;
 |                                   the

In [22]:

help(torch.randint)

Help on built-in function randint in module torch:

randint(...)
    randint(low=0, high, size, \*, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a tensor filled with random integers generated uniformly
    between :attr:`low` (inclusive) and :attr:`high` (exclusive).
    
    The shape of the tensor is defined by the variable argument :attr:`size`.
    
    .. note::
        With the global dtype default (``torch.float32``), this function returns
        a tensor with dtype ``torch.int64``.
    
    Args:
        low (int, optional): Lowest integer to be drawn from the distribution. Default: 0.
        high (int): One above the highest integer to be drawn from the distribution.
        size (tuple): a tuple defining the shape of the output tensor.
    
    Keyword args:
        generator (:class:`torch.Generator`, optional): a pseudorandom number generator for sampling
        out (Tensor, optional): the output

In [24]:
torch.randint(0, 9, (3,))

tensor([4, 5, 2])

In [27]:
# Exercise 3: Simple embedding lookup
# TODO:
# 1) Create an nn.Embedding called emb with vocab_size=10 and emb_dim=4
# 2) Create token_ids as a LongTensor of shape (3,) with values in [0, 9]
# 3) Lookup embeddings: E = emb(token_ids)
# 4) Print E and E.shape

vocab_size, emb_dim = 10, 4
emb = nn.Embedding(vocab_size, emb_dim)  # YOUR CODE HERE

token_ids = torch.randint(0, 9, (3,))  # YOUR CODE HERE
E = emb(token_ids)  # YOUR CODE HERE

# Print statements (uncomment after implementing)
print("token_ids:", token_ids)
print("E=\n", E)
print("E.shape:", E.shape)

# --- autograder asserts (do not delete) ---
assert isinstance(emb, nn.Embedding), "emb must be an nn.Embedding"
assert token_ids.dtype == torch.long, "token_ids must be torch.long"
assert token_ids.shape == (3,), f"token_ids must be shape (3,), got {tuple(token_ids.shape)}"
assert E.shape == (3, 4), f"E must have shape (3,4), got {tuple(E.shape)}"
assert E.requires_grad, "Embedding output should require gradients by default"


token_ids: tensor([7, 7, 4])
E=
 tensor([[ 0.0075, -0.0774,  0.6427,  0.5742],
        [ 0.0075, -0.0774,  0.6427,  0.5742],
        [ 0.6630,  0.7047, -0.0045,  1.6668]], grad_fn=<EmbeddingBackward0>)
E.shape: torch.Size([3, 4])


In [30]:
help(torch.mean)

Help on built-in function mean in module torch:

mean(...)
    mean(input, *, dtype=None) -> Tensor
    
    .. note::
        If the `input` tensor is empty, ``torch.mean()`` returns ``nan``.
        This behavior is consistent with NumPy and follows the definition
        that the mean over an empty set is undefined.
    
    
    Returns the mean value of all elements in the :attr:`input` tensor. Input must be floating point or complex.
    
    Args:
        input (Tensor):
          the input tensor, either of floating point or complex dtype
    
    Keyword args:
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            If specified, the input tensor is casted to :attr:`dtype` before the operation
            is performed. This is useful for preventing data type overflows. Default: None.
    
    Example::
    
        >>> a = torch.randn(1, 3)
        >>> a
        tensor([[ 0.2294, -0.5481,  1.3288]])
        >>> torch.mean(a)
       

In [35]:
E

tensor([[ 0.0075, -0.0774,  0.6427,  0.5742],
        [ 0.0075, -0.0774,  0.6427,  0.5742],
        [ 0.6630,  0.7047, -0.0045,  1.6668]], grad_fn=<EmbeddingBackward0>)

In [40]:
torch.mean(E, 0)

tensor([0.2260, 0.1833, 0.4270, 0.9384], grad_fn=<MeanBackward1>)

In [43]:
help(nn.Linear)

Help on class Linear in module torch.nn.modules.linear:

class Linear(torch.nn.modules.module.Module)
 |  Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None
 |  
 |  Applies an affine linear transformation to the incoming data: :math:`y = xA^T + b`.
 |  
 |  This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
 |  
 |  On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.
 |  
 |  Args:
 |      in_features: size of each input sample
 |      out_features: size of each output sample
 |      bias: If set to ``False``, the layer will not learn an additive bias.
 |          Default: ``True``
 |  
 |  Shape:
 |      - Input: :math:`(*, H_\text{in})` where :math:`*` means any number of
 |        dimensions including none and :math:`H_\text{in} = \text{in\_features}`.
 |      - Output: :math:`(*, H_\text{out})` where all but the last dimension
 |        are the same shap

In [46]:
help(torch.max)

Help on built-in function max in module torch:

max(...)
    max(input, *, out=None) -> Tensor
    
    Returns the maximum value of all elements in the ``input`` tensor.
    
    .. note::
        The difference between ``max``/``min`` and ``amax``/``amin`` is:
            - ``amax``/``amin`` supports reducing on multiple dimensions,
            - ``amax``/``amin`` does not return indices.
    
        Both ``amax``/``amin`` evenly distribute gradients between equal values
        when there are multiple input elements with the same minimum or maximum value.
    
        For ``max``/``min``:
            - If reduce over all dimensions(no dim specified), gradients evenly distribute between equally ``max``/``min`` values.
            - If reduce over one specified axis, only propagate to the indexed element.
    
    Args:
        input (Tensor): the input tensor.
    
    Keyword args:
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> a = torch.rand

In [48]:
# Exercise 4: From embeddings to a prediction
# NOTE: This exercise depends on Exercise 3 — complete that first.
# TODO:
# 1) Compute mean embedding across tokens: mean_E of shape (4,)
# 2) Create a Linear layer (4 -> 1) called head
# 3) Produce y_pred as shape (1,) or scalar

mean_E = torch.mean(E, 0)  # YOUR CODE HERE
head = nn.Linear(4, 1)  # YOUR CODE HERE

y_pred = head(mean_E)  # YOUR CODE HERE

# Print statements (uncomment after implementing)
print("mean_E.shape:", mean_E.shape)
# print("y_pred:", y_pred, "shape:", y_pred.shape)

# --- autograder asserts (do not delete) ---
assert mean_E.shape == (4,), f"mean_E must be shape (4,), got {tuple(mean_E.shape)}"
assert isinstance(head, nn.Linear) and head.in_features == 4 and head.out_features == 1, "head must be Linear(4->1)"
assert y_pred.numel() == 1, "y_pred must have exactly 1 element"
assert y_pred.requires_grad, "y_pred should require gradients"


mean_E.shape: torch.Size([4])


## Part 3 — Build a Tiny Network

In [52]:
help(nn.Linear)

Help on class Linear in module torch.nn.modules.linear:

class Linear(torch.nn.modules.module.Module)
 |  Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None
 |  
 |  Applies an affine linear transformation to the incoming data: :math:`y = xA^T + b`.
 |  
 |  This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
 |  
 |  On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.
 |  
 |  Args:
 |      in_features: size of each input sample
 |      out_features: size of each output sample
 |      bias: If set to ``False``, the layer will not learn an additive bias.
 |          Default: ``True``
 |  
 |  Shape:
 |      - Input: :math:`(*, H_\text{in})` where :math:`*` means any number of
 |        dimensions including none and :math:`H_\text{in} = \text{in\_features}`.
 |      - Output: :math:`(*, H_\text{out})` where all but the last dimension
 |        are the same shap

In [55]:
# Exercise 5: Define a simple feed-forward network
# Requirements:
# - input_dim = 6
# - hidden_dim = 8
# - output_dim = 1
# - 1 hidden layer + ReLU
# Implement SimpleNet so forward(x) returns shape (batch, 1)

class SimpleNet(nn.Module):
    def __init__(self, input_dim=6, hidden_dim=8, output_dim=1):
        super().__init__()
        # YOUR CODE HERE — define layers (fc1, fc2, activation)
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        self.activation = nn.ReLU()
        

    def forward(self, x):
        # YOUR CODE HERE — implement forward pass
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        return x


model = SimpleNet()
print(model)

# --- autograder asserts (do not delete) ---
assert isinstance(model, nn.Module), "model must be an nn.Module"
params = dict(model.named_parameters())
assert "fc1.weight" in params and "fc2.weight" in params, "Model must have two Linear layers (fc1, fc2)"


SimpleNet(
  (fc1): Linear(in_features=6, out_features=8, bias=True)
  (fc2): Linear(in_features=8, out_features=1, bias=True)
  (activation): ReLU()
)


In [57]:
# Exercise 6: Forward pass with dummy data
# NOTE: This exercise depends on Exercise 5 — complete that first.
# TODO:
# 1) Create dummy input x of shape (4, 6)
# 2) Run out = model(x)
# 3) Print out and out.shape

x = torch.randn(4, 6)  # YOUR CODE HERE
out = model(x)  # YOUR CODE HERE

# Print statements (uncomment after implementing)
print("out=\n", out)
print("out.shape:", out.shape)

# --- autograder asserts (do not delete) ---
assert x.shape == (4, 6), f"x must be shape (4,6), got {tuple(x.shape)}"
assert out.shape == (4, 1), f"out must be shape (4,1), got {tuple(out.shape)}"


out=
 tensor([[-0.0907],
        [-0.2239],
        [-0.4359],
        [-0.2379]], grad_fn=<AddmmBackward0>)
out.shape: torch.Size([4, 1])


## Part 4 — One Training Step

In [58]:
# Exercise 7: One training step
# NOTE: This exercise depends on Exercise 5 — complete that first.
# TODO:
# 1) Create inputs x_train (batch=8, input_dim=6) and targets y_train (shape (8,1))
# 2) Define loss_fn = MSELoss and opt = SGD(model.parameters(), lr=0.1)
# 3) Perform exactly one update step and print loss_before and loss_after

torch.manual_seed(123)  # deterministic for this part

# Create training data (provided for you)
x_train = torch.randn(8, 6)
true_w = torch.tensor([[0.5], [-1.0], [0.3], [0.0], [1.2], [-0.7]])
y_train = x_train @ true_w + 0.01 * torch.randn(8, 1)

loss_fn = nn.MSELoss()  # YOUR CODE HERE
opt = torch.optim.SGD(model.parameters(), lr=0.1)  # YOUR CODE HERE

# YOUR CODE HERE — compute loss_before (forward pass + loss)
loss_before = loss_fn(model(x_train), y_train)

# YOUR CODE HERE — perform backward pass and optimizer step
opt.zero_grad()
loss_before.backward()
opt.step()

# YOUR CODE HERE — compute loss_after (forward pass + loss)
loss_after = loss_fn(model(x_train), y_train)

# Print statements (uncomment after implementing)
print("loss_before:", float(loss_before))
print("loss_after :", float(loss_after))

# --- autograder asserts (do not delete) ---
assert loss_before.shape == (), "loss_before must be a scalar tensor"
assert loss_after.shape == (), "loss_after must be a scalar tensor"
assert float(loss_after) < float(loss_before), "loss_after should be < loss_before after one SGD step"


loss_before: 1.8299105167388916
loss_after : 1.3600938320159912


Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
  print("loss_before:", float(loss_before))


## Optional Stretch (No grade)
If you finish early:
1. Add a second training step and show loss keeps decreasing.
2. Change activation to Tanh and compare loss curves.
