# Tensors: NumPy → PyTorch

This notebook helps you build muscle memory with PyTorch tensors. For each exercise, **predict the output before running the cell.**

**Exercises:**
1. Create training data as PyTorch tensors (translate your NumPy code) — GUIDED
2. Check `.shape`, `.dtype`, and `.device` for each tensor — GUIDED
3. Move your data to GPU and verify the device changed — GUIDED
4. Compute the forward pass: `y_hat = X @ w + b` — SUPPORTED
5. Reshape a batch of 28×28 images into 784-element vectors — SUPPORTED

Remember the **shape/dtype/device trinity** — every tensor has all three, and most bugs come from one of them being wrong.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# For nice plots
plt.style.use('dark_background')
plt.rcParams['figure.figsize'] = [10, 4]

---

## Exercise 1: Create Training Data as PyTorch Tensors (GUIDED)

You know how to generate data with NumPy. Now translate it to PyTorch.

**Task:** Create a simple linear regression dataset using PyTorch tensors.

- 100 samples
- `X` values between -3 and 3
- True relationship: `y = 2.5 * X - 1.0 + noise`

Fill in the blanks below. The NumPy equivalent is shown in comments.

In [None]:
# Set seed for reproducibility
torch.manual_seed(42)

n_samples = 100

# NumPy version:  X = np.random.uniform(-3, 3, (n_samples, 1))
# PyTorch version: torch.rand gives values in [0, 1), so scale and shift
X = ____  # FILL IN: 100x1 tensor with values in [-3, 3)

# True parameters
true_w = 2.5
true_b = -1.0

# NumPy version:  noise = np.random.normal(0, 0.5, (n_samples, 1))
# PyTorch version: torch.randn gives standard normal, so scale it
noise = ____  # FILL IN: 100x1 tensor of Gaussian noise with std=0.5

# Compute y
y = true_w * X + true_b + noise

# Verify
print(f'X shape: {X.shape}')  # Should be [100, 1]
print(f'y shape: {y.shape}')  # Should be [100, 1]
print(f'X range: [{X.min():.2f}, {X.max():.2f}]')

# Plot
plt.scatter(X.numpy(), y.numpy(), alpha=0.6, s=20)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Training Data (PyTorch tensors → NumPy for plotting)')
plt.grid(alpha=0.3)
plt.show()

<details>
<summary><strong>Solution</strong> (click to expand)</summary>

```python
X = torch.rand(n_samples, 1) * 6 - 3      # [0, 1) -> [0, 6) -> [-3, 3)
noise = torch.randn(n_samples, 1) * 0.5    # standard normal * 0.5 = std of 0.5
```

**Key differences from NumPy:**
- `torch.rand` = uniform [0, 1) &nbsp;&nbsp;vs&nbsp;&nbsp; `np.random.uniform(low, high, shape)`
- `torch.randn` = standard normal &nbsp;&nbsp;vs&nbsp;&nbsp; `np.random.normal(mean, std, shape)`
- Shape is passed as separate args `(100, 1)` not a tuple `((100, 1),)`

</details>

---

## Exercise 2: Check Shape, Dtype, and Device (GUIDED)

Every tensor has three fundamental properties: **shape**, **dtype**, and **device**. This is the trinity. When something goes wrong in deep learning, one of these three is usually the culprit.

**Task:** Before running the cell below, predict the output for each tensor. Write your predictions as comments, then run the cell to check.

In [None]:
# Create a few different tensors
a = torch.tensor([1, 2, 3])                        # From a Python list of ints
b = torch.tensor([1.0, 2.0, 3.0])                  # From a Python list of floats
c = torch.zeros(3, 4)                               # 3x4 zeros
d = torch.tensor([[True, False], [False, True]])    # Boolean tensor

# YOUR PREDICTIONS (fill these in BEFORE running):
#   a -> shape: ____  dtype: ____  device: ____
#   b -> shape: ____  dtype: ____  device: ____
#   c -> shape: ____  dtype: ____  device: ____
#   d -> shape: ____  dtype: ____  device: ____

for name, tensor in [('a', a), ('b', b), ('c', c), ('d', d)]:
    print(f'{name} -> shape: {str(tensor.shape):<16} dtype: {str(tensor.dtype):<16} device: {tensor.device}')

<details>
<summary><strong>Solution</strong> (click to expand)</summary>

```
a -> shape: torch.Size([3])    dtype: torch.int64    device: cpu
b -> shape: torch.Size([3])    dtype: torch.float32  device: cpu
c -> shape: torch.Size([3, 4]) dtype: torch.float32  device: cpu
d -> shape: torch.Size([2, 2]) dtype: torch.bool     device: cpu
```

**Key insights:**
- Python `int` list → `torch.int64`. Python `float` list → `torch.float32`.
- `torch.zeros` defaults to `float32` — the standard dtype for neural network weights.
- Everything starts on `cpu` unless you explicitly move it.
- Boolean tensors exist and are useful for masking.

</details>

Now inspect your training data from Exercise 1:

In [None]:
# Check the trinity for your training data
for name, tensor in [('X', X), ('y', y)]:
    print(f'{name} -> shape: {str(tensor.shape):<16} dtype: {str(tensor.dtype):<16} device: {tensor.device}')

# Quick quiz: why is X float32 and not float64?
# (Hint: what does torch.rand return by default?)

---

## Exercise 3: Move Data to GPU (GUIDED)

The `.to(device)` pattern is how you move tensors between CPU and GPU. This is something you'll do at the start of every training script.

**Task:** Move your training data to GPU (if available) and verify the device changed.

In [None]:
# Step 1: Detect available device
# This pattern works on any machine: GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Step 2: Move tensors to the device
X_device = ____  # FILL IN: move X to `device`
y_device = ____  # FILL IN: move y to `device`

# Step 3: Verify
print(f'X_device is on: {X_device.device}')
print(f'y_device is on: {y_device.device}')

# Step 4: Check that the original tensors are unchanged
print(f'\nOriginal X is still on: {X.device}')
print(f'Original y is still on: {y.device}')
print('\n.to() returns a NEW tensor. It does not modify in place.')

<details>
<summary><strong>Solution</strong> (click to expand)</summary>

```python
X_device = X.to(device)
y_device = y.to(device)
```

**Key points:**
- `.to(device)` returns a new tensor on the target device.
- If you're already on the target device, `.to()` returns the same tensor (no copy).
- A common mistake: forgetting to reassign. `X.to(device)` alone does nothing because `.to()` doesn't modify in place.
- In training scripts, you'll see this pattern: `X, y = X.to(device), y.to(device)`

</details>

### NumPy Interop: The CPU Requirement

One thing that trips people up: NumPy only works with CPU tensors. Try this:

In [None]:
# This works (CPU tensor)
cpu_tensor = torch.tensor([1.0, 2.0, 3.0])
print(f'CPU tensor to numpy: {cpu_tensor.numpy()}')

# If you have a GPU tensor, you need .cpu() first
gpu_tensor = cpu_tensor.to(device)
try:
    gpu_tensor.numpy()  # This will fail if device is cuda
    print(f'GPU tensor to numpy: worked (you are on CPU, so .to(device) was a no-op)')
except RuntimeError as e:
    print(f'GPU tensor to numpy: FAILED \u2014 {e}')
    print(f'Fix: gpu_tensor.cpu().numpy() = {gpu_tensor.cpu().numpy()}')

---

## Exercise 4: Compute the Forward Pass (SUPPORTED)

A linear model's forward pass is: **`y_hat = X @ w + b`**

The `@` operator is matrix multiplication. This single line is the entire prediction step for linear regression.

**Task:**
1. Create a weight vector `w` of shape `(1, 1)` and a bias `b` of shape `(1,)`
2. Compute `y_hat = X @ w + b` using your training data
3. Verify that `y_hat` has the same shape as `y`
4. Plot the predictions against the true values

**Hints:**
- Use `torch.randn` for random initialization
- X is `(100, 1)`, so w needs to be `(1, 1)` for the matrix multiply to work
- Think about what shapes result from `(100, 1) @ (1, 1) + (1,)`

In [None]:
# TODO: Initialize w and b with random values
# w = ...
# b = ...

# TODO: Compute the forward pass
# y_hat = ...

# TODO: Verify shapes match
# print(f'X shape: {X.shape}')
# print(f'w shape: {w.shape}')
# print(f'y_hat shape: {y_hat.shape}')
# print(f'y shape: {y.shape}')

# TODO: Plot predictions vs true values
# Sort X for a clean line plot
# sorted_idx = X.squeeze().argsort()
# plt.scatter(X.numpy(), y.numpy(), alpha=0.4, s=20, label='True data')
# plt.plot(X[sorted_idx].numpy(), y_hat[sorted_idx].detach().numpy(), 'r-', linewidth=2, label='Predictions (random w, b)')
# plt.xlabel('X')
# plt.ylabel('y')
# plt.legend()
# plt.title('Forward Pass with Random Parameters')
# plt.grid(alpha=0.3)
# plt.show()

<details>
<summary><strong>Solution</strong> (click to expand)</summary>

```python
# Initialize random parameters
w = torch.randn(1, 1)
b = torch.randn(1)

# Forward pass
y_hat = X @ w + b

# Verify shapes
print(f'X shape: {X.shape}')         # [100, 1]
print(f'w shape: {w.shape}')         # [1, 1]
print(f'y_hat shape: {y_hat.shape}') # [100, 1]
print(f'y shape: {y.shape}')         # [100, 1]

# Plot
sorted_idx = X.squeeze().argsort()
plt.scatter(X.numpy(), y.numpy(), alpha=0.4, s=20, label='True data')
plt.plot(X[sorted_idx].numpy(), y_hat[sorted_idx].detach().numpy(), 'r-', linewidth=2, label='Predictions (random w, b)')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Forward Pass with Random Parameters')
plt.grid(alpha=0.3)
plt.show()
```

**Why the line is wrong:** `w` and `b` are random, so the predictions are garbage. Training adjusts these parameters to minimize the error. But the *shape* is correct, and that's what matters here.

**Shape arithmetic:** `(100, 1) @ (1, 1)` = `(100, 1)`, then `+ (1,)` broadcasts to `(100, 1)`. Broadcasting is PyTorch automatically expanding the `(1,)` bias to match.

</details>

---

## Exercise 5: Reshape a Batch of Images (SUPPORTED)

In image classification, you often need to flatten 2D images into 1D vectors. For example, MNIST images are 28×28 pixels, and a dense (fully-connected) layer expects a flat vector of 784 values.

**Task:**
1. Create a fake batch of 16 grayscale 28×28 images
2. Reshape the batch from `(16, 1, 28, 28)` to `(16, 784)` using two different methods
3. Verify the total number of elements didn't change

**Hints:**
- `tensor.view(new_shape)` reshapes without copying data
- `tensor.reshape(new_shape)` also works (copies if needed)
- You can use `-1` in one dimension to let PyTorch infer the size
- Think about what `(16, 1, 28, 28)` means: 16 images, 1 channel, 28 height, 28 width

In [None]:
# Create a fake batch of images: 16 images, 1 channel (grayscale), 28x28 pixels
images = torch.randn(16, 1, 28, 28)
print(f'Original shape: {images.shape}')
print(f'Total elements: {images.numel()}')

# TODO: Method 1 - Use .view() to flatten each image to 784 elements
# flat_v = ...
# print(f'After view:    {flat_v.shape}')   # Should be [16, 784]

# TODO: Method 2 - Use .reshape() to do the same thing
# flat_r = ...
# print(f'After reshape: {flat_r.shape}')   # Should be [16, 784]

# TODO: Verify nothing was lost
# print(f'Elements preserved: {flat_v.numel() == images.numel()}')

# Bonus: visualize what flattening does to one image
# fig, axes = plt.subplots(1, 2, figsize=(10, 3))
# axes[0].imshow(images[0, 0].numpy(), cmap='gray')
# axes[0].set_title(f'2D image ({images[0, 0].shape})')
# axes[1].plot(flat_v[0].numpy(), linewidth=0.5)
# axes[1].set_title(f'Flattened vector ({flat_v[0].shape})')
# axes[1].set_xlabel('Index')
# plt.tight_layout()
# plt.show()

<details>
<summary><strong>Solution</strong> (click to expand)</summary>

```python
# Method 1: .view()
flat_v = images.view(16, -1)   # -1 means "infer this dimension" -> 1*28*28 = 784
print(f'After view:    {flat_v.shape}')   # [16, 784]

# Method 2: .reshape()
flat_r = images.reshape(16, -1)
print(f'After reshape: {flat_r.shape}')   # [16, 784]

# Verify nothing was lost
print(f'Elements preserved: {flat_v.numel() == images.numel()}')

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(10, 3))
axes[0].imshow(images[0, 0].numpy(), cmap='gray')
axes[0].set_title(f'2D image ({images[0, 0].shape})')
axes[1].plot(flat_v[0].numpy(), linewidth=0.5)
axes[1].set_title(f'Flattened vector ({flat_v[0].shape})')
axes[1].set_xlabel('Index')
plt.tight_layout()
plt.show()
```

**`.view()` vs `.reshape()`:**
- `.view()` requires the tensor to be contiguous in memory. It never copies data.
- `.reshape()` works on any tensor. It returns a view if possible, copies if not.
- In practice, `.view()` is more common in PyTorch code because it's explicit about not copying.

**The `-1` trick:** When you write `.view(16, -1)`, PyTorch computes the missing dimension: `total elements / 16 = 12544 / 16 = 784`. This is safer than hardcoding 784 because it adapts to different image sizes.

</details>

---

## Key Takeaways

- **The shape/dtype/device trinity**: Every tensor has all three. Most bugs come from one being wrong.
- **NumPy interop**: `torch.from_numpy()` and `.numpy()` convert between the two. GPU tensors need `.cpu()` first.
- **GPU transfer**: `.to(device)` returns a new tensor. It does not modify in place.
- **Matrix multiply with `@`**: The forward pass `y_hat = X @ w + b` is the foundation of every neural network.
- **Reshaping**: `.view()` and `.reshape()` rearrange dimensions without changing the data. Use `-1` to infer a dimension.
- **PyTorch defaults**: `float32` for most operations, `cpu` for device. Both are intentional choices.