# PyTorch Tensors — Hands-On Tutorial

**Month 2, Week 1** — Sequence Models

Tensors are the fundamental data structure in PyTorch. Think of them as n-dimensional arrays that can run on GPU.

## What You'll Learn

1. Creating tensors (multiple ways)
2. Tensor attributes (shape, dtype, device)
3. Operations (math, reshaping, indexing)
4. GPU/MPS acceleration
5. Preview: connection to autograd

In [1]:
import torch
import numpy as np

print(f"PyTorch version: {torch.__version__}")
print(f"MPS available: {torch.backends.mps.is_available()}")

PyTorch version: 2.9.1
MPS available: True


---

## 1. Creating Tensors

### From Python data

In [2]:
# From a Python list
data = [[1, 2, 3], [4, 5, 6]]
t1 = torch.tensor(data)
print("From list:")
print(t1)
print(f"Shape: {t1.shape}")

From list:
tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3])


In [3]:
# From NumPy (shares memory by default!)
np_array = np.array([[1.0, 2.0], [3.0, 4.0]])
t2 = torch.from_numpy(np_array)
print("From NumPy:")
print(t2)

# Modifying numpy changes the tensor!
np_array[0, 0] = 999
print(f"\nAfter modifying numpy: {t2[0, 0]}")

From NumPy:
tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

After modifying numpy: 999.0


### Factory functions

In [4]:
# Zeros and ones
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)
print(f"Zeros:\n{zeros}")
print(f"\nOnes:\n{ones}")

Zeros:
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Ones:
tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [5]:
# Random tensors (very common for initializing weights)
rand_uniform = torch.rand(2, 3)      # Uniform [0, 1)
rand_normal = torch.randn(2, 3)      # Normal (mean=0, std=1)
rand_int = torch.randint(0, 10, (2, 3))  # Integers [0, 10)

print(f"Uniform [0,1):\n{rand_uniform}")
print(f"\nNormal:\n{rand_normal}")
print(f"\nIntegers [0,10):\n{rand_int}")

Uniform [0,1):
tensor([[0.3893, 0.9748, 0.9735],
        [0.4535, 0.9762, 0.4830]])

Normal:
tensor([[ 1.8515, -0.0935,  0.7531],
        [-0.4094, -0.1737, -1.6308]])

Integers [0,10):
tensor([[9, 5, 7],
        [7, 2, 7]])


In [6]:
# Like another tensor (same shape, dtype, device)
template = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
zeros_like = torch.zeros_like(template)
rand_like = torch.rand_like(template)

print(f"Template: {template.shape}, dtype={template.dtype}")
print(f"zeros_like:\n{zeros_like}")

Template: torch.Size([2, 2]), dtype=torch.float32
zeros_like:
tensor([[0., 0.],
        [0., 0.]])


In [7]:
# Sequences
arange = torch.arange(0, 10, 2)  # start, end, step
linspace = torch.linspace(0, 1, 5)  # start, end, num_points

print(f"arange(0, 10, 2): {arange}")
print(f"linspace(0, 1, 5): {linspace}")

arange(0, 10, 2): tensor([0, 2, 4, 6, 8])
linspace(0, 1, 5): tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


---

## 2. Tensor Attributes

Every tensor has three key attributes:

In [8]:
t = torch.rand(3, 4, 5)

print(f"Shape: {t.shape}")       # Dimensions
print(f"Dtype: {t.dtype}")       # Data type
print(f"Device: {t.device}")     # CPU or GPU

Shape: torch.Size([3, 4, 5])
Dtype: torch.float32
Device: cpu


In [9]:
# Common dtypes for deep learning
float32 = torch.tensor([1.0], dtype=torch.float32)  # Default, most common
float16 = torch.tensor([1.0], dtype=torch.float16)  # Half precision (faster on GPU)
int64 = torch.tensor([1], dtype=torch.int64)        # Labels/indices
bool_t = torch.tensor([True, False], dtype=torch.bool)  # Masks

print(f"float32: {float32.dtype}")
print(f"float16: {float16.dtype}")
print(f"int64: {int64.dtype}")
print(f"bool: {bool_t.dtype}")

float32: torch.float32
float16: torch.float16
int64: torch.int64
bool: torch.bool


---

## 3. Basic Operations

### Arithmetic

In [10]:
a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

# Element-wise operations
print(f"a + b:\n{a + b}")
print(f"\na * b (element-wise):\n{a * b}")
print(f"\na ** 2:\n{a ** 2}")

a + b:
tensor([[ 6.,  8.],
        [10., 12.]])

a * b (element-wise):
tensor([[ 5., 12.],
        [21., 32.]])

a ** 2:
tensor([[ 1.,  4.],
        [ 9., 16.]])


In [11]:
# Matrix multiplication (crucial for neural networks!)
# Three equivalent ways:
result1 = a @ b
result2 = torch.matmul(a, b)
result3 = a.matmul(b)

print(f"a @ b (matrix multiply):\n{result1}")
print(f"\nAll equal: {torch.equal(result1, result2) and torch.equal(result2, result3)}")

a @ b (matrix multiply):
tensor([[19., 22.],
        [43., 50.]])

All equal: True


In [12]:
# Aggregations
t = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32)

print(f"Tensor:\n{t}")
print(f"\nSum all: {t.sum()}")
print(f"Sum per row (dim=1): {t.sum(dim=1)}")
print(f"Sum per column (dim=0): {t.sum(dim=0)}")
print(f"Mean: {t.mean()}")
print(f"Max: {t.max()}")
print(f"Argmax: {t.argmax()}")

Tensor:
tensor([[1., 2., 3.],
        [4., 5., 6.]])

Sum all: 21.0
Sum per row (dim=1): tensor([ 6., 15.])
Sum per column (dim=0): tensor([5., 7., 9.])
Mean: 3.5
Max: 6.0
Argmax: 5


### Reshaping

In [13]:
t = torch.arange(12)
print(f"Original: {t}")
print(f"Shape: {t.shape}")

# Reshape (total elements must match)
reshaped = t.reshape(3, 4)
print(f"\nReshaped to (3, 4):\n{reshaped}")

# Use -1 to infer dimension
auto_reshaped = t.reshape(2, -1)  # -1 becomes 6
print(f"\nReshaped to (2, -1):\n{auto_reshaped}")

Original: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Shape: torch.Size([12])

Reshaped to (3, 4):
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Reshaped to (2, -1):
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


In [19]:
# view vs reshape: view requires contiguous memory, reshape is more flexible
t = torch.arange(6).reshape(2, 3)
print(f"Original:\n{t}")

# Flatten (very common before fully connected layers)
flattened = t.flatten()
print(f"\nFlattened: {flattened}")

# Squeeze/unsqueeze (add/remove dimensions of size 1)
t = torch.tensor([1, 2, 3])
print(f"\nOriginal shape: {t.shape}")
print(f"unsqueeze(0) shape: {t.unsqueeze(0).shape}")  # Add batch dim
print(f"unsqueeze(1) shape: {t.unsqueeze(1).shape}")  # Add feature dim

Original:
tensor([[0, 1, 2],
        [3, 4, 5]])

Flattened: tensor([0, 1, 2, 3, 4, 5])

Original shape: torch.Size([3])
unsqueeze(0) shape: torch.Size([1, 3])
unsqueeze(1) shape: torch.Size([3, 1])


### Indexing and Slicing

In [20]:
t = torch.arange(12).reshape(3, 4)
print(f"Tensor:\n{t}")

# Standard indexing (like NumPy)
print(f"\nt[0]: {t[0]}")
print(f"t[0, 0]: {t[0, 0]}")
print(f"t[:, 0]: {t[:, 0]}")
print(f"t[1:, 1:3]:\n{t[1:, 1:3]}")

Tensor:
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

t[0]: tensor([0, 1, 2, 3])
t[0, 0]: 0
t[:, 0]: tensor([0, 4, 8])
t[1:, 1:3]:
tensor([[ 5,  6],
        [ 9, 10]])


In [21]:
# Boolean indexing (for masking)
t = torch.randn(5)
print(f"Tensor: {t}")
print(f"Positive mask: {t > 0}")
print(f"Positive values: {t[t > 0]}")

Tensor: tensor([-1.1492,  0.1502, -0.8677,  2.5607, -0.7654])
Positive mask: tensor([False,  True, False,  True, False])
Positive values: tensor([0.1502, 2.5607])


---

## 4. GPU/MPS Acceleration

PyTorch can run on:
- `cpu` — Default
- `cuda` — NVIDIA GPUs
- `mps` — Apple Silicon (M1/M2/M3)

In [22]:
# Check available devices
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# Choose best available device
if torch.cuda.is_available():
    device = torch.device("cuda")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")

print(f"\nUsing device: {device}")

CUDA available: False
MPS available: True

Using device: mps


In [23]:
# Move tensor to device
t_cpu = torch.randn(1000, 1000)
t_gpu = t_cpu.to(device)

print(f"CPU tensor device: {t_cpu.device}")
print(f"GPU tensor device: {t_gpu.device}")

CPU tensor device: cpu
GPU tensor device: mps:0


In [24]:
# Create directly on device
t_direct = torch.randn(1000, 1000, device=device)
print(f"Created on device: {t_direct.device}")

Created on device: mps:0


In [25]:
# Speed comparison: matrix multiplication
import time

size = 2000
a_cpu = torch.randn(size, size)
b_cpu = torch.randn(size, size)

# CPU timing
start = time.time()
c_cpu = a_cpu @ b_cpu
cpu_time = time.time() - start
print(f"CPU matmul: {cpu_time:.4f} seconds")

# GPU timing (if available)
if device.type != "cpu":
    a_gpu = a_cpu.to(device)
    b_gpu = b_cpu.to(device)
    
    # Warm up GPU
    _ = a_gpu @ b_gpu
    if device.type == "mps":
        torch.mps.synchronize()
    elif device.type == "cuda":
        torch.cuda.synchronize()
    
    start = time.time()
    c_gpu = a_gpu @ b_gpu
    if device.type == "mps":
        torch.mps.synchronize()
    elif device.type == "cuda":
        torch.cuda.synchronize()
    gpu_time = time.time() - start
    
    print(f"GPU matmul: {gpu_time:.4f} seconds")
    print(f"Speedup: {cpu_time / gpu_time:.1f}x")

CPU matmul: 0.0070 seconds
GPU matmul: 0.0047 seconds
Speedup: 1.5x


---

## 5. Preview: Autograd Connection

Tomorrow we'll dive deep into autograd. Here's the key insight:

In [26]:
# requires_grad=True tells PyTorch to track operations for gradients
x = torch.tensor([2.0], requires_grad=True)

# Forward pass: compute y = x^2 + 3x
y = x ** 2 + 3 * x
print(f"x = {x.item()}")
print(f"y = x² + 3x = {y.item()}")

# Backward pass: compute dy/dx
y.backward()
print(f"dy/dx = 2x + 3 = {x.grad.item()}")
print(f"Expected: 2*2 + 3 = 7")

x = 2.0
y = x² + 3x = 10.0
dy/dx = 2x + 3 = 7.0
Expected: 2*2 + 3 = 7


This automatic differentiation is what makes PyTorch powerful for training neural networks. We'll explore it fully tomorrow.

---

## Exercises

Try these to solidify your understanding:

In [27]:
# Exercise 1: Create a 3x3 identity matrix
# Hint: torch.eye()

identity = torch.eye(3)  # Your code here
print(identity)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


In [35]:
# Exercise 2: Create a tensor of shape (2, 3, 4) filled with random values,
# then reshape it to (6, 4)

t = torch.randn([2, 3, 4])  # Your code here
reshaped = t.reshape(6, -1)  # Your code here
print(f"Original shape: {t.shape}")
print(f"Reshaped: {reshaped.shape}")

Original shape: torch.Size([2, 3, 4])
Reshaped: torch.Size([6, 4])


In [36]:
# Exercise 3: Given two matrices, compute their matrix product on GPU (if available)

a = torch.randn(100, 50)
b = torch.randn(50, 100)

# Move to device and multiply
result = a.to(device) @ b.to(device)  # Your code here
print(f"Result shape: {result.shape}")
print(f"Result device: {result.device}")

Result shape: torch.Size([100, 100])
Result device: mps:0


In [37]:
# Exercise 4: Normalize a tensor to have mean=0 and std=1
# Formula: (x - mean) / std

t = torch.tensor([1.0, 5.0, 3.0, 9.0, 2.0])
normalized = (t - t.mean()) / t.std()  # Your code here

print(f"Original: {t}")
print(f"Normalized: {normalized}")
print(f"Mean: {normalized.mean():.6f}")
print(f"Std: {normalized.std():.6f}")

Original: tensor([1., 5., 3., 9., 2.])
Normalized: tensor([-0.9487,  0.3162, -0.3162,  1.5811, -0.6325])
Mean: 0.000000
Std: 1.000000


---

## Summary

| Concept | Key Functions |
|---------|---------------|
| Creation | `tensor()`, `zeros()`, `ones()`, `rand()`, `randn()` |
| Attributes | `.shape`, `.dtype`, `.device` |
| Math | `+`, `*`, `@`, `.sum()`, `.mean()` |
| Reshaping | `.reshape()`, `.flatten()`, `.unsqueeze()` |
| GPU | `.to(device)`, `device=` parameter |

## Tomorrow

**Autograd deep dive** — How PyTorch computes gradients automatically, which is the foundation of training neural networks.