# 01a: NumPy & PyTorch Primer — Core

**Week 1, Days 1-2** | Foundations

**Prerequisites**: Basic Python (you're a systems programmer, so ✓)

**Time**: ~60 minutes

---

## Learning Objectives

By the end of this notebook, you will be able to:

- [ ] Create and manipulate tensors in NumPy and PyTorch
- [ ] Use broadcasting to operate on arrays of different shapes
- [ ] Apply advanced indexing to slice and filter data
- [ ] Compute gradients automatically with autograd
- [ ] Move tensors between CPU and GPU

In [None]:
# Provided Code - Do NOT Edit
import numpy as np
import torch
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

# Check what we're working with
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.backends.mps.is_available():
    print("MPS (Apple Silicon) available: True")

# ═══════════════════════════════════════════════════════════════════════════════
# INTRO
# ═══════════════════════════════════════════════════════════════════════════════

![Code becoming vectors](hero-intro.png)

## The Setup

You're building a code search tool. Not grep—*semantic* search. The kind that knows `allocate_buffer()` and `malloc()` are related even though they share zero characters.

The trick? Turn code into vectors. Functions that do similar things end up as similar vectors. Search becomes "find the vectors closest to my query."

Your prototype needs to:
1. Store code embeddings (vectors, lots of them)
2. Compute similarity between query and all snippets (fast matrix math)
3. Extract the top results (slicing and filtering)
4. Eventually: learn better embeddings from user feedback (gradients)
5. Eventually: not take forever (GPU)

Today we're building the foundation. By the end of this notebook, you'll have a working similarity search over synthetic embeddings—all the tensor operations you need for the real thing.

Let's go.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 0: THE PROBLEM
# ═══════════════════════════════════════════════════════════════════════════════

## What Are We Actually Building?

Semantic code search works like this:

```
1. Encode all code snippets as vectors (embeddings)
2. Encode the search query as a vector
3. Find code vectors closest to query vector
4. Return those snippets
```

The "encoding" part is a neural network (we'll get there in Week 2). Today's job is everything else: storing vectors, computing distances, extracting results.

Here's what the data looks like:

- **Code corpus**: 1000 functions, each embedded as a 384-dimensional vector
- **Query**: A single 384-dimensional vector
- **Similarity**: Cosine similarity (dot product of normalized vectors)
- **Output**: Top 10 most similar functions

Simple enough, right? Let's see what tools we need.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 1: INTUITION — Tensors
# ═══════════════════════════════════════════════════════════════════════════════

## What *is* a tensor?

Imagine organizing data in space:

- **A single number** (similarity score: 0.87) is a **scalar** — a point
- **A list of numbers** (one embedding: [0.1, -0.3, 0.8, ...]) is a **vector** — a line
- **A grid of numbers** (all 1000 embeddings stacked) is a **matrix** — a sheet
- **A cube** (batch of users × batch of queries × embedding dims) is a **3D tensor**

And it keeps going. A **tensor** is just an n-dimensional array of numbers.

```
0D: scalar       → shape: ()        → similarity score
1D: vector       → shape: (384,)    → one embedding
2D: matrix       → shape: (1000, 384)   → all embeddings
3D: 3-tensor     → shape: (32, 1000, 384)   → batched search
```

The **rank** of a tensor is how many indices you need to grab a single number. A matrix needs two (row, column). A 3D tensor needs three.

For our code search:
- Query embedding: 1D tensor, shape (384,)
- Corpus embeddings: 2D tensor, shape (1000, 384)
- Similarity scores: 1D tensor, shape (1000,)

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 2: CODE + VIZ — Tensors
# ═══════════════════════════════════════════════════════════════════════════════

# -------------------------------------------------------------------------------
# Problem 1: Create the Corpus Embeddings
# -------------------------------------------------------------------------------

Let's start by creating synthetic embeddings for our code corpus. In the real system, these would come from a neural network. For now, we'll use random vectors (normalized, so they live on the unit sphere).

Your task:
1. Create a `corpus` tensor of shape (1000, 384) with random values
2. Normalize each row to have unit length (L2 norm = 1)
3. Verify the shape and that rows are normalized

Hint: To normalize, divide each row by its L2 norm. Use `np.linalg.norm(x, axis=1, keepdims=True)`

In [None]:
# Provided Code - Do NOT Edit
np.random.seed(42)  # For reproducibility
n_functions = 1000
embedding_dim = 384

In [None]:
# --- YOUR CODE BELOW ---

# Create random embeddings
corpus_raw = None  # TODO: shape (1000, 384)

# Normalize each row
corpus = None  # TODO: divide by row norms

# Verify
print(f"Corpus shape: {corpus.shape}")
print(f"First row norm: {np.linalg.norm(corpus[0]):.4f}")
print(f"Last row norm: {np.linalg.norm(corpus[-1]):.4f}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ corpus_raw = np.random.randn(n_functions, embedding_dim)
# │ norms = np.linalg.norm(corpus_raw, axis=1, keepdims=True)
# │ corpus = corpus_raw / norms
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert corpus.shape == (1000, 384), f"Expected shape (1000, 384), got {corpus.shape}"
assert np.allclose(np.linalg.norm(corpus, axis=1), 1.0), "Rows should have unit norm"
# Tests pass. Moving on.

# -------------------------------------------------------------------------------
# Problem 2: Create the Query Embedding
# -------------------------------------------------------------------------------

Now let's create a query embedding. In the real system, this would be the embedding of the user's search query (like "memory allocation function"). We'll simulate one.

Your task:
1. Create a `query` tensor of shape (384,) with random values
2. Normalize it to unit length
3. Verify the shape and norm

In [None]:
# --- YOUR CODE BELOW ---

# Create random query embedding
query_raw = None  # TODO: shape (384,)

# Normalize
query = None  # TODO

# Verify
print(f"Query shape: {query.shape}")
print(f"Query norm: {np.linalg.norm(query):.4f}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ query_raw = np.random.randn(embedding_dim)
# │ query = query_raw / np.linalg.norm(query_raw)
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert query.shape == (384,), f"Expected shape (384,), got {query.shape}"
assert np.isclose(np.linalg.norm(query), 1.0), "Query should have unit norm"
# Tests pass. Moving on.

Let's visualize what we've built. High-dimensional vectors are hard to picture, so we'll project down to 2D.

In [None]:
# Provided Code - Do NOT Edit
# Visualize using PCA projection to 2D (just for intuition)
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
corpus_2d = pca.fit_transform(corpus)
query_2d = pca.transform(query.reshape(1, -1))[0]

plt.figure(figsize=(10, 8))
plt.scatter(corpus_2d[:, 0], corpus_2d[:, 1], alpha=0.3, s=10, label='Code snippets')
plt.scatter(query_2d[0], query_2d[1], color='red', s=200, marker='*', label='Query', zorder=5)
plt.xlabel('PCA dimension 1')
plt.ylabel('PCA dimension 2')
plt.title('Code Embeddings (projected to 2D)\nRed star = search query')
plt.legend()
plt.show()

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 1: INTUITION — Broadcasting
# ═══════════════════════════════════════════════════════════════════════════════

## Computing similarity: the naive way vs the smart way

To find similar code, we need to compute the cosine similarity between our query and *every* snippet in the corpus. Cosine similarity between normalized vectors is just the dot product.

The naive way:
```python
similarities = []
for i in range(1000):
    sim = np.dot(query, corpus[i])
    similarities.append(sim)
```

This works, but it's slow. We're calling `np.dot` 1000 times, and Python loops are molasses.

The smart way: **Broadcasting**.

Broadcasting lets NumPy/PyTorch "stretch" arrays to compatible shapes and compute element-wise operations in C (fast). When we write `corpus @ query`, we get all 1000 dot products in one operation.

```
corpus: (1000, 384)
query:        (384,)
─────────────────────
result: (1000,)       ← each row dotted with query
```

The shapes align from the right. The `@` operator does matrix-vector multiplication, computing 1000 dot products at once.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 2: CODE + VIZ — Broadcasting
# ═══════════════════════════════════════════════════════════════════════════════

# -------------------------------------------------------------------------------
# Problem 3: Compute Similarity Scores
# -------------------------------------------------------------------------------

Now let's compute the similarity between our query and every code snippet in the corpus. Since both are normalized, cosine similarity = dot product.

Your task:
1. Compute `similarities` using matrix-vector multiplication: `corpus @ query`
2. Verify the shape is (1000,)
3. Find the range of similarity scores

In [None]:
# --- YOUR CODE BELOW ---

# Compute all similarities at once
similarities = None  # TODO: corpus @ query

# Verify
print(f"Similarities shape: {similarities.shape}")
print(f"Min similarity: {similarities.min():.4f}")
print(f"Max similarity: {similarities.max():.4f}")
print(f"Mean similarity: {similarities.mean():.4f}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ similarities = corpus @ query
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert similarities.shape == (1000,), f"Expected shape (1000,), got {similarities.shape}"
# Tests pass. Moving on.

In [None]:
# Provided Code - Do NOT Edit
# Visualize the similarity distribution
plt.figure(figsize=(10, 4))
plt.hist(similarities, bins=50, edgecolor='black', alpha=0.7)
plt.axvline(similarities.mean(), color='red', linestyle='--', label=f'Mean: {similarities.mean():.3f}')
plt.xlabel('Cosine Similarity')
plt.ylabel('Count')
plt.title('Distribution of Similarity Scores\n(Query vs All Code Snippets)')
plt.legend()
plt.show()

The similarity scores form a bell curve around 0. This makes sense—random vectors in high dimensions are nearly orthogonal (uncorrelated). The ones with high scores happened to point in a similar direction by chance.

With real embeddings, similar code would cluster, creating meaningful high-similarity matches.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 1: INTUITION — Indexing
# ═══════════════════════════════════════════════════════════════════════════════

## Extracting the results we care about

We have 1000 similarity scores. But users don't want all 1000—they want the top 10 most similar snippets.

NumPy's indexing is incredibly powerful. You can:
- Grab specific elements: `arr[5]`
- Grab slices: `arr[10:20]`
- Grab by condition: `arr[arr > 0.1]`
- Grab by index array: `arr[[3, 7, 12, 99]]`

For our search, we need to:
1. Find the *indices* of the top 10 scores: `np.argsort()` + slicing
2. Extract those scores: fancy indexing with the indices

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 2: CODE + VIZ — Indexing
# ═══════════════════════════════════════════════════════════════════════════════

# -------------------------------------------------------------------------------
# Problem 4: Find Top-K Results
# -------------------------------------------------------------------------------

Now let's extract the top 10 search results. We need the indices (which snippets?) and the scores (how similar?).

Your task:
1. Use `np.argsort(similarities)` to get indices that would sort by similarity
2. Take the last 10 indices (highest scores) and reverse them (descending order)
3. Use those indices to get the corresponding similarity scores

Hint: `argsort` returns indices in ascending order. `[-10:][::-1]` gets the last 10, reversed.

In [None]:
# --- YOUR CODE BELOW ---

k = 10  # Number of results to return

# Get indices that would sort similarities (ascending)
sorted_indices = None  # TODO: np.argsort(...)

# Get top-k indices (highest scores, descending)
top_k_indices = None  # TODO: last k, reversed

# Get corresponding scores
top_k_scores = None  # TODO: fancy indexing

# Display results
print("Top 10 Search Results:")
print(f"{'Rank':<6} {'Index':<10} {'Similarity':<12}")
print("-" * 28)
for rank, (idx, score) in enumerate(zip(top_k_indices, top_k_scores), 1):
    print(f"{rank:<6} {idx:<10} {score:<12.4f}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ sorted_indices = np.argsort(similarities)
# │ top_k_indices = sorted_indices[-k:][::-1]
# │ top_k_scores = similarities[top_k_indices]
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert len(top_k_indices) == 10, f"Expected 10 indices, got {len(top_k_indices)}"
assert top_k_scores[0] >= top_k_scores[-1], "Scores should be in descending order"
assert top_k_scores[0] == similarities.max(), "First score should be the maximum"
# Tests pass. Moving on.

# -------------------------------------------------------------------------------
# Problem 5: Threshold Filtering
# -------------------------------------------------------------------------------

Sometimes we want to filter by a minimum similarity threshold instead of (or in addition to) top-k. Let's find all snippets with similarity > 0.15.

Your task:
1. Create a boolean mask: `similarities > 0.15`
2. Use the mask to get matching indices: `np.where(mask)[0]`
3. Count how many matches we have

In [None]:
# --- YOUR CODE BELOW ---

threshold = 0.15

# Create boolean mask
above_threshold = None  # TODO: similarities > threshold

# Get indices of True values
matching_indices = None  # TODO: np.where(...)

# Get matching scores  
matching_scores = None  # TODO: fancy indexing

print(f"Snippets above threshold ({threshold}): {len(matching_indices)}")
print(f"Scores: {matching_scores}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ above_threshold = similarities > threshold
# │ matching_indices = np.where(above_threshold)[0]
# │ matching_scores = similarities[matching_indices]
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert all(similarities[matching_indices] > threshold), "All matches should be above threshold"
# Tests pass. Moving on.

In [None]:
# Provided Code - Do NOT Edit
# Visualize: highlight top results in 2D projection
plt.figure(figsize=(10, 8))
plt.scatter(corpus_2d[:, 0], corpus_2d[:, 1], alpha=0.3, s=10, c='gray', label='All snippets')
plt.scatter(corpus_2d[top_k_indices, 0], corpus_2d[top_k_indices, 1], 
            c='blue', s=100, marker='o', label='Top 10 results', zorder=4)
plt.scatter(query_2d[0], query_2d[1], color='red', s=200, marker='*', label='Query', zorder=5)

# Add rank labels
for rank, idx in enumerate(top_k_indices[:5], 1):
    plt.annotate(f'#{rank}', (corpus_2d[idx, 0], corpus_2d[idx, 1]), fontsize=10)

plt.xlabel('PCA dimension 1')
plt.ylabel('PCA dimension 2')
plt.title('Search Results Highlighted')
plt.legend()
plt.show()

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 1: INTUITION — Autograd
# ═══════════════════════════════════════════════════════════════════════════════

## Making the embeddings better

So far we've been using random embeddings. They work, but they're not meaningful. Real semantic search needs embeddings where similar code → similar vectors.

How do we learn better embeddings? **Gradients**.

Here's the idea:
1. Define what "good" means (a loss function)
2. Compute how to tweak each embedding to make the loss smaller (gradients)
3. Tweak and repeat

Computing gradients by hand is tedious and error-prone. **Autograd** does it automatically:
1. You write the forward computation (query → similarity → loss)
2. PyTorch records every operation
3. Call `loss.backward()` and PyTorch replays backwards, computing gradients via chain rule

Think of it like a tape recorder. Forward pass: record. Backward pass: replay in reverse.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 2: CODE + VIZ — Autograd
# ═══════════════════════════════════════════════════════════════════════════════

# -------------------------------------------------------------------------------
# Problem 6: Convert to PyTorch and Track Gradients
# -------------------------------------------------------------------------------

Let's switch to PyTorch and make our embeddings learnable. We'll convert the corpus and query to PyTorch tensors, and tell PyTorch to track gradients for the corpus embeddings.

Your task:
1. Convert `corpus` (NumPy) to `corpus_pt` (PyTorch) with `requires_grad=True`
2. Convert `query` to `query_pt` (no gradients needed for query)
3. Verify both are PyTorch tensors

In [None]:
# --- YOUR CODE BELOW ---

# Convert to PyTorch tensors
corpus_pt = None  # TODO: torch.tensor(..., requires_grad=True, dtype=torch.float32)
query_pt = None   # TODO: torch.tensor(..., dtype=torch.float32)

# Verify
print(f"corpus_pt type: {type(corpus_pt)}")
print(f"corpus_pt requires_grad: {corpus_pt.requires_grad}")
print(f"query_pt requires_grad: {query_pt.requires_grad}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ corpus_pt = torch.tensor(corpus, requires_grad=True, dtype=torch.float32)
# │ query_pt = torch.tensor(query, dtype=torch.float32)
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert isinstance(corpus_pt, torch.Tensor), "corpus_pt should be a PyTorch tensor"
assert corpus_pt.requires_grad, "corpus_pt should have requires_grad=True"
assert not query_pt.requires_grad, "query_pt should not require gradients"
# Tests pass. Moving on.

# -------------------------------------------------------------------------------
# Problem 7: Compute a Loss and Backpropagate
# -------------------------------------------------------------------------------

Let's define a simple learning objective: we want snippet #42 to be the most similar to the query. (In reality, this signal would come from user clicks.)

Our loss will be: `-similarity[42]` (negative because we want to maximize similarity, but optimization minimizes loss)

Your task:
1. Compute similarities: `corpus_pt @ query_pt`
2. Define loss: negative of the similarity for snippet #42
3. Call `loss.backward()` to compute gradients
4. Print the gradient shape and a sample of values

In [None]:
# --- YOUR CODE BELOW ---

target_idx = 42  # We want this snippet to be most similar

# Compute similarities
sims_pt = None  # TODO: matrix-vector product

# Define loss (we want to MAXIMIZE similarity, so MINIMIZE negative similarity)
loss = None  # TODO: -sims_pt[target_idx]

# Backpropagate
# TODO: loss.backward()

# Inspect gradients
print(f"Loss value: {loss.item():.4f}")
print(f"Gradient shape: {corpus_pt.grad.shape}")
print(f"Gradient for snippet 42 (first 10 dims): {corpus_pt.grad[42, :10]}")
print(f"Gradient for snippet 0 (first 10 dims): {corpus_pt.grad[0, :10]}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ sims_pt = corpus_pt @ query_pt
# │ loss = -sims_pt[target_idx]
# │ loss.backward()
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Test
assert corpus_pt.grad is not None, "Gradients should have been computed"
assert corpus_pt.grad.shape == (1000, 384), "Gradient shape should match corpus"
# Tests pass. Moving on.

Notice something interesting:
- Snippet 42's gradient is non-zero: `-query` (the direction that increases its similarity)
- Other snippets' gradients are zero (they don't affect the loss)

If we used a more complex loss (like contrastive loss over all snippets), all gradients would be non-zero.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 1: INTUITION — GPU
# ═══════════════════════════════════════════════════════════════════════════════

## Scaling up

Our corpus has 1000 embeddings. A real codebase might have millions. The GPU is how we make this fast.

Your CPU is like a brilliant professor: one thing at a time, very well.

Your GPU is like a classroom of 1000 students: each can only do simple math, but they all work simultaneously.

Matrix multiplication (which is what `corpus @ query` does) is "embarrassingly parallel"—each output element can be computed independently. Perfect for the GPU.

Moving data to the GPU is easy: `.to(device)` or `.cuda()` / `.mps()`. The catch: all tensors in an operation must be on the same device.

# ═══════════════════════════════════════════════════════════════════════════════
# LAYER 2: CODE + VIZ — GPU
# ═══════════════════════════════════════════════════════════════════════════════

# -------------------------------------------------------------------------------
# Problem 8: Move to GPU and Benchmark
# -------------------------------------------------------------------------------

Let's move our tensors to the GPU (if available) and compare performance.

Your task:
1. Detect the best available device (cuda, mps, or cpu)
2. Move corpus and query to that device
3. Run a timing comparison

In [None]:
# --- YOUR CODE BELOW ---

# Detect best device
if torch.cuda.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cpu')

print(f"Using device: {device}")

# Move tensors to device (create fresh tensors without gradient history)
corpus_gpu = None  # TODO: torch.tensor(corpus, device=device, dtype=torch.float32)
query_gpu = None   # TODO: torch.tensor(query, device=device, dtype=torch.float32)

# Verify
print(f"corpus_gpu device: {corpus_gpu.device}")
print(f"query_gpu device: {query_gpu.device}")

In [None]:
# >>> SOLUTION (collapsed by default)
# ┌─────────────────────────────────────────────────────────────────────────────
# │ corpus_gpu = torch.tensor(corpus, device=device, dtype=torch.float32)
# │ query_gpu = torch.tensor(query, device=device, dtype=torch.float32)
# └─────────────────────────────────────────────────────────────────────────────

In [None]:
# Provided Code - Do NOT Edit
# Benchmark: CPU vs GPU (or whatever we have)
import time

# Larger corpus for meaningful benchmark
large_corpus_np = np.random.randn(100000, 384).astype(np.float32)
large_corpus_np = large_corpus_np / np.linalg.norm(large_corpus_np, axis=1, keepdims=True)

# CPU timing
large_corpus_cpu = torch.tensor(large_corpus_np)
query_cpu = torch.tensor(query, dtype=torch.float32)

start = time.time()
for _ in range(10):
    _ = large_corpus_cpu @ query_cpu
cpu_time = (time.time() - start) / 10

# GPU timing (if not cpu)
if device.type != 'cpu':
    large_corpus_device = torch.tensor(large_corpus_np, device=device)
    query_device = torch.tensor(query, device=device, dtype=torch.float32)
    
    # Warmup
    _ = large_corpus_device @ query_device
    
    start = time.time()
    for _ in range(10):
        _ = large_corpus_device @ query_device
        if device.type == 'cuda':
            torch.cuda.synchronize()
    device_time = (time.time() - start) / 10
    
    print(f"\nBenchmark (100K × 384 matrix-vector product):")
    print(f"  CPU: {cpu_time*1000:.2f} ms")
    print(f"  {device.type.upper()}: {device_time*1000:.2f} ms")
    print(f"  Speedup: {cpu_time/device_time:.1f}x")
else:
    print(f"\nCPU-only timing: {cpu_time*1000:.2f} ms for 100K searches")

# ═══════════════════════════════════════════════════════════════════════════════
# EXERCISES
# ═══════════════════════════════════════════════════════════════════════════════

These exercises extend what you've learned. Complete them to solidify your understanding.

## Exercise 1: Batch Search

Modify the search to handle multiple queries at once. Given a batch of 5 queries, compute the similarity of each query against all corpus embeddings.

Expected output shape: (5, 1000) — each row is the similarities for one query.

In [None]:
# --- YOUR CODE BELOW ---

# Create 5 random queries
queries_raw = np.random.randn(5, embedding_dim)
queries = queries_raw / np.linalg.norm(queries_raw, axis=1, keepdims=True)

# Compute all similarities at once
# Hint: (5, 384) @ (384, 1000) → (5, 1000)
# You need to transpose the corpus: corpus.T
batch_similarities = None  # TODO

print(f"Batch similarities shape: {batch_similarities.shape}")
print(f"Query 0 top match: snippet {batch_similarities[0].argmax()} (sim: {batch_similarities[0].max():.4f})")
print(f"Query 4 top match: snippet {batch_similarities[4].argmax()} (sim: {batch_similarities[4].max():.4f})")

## Exercise 2: Contrastive Loss

Implement a simple contrastive loss: given a query and a "positive" snippet (the one we want to rank highly) and "negative" snippets (all others), maximize the margin between positive and negative similarities.

Loss = max(0, margin - (sim_positive - sim_negative_mean))

This loss is 0 when the positive snippet is sufficiently more similar than the average negative.

In [None]:
# --- YOUR CODE BELOW ---

def contrastive_loss(corpus_pt, query_pt, positive_idx, margin=0.5):
    """
    Compute contrastive loss.
    
    Args:
        corpus_pt: (N, D) tensor of corpus embeddings
        query_pt: (D,) tensor of query embedding
        positive_idx: index of the "correct" snippet
        margin: desired gap between positive and negative similarities
    
    Returns:
        Scalar loss value
    """
    # TODO: Compute similarities
    # TODO: Get positive similarity (at positive_idx)
    # TODO: Compute mean of negative similarities (all except positive_idx)
    # TODO: Return max(0, margin - (positive - negative_mean))
    pass

# Test
corpus_fresh = torch.tensor(corpus, requires_grad=True, dtype=torch.float32)
query_fresh = torch.tensor(query, dtype=torch.float32)
loss = contrastive_loss(corpus_fresh, query_fresh, positive_idx=42)
print(f"Contrastive loss: {loss.item():.4f}")

# ═══════════════════════════════════════════════════════════════════════════════
# OUTRO
# ═══════════════════════════════════════════════════════════════════════════════

![The search network works](hero-outro.png)

## What Just Happened

You built the computational backbone of a semantic code search system. Along the way, you:

- Created high-dimensional embeddings as tensors
- Used broadcasting to compute 1000 dot products in one line
- Extracted top-k results using fancy indexing
- Made embeddings learnable with autograd
- Moved everything to GPU for speed

These aren't toy examples. This is exactly how production semantic search works—just with learned embeddings instead of random ones.

## What's Next

In **01b: Linear Algebra Refresh**, we'll dig into *why* these operations work geometrically:
- Vectors as arrows, matrices as transformations
- Eigenvalues and why they matter for stability
- SVD for finding the "important directions" in your embedding space

The intuition you build there will pay off when we design actual embedding models in Week 2.

→ [01b: Linear Algebra Refresh](01b-linear-algebra-core.ipynb)

# ═══════════════════════════════════════════════════════════════════════════════
# RESOURCES
# ═══════════════════════════════════════════════════════════════════════════════

**Official Documentation**:
- [PyTorch Tensors Tutorial](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)
- [NumPy Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)
- [PyTorch Autograd](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)

**Deep Dives**:
- [Jay Alammar: A Visual Intro to NumPy](https://jalammar.github.io/visual-numpy/)
- [PyTorch Internals: Autograd](http://blog.ezyang.com/2019/05/pytorch-internals/)

**Depth Notebook**:
- [01a: NumPy & PyTorch — Depth](01a-numpy-pytorch-depth.ipynb) — CS terminology, complexity analysis, mathematical formalism