In [None]:
#@title üéß Download Narration Audio & Play Introduction
import os as _os
if not _os.path.exists("/content/narration"):
    !pip install -q gdown
    import gdown
    gdown.download(id="17YjRH6IoE878oNGb8X3rv81ByYlgYkJj", output="/content/narration.zip", quiet=False)
    !unzip -q /content/narration.zip -d /content/narration
    !rm /content/narration.zip
    print(f"Loaded {len(_os.listdir('/content/narration'))} narration segments")
else:
    print("Narration audio already loaded.")

from IPython.display import Audio, display
display(Audio("/content/narration/00_intro.mp3"))

In [None]:
# üîß Setup: Run this cell first!
# Check GPU availability and install dependencies

import torch
import sys

# Check GPU
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"‚úÖ GPU available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    device = torch.device('cpu')
    print("‚ö†Ô∏è No GPU detected. Some cells may run slowly.")
    print("   Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

print(f"\nüì¶ Python {sys.version.split()[0]}")
print(f"üî• PyTorch {torch.__version__}")

# Set random seeds for reproducibility
import random
import numpy as np

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

print(f"üé≤ Random seed set to {SEED}")

%matplotlib inline

# üöÄ Controller + CMA-ES: Deciding with 867 Parameters from First Principles

*Part 3 of the Vizuara series on World Models*
*Estimated time: 45 minutes*

In [None]:
#@title üéß Listen: Why It Matters
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_why_it_matters.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 1. Why Does This Matter?

We have built the eyes (VAE) and the memory (MDN-RNN) of our World Model agent. Now we need a brain ‚Äî something that takes in what the agent sees and remembers, and decides what to do.

Here is the surprise: **the Controller is a single linear layer with only 867 parameters.**

That is not a typo. In the original World Models paper, the entire decision-making logic ‚Äî the part that drives a car around a racetrack ‚Äî fits in fewer parameters than a single layer of a typical neural network. The secret is that all the complexity lives in V and M. By the time information reaches the Controller, it has already been compressed and contextualized so well that a simple linear mapping is enough.

But here is the second surprise: we do not train this Controller with backpropagation. Instead, we use **CMA-ES** (Covariance Matrix Adaptation Evolution Strategy) ‚Äî a derivative-free optimization method inspired by natural selection. We literally *evolve* a population of controllers by testing them inside the learned dream.

By the end of this notebook, you will:
- Build a Controller from scratch
- Implement CMA-ES (evolutionary optimization) from first principles
- Evolve controllers on a simple control task
- Watch reward curves climb as controllers get better over generations

In [None]:
# üîß Setup
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import gymnasium as gym

%matplotlib inline

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

torch.manual_seed(42)
np.random.seed(42)

In [None]:
#@title üéß Listen: Building Intuition
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/02_building_intuition.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 2. Building Intuition

### Why a Linear Controller?

Think about a CEO running a company. The CEO does not personally analyze every sales report, every customer email, every financial statement. Instead, they have teams that digest this information and present summarized reports. The CEO then makes decisions based on these already-processed summaries.

The Controller is the CEO. The VAE and MDN-RNN are the teams. The latent code $z_t$ summarizes what the agent currently sees, and the LSTM hidden state $h_t$ summarizes everything the agent has experienced. Given these two compressed inputs, the Controller just needs to pick an action ‚Äî and a simple linear mapping is enough.

### Why Evolution Instead of Backpropagation?

There are two reasons the original paper uses CMA-ES instead of gradient descent:

1. **The reward signal is sparse and non-differentiable.** The Controller is optimized for total episode reward, which depends on the entire trajectory. You cannot easily compute gradients through hundreds of time steps of dream rollouts.

2. **The Controller has very few parameters.** CMA-ES scales well to problems with up to a few thousand parameters. With only 867 parameters, it is a perfect fit.

### ü§î Think About This

If you had to choose between (a) a deep neural network with 1 million parameters trained with gradient descent, or (b) a single linear layer with 867 parameters trained with evolution ‚Äî which would you guess performs better for a car racing game?

Surprisingly, the answer is (b). The key insight is that all the intelligence is in the representation, not the decision function. A good representation makes the decision trivial.

In [None]:
#@title üéß Listen: Mathematics
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/03_mathematics.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 3. The Mathematics

### The Controller Equation

The controller is a single linear layer:

$$a_t = \tanh(W_c \cdot [z_t ; h_t] + b_c)$$

This equation says: concatenate the latent state $z_t$ (32 dims) and the LSTM hidden state $h_t$ (256 dims) into a single 288-dimensional vector. Multiply by a weight matrix $W_c$ and add a bias $b_c$. Apply tanh to squash outputs to $[-1, 1]$.

Computationally: for CarRacing with 3 action dimensions (steering, gas, brake):
- $W_c$ has shape $(3, 288)$ ‚Üí 864 parameters
- $b_c$ has shape $(3,)$ ‚Üí 3 parameters
- **Total: 867 parameters**

### CMA-ES: Evolution in Parameter Space

CMA-ES maintains a multivariate Gaussian distribution over controller parameters:

$$\theta \sim \mathcal{N}(m, \sigma^2 C)$$

Where:
- $m$ is the **mean** of the distribution (the current "best guess" for the optimal parameters)
- $\sigma$ is the **step size** (how far to explore)
- $C$ is the **covariance matrix** (the shape of the search distribution)

Each generation:
1. **Sample** $\lambda$ candidate solutions from $\mathcal{N}(m, \sigma^2 C)$
2. **Evaluate** each candidate (run it in the environment, measure total reward)
3. **Rank** candidates by reward
4. **Update** $m$ toward the best candidates (weighted mean of top performers)
5. **Update** $\sigma$ and $C$ to adapt the search distribution

Computationally: think of it as "natural selection for neural network weights." Each candidate is a complete set of Controller parameters. The fittest survive and shape the next generation.

In [None]:
#@title üéß Listen: Controller Code
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/04_controller_code.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 4. Let's Build It ‚Äî Component by Component

### 4.1 The Controller

In [None]:
class Controller(nn.Module):
    """
    Linear controller: takes concatenated [z_t, h_t] and outputs an action.
    This is the simplest possible decision-maker.
    """
    def __init__(self, input_dim, action_dim):
        super().__init__()
        self.fc = nn.Linear(input_dim, action_dim)

    def forward(self, z, h):
        """
        Args:
            z: latent state, shape (latent_dim,) or (batch, latent_dim)
            h: hidden state, shape (hidden_dim,) or (batch, hidden_dim)
        Returns:
            action: shape (action_dim,) or (batch, action_dim), values in [-1, 1]
        """
        x = torch.cat([z, h], dim=-1)
        return torch.tanh(self.fc(x))

    def get_num_params(self):
        return sum(p.numel() for p in self.parameters())

    def set_params(self, flat_params):
        """Set all parameters from a flat numpy array."""
        idx = 0
        for p in self.parameters():
            n = p.numel()
            p.data = torch.tensor(
                flat_params[idx:idx+n], dtype=torch.float32
            ).reshape(p.shape)
            idx += n

    def get_params(self):
        """Get all parameters as a flat numpy array."""
        return np.concatenate([p.data.cpu().numpy().flatten()
                              for p in self.parameters()])

# Example: World Models scale (32 + 256 = 288 input, 3 actions)
controller = Controller(input_dim=288, action_dim=3)
print(f"Controller parameters: {controller.get_num_params()}")
print(f"  Weight shape: {controller.fc.weight.shape}")
print(f"  Bias shape: {controller.fc.bias.shape}")

In [None]:
#@title üéß Listen: Controller Viz
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/05_controller_viz.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

In [None]:
# üìä Visualize the controller's simplicity
fig, ax = plt.subplots(figsize=(14, 3))
ax.set_xlim(0, 14)
ax.set_ylim(0, 3)
ax.axis('off')

# z_t box
rect = plt.Rectangle((0.5, 1.0), 2, 1, facecolor='#2196F3', alpha=0.7, edgecolor='black')
ax.add_patch(rect)
ax.text(1.5, 1.5, 'z_t\n(32 dim)', ha='center', va='center', fontsize=10, color='white', fontweight='bold')

# h_t box
rect = plt.Rectangle((0.5, 0), 2, 0.8, facecolor='#FF9800', alpha=0.7, edgecolor='black')
ax.add_patch(rect)
ax.text(1.5, 0.4, 'h_t (256 dim)', ha='center', va='center', fontsize=9, color='white', fontweight='bold')

# Concat
ax.annotate('', xy=(4.0, 1.2), xytext=(2.7, 1.5),
            arrowprops=dict(arrowstyle='->', lw=1.5))
ax.annotate('', xy=(4.0, 1.0), xytext=(2.7, 0.4),
            arrowprops=dict(arrowstyle='->', lw=1.5))

# Concat box
rect = plt.Rectangle((4, 0.5), 2.5, 1.2, facecolor='#9C27B0', alpha=0.7, edgecolor='black')
ax.add_patch(rect)
ax.text(5.25, 1.1, 'Concatenate\n(288 dim)', ha='center', va='center', fontsize=10, color='white', fontweight='bold')

# Arrow to linear
ax.annotate('', xy=(7.5, 1.1), xytext=(6.7, 1.1),
            arrowprops=dict(arrowstyle='->', lw=2))

# Linear box
rect = plt.Rectangle((7.5, 0.6), 2.5, 1, facecolor='#4CAF50', alpha=0.7, edgecolor='black')
ax.add_patch(rect)
ax.text(8.75, 1.1, 'Linear + tanh\n867 params', ha='center', va='center', fontsize=10, color='white', fontweight='bold')

# Arrow to action
ax.annotate('', xy=(11, 1.1), xytext=(10.2, 1.1),
            arrowprops=dict(arrowstyle='->', lw=2))

# Action box
rect = plt.Rectangle((11, 0.7), 2, 0.8, facecolor='#E91E63', alpha=0.7, edgecolor='black')
ax.add_patch(rect)
ax.text(12, 1.1, 'Action a_t\n(3 dim)', ha='center', va='center', fontsize=10, color='white', fontweight='bold')

plt.title('The Controller: Just One Linear Layer!', fontsize=14, pad=10)
plt.tight_layout()
plt.show()

In [None]:
#@title üéß Listen: Cmaes Implementation
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/06_cmaes_implementation.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

### 4.2 CMA-ES from Scratch

Now let us implement the evolutionary strategy. We will build a simplified but fully functional CMA-ES.

In [None]:
class SimpleCMAES:
    """
    Simplified CMA-ES (Covariance Matrix Adaptation Evolution Strategy).

    This is a derivative-free optimizer inspired by natural selection:
    1. Sample a population of candidate solutions
    2. Evaluate each on the task
    3. Update the search distribution toward the best candidates
    """

    def __init__(self, num_params, population_size=64, sigma_init=0.5, elite_ratio=0.25):
        """
        Args:
            num_params: dimensionality of the search space
            population_size: number of candidates per generation
            sigma_init: initial step size (exploration radius)
            elite_ratio: fraction of top performers to keep
        """
        self.num_params = num_params
        self.pop_size = population_size
        self.sigma = sigma_init
        self.elite_size = max(1, int(population_size * elite_ratio))

        # The mean of the search distribution (our "best guess")
        self.mean = np.zeros(num_params)

        # Track history
        self.best_rewards = []
        self.mean_rewards = []

    def sample_population(self):
        """Sample a population of candidate parameter vectors."""
        noise = np.random.randn(self.pop_size, self.num_params)
        population = self.mean + self.sigma * noise
        return population

    def update(self, population, rewards):
        """
        Update the search distribution based on rewards.

        Args:
            population: array of shape (pop_size, num_params)
            rewards: array of shape (pop_size,) ‚Äî higher is better
        """
        # Rank by reward (descending)
        ranked_idx = np.argsort(rewards)[::-1]
        elite_idx = ranked_idx[:self.elite_size]

        # Update mean toward the elite (weighted by rank)
        weights = np.log(self.elite_size + 0.5) - np.log(np.arange(1, self.elite_size + 1))
        weights = weights / weights.sum()

        self.mean = np.sum(
            weights[:, np.newaxis] * population[elite_idx], axis=0
        )

        # Adapt sigma based on improvement
        self.best_rewards.append(rewards[ranked_idx[0]])
        self.mean_rewards.append(np.mean(rewards))

    def get_best(self):
        """Return the current best estimate (the mean)."""
        return self.mean.copy()

# Test
cmaes = SimpleCMAES(num_params=10, population_size=20)
pop = cmaes.sample_population()
print(f"Population shape: {pop.shape}")
print(f"Each individual is a {pop.shape[1]}-dimensional parameter vector")

In [None]:
#@title üéß Listen: Todo1 Fitness
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/07_todo1_fitness.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

### 4.3 The Fitness Evaluation

In the full World Model, we evaluate controllers inside the *dream*. For now, let us use a real environment ‚Äî CartPole ‚Äî to demonstrate the evolution process. CartPole is a classic control task where you balance a pole on a cart.

## 5. üîß Your Turn: Implement the Fitness Evaluation

The fitness function runs the controller in the environment and measures how well it performs.

In [None]:
def evaluate_controller(controller, env_name='CartPole-v1', n_episodes=3, max_steps=500):
    """
    Evaluate a controller on an environment.
    Returns the average total reward across episodes.

    Args:
        controller: Controller module with forward(z, h) -> action
        env_name: Gymnasium environment name
        n_episodes: Number of episodes to average over
        max_steps: Maximum steps per episode

    Returns:
        Average total reward across episodes
    """
    env = gym.make(env_name)
    total_rewards = []

    # ============ TODO ============
    # For each episode:
    #   Step 1: Reset the environment: obs, _ = env.reset()
    #   Step 2: Loop for max_steps:
    #     a) Convert obs to a tensor
    #     b) Use obs as z, and zeros as h (dummy hidden state):
    #        z = obs_tensor
    #        h = torch.zeros(controller.fc.in_features - len(obs))
    #     c) Get action from controller (with torch.no_grad()):
    #        action_continuous = controller(z, h)
    #     d) CartPole has discrete actions ‚Äî convert:
    #        action = 1 if action_continuous[0].item() > 0 else 0
    #     e) Step the environment: obs, reward, terminated, truncated, _ = env.step(action)
    #     f) Accumulate reward
    #     g) Break if terminated or truncated
    #   Step 3: Append episode reward to total_rewards
    # ==============================

    ???  # YOUR CODE HERE

    env.close()
    return np.mean(total_rewards)

In [None]:
#@title üéß Listen: Todo1 Followup
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/08_todo1_followup.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

In [None]:
# ‚úÖ Verification
test_controller = Controller(input_dim=4 + 12, action_dim=1)  # 4 obs + 12 dummy hidden
reward = evaluate_controller(test_controller)
assert isinstance(reward, (int, float, np.floating)), "‚ùå Should return a number"
assert reward >= 0, "‚ùå Reward should be non-negative"
print(f"‚úÖ evaluate_controller works!")
print(f"Random controller reward: {reward:.1f}")
print(f"(CartPole max is 500 ‚Äî random gets around 20-50)")

In [None]:
#@title üéß Listen: Todo2 Evolution Loop
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/09_todo2_evolution_loop.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 6. üîß Your Turn: Implement the Evolution Loop

Now it is your turn. Put the pieces together ‚Äî sample controllers, evaluate them, update the distribution.

In [None]:
def evolve_controllers(input_dim, action_dim, n_generations=50, pop_size=32):
    """
    Evolve controllers using CMA-ES on CartPole.

    Args:
        input_dim: input dimension for the controller
        action_dim: action dimension
        n_generations: number of evolutionary generations
        pop_size: population size

    Returns:
        best_controller: the evolved controller
        cmaes: the CMA-ES instance (for plotting history)
    """
    # Create a template controller to know the number of parameters
    template = Controller(input_dim, action_dim)
    num_params = template.get_num_params()
    print(f"Evolving {num_params} parameters over {n_generations} generations")
    print(f"Population size: {pop_size}")

    # ============ TODO ============
    # Step 1: Initialize CMA-ES with the right num_params and pop_size
    #         cmaes = SimpleCMAES(num_params=..., population_size=..., sigma_init=0.5)
    #
    # Step 2: For each generation:
    #   a) Sample a population: pop = cmaes.sample_population()
    #   b) Evaluate each individual:
    #      - Create a Controller(input_dim, action_dim)
    #      - Set its parameters: controller.set_params(pop[i])
    #      - Evaluate: reward = evaluate_controller(controller)
    #   c) Update CMA-ES: cmaes.update(pop, rewards)
    #   d) Print progress every 10 generations
    #
    # Step 3: After evolution, create the best controller using cmaes.get_best()
    # ==============================

    cmaes = ???     # YOUR CODE HERE

    for gen in range(n_generations):
        ???  # YOUR CODE HERE

    # Create and return the best controller
    best_controller = Controller(input_dim, action_dim)
    best_controller.set_params(cmaes.get_best())

    return best_controller, cmaes

In [None]:
#@title üéß Listen: Evolution Verification
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/10_evolution_verification.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

In [None]:
# ‚úÖ Verification: Run the evolution (this takes ~2 minutes)
print("Starting evolution on CartPole...")
print("=" * 50)

best_controller, cmaes = evolve_controllers(
    input_dim=4 + 12,  # 4 obs dims + 12 dummy hidden dims
    action_dim=1,
    n_generations=50,
    pop_size=32
)

# Test the evolved controller
final_reward = evaluate_controller(best_controller, n_episodes=10)
print(f"\n{'=' * 50}")
print(f"Final evolved controller reward: {final_reward:.1f} / 500")
assert final_reward > 100, f"‚ùå Controller should achieve > 100 reward, got {final_reward:.1f}"
print(f"‚úÖ Controller evolved successfully!")

if final_reward > 400:
    print("üéâ Excellent! Near-optimal controller!")
elif final_reward > 200:
    print("üëç Good! Try more generations for better results.")

In [None]:
#@title üéß Listen: Evolution Viz
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/11_evolution_viz.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 7. Visualizing the Evolution

In [None]:
# üìä Reward curves over generations
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))

gens = range(1, len(cmaes.best_rewards) + 1)

ax1.plot(gens, cmaes.best_rewards, 'b-', linewidth=2, label='Best in generation')
ax1.plot(gens, cmaes.mean_rewards, 'r--', linewidth=2, alpha=0.7, label='Mean of generation')
ax1.axhline(y=500, color='green', linestyle=':', alpha=0.5, label='Max possible (500)')
ax1.set_xlabel('Generation', fontsize=12)
ax1.set_ylabel('Reward', fontsize=12)
ax1.set_title('Evolution Progress', fontsize=14)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Improvement rate
improvements = [cmaes.best_rewards[i] - cmaes.best_rewards[max(0, i-1)]
                for i in range(len(cmaes.best_rewards))]
ax2.bar(gens, improvements, color=['green' if x > 0 else 'red' for x in improvements], alpha=0.7)
ax2.set_xlabel('Generation', fontsize=12)
ax2.set_ylabel('Reward Improvement', fontsize=12)
ax2.set_title('Per-Generation Improvement', fontsize=14)
ax2.axhline(y=0, color='black', linewidth=0.5)
ax2.grid(True, alpha=0.3)

plt.suptitle('CMA-ES Evolution: From Random to Skilled', fontsize=15, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
#@title üéß Listen: Watching Controller Play
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/12_watching_controller_play.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

### Watching the Evolved Controller Play

In [None]:
# üìä Visualize a few episodes
def visualize_episode(controller, env_name='CartPole-v1', max_steps=500):
    """Record an episode and plot the observations over time."""
    env = gym.make(env_name)
    obs, _ = env.reset()

    positions = []
    angles = []
    actions_taken = []
    rewards = []

    total_reward = 0
    for step in range(max_steps):
        obs_tensor = torch.tensor(obs, dtype=torch.float32)
        z = obs_tensor
        h = torch.zeros(controller.fc.in_features - len(obs))

        with torch.no_grad():
            action_continuous = controller(z, h)
        action = 1 if action_continuous[0].item() > 0 else 0

        positions.append(obs[0])
        angles.append(obs[2])
        actions_taken.append(action)

        obs, reward, terminated, truncated, _ = env.step(action)
        total_reward += reward
        rewards.append(total_reward)

        if terminated or truncated:
            break

    env.close()
    return positions, angles, actions_taken, rewards

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Run 4 episodes
for i in range(4):
    pos, ang, acts, rews = visualize_episode(best_controller)

    row, col = i // 2, i % 2
    ax = axes[row, col]
    ax2 = ax.twinx()

    ax.plot(pos, 'b-', alpha=0.7, label='Cart Position')
    ax.plot(ang, 'r-', alpha=0.7, label='Pole Angle')
    ax2.step(range(len(acts)), acts, 'g-', alpha=0.3, label='Action')

    ax.set_xlabel('Step', fontsize=10)
    ax.set_ylabel('Position / Angle', fontsize=10, color='blue')
    ax2.set_ylabel('Action', fontsize=10, color='green')
    ax.set_title(f'Episode {i+1}: {rews[-1]:.0f} reward', fontsize=12)
    ax.legend(loc='upper left', fontsize=9)
    ax.grid(True, alpha=0.3)

plt.suptitle('Evolved Controller Balancing CartPole', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
#@title üéß Listen: Comparison
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/13_comparison.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 8. Comparing Evolution vs Random

In [None]:
# üìä Compare evolved vs random controller performance
random_rewards = []
evolved_rewards = []

for _ in range(20):
    # Random controller
    rand_ctrl = Controller(input_dim=4 + 12, action_dim=1)
    random_rewards.append(evaluate_controller(rand_ctrl, n_episodes=1))

    # Evolved controller
    evolved_rewards.append(evaluate_controller(best_controller, n_episodes=1))

fig, ax = plt.subplots(figsize=(10, 5))
positions = [1, 2]
bp = ax.boxplot([random_rewards, evolved_rewards], positions=positions,
                widths=0.5, patch_artist=True)

bp['boxes'][0].set_facecolor('#FF6B6B')
bp['boxes'][1].set_facecolor('#4CAF50')

ax.set_xticks(positions)
ax.set_xticklabels(['Random Controller', 'Evolved Controller'], fontsize=12)
ax.set_ylabel('Episode Reward', fontsize=12)
ax.set_title('Random vs Evolved Controller (20 episodes each)', fontsize=14)
ax.axhline(y=500, color='gold', linestyle='--', alpha=0.5, label='Maximum possible')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print(f"Random controller:  {np.mean(random_rewards):.1f} ¬± {np.std(random_rewards):.1f}")
print(f"Evolved controller: {np.mean(evolved_rewards):.1f} ¬± {np.std(evolved_rewards):.1f}")

In [None]:
#@title üéß Listen: Final Animation
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/14_final_animation.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 9. üéØ Final Output: The Full CMA-ES Evolution Animation

In [None]:
# üìä A comprehensive view: parameter distribution evolving over time
# We will re-run a quick evolution and track parameter distributions

mini_controller = Controller(input_dim=4 + 12, action_dim=1)
num_params = mini_controller.get_num_params()

cmaes_viz = SimpleCMAES(num_params=num_params, population_size=40, sigma_init=1.0)
param_history = []  # Track population parameters at each generation
reward_history = []

for gen in range(30):
    pop = cmaes_viz.sample_population()
    param_history.append(pop.copy())

    rewards = np.zeros(len(pop))
    for i in range(len(pop)):
        ctrl = Controller(input_dim=4 + 12, action_dim=1)
        ctrl.set_params(pop[i])
        rewards[i] = evaluate_controller(ctrl, n_episodes=1)

    reward_history.append(rewards.copy())
    cmaes_viz.update(pop, rewards)

# Visualize: parameter distributions narrowing over time
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

snapshot_gens = [0, 5, 10, 15, 20, 29]
for ax, gen in zip(axes.flatten(), snapshot_gens):
    # Plot first 2 parameters of each individual
    pop = param_history[gen]
    rewards = reward_history[gen]

    scatter = ax.scatter(pop[:, 0], pop[:, 1], c=rewards, cmap='RdYlGn',
                        s=50, alpha=0.7, edgecolors='black', linewidths=0.5)
    ax.set_title(f'Generation {gen+1}\nBest reward: {rewards.max():.0f}', fontsize=11)
    ax.set_xlabel('Parameter 1', fontsize=10)
    ax.set_ylabel('Parameter 2', fontsize=10)
    ax.grid(True, alpha=0.3)
    plt.colorbar(scatter, ax=ax, label='Reward')

plt.suptitle('üéØ CMA-ES Evolution: Population Converges Toward High-Reward Regions',
             fontsize=15, y=1.02)
plt.tight_layout()
plt.show()

print("üéâ Watch how the population converges! Early generations explore widely,")
print("   later generations cluster around the best parameter values.")
print("   This is natural selection for neural network weights!")

In [None]:
#@title üéß Listen: Closing
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/15_closing.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")

## 10. Reflection and Next Steps

### ü§î Reflection Questions
1. Why can a linear controller with 867 parameters outperform deep networks with millions of parameters? What is doing the "heavy lifting"?
2. What happens to CMA-ES if the number of parameters is 1 million instead of 867? Why does this matter for the World Model design?
3. In the original paper, the Controller is evaluated inside the *dream* (the learned world model), not the real environment. What are the advantages and risks of this?

### üèÜ Optional Challenges
1. **Sigma adaptation**: Implement adaptive step size ‚Äî start with large sigma and decay it as the population converges.
2. **Full covariance**: Our CMA-ES uses isotropic (diagonal) covariance. Implement full covariance matrix adaptation for better search.
3. **LunarLander**: Try evolving a controller on LunarLander-v2 (8-dim observation, 4 discrete actions). Does it work?

### What is Next?

We have all three components: V (VAE), M (MDN-RNN), and C (Controller + CMA-ES). In the final notebook, we will wire them all together into a complete World Model pipeline ‚Äî collect data, train V and M, then evolve C inside the learned dream. The agent will learn to drive by practicing in its own imagination!

In [None]:
#@title üí¨ AI Teaching Assistant ‚Äî Click ‚ñ∂ to start
#@markdown This AI chatbot reads your notebook and can answer questions about any concept, code, or exercise.

import json as _json
import requests as _requests
from google.colab import output as _output
from IPython.display import display, HTML as _HTML, Markdown as _Markdown

# --- Read notebook content for context ---
def _get_notebook_context():
    try:
        from google.colab import _message
        nb = _message.blocking_request("get_ipynb", request="", timeout_sec=10)
        cells = nb.get("ipynb", {}).get("cells", [])
        parts = []
        for cell in cells:
            src = "".join(cell.get("source", []))
            tags = cell.get("metadata", {}).get("tags", [])
            if "chatbot" in tags:
                continue
            if src.strip():
                ct = cell.get("cell_type", "unknown")
                parts.append(f"[{ct.upper()}]\n{src}")
        return "\n\n---\n\n".join(parts)
    except Exception:
        return "Notebook content unavailable."

_NOTEBOOK_CONTEXT = _get_notebook_context()
_CHAT_HISTORY = []
_API_URL = "https://course-creator-brown.vercel.app/api/chat"

def _notebook_chat(question):
    global _CHAT_HISTORY
    try:
        resp = _requests.post(_API_URL, json={
            'question': question,
            'context': _NOTEBOOK_CONTEXT[:100000],
            'history': _CHAT_HISTORY[-10:],
        }, timeout=60)
        data = resp.json()
        answer = data.get('answer', 'Sorry, I could not generate a response.')
        _CHAT_HISTORY.append({'role': 'user', 'content': question})
        _CHAT_HISTORY.append({'role': 'assistant', 'content': answer})
        return answer
    except Exception as e:
        return f'Error connecting to teaching assistant: {str(e)}'

_output.register_callback('notebook_chat', _notebook_chat)

def ask(question):
    """Ask the AI teaching assistant a question about this notebook."""
    answer = _notebook_chat(question)
    display(_Markdown(answer))

print("\u2705 AI Teaching Assistant is ready!")
print("\U0001f4a1 Use the chat below, or call ask(\'your question\') in any cell.")

# --- Display chat widget ---
display(_HTML('''<style>
  .vc-wrap{font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:100%;border-radius:16px;overflow:hidden;box-shadow:0 4px 24px rgba(0,0,0,.12);background:#fff;border:1px solid #e5e7eb}
  .vc-hdr{background:linear-gradient(135deg,#667eea 0%,#764ba2 100%);color:#fff;padding:16px 20px;display:flex;align-items:center;gap:12px}
  .vc-avatar{width:42px;height:42px;background:rgba(255,255,255,.2);border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:22px}
  .vc-hdr h3{font-size:16px;font-weight:600;margin:0}
  .vc-hdr p{font-size:12px;opacity:.85;margin:2px 0 0}
  .vc-msgs{height:420px;overflow-y:auto;padding:16px;background:#f8f9fb;display:flex;flex-direction:column;gap:10px}
  .vc-msg{display:flex;flex-direction:column;animation:vc-fade .25s ease}
  .vc-msg.user{align-items:flex-end}
  .vc-msg.bot{align-items:flex-start}
  .vc-bbl{max-width:85%;padding:10px 14px;border-radius:16px;font-size:14px;line-height:1.55;word-wrap:break-word}
  .vc-msg.user .vc-bbl{background:linear-gradient(135deg,#667eea 0%,#764ba2 100%);color:#fff;border-bottom-right-radius:4px}
  .vc-msg.bot .vc-bbl{background:#fff;color:#1a1a2e;border:1px solid #e8e8e8;border-bottom-left-radius:4px}
  .vc-bbl code{background:rgba(0,0,0,.07);padding:2px 6px;border-radius:4px;font-size:13px;font-family:'Fira Code',monospace}
  .vc-bbl pre{background:#1e1e2e;color:#cdd6f4;padding:12px;border-radius:8px;overflow-x:auto;margin:8px 0;font-size:13px}
  .vc-bbl pre code{background:none;padding:0;color:inherit}
  .vc-bbl h3,.vc-bbl h4{margin:10px 0 4px;font-size:15px}
  .vc-bbl ul,.vc-bbl ol{margin:4px 0;padding-left:20px}
  .vc-bbl li{margin:2px 0}
  .vc-chips{display:flex;flex-wrap:wrap;gap:8px;padding:0 16px 12px;background:#f8f9fb}
  .vc-chip{background:#fff;border:1px solid #d1d5db;border-radius:20px;padding:6px 14px;font-size:12px;cursor:pointer;transition:all .15s;color:#4b5563}
  .vc-chip:hover{border-color:#667eea;color:#667eea;background:#f0f0ff}
  .vc-input{display:flex;padding:12px 16px;background:#fff;border-top:1px solid #eee;gap:8px}
  .vc-input input{flex:1;padding:10px 16px;border:2px solid #e8e8e8;border-radius:24px;font-size:14px;outline:none;transition:border-color .2s}
  .vc-input input:focus{border-color:#667eea}
  .vc-input button{background:linear-gradient(135deg,#667eea 0%,#764ba2 100%);color:#fff;border:none;border-radius:50%;width:42px;height:42px;cursor:pointer;display:flex;align-items:center;justify-content:center;font-size:18px;transition:transform .1s}
  .vc-input button:hover{transform:scale(1.05)}
  .vc-input button:disabled{opacity:.5;cursor:not-allowed;transform:none}
  .vc-typing{display:flex;gap:5px;padding:4px 0}
  .vc-typing span{width:8px;height:8px;background:#667eea;border-radius:50%;animation:vc-bounce 1.4s infinite ease-in-out}
  .vc-typing span:nth-child(2){animation-delay:.2s}
  .vc-typing span:nth-child(3){animation-delay:.4s}
  @keyframes vc-bounce{0%,80%,100%{transform:scale(0)}40%{transform:scale(1)}}
  @keyframes vc-fade{from{opacity:0;transform:translateY(8px)}to{opacity:1;transform:translateY(0)}}
  .vc-note{text-align:center;font-size:11px;color:#9ca3af;padding:8px 16px 12px;background:#fff}
</style>
<div class="vc-wrap">
  <div class="vc-hdr">
    <div class="vc-avatar">&#129302;</div>
    <div>
      <h3>Vizuara Teaching Assistant</h3>
      <p>Ask me anything about this notebook</p>
    </div>
  </div>
  <div class="vc-msgs" id="vcMsgs">
    <div class="vc-msg bot">
      <div class="vc-bbl">&#128075; Hi! I've read through this entire notebook. Ask me about any concept, code block, or exercise &mdash; I'm here to help you learn!</div>
    </div>
  </div>
  <div class="vc-chips" id="vcChips">
    <span class="vc-chip" onclick="vcAsk(this.textContent)">Explain the main concept</span>
    <span class="vc-chip" onclick="vcAsk(this.textContent)">Help with the TODO exercise</span>
    <span class="vc-chip" onclick="vcAsk(this.textContent)">Summarize what I learned</span>
  </div>
  <div class="vc-input">
    <input type="text" id="vcIn" placeholder="Ask about concepts, code, exercises..." />
    <button id="vcSend" onclick="vcSendMsg()">&#10148;</button>
  </div>
  <div class="vc-note">AI-generated &middot; Verify important information &middot; <a href="#" onclick="vcClear();return false" style="color:#667eea">Clear chat</a></div>
</div>
<script>
(function(){
  var msgs=document.getElementById('vcMsgs'),inp=document.getElementById('vcIn'),
      btn=document.getElementById('vcSend'),chips=document.getElementById('vcChips');

  function esc(s){var d=document.createElement('div');d.textContent=s;return d.innerHTML}

  function md(t){
    return t
      .replace(/```(\w*)\n([\s\S]*?)```/g,function(_,l,c){return '<pre><code>'+esc(c)+'</code></pre>'})
      .replace(/`([^`]+)`/g,'<code>$1</code>')
      .replace(/\*\*([^*]+)\*\*/g,'<strong>$1</strong>')
      .replace(/\*([^*]+)\*/g,'<em>$1</em>')
      .replace(/^#### (.+)$/gm,'<h4>$1</h4>')
      .replace(/^### (.+)$/gm,'<h4>$1</h4>')
      .replace(/^## (.+)$/gm,'<h3>$1</h3>')
      .replace(/^\d+\. (.+)$/gm,'<li>$1</li>')
      .replace(/^- (.+)$/gm,'<li>$1</li>')
      .replace(/\n\n/g,'<br><br>')
      .replace(/\n/g,'<br>');
  }

  function addMsg(text,isUser){
    var m=document.createElement('div');m.className='vc-msg '+(isUser?'user':'bot');
    var b=document.createElement('div');b.className='vc-bbl';
    b.innerHTML=isUser?esc(text):md(text);
    m.appendChild(b);msgs.appendChild(m);msgs.scrollTop=msgs.scrollHeight;
  }

  function showTyping(){
    var m=document.createElement('div');m.className='vc-msg bot';m.id='vcTyping';
    m.innerHTML='<div class="vc-bbl"><div class="vc-typing"><span></span><span></span><span></span></div></div>';
    msgs.appendChild(m);msgs.scrollTop=msgs.scrollHeight;
  }

  function hideTyping(){var e=document.getElementById('vcTyping');if(e)e.remove()}

  window.vcSendMsg=function(){
    var q=inp.value.trim();if(!q)return;
    inp.value='';chips.style.display='none';
    addMsg(q,true);showTyping();btn.disabled=true;
    google.colab.kernel.invokeFunction('notebook_chat',[q],{})
      .then(function(r){
        hideTyping();
        var a=r.data['application/json'];
        addMsg(typeof a==='string'?a:JSON.stringify(a),false);
      })
      .catch(function(){
        hideTyping();
        addMsg('Sorry, I encountered an error. Please check your internet connection and try again.',false);
      })
      .finally(function(){btn.disabled=false;inp.focus()});
  };

  window.vcAsk=function(q){inp.value=q;vcSendMsg()};
  window.vcClear=function(){
    msgs.innerHTML='<div class="vc-msg bot"><div class="vc-bbl">&#128075; Chat cleared. Ask me anything!</div></div>';
    chips.style.display='flex';
  };

  inp.addEventListener('keypress',function(e){if(e.key==='Enter')vcSendMsg()});
  inp.focus();
})();
</script>'''))