# Problem 4: Vector Fields - Visualizing Optimization Dynamics

## Learning Objectives
By the end of this problem, you will:
- Understand optimization as flow in vector fields defined by negative gradients
- Visualize and analyze different optimizer behaviors through vector field theory
- Connect mathematical concepts like divergence and curl to optimization properties
- Apply dynamical systems theory to understand convergence and stability

## Task Overview

1. **Vector Field Fundamentals** - Gradients as vector fields and flow dynamics
2. **Optimizer Vector Fields** - How different optimizers create different flows
3. **Divergence and Curl Analysis** - Mathematical properties of optimization flows
4. **Dynamical Systems Theory** - Fixed points, stability, and basins of attraction

---

## From Static Analysis to Dynamic Flows

In Problems 1-3, you analyzed static properties of the "Go Dolphins!" loss landscape:
- **Problem 1**: Loss surface topology and critical points
- **Problem 2**: Gradient flow through network layers
- **Problem 3**: Sensitivity analysis via Jacobian matrices

But optimization is fundamentally a **dynamic process** - weights evolve over time following paths through the landscape. To understand this evolution, we need **vector field theory**.

**The Vector Field Perspective**:
```
Traditional view: w(t+1) = w(t) - α∇L(w(t))
Vector field view: dw/dt = -∇L(w) (continuous gradient flow)
```

This transforms optimization from discrete updates to continuous flow in a vector field defined by the negative gradient.

## Mathematical Foundation: Vector Fields

**Definition**: A vector field $\mathbf{F}: \mathbb{R}^n \rightarrow \mathbb{R}^n$ assigns a vector to every point in space.

**For optimization**: $\mathbf{F}(\mathbf{w}) = -\nabla L(\mathbf{w})$ (negative gradient field)

**Flow lines**: Curves $\mathbf{w}(t)$ satisfying $\frac{d\mathbf{w}}{dt} = \mathbf{F}(\mathbf{w})$

**Key Properties**:
- **Divergence**: $\nabla \cdot \mathbf{F} = \sum_i \frac{\partial F_i}{\partial w_i}$ (expansion/contraction of flow)
- **Curl**: $\nabla \times \mathbf{F}$ (rotation in the flow)
- **Fixed points**: Points where $\mathbf{F}(\mathbf{w}^*) = \mathbf{0}$ (critical points)
- **Stability**: Eigenvalues of Jacobian $\mathbf{J}_F$ determine local behavior

**Different Optimizers = Different Vector Fields**:
- **SGD**: $\mathbf{F} = -\nabla L$
- **Momentum**: $\mathbf{F} = -\nabla L + \beta \mathbf{v}$ (includes velocity)
- **Adam**: $\mathbf{F} = -\frac{\hat{\mathbf{m}}}{\sqrt{\hat{\mathbf{v}}} + \epsilon}$ (adaptive field)

## Why Vector Fields Matter for "Go Dolphins!"

Understanding optimization as vector field flow reveals:
- **Convergence patterns**: Which paths lead to optimal weights?
- **Optimizer differences**: How do SGD, momentum, and Adam create different flows?
- **Stability regions**: Where is the flow stable vs chaotic?
- **Escape mechanisms**: How do optimizers escape local minima?

This connects our sentiment classifier to the deepest mathematical principles governing learning!

In [None]:
# Setup for vector field analysis
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import proj3d
import seaborn as sns
from scipy.integrate import solve_ivp

# Import our utilities
import sys
sys.path.append('./utils')
from data_generators import load_sports_dataset

# Load our "Go Dolphins!" dataset
features, labels, feature_names, texts = load_sports_dataset()

print("VECTOR FIELD ANALYSIS OF 'GO DOLPHINS!' OPTIMIZATION")
print("=" * 58)
print(f"Dataset: {len(texts)} sports tweets")
print(f"Parameter space: ℝ³ (weights for {feature_names})")
print()
print("Vector fields we'll analyze:")
print("• Gradient field: F(w) = -∇L(w)")
print("• Momentum field: F(w,v) = -∇L(w) + βv")
print("• Adam field: F(w) = -m̂/(√v̂ + ε)")
print("• RMSprop field: F(w) = -∇L(w)/√v")
print()
print("Mathematical analysis:")
print("• Flow line integration")
print("• Divergence and curl computation")
print("• Fixed point stability analysis")
print("• Basin of attraction mapping")

# Define our loss function and gradient for vector field analysis
def sigmoid(x):
    """Sigmoid with numerical stability"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

def loss_function(weights):
    """Binary cross-entropy loss for our sentiment classifier"""
    total_loss = 0.0
    for i in range(len(features)):
        z = np.dot(features[i], weights)
        p = sigmoid(z)
        p = np.clip(p, 1e-15, 1 - 1e-15)
        loss = -labels[i] * np.log(p) - (1 - labels[i]) * np.log(1 - p)
        total_loss += loss
    return total_loss / len(features)

def gradient_function(weights):
    """Analytical gradient of BCE loss"""
    total_gradient = np.zeros_like(weights)
    for i in range(len(features)):
        z = np.dot(features[i], weights)
        p = sigmoid(z)
        error = p - labels[i]
        total_gradient += error * features[i]
    return total_gradient / len(features)

def gradient_vector_field(weights):
    """Vector field F(w) = -∇L(w) for gradient descent"""
    return -gradient_function(weights)

# Test the vector field
test_point = np.array([0.3, 0.5, 0.4])
test_gradient = gradient_function(test_point)
test_field = gradient_vector_field(test_point)

print(f"\nTest evaluation at w = {test_point}:")
print(f"Loss: {loss_function(test_point):.6f}")
print(f"Gradient: {test_gradient}")
print(f"Vector field: {test_field}")
print(f"Field magnitude: {np.linalg.norm(test_field):.6f}")

print("\n✅ Vector field analysis setup complete!")

## Task 1: Vector Field Fundamentals

Let's start by visualizing the gradient vector field and understanding flow dynamics in our loss landscape.

In [None]:
# TODO: Visualize gradient vector field
def create_2d_vector_field_slice(w1_range, w2_range, fixed_w3=0.4, resolution=15):
    """
    Create 2D slice of the 3D vector field for visualization.
    """
    w1_vals = np.linspace(w1_range[0], w1_range[1], resolution)
    w2_vals = np.linspace(w2_range[0], w2_range[1], resolution)
    
    W1, W2 = np.meshgrid(w1_vals, w2_vals)
    
    # Compute vector field and loss at each point
    U = np.zeros_like(W1)  # w1 component of vector field
    V = np.zeros_like(W1)  # w2 component of vector field
    Loss = np.zeros_like(W1)
    
    for i in range(resolution):
        for j in range(resolution):
            w = np.array([W1[i, j], W2[i, j], fixed_w3])
            field_vector = gradient_vector_field(w)
            loss_val = loss_function(w)
            
            U[i, j] = field_vector[0]
            V[i, j] = field_vector[1]
            Loss[i, j] = loss_val
    
    return W1, W2, U, V, Loss

def plot_vector_field_analysis():
    """
    Create comprehensive vector field visualization.
    """
    print("GRADIENT VECTOR FIELD VISUALIZATION")
    print("=" * 40)
    
    # Generate vector field data
    W1, W2, U, V, Loss = create_2d_vector_field_slice(
        w1_range=(-1.0, 1.5), w2_range=(-1.0, 1.5), resolution=12
    )
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Plot 1: Vector field with loss contours
    ax = axes[0, 0]
    contour = ax.contourf(W1, W2, Loss, levels=20, cmap='viridis', alpha=0.6)
    ax.contour(W1, W2, Loss, levels=15, colors='white', alpha=0.8, linewidths=0.5)
    
    # Add vector field arrows
    skip = 1  # Show every arrow
    ax.quiver(W1[::skip, ::skip], W2[::skip, ::skip], 
              U[::skip, ::skip], V[::skip, ::skip],
              color='red', alpha=0.8, scale=15, width=0.003, headwidth=3)
    
    fig.colorbar(contour, ax=ax, label='Loss Value')
    ax.set_xlabel('w₁ (word_count weight)')
    ax.set_ylabel('w₂ (has_team weight)')
    ax.set_title('Gradient Vector Field\n(Red arrows = -∇L direction)')
    
    # Plot 2: Vector field magnitude
    ax = axes[0, 1]
    magnitude = np.sqrt(U**2 + V**2)
    magnitude_plot = ax.contourf(W1, W2, magnitude, levels=20, cmap='plasma')
    fig.colorbar(magnitude_plot, ax=ax, label='Field Magnitude')
    ax.set_xlabel('w₁ (word_count weight)')
    ax.set_ylabel('w₂ (has_team weight)')
    ax.set_title('Vector Field Magnitude\n|F(w)| = |∇L(w)|')
    
    # Plot 3: Streamlines (flow lines)
    ax = axes[1, 0]
    ax.contourf(W1, W2, Loss, levels=20, cmap='viridis', alpha=0.3)
    
    # Create streamlines
    ax.streamplot(W1, W2, U, V, color='red', linewidth=2, density=1.5, arrowsize=1.5)
    ax.set_xlabel('w₁ (word_count weight)')
    ax.set_ylabel('w₂ (has_team weight)')
    ax.set_title('Flow Lines (Streamlines)\nPaths of gradient descent')
    
    # Plot 4: Divergence of vector field
    ax = axes[1, 1]
    
    # Compute divergence numerically
    dU_dw1 = np.gradient(U, axis=1)
    dV_dw2 = np.gradient(V, axis=0)
    divergence = dU_dw1 + dV_dw2
    
    div_plot = ax.contourf(W1, W2, divergence, levels=20, cmap='RdBu_r', 
                          vmin=-np.max(np.abs(divergence)), vmax=np.max(np.abs(divergence)))
    fig.colorbar(div_plot, ax=ax, label='Divergence')
    ax.set_xlabel('w₁ (word_count weight)')
    ax.set_ylabel('w₂ (has_team weight)')
    ax.set_title('Divergence of Vector Field\n∇·F (expansion/contraction)')
    
    plt.tight_layout()
    plt.show()
    
    return W1, W2, U, V, Loss, magnitude, divergence

# Create vector field visualization
field_data = plot_vector_field_analysis()
W1, W2, U, V, Loss, magnitude, divergence = field_data

# Analyze vector field properties
print("\nVECTOR FIELD ANALYSIS:")
print(f"• Average field magnitude: {np.mean(magnitude):.6f}")
print(f"• Maximum field magnitude: {np.max(magnitude):.6f}")
print(f"• Minimum field magnitude: {np.min(magnitude):.6f}")
print(f"• Average divergence: {np.mean(divergence):.6f}")
print(f"• Divergence range: [{np.min(divergence):.6f}, {np.max(divergence):.6f}]")

# Find regions of high/low field magnitude
high_magnitude_threshold = np.percentile(magnitude, 90)
low_magnitude_threshold = np.percentile(magnitude, 10)

print(f"\nREGION ANALYSIS:")
print(f"• High magnitude regions (>90th percentile): Strong gradients, fast optimization")
print(f"• Low magnitude regions (<10th percentile): Weak gradients, slow optimization")
print(f"• High magnitude threshold: {high_magnitude_threshold:.6f}")
print(f"• Low magnitude threshold: {low_magnitude_threshold:.6f}")

print("\n✅ Vector field fundamentals complete!")

In [None]:
# TODO: Integrate flow lines to show optimization paths
def integrate_flow_lines(starting_points, field_function, t_span=(0, 10), method='RK45'):
    """
    Integrate flow lines in the vector field to show optimization trajectories.
    """
    def vector_field_ode(t, w):
        """ODE for the vector field: dw/dt = F(w)"""
        if len(w) == 2:
            # 2D slice - add fixed third component
            w_3d = np.array([w[0], w[1], 0.4])
            field_3d = field_function(w_3d)
            return field_3d[:2]  # Return only first two components
        else:
            return field_function(w)
    
    trajectories = []
    
    for start_point in starting_points:
        try:
            # Integrate the ODE
            sol = solve_ivp(vector_field_ode, t_span, start_point, 
                          method=method, dense_output=True, 
                          rtol=1e-8, atol=1e-10)
            
            if sol.success:
                # Evaluate at regular time intervals
                t_eval = np.linspace(t_span[0], t_span[1], 100)
                trajectory = sol.sol(t_eval).T
                
                trajectories.append({
                    'start': start_point,
                    'trajectory': trajectory,
                    'times': t_eval,
                    'success': True
                })
            else:
                trajectories.append({
                    'start': start_point,
                    'success': False,
                    'message': sol.message
                })
        except Exception as e:
            trajectories.append({
                'start': start_point,
                'success': False,
                'message': str(e)
            })
    
    return trajectories

def visualize_optimization_trajectories():
    """
    Visualize optimization trajectories as integrated flow lines.
    """
    print("\nOPTIMIZATION TRAJECTORY ANALYSIS")
    print("=" * 40)
    
    # Define starting points for different optimization runs
    starting_points = [
        np.array([-0.5, -0.5]),  # Bottom left
        np.array([1.0, 1.0]),    # Top right
        np.array([-0.5, 1.0]),   # Top left
        np.array([1.0, -0.5]),   # Bottom right
        np.array([0.0, 0.0]),    # Center
        np.array([0.5, 0.5]),    # Off-center
    ]
    
    # Integrate trajectories
    print("Integrating flow lines...")
    trajectories = integrate_flow_lines(starting_points, gradient_vector_field, t_span=(0, 5))
    
    # Visualize trajectories
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: Trajectories on loss surface
    ax = axes[0]
    
    # Background: loss contours
    ax.contourf(W1, W2, Loss, levels=20, cmap='viridis', alpha=0.6)
    ax.contour(W1, W2, Loss, levels=15, colors='white', alpha=0.8, linewidths=0.5)
    
    # Plot trajectories
    colors = ['red', 'blue', 'green', 'orange', 'purple', 'brown']
    
    for i, (traj, color) in enumerate(zip(trajectories, colors)):
        if traj['success']:
            trajectory = traj['trajectory']
            start = traj['start']
            
            # Plot trajectory line
            ax.plot(trajectory[:, 0], trajectory[:, 1], color=color, 
                   linewidth=2, alpha=0.8, label=f'Start {i+1}')
            
            # Mark starting point
            ax.plot(start[0], start[1], 'o', color=color, markersize=8, 
                   markeredgecolor='black', markeredgewidth=1)
            
            # Mark ending point
            ax.plot(trajectory[-1, 0], trajectory[-1, 1], 's', color=color, 
                   markersize=8, markeredgecolor='black', markeredgewidth=1)
            
            print(f"Trajectory {i+1}: {start} → {trajectory[-1]}")
        else:
            print(f"Trajectory {i+1}: Failed - {traj.get('message', 'Unknown error')}")
    
    ax.set_xlabel('w₁ (word_count weight)')
    ax.set_ylabel('w₂ (has_team weight)')
    ax.set_title('Optimization Trajectories\n(Circles = start, Squares = end)')
    ax.legend()
    
    # Plot 2: Loss evolution along trajectories
    ax = axes[1]
    
    for i, (traj, color) in enumerate(zip(trajectories, colors)):
        if traj['success']:
            trajectory = traj['trajectory']
            times = traj['times']
            
            # Compute loss along trajectory
            losses = []
            for point_2d in trajectory:
                point_3d = np.array([point_2d[0], point_2d[1], 0.4])
                loss_val = loss_function(point_3d)
                losses.append(loss_val)
            
            ax.plot(times, losses, color=color, linewidth=2, label=f'Start {i+1}')
    
    ax.set_xlabel('Time')
    ax.set_ylabel('Loss Value')
    ax.set_title('Loss Evolution Along Trajectories')
    ax.set_yscale('log')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return trajectories

# Visualize optimization trajectories
optimization_trajectories = visualize_optimization_trajectories()

print("\n✅ Flow line integration complete!")

## Task 2: Optimizer Vector Fields

Different optimizers create different vector fields. Let's compare SGD, momentum, and adaptive optimizers.

In [None]:
# TODO: Implement different optimizer vector fields
class OptimizerVectorField:
    """
    Base class for optimizer vector fields.
    """
    
    def __init__(self, loss_fn, grad_fn):
        self.loss_fn = loss_fn
        self.grad_fn = grad_fn
        self.reset_state()
    
    def reset_state(self):
        """Reset optimizer state"""
        pass
    
    def field(self, weights):
        """Compute vector field at given point"""
        raise NotImplementedError

class SGDField(OptimizerVectorField):
    """Standard gradient descent: F(w) = -∇L(w)"""
    
    def __init__(self, loss_fn, grad_fn, lr=0.1):
        super().__init__(loss_fn, grad_fn)
        self.lr = lr
    
    def field(self, weights):
        return -self.lr * self.grad_fn(weights)

class MomentumField(OptimizerVectorField):
    """Momentum: F(w,v) = -∇L(w) + βv"""
    
    def __init__(self, loss_fn, grad_fn, lr=0.1, momentum=0.9):
        super().__init__(loss_fn, grad_fn)
        self.lr = lr
        self.momentum = momentum
        self.velocity = None
    
    def reset_state(self):
        self.velocity = None
    
    def field(self, weights):
        grad = self.grad_fn(weights)
        
        if self.velocity is None:
            self.velocity = np.zeros_like(weights)
        
        # Update velocity: v = βv - α∇L
        self.velocity = self.momentum * self.velocity - self.lr * grad
        
        return self.velocity

class AdamField(OptimizerVectorField):
    """Adam: F(w) = -α * m̂/(√v̂ + ε)"""
    
    def __init__(self, loss_fn, grad_fn, lr=0.01, beta1=0.9, beta2=0.999, eps=1e-8):
        super().__init__(loss_fn, grad_fn)
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.eps = eps
        self.reset_state()
    
    def reset_state(self):
        self.m = None  # First moment
        self.v = None  # Second moment
        self.t = 0     # Time step
    
    def field(self, weights):
        grad = self.grad_fn(weights)
        self.t += 1
        
        if self.m is None:
            self.m = np.zeros_like(weights)
            self.v = np.zeros_like(weights)
        
        # Update moments
        self.m = self.beta1 * self.m + (1 - self.beta1) * grad
        self.v = self.beta2 * self.v + (1 - self.beta2) * grad**2
        
        # Bias correction
        m_hat = self.m / (1 - self.beta1**self.t)
        v_hat = self.v / (1 - self.beta2**self.t)
        
        # Adam update
        return -self.lr * m_hat / (np.sqrt(v_hat) + self.eps)

class RMSpropField(OptimizerVectorField):
    """RMSprop: F(w) = -α * ∇L/(√v + ε)"""
    
    def __init__(self, loss_fn, grad_fn, lr=0.01, alpha=0.99, eps=1e-8):
        super().__init__(loss_fn, grad_fn)
        self.lr = lr
        self.alpha = alpha
        self.eps = eps
        self.reset_state()
    
    def reset_state(self):
        self.v = None
    
    def field(self, weights):
        grad = self.grad_fn(weights)
        
        if self.v is None:
            self.v = np.zeros_like(weights)
        
        # Update running average of squared gradients
        self.v = self.alpha * self.v + (1 - self.alpha) * grad**2
        
        # RMSprop update
        return -self.lr * grad / (np.sqrt(self.v) + self.eps)

# Create optimizer vector fields
print("\nCREATING OPTIMIZER VECTOR FIELDS")
print("=" * 40)

optimizers = {
    'SGD': SGDField(loss_function, gradient_function, lr=0.5),
    'Momentum': MomentumField(loss_function, gradient_function, lr=0.1, momentum=0.9),
    'Adam': AdamField(loss_function, gradient_function, lr=0.1),
    'RMSprop': RMSpropField(loss_function, gradient_function, lr=0.1)
}

# Test each optimizer at a test point
test_point = np.array([0.5, 0.5, 0.4])
print(f"\nTesting optimizers at w = {test_point}:")

for name, optimizer in optimizers.items():
    optimizer.reset_state()
    field_value = optimizer.field(test_point)
    field_magnitude = np.linalg.norm(field_value)
    
    print(f"{name:<10}: F(w) = {field_value}, |F| = {field_magnitude:.6f}")

print("\n✅ Optimizer vector fields created!")

In [None]:
# TODO: Compare optimizer vector fields visually
def compare_optimizer_fields():
    """
    Visualize and compare different optimizer vector fields.
    """
    print("\nCOMPARING OPTIMIZER VECTOR FIELDS")
    print("=" * 40)
    
    # Create grid for visualization
    resolution = 10
    w1_range, w2_range = (-1.0, 1.5), (-1.0, 1.5)
    w1_vals = np.linspace(w1_range[0], w1_range[1], resolution)
    w2_vals = np.linspace(w2_range[0], w2_range[1], resolution)
    W1, W2 = np.meshgrid(w1_vals, w2_vals)
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    axes = axes.flatten()
    
    optimizer_names = ['SGD', 'Momentum', 'Adam', 'RMSprop']
    
    for idx, name in enumerate(optimizer_names):
        ax = axes[idx]
        optimizer = optimizers[name]
        optimizer.reset_state()
        
        # Compute vector field
        U = np.zeros_like(W1)
        V = np.zeros_like(W1)
        Magnitude = np.zeros_like(W1)
        
        for i in range(resolution):
            for j in range(resolution):
                w = np.array([W1[i, j], W2[i, j], 0.4])
                field = optimizer.field(w)
                
                U[i, j] = field[0]
                V[i, j] = field[1]
                Magnitude[i, j] = np.linalg.norm(field[:2])
        
        # Plot background loss contours
        ax.contourf(W1, W2, Loss, levels=15, cmap='viridis', alpha=0.3)
        
        # Plot vector field
        ax.quiver(W1, W2, U, V, Magnitude, cmap='plasma', 
                 scale=20, width=0.004, alpha=0.8)
        
        ax.set_xlabel('w₁')
        ax.set_ylabel('w₂')
        ax.set_title(f'{name} Vector Field')
        
        # Add some analysis
        avg_magnitude = np.mean(Magnitude)
        max_magnitude = np.max(Magnitude)
        print(f"{name} field - Avg magnitude: {avg_magnitude:.4f}, Max: {max_magnitude:.4f}")
    
    plt.tight_layout()
    plt.show()

def simulate_optimizer_trajectories():
    """
    Simulate and compare optimization trajectories for different optimizers.
    """
    print("\nSIMULATING OPTIMIZER TRAJECTORIES")
    print("=" * 40)
    
    # Starting point
    start_point = np.array([1.0, -0.5, 0.4])
    num_steps = 50
    
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Simulate each optimizer
    trajectories = {}
    colors = ['red', 'blue', 'green', 'orange']
    
    for i, (name, optimizer) in enumerate(optimizers.items()):
        optimizer.reset_state()
        
        # Simulate trajectory
        trajectory = [start_point.copy()]
        losses = [loss_function(start_point)]
        current_point = start_point.copy()
        
        for step in range(num_steps):
            field_vector = optimizer.field(current_point)
            current_point = current_point + field_vector
            trajectory.append(current_point.copy())
            losses.append(loss_function(current_point))
        
        trajectory = np.array(trajectory)
        trajectories[name] = {'path': trajectory, 'losses': losses}
        
        # Plot trajectory (2D projection)
        ax = axes[0]
        ax.plot(trajectory[:, 0], trajectory[:, 1], color=colors[i], 
               linewidth=2, marker='o', markersize=3, label=name, alpha=0.8)
        
        # Mark start and end
        ax.plot(start_point[0], start_point[1], 'o', color=colors[i], 
               markersize=10, markeredgecolor='black')
        ax.plot(trajectory[-1, 0], trajectory[-1, 1], 's', color=colors[i], 
               markersize=8, markeredgecolor='black')
        
        # Plot loss evolution
        ax = axes[1]
        ax.plot(range(len(losses)), losses, color=colors[i], 
               linewidth=2, label=name, alpha=0.8)
        
        print(f"{name}: Final point = {trajectory[-1]}, Final loss = {losses[-1]:.6f}")
    
    # Customize plots
    axes[0].contourf(W1, W2, Loss, levels=15, cmap='viridis', alpha=0.3)
    axes[0].set_xlabel('w₁')
    axes[0].set_ylabel('w₂')
    axes[0].set_title('Optimizer Trajectories\n(Circles = start, Squares = end)')
    axes[0].legend()
    
    axes[1].set_xlabel('Step')
    axes[1].set_ylabel('Loss')
    axes[1].set_title('Loss Evolution by Optimizer')
    axes[1].set_yscale('log')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return trajectories

# Compare optimizer fields and trajectories
compare_optimizer_fields()
optimizer_trajectories = simulate_optimizer_trajectories()

print("\n✅ Optimizer comparison complete!")

## Task 3: Divergence and Curl Analysis

Let's analyze the mathematical properties of our vector fields using divergence and curl.

In [None]:
# TODO: Compute divergence and curl of vector fields
def compute_field_properties(field_function, w1_range=(-1, 1.5), w2_range=(-1, 1.5), 
                            fixed_w3=0.4, resolution=20):
    """
    Compute divergence and curl of a 2D vector field slice.
    """
    w1_vals = np.linspace(w1_range[0], w1_range[1], resolution)
    w2_vals = np.linspace(w2_range[0], w2_range[1], resolution)
    W1, W2 = np.meshgrid(w1_vals, w2_vals)
    
    h = w1_vals[1] - w1_vals[0]  # Grid spacing
    
    # Compute vector field components
    U = np.zeros_like(W1)
    V = np.zeros_like(W1)
    
    for i in range(resolution):
        for j in range(resolution):
            w = np.array([W1[i, j], W2[i, j], fixed_w3])
            field = field_function(w)
            U[i, j] = field[0]
            V[i, j] = field[1]
    
    # Compute divergence: ∇·F = ∂U/∂w1 + ∂V/∂w2
    dU_dw1 = np.gradient(U, h, axis=1)
    dV_dw2 = np.gradient(V, h, axis=0)
    divergence = dU_dw1 + dV_dw2
    
    # Compute curl (2D): ∇×F = ∂V/∂w1 - ∂U/∂w2
    dV_dw1 = np.gradient(V, h, axis=1)
    dU_dw2 = np.gradient(U, h, axis=0)
    curl = dV_dw1 - dU_dw2
    
    return W1, W2, U, V, divergence, curl

def analyze_vector_field_properties():
    """
    Analyze divergence and curl properties of different optimizers.
    """
    print("\nVECTOR FIELD MATHEMATICAL ANALYSIS")
    print("=" * 45)
    
    fig, axes = plt.subplots(2, 4, figsize=(20, 10))
    
    optimizer_names = ['SGD', 'Momentum', 'Adam', 'RMSprop']
    properties = {}
    
    for idx, name in enumerate(optimizer_names):
        optimizer = optimizers[name]
        optimizer.reset_state()
        
        # Compute field properties
        W1, W2, U, V, div, curl = compute_field_properties(optimizer.field)
        
        properties[name] = {
            'divergence': div,
            'curl': curl,
            'field_magnitude': np.sqrt(U**2 + V**2)
        }
        
        # Plot divergence
        ax = axes[0, idx]
        div_max = np.max(np.abs(div))
        div_plot = ax.contourf(W1, W2, div, levels=20, cmap='RdBu_r', 
                              vmin=-div_max, vmax=div_max)
        ax.contour(W1, W2, div, levels=10, colors='black', alpha=0.3, linewidths=0.5)
        plt.colorbar(div_plot, ax=ax, label='Divergence')
        ax.set_title(f'{name} Divergence\n∇·F')
        ax.set_xlabel('w₁')
        ax.set_ylabel('w₂')
        
        # Plot curl
        ax = axes[1, idx]
        curl_max = np.max(np.abs(curl))
        curl_plot = ax.contourf(W1, W2, curl, levels=20, cmap='RdBu_r', 
                               vmin=-curl_max, vmax=curl_max)
        ax.contour(W1, W2, curl, levels=10, colors='black', alpha=0.3, linewidths=0.5)
        plt.colorbar(curl_plot, ax=ax, label='Curl')
        ax.set_title(f'{name} Curl\n∇×F')
        ax.set_xlabel('w₁')
        ax.set_ylabel('w₂')
        
        # Print statistics
        print(f"\n{name} Field Properties:")
        print(f"  Divergence - Mean: {np.mean(div):.6f}, Std: {np.std(div):.6f}")
        print(f"  Curl - Mean: {np.mean(curl):.6f}, Std: {np.std(curl):.6f}")
        print(f"  Max |divergence|: {div_max:.6f}")
        print(f"  Max |curl|: {curl_max:.6f}")
        
        # Interpret the results
        if np.mean(div) > 0.01:
            print(f"  → Predominantly expanding flow (sources)")
        elif np.mean(div) < -0.01:
            print(f"  → Predominantly contracting flow (sinks)")
        else:
            print(f"  → Nearly incompressible flow")
        
        if np.std(curl) > 0.01:
            print(f"  → Significant rotational components")
        else:
            print(f"  → Mostly irrotational flow")
    
    plt.tight_layout()
    plt.show()
    
    return properties

# Analyze vector field properties
field_properties = analyze_vector_field_properties()

print("\n" + "="*50)
print("MATHEMATICAL INTERPRETATION")
print("="*50)
print("\n• DIVERGENCE (∇·F):")
print("  - Positive: Flow expansion (sources)")
print("  - Negative: Flow contraction (sinks)")
print("  - Zero: Incompressible flow")
print("\n• CURL (∇×F):")
print("  - Positive: Counterclockwise rotation")
print("  - Negative: Clockwise rotation")
print("  - Zero: Irrotational flow")
print("\n• OPTIMIZATION IMPLICATIONS:")
print("  - High divergence: Flow focusing/dispersing")
print("  - High curl: Potential oscillatory behavior")
print("  - Low curl: Direct convergence to minima")

print("\n✅ Divergence and curl analysis complete!")

## Task 4: Dynamical Systems Theory

Finally, let's apply dynamical systems theory to understand fixed points, stability, and long-term behavior.

In [None]:
# TODO: Apply dynamical systems theory to optimization
def analyze_fixed_points_stability():
    """
    Analyze fixed points and their stability using dynamical systems theory.
    """
    print("\nDYNAMICAL SYSTEMS ANALYSIS")
    print("=" * 35)
    
    # For gradient descent, fixed points are where ∇L(w) = 0
    # Let's find approximate fixed points
    from scipy.optimize import minimize
    
    # Find critical points (fixed points of gradient field)
    starting_points = [
        np.array([0.0, 0.0, 0.0]),
        np.array([0.5, 0.5, 0.5]),
        np.array([1.0, 1.0, 1.0]),
        np.array([-0.5, -0.5, -0.5])
    ]
    
    fixed_points = []
    
    for start in starting_points:
        try:
            result = minimize(loss_function, start, jac=gradient_function, 
                            method='BFGS', options={'gtol': 1e-8})
            
            if result.success:
                candidate = result.x
                grad_norm = np.linalg.norm(gradient_function(candidate))
                
                if grad_norm < 1e-6:
                    # Check if we already found this point
                    is_new = True
                    for existing in fixed_points:
                        if np.linalg.norm(candidate - existing['point']) < 1e-4:
                            is_new = False
                            break
                    
                    if is_new:
                        fixed_points.append({
                            'point': candidate,
                            'loss': loss_function(candidate),
                            'gradient_norm': grad_norm
                        })
        except:
            continue
    
    print(f"Found {len(fixed_points)} fixed point(s):")
    
    for i, fp in enumerate(fixed_points):
        point = fp['point']
        loss_val = fp['loss']
        
        print(f"\nFixed Point {i+1}:")
        print(f"  Location: {point}")
        print(f"  Loss: {loss_val:.8f}")
        print(f"  Gradient norm: {fp['gradient_norm']:.2e}")
        
        # Analyze stability using Hessian eigenvalues
        hessian = compute_hessian_numerical(point)
        eigenvalues = np.linalg.eigvals(hessian)
        
        print(f"  Hessian eigenvalues: {eigenvalues}")
        
        # Stability classification
        if np.all(eigenvalues > 1e-8):
            stability = "Stable (local minimum)"
        elif np.all(eigenvalues < -1e-8):
            stability = "Unstable (local maximum)"
        elif np.any(eigenvalues > 1e-8) and np.any(eigenvalues < -1e-8):
            stability = "Saddle point (unstable)"
        else:
            stability = "Marginal stability"
        
        print(f"  Stability: {stability}")
    
    return fixed_points

def compute_hessian_numerical(weights, h=1e-6):
    """Compute Hessian matrix numerically"""
    n = len(weights)
    hessian = np.zeros((n, n))
    
    for i in range(n):
        for j in range(n):
            # Compute ∂²f/∂wi∂wj using finite differences
            w_pp = weights.copy()
            w_pm = weights.copy()
            w_mp = weights.copy()
            w_mm = weights.copy()
            
            w_pp[i] += h
            w_pp[j] += h
            
            w_pm[i] += h
            w_pm[j] -= h
            
            w_mp[i] -= h
            w_mp[j] += h
            
            w_mm[i] -= h
            w_mm[j] -= h
            
            hessian[i, j] = (loss_function(w_pp) - loss_function(w_pm) - 
                           loss_function(w_mp) + loss_function(w_mm)) / (4 * h**2)
    
    return hessian

def analyze_basins_of_attraction(fixed_points):
    """
    Analyze basins of attraction for different fixed points.
    """
    if not fixed_points:
        print("No fixed points found for basin analysis.")
        return
    
    print("\nBASIN OF ATTRACTION ANALYSIS")
    print("=" * 35)
    
    # Create grid of starting points
    resolution = 15
    w1_range, w2_range = (-1.5, 1.5), (-1.5, 1.5)
    w1_vals = np.linspace(w1_range[0], w1_range[1], resolution)
    w2_vals = np.linspace(w2_range[0], w2_range[1], resolution)
    W1, W2 = np.meshgrid(w1_vals, w2_vals)
    
    # For each starting point, see which fixed point it converges to
    basin_map = np.zeros_like(W1, dtype=int)
    convergence_time = np.zeros_like(W1)
    
    print("Computing basins of attraction...")
    
    for i in range(resolution):
        for j in range(resolution):
            start_2d = np.array([W1[i, j], W2[i, j]])
            start_3d = np.array([W1[i, j], W2[i, j], 0.4])
            
            # Simulate gradient descent
            current_point = start_3d.copy()
            max_steps = 100
            lr = 0.1
            tolerance = 1e-4
            
            for step in range(max_steps):
                grad = gradient_function(current_point)
                if np.linalg.norm(grad) < tolerance:
                    break
                current_point -= lr * grad
            
            # Find which fixed point this converged to
            min_distance = float('inf')
            closest_fp = 0
            
            for fp_idx, fp in enumerate(fixed_points):
                distance = np.linalg.norm(current_point - fp['point'])
                if distance < min_distance:
                    min_distance = distance
                    closest_fp = fp_idx
            
            basin_map[i, j] = closest_fp
            convergence_time[i, j] = step
    
    # Visualize basins
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: Basin map
    ax = axes[0]
    basin_plot = ax.contourf(W1, W2, basin_map, levels=len(fixed_points), 
                            cmap='tab10', alpha=0.7)
    
    # Mark fixed points
    for i, fp in enumerate(fixed_points):
        ax.plot(fp['point'][0], fp['point'][1], 'o', color='black', 
               markersize=12, markeredgewidth=2, markerfacecolor='white')
        ax.text(fp['point'][0], fp['point'][1], str(i+1), 
               ha='center', va='center', fontweight='bold')
    
    ax.set_xlabel('w₁')
    ax.set_ylabel('w₂')
    ax.set_title('Basins of Attraction\n(Numbers = Fixed Points)')
    
    # Plot 2: Convergence time
    ax = axes[1]
    time_plot = ax.contourf(W1, W2, convergence_time, levels=20, cmap='viridis')
    plt.colorbar(time_plot, ax=ax, label='Steps to Converge')
    ax.set_xlabel('w₁')
    ax.set_ylabel('w₂')
    ax.set_title('Convergence Time Map')
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nBasin Analysis Summary:")
    for i, fp in enumerate(fixed_points):
        basin_size = np.sum(basin_map == i)
        basin_percentage = basin_size / basin_map.size * 100
        avg_convergence = np.mean(convergence_time[basin_map == i])
        
        print(f"Fixed Point {i+1}: {basin_percentage:.1f}% of space, "
              f"avg convergence: {avg_convergence:.1f} steps")

# Perform dynamical systems analysis
fixed_points = analyze_fixed_points_stability()
analyze_basins_of_attraction(fixed_points)

print("\n✅ Dynamical systems analysis complete!")

In [None]:
# TODO: Summarize vector field insights
def summarize_vector_field_insights():
    """
    Summarize key insights from vector field analysis.
    """
    print("\n" + "="*60)
    print("VECTOR FIELD ANALYSIS SUMMARY")
    print("="*60)
    
    print("\n🌊 WHAT WE DISCOVERED:")
    print("-" * 25)
    
    print("\n1. OPTIMIZATION AS FLOW:")
    print("   • Gradient descent = flow in vector field F(w) = -∇L(w)")
    print("   • Different optimizers create different vector fields")
    print("   • Flow lines reveal optimization paths and convergence")
    
    print("\n2. OPTIMIZER DIFFERENCES:")
    print("   • SGD: Direct gradient flow, can be slow")
    print("   • Momentum: Smoothed flow with inertia, faster convergence")
    print("   • Adam: Adaptive flow scaling, good for varied landscapes")
    print("   • RMSprop: Normalized flow, handles different scales")
    
    print("\n3. MATHEMATICAL PROPERTIES:")
    print("   • Divergence reveals flow expansion/contraction")
    print("   • Curl indicates rotational behavior")
    print("   • Fixed points correspond to critical points")
    print("   • Eigenvalues determine local stability")
    
    print("\n4. DYNAMICAL BEHAVIOR:")
    print("   • Basins of attraction show convergence regions")
    print("   • Stability analysis predicts long-term behavior")
    print("   • Flow patterns explain optimization efficiency")
    
    print("\n🧮 MATHEMATICAL SIGNIFICANCE:")
    print("-" * 30)
    
    print("\n• VECTOR FIELDS unify optimization and dynamical systems")
    print("• FLOW INTEGRATION shows actual optimization paths")
    print("• DIVERGENCE/CURL reveal geometric properties")
    print("• STABILITY THEORY predicts convergence behavior")
    
    print("\n🎯 PRACTICAL IMPLICATIONS:")
    print("-" * 25)
    
    print("\n1. OPTIMIZER SELECTION:")
    print("   → Vector field analysis reveals why certain optimizers work better")
    print("   → Flow patterns predict convergence speed and stability")
    
    print("\n2. HYPERPARAMETER TUNING:")
    print("   → Learning rate affects flow magnitude and stability")
    print("   → Momentum parameters change flow smoothness")
    
    print("\n3. ARCHITECTURE DESIGN:")
    print("   → Loss landscape topology affects vector field properties")
    print("   → Network design influences optimization dynamics")
    
    print("\n4. CONVERGENCE PREDICTION:")
    print("   → Basin analysis shows which initializations lead where")
    print("   → Stability analysis predicts robustness to perturbations")
    
    print("\n🚀 CONNECTION TO MODERN AI:")
    print("-" * 25)
    
    print("\n• NEURAL ODE: Explicit continuous optimization flows")
    print("• ADVERSARIAL TRAINING: Understanding optimization conflicts")
    print("• META-LEARNING: Optimizing the optimization process")
    print("• DISTRIBUTED OPTIMIZATION: Coordinated multi-agent flows")
    
    print("\n" + "="*60)
    print("Vector field theory reveals optimization as a beautiful")
    print("mathematical flow process, connecting AI to the deepest")
    print("principles of dynamical systems and differential geometry!")
    print("="*60)

summarize_vector_field_insights()
print("\n✅ Vector field analysis complete!")

## What's Next?

You've now mastered vector field analysis - the mathematical framework that reveals optimization as dynamic flow! Here's what we discovered:

**🔑 Key Mathematical Insights:**
1. **Optimization as Flow** - Gradient descent is flow in the vector field F(w) = -∇L(w)
2. **Optimizer Differences** - Each optimizer creates a unique vector field with distinct properties
3. **Divergence and Curl** - Reveal expansion/contraction and rotational behavior
4. **Dynamical Systems Theory** - Fixed points, stability, and basins of attraction

**🧮 Mathematical Tools Mastered:**
- **Vector field theory** for continuous optimization analysis
- **Flow line integration** for trajectory prediction
- **Divergence and curl** for geometric property analysis
- **Stability theory** for convergence prediction

**🎯 Why This Matters:**
Vector field analysis transforms our understanding of optimization:
- **Optimizer Selection**: Understanding why certain optimizers work better
- **Hyperparameter Tuning**: Predicting the effects of learning rates and momentum
- **Convergence Prediction**: Knowing which initializations lead to which solutions
- **Robustness Analysis**: Understanding stability to perturbations

**🚀 Coming in Problem 5: Optimization Landscapes**
- How do we analyze the complete topology of loss landscapes?
- What are the mathematical principles governing convergence guarantees?
- How do modern optimization techniques escape local minima?
- What can advanced analysis tell us about deep learning theory?

You're about to complete the deepest mathematical journey through AI! 🐬➡️📊➡️🎯➡️⚡➡️🚀➡️🧮➡️🔗➡️📐➡️🌊➡️🏔️