# Problem 3: Jacobian Analysis - Understanding System Sensitivity

## Learning Objectives
By the end of this problem, you will:
- Understand Jacobian matrices as complete sensitivity analysis tools
- Analyze how input perturbations propagate through neural networks
- Connect Jacobian eigenvalues to optimization stability and conditioning
- Apply Jacobian analysis to understand model robustness and interpretability

## Task Overview

1. **Jacobian Fundamentals** - From single derivatives to matrix of all partial derivatives
2. **Network Jacobian Computation** - Sensitivity analysis for "Go Dolphins!" classifier
3. **Eigenvalue Analysis** - Understanding conditioning and stability through spectra
4. **Robustness Assessment** - How sensitive are predictions to input changes?

---

## From Gradients to Jacobians

In Problems 1-2, you analyzed gradients - derivatives that show how outputs change with respect to parameters. But machine learning systems involve complex transformations with multiple inputs and outputs:

**Single Output (Gradient)**:
```
"Go Dolphins!" → [features] → [weights] → prediction
∇L(w) = [∂L/∂w₁, ∂L/∂w₂, ∂L/∂w₃]
```

**Multiple Outputs (Jacobian)**:
```
"Go Dolphins!" → [features] → [hidden activations] → [outputs]
J = [∂yᵢ/∂xⱼ] for all input-output pairs
```

**The Challenge**: How do we understand the complete sensitivity structure of neural networks? How do small changes in inputs (or parameters) affect all aspects of the network's behavior?

**The Solution**: **Jacobian matrices** - complete maps of how every output depends on every input.

## Mathematical Foundation: The Jacobian Matrix

**Definition**: For function $\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^m$, the Jacobian is:

$$\mathbf{J} = \begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}$$

**Key Properties**:
- **Size**: $m \times n$ matrix (outputs × inputs)
- **Linear approximation**: $\mathbf{f}(\mathbf{x} + \boldsymbol{\epsilon}) \approx \mathbf{f}(\mathbf{x}) + \mathbf{J}\boldsymbol{\epsilon}$
- **Chain rule**: $\mathbf{J}_{f \circ g} = \mathbf{J}_f \mathbf{J}_g$ (composition of transformations)
- **Eigenvalues**: Reveal stability and conditioning properties

**Applications in ML**:
1. **Sensitivity Analysis**: How robust are predictions to input noise?
2. **Feature Importance**: Which inputs most affect which outputs?
3. **Optimization Conditioning**: Is the loss landscape well-behaved?
4. **Adversarial Robustness**: Can small input changes fool the model?

## Why Jacobians Matter for "Go Dolphins!"

Understanding the Jacobian of our sentiment classifier reveals:
- **Input sensitivity**: How much does each word feature affect the prediction?
- **Layer interactions**: How do hidden representations depend on inputs?
- **Robustness**: Would small changes in tweet features change the classification?
- **Interpretability**: Which aspects of the input drive the decision?

Let's dive into the mathematics!

In [None]:
# Setup for Jacobian analysis
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import svd, eigvals
import seaborn as sns

# Import our utilities
import sys
sys.path.append('./utils')
from data_generators import load_sports_dataset

# Load our "Go Dolphins!" dataset
features, labels, feature_names, texts = load_sports_dataset()

print("JACOBIAN ANALYSIS OF 'GO DOLPHINS!' SENTIMENT CLASSIFIER")
print("=" * 60)
print(f"Dataset: {len(texts)} sports tweets")
print(f"Features: {feature_names}")
print()
print("Jacobian matrices we'll compute:")
print("• ∂(output)/∂(input): How predictions depend on tweet features")
print("• ∂(hidden)/∂(input): How internal representations depend on inputs")
print("• ∂(output)/∂(weights): How predictions depend on parameters")
print("• ∂(loss)/∂(weights): Complete parameter sensitivity (gradient)")
print()
print("Analysis focus: Sensitivity, robustness, and conditioning")

# Define our network architecture for Jacobian analysis
def sigmoid(x):
    """Sigmoid activation with numerical stability"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

def sigmoid_derivative(x):
    """Derivative of sigmoid"""
    s = sigmoid(x)
    return s * (1 - s)

class JacobianAnalysisNetwork:
    """Network designed for comprehensive Jacobian analysis."""
    
    def __init__(self, layer_sizes, seed=42):
        np.random.seed(seed)
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        
        # Initialize weights with careful scaling
        self.weights = []
        self.biases = []
        
        for i in range(self.num_layers):
            # Xavier initialization
            fan_in, fan_out = layer_sizes[i], layer_sizes[i+1]
            scale = np.sqrt(2.0 / (fan_in + fan_out))
            
            w = np.random.normal(0, scale, (fan_in, fan_out))
            b = np.zeros(fan_out)
            
            self.weights.append(w)
            self.biases.append(b)
        
        print(f"Initialized network: {layer_sizes}")
        print(f"Total parameters: {sum(w.size + b.size for w, b in zip(self.weights, self.biases))}")
    
    def forward_detailed(self, x):
        """Forward pass storing all intermediate values for Jacobian computation."""
        activations = [x]  # a^(0) = x
        z_values = []      # z^(l) = W^(l)^T a^(l-1) + b^(l)
        
        current_activation = x
        
        for i in range(self.num_layers):
            # Linear transformation
            z = np.dot(current_activation, self.weights[i]) + self.biases[i]
            z_values.append(z)
            
            # Nonlinear activation
            a = sigmoid(z)
            activations.append(a)
            
            current_activation = a
        
        return {
            'output': current_activation,
            'activations': activations,
            'z_values': z_values
        }

# Create a test network
net = JacobianAnalysisNetwork([3, 4, 2, 1])  # Input → Hidden → Hidden → Output

# Test forward pass
test_input = features[0]  # "Go Dolphins!" features
result = net.forward_detailed(test_input)

print(f"\nTest forward pass:")
print(f"Input: {test_input}")
print(f"Output: {result['output']:.6f}")
print(f"Hidden layer activations:")
for i, activation in enumerate(result['activations'][1:-1], 1):
    print(f"  Layer {i}: {activation}")

print("\n✅ Jacobian analysis network ready!")

## Task 1: Jacobian Fundamentals

Let's start by computing basic Jacobian matrices to understand the mathematical structure.

In [None]:
# TODO: Implement basic Jacobian computation
def compute_output_input_jacobian(network, x, h=1e-8):
    """
    Compute Jacobian matrix ∂(output)/∂(input) using finite differences.
    
    Returns:
        jacobian: Matrix of size (output_dim, input_dim)
    """
    # Get dimensions
    input_dim = len(x)
    
    # Get baseline output
    baseline = network.forward_detailed(x)['output']
    output_dim = 1 if np.isscalar(baseline) else len(baseline)
    
    # Initialize Jacobian
    jacobian = np.zeros((output_dim, input_dim))
    
    # Compute partial derivatives
    for i in range(input_dim):
        # Perturb input
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[i] += h
        x_minus[i] -= h
        
        # Compute outputs
        output_plus = network.forward_detailed(x_plus)['output']
        output_minus = network.forward_detailed(x_minus)['output']
        
        # Finite difference
        if output_dim == 1:
            jacobian[0, i] = (output_plus - output_minus) / (2 * h)
        else:
            jacobian[:, i] = (output_plus - output_minus) / (2 * h)
    
    return jacobian

def compute_hidden_input_jacobian(network, x, layer_idx, h=1e-8):
    """
    Compute Jacobian matrix ∂(hidden_layer)/∂(input).
    """
    input_dim = len(x)
    
    # Get baseline hidden activations
    baseline_result = network.forward_detailed(x)
    baseline_hidden = baseline_result['activations'][layer_idx + 1]  # +1 because activations[0] is input
    hidden_dim = len(baseline_hidden) if hasattr(baseline_hidden, '__len__') else 1
    
    # Initialize Jacobian
    jacobian = np.zeros((hidden_dim, input_dim))
    
    # Compute partial derivatives
    for i in range(input_dim):
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[i] += h
        x_minus[i] -= h
        
        hidden_plus = network.forward_detailed(x_plus)['activations'][layer_idx + 1]
        hidden_minus = network.forward_detailed(x_minus)['activations'][layer_idx + 1]
        
        if hidden_dim == 1:
            jacobian[0, i] = (hidden_plus - hidden_minus) / (2 * h)
        else:
            jacobian[:, i] = (hidden_plus - hidden_minus) / (2 * h)
    
    return jacobian

print("COMPUTING JACOBIAN MATRICES FOR 'GO DOLPHINS!'")
print("=" * 50)

# Test input: "Go Dolphins!" features
x = features[0]  # [2, 1, 1]
print(f"Input features: {x} ({feature_names})")
print(f"Tweet: '{texts[0]}'")
print()

# Compute output-input Jacobian
print("1. Computing ∂(output)/∂(input) Jacobian...")
J_output_input = compute_output_input_jacobian(net, x)
print(f"Jacobian shape: {J_output_input.shape}")
print(f"Jacobian matrix:")
print(J_output_input)
print()

print("Interpretation:")
for i, feature_name in enumerate(feature_names):
    sensitivity = J_output_input[0, i]
    print(f"  ∂(prediction)/∂({feature_name}) = {sensitivity:.6f}")
    if abs(sensitivity) > 0.1:
        print(f"    → High sensitivity! Small changes in {feature_name} strongly affect prediction")
    elif abs(sensitivity) > 0.01:
        print(f"    → Moderate sensitivity")
    else:
        print(f"    → Low sensitivity")
print()

# Compute hidden layer Jacobians
print("2. Computing ∂(hidden_layers)/∂(input) Jacobians...")
hidden_jacobians = []

for layer_idx in range(net.num_layers - 1):  # Exclude output layer
    print(f"\nHidden Layer {layer_idx + 1}:")
    J_hidden = compute_hidden_input_jacobian(net, x, layer_idx)
    hidden_jacobians.append(J_hidden)
    
    print(f"  Jacobian shape: {J_hidden.shape}")
    print(f"  Jacobian matrix:")
    print(f"  {J_hidden}")
    
    # Analyze sensitivity patterns
    max_sensitivity = np.max(np.abs(J_hidden))
    print(f"  Maximum sensitivity: {max_sensitivity:.6f}")
    
    # Find most sensitive neuron-input pairs
    if J_hidden.size > 1:
        max_idx = np.unravel_index(np.argmax(np.abs(J_hidden)), J_hidden.shape)
        neuron_idx, input_idx = max_idx
        print(f"  Most sensitive: Neuron {neuron_idx} to {feature_names[input_idx]}")
        print(f"                  (∂h{neuron_idx}/∂{feature_names[input_idx]} = {J_hidden[neuron_idx, input_idx]:.6f})")

print("\n✅ Basic Jacobian computation complete!")

In [None]:
# TODO: Analyze Jacobian properties and structure
def analyze_jacobian_properties(jacobian, name="Jacobian"):
    """
    Analyze mathematical properties of a Jacobian matrix.
    """
    print(f"\nANALYSIS: {name}")
    print("=" * (10 + len(name)))
    
    print(f"Shape: {jacobian.shape} (outputs × inputs)")
    print(f"Rank: {np.linalg.matrix_rank(jacobian)}")
    
    # Singular Value Decomposition
    if jacobian.size > 1 and min(jacobian.shape) > 1:
        U, s, Vt = svd(jacobian)
        print(f"Singular values: {s}")
        print(f"Condition number: {s[0]/s[-1]:.2e}")
        
        if s[0]/s[-1] > 1000:
            print("⚠️  High condition number - poorly conditioned transformation")
        else:
            print("✅ Well-conditioned transformation")
    
    # Frobenius norm (overall sensitivity)
    frobenius_norm = np.linalg.norm(jacobian, 'fro')
    print(f"Frobenius norm: {frobenius_norm:.6f}")
    print(f"Average sensitivity: {frobenius_norm / np.sqrt(jacobian.size):.6f}")
    
    # Element statistics
    flat_jacobian = jacobian.flatten()
    print(f"Min element: {np.min(flat_jacobian):.6f}")
    print(f"Max element: {np.max(flat_jacobian):.6f}")
    print(f"Mean absolute: {np.mean(np.abs(flat_jacobian)):.6f}")
    print(f"Std deviation: {np.std(flat_jacobian):.6f}")
    
    return {
        'frobenius_norm': frobenius_norm,
        'rank': np.linalg.matrix_rank(jacobian),
        'mean_abs': np.mean(np.abs(flat_jacobian)),
        'max_abs': np.max(np.abs(flat_jacobian))
    }

# Analyze all computed Jacobians
print("JACOBIAN PROPERTY ANALYSIS")
print("=" * 30)

# Output-input Jacobian analysis
output_props = analyze_jacobian_properties(J_output_input, "∂(output)/∂(input)")

# Hidden layer Jacobian analysis
hidden_props = []
for i, J_hidden in enumerate(hidden_jacobians):
    props = analyze_jacobian_properties(J_hidden, f"∂(hidden_layer_{i+1})/∂(input)")
    hidden_props.append(props)

# Compare sensitivity across layers
print("\nSENSITIVITY COMPARISON ACROSS LAYERS")
print("=" * 40)

print(f"{'Layer':<15} | {'Frobenius Norm':<15} | {'Max Sensitivity':<15} | {'Mean |Sensitivity|':<18}")
print("-" * 75)

print(f"{'Output':<15} | {output_props['frobenius_norm']:<15.6f} | {output_props['max_abs']:<15.6f} | {output_props['mean_abs']:<18.6f}")

for i, props in enumerate(hidden_props):
    layer_name = f"Hidden {i+1}"
    print(f"{layer_name:<15} | {props['frobenius_norm']:<15.6f} | {props['max_abs']:<15.6f} | {props['mean_abs']:<18.6f}")

print("\nKey Insights:")
print("• Frobenius norm measures overall network sensitivity")
print("• Max sensitivity shows most responsive input-output pairs")
print("• Layer comparison reveals sensitivity patterns through depth")
print("• Higher values indicate more sensitive/responsive transformations")

print("\n✅ Jacobian property analysis complete!")

## Task 2: Network Jacobian Computation

Now let's compute Jacobians efficiently using analytical methods and understand their structure across different network architectures.

In [None]:
# TODO: Implement analytical Jacobian computation
def compute_analytical_jacobians(network, x):
    """
    Compute Jacobians analytically using chain rule.
    More efficient and accurate than finite differences.
    """
    # Forward pass to get all intermediate values
    result = network.forward_detailed(x)
    activations = result['activations']
    z_values = result['z_values']
    
    jacobians = {}
    
    # 1. Compute ∂(output)/∂(input) analytically
    # This requires propagating derivatives backward through all layers
    
    # Start with identity for output layer
    current_jacobian = np.array([[1.0]])  # ∂(output)/∂(output) = 1
    
    # Work backwards through layers
    for layer_idx in reversed(range(network.num_layers)):
        # Get activation derivative for this layer
        z = z_values[layer_idx]
        activation_deriv = sigmoid_derivative(z)
        
        # Create diagonal matrix of activation derivatives
        if np.isscalar(activation_deriv):
            D_activation = np.array([[activation_deriv]])
        else:
            D_activation = np.diag(activation_deriv)
        
        # Get weight matrix for this layer
        W = network.weights[layer_idx]
        
        # Apply chain rule: J_new = J_current × D_activation × W^T
        current_jacobian = current_jacobian @ D_activation @ W.T
        
        # Store Jacobian for this layer's input (which is previous layer's output)
        if layer_idx == 0:
            jacobians['output_to_input'] = current_jacobian
        else:
            jacobians[f'output_to_layer_{layer_idx}'] = current_jacobian
    
    # 2. Compute ∂(hidden_layer_k)/∂(input) for each hidden layer
    for hidden_layer in range(network.num_layers - 1):
        # Start from hidden layer output
        hidden_size = len(activations[hidden_layer + 1])
        if hidden_size == 1:
            layer_jacobian = np.array([[1.0]])
        else:
            layer_jacobian = np.eye(hidden_size)
        
        # Work backwards to input
        for layer_idx in reversed(range(hidden_layer + 1)):
            z = z_values[layer_idx]
            activation_deriv = sigmoid_derivative(z)
            
            if np.isscalar(activation_deriv):
                D_activation = np.array([[activation_deriv]])
            else:
                D_activation = np.diag(activation_deriv)
            
            W = network.weights[layer_idx]
            layer_jacobian = layer_jacobian @ D_activation @ W.T
        
        jacobians[f'hidden_{hidden_layer+1}_to_input'] = layer_jacobian
    
    return jacobians

print("ANALYTICAL JACOBIAN COMPUTATION")
print("=" * 35)

# Compute analytical Jacobians
analytical_jacobians = compute_analytical_jacobians(net, x)

print("Computed Jacobians:")
for name, jacobian in analytical_jacobians.items():
    print(f"  {name}: shape {jacobian.shape}")

# Compare analytical vs numerical (for verification)
print("\nVERIFICATION: Analytical vs Numerical")
print("=" * 40)

# Compare output-to-input Jacobian
analytical_output_input = analytical_jacobians['output_to_input']
numerical_output_input = J_output_input

print(f"Output-to-input Jacobian comparison:")
print(f"Analytical: {analytical_output_input}")
print(f"Numerical:  {numerical_output_input}")
print(f"Max difference: {np.max(np.abs(analytical_output_input - numerical_output_input)):.2e}")

if np.max(np.abs(analytical_output_input - numerical_output_input)) < 1e-6:
    print("✅ Analytical and numerical Jacobians match perfectly!")
elif np.max(np.abs(analytical_output_input - numerical_output_input)) < 1e-4:
    print("✅ Analytical and numerical Jacobians match well")
else:
    print("⚠️  Jacobians don't match - check implementation")

print("\n✅ Analytical Jacobian computation verified!")

In [None]:
# TODO: Visualize Jacobian structure across different inputs
def visualize_jacobian_heatmaps(network, test_inputs, test_labels, test_texts):
    """
    Visualize Jacobian matrices as heatmaps for different inputs.
    """
    print("JACOBIAN VISUALIZATION ACROSS DIFFERENT INPUTS")
    print("=" * 50)
    
    # Select a few representative examples
    indices = [0, 1, 4, 7]  # Different sentiment examples
    
    fig, axes = plt.subplots(len(indices), 2, figsize=(12, 3*len(indices)))
    if len(indices) == 1:
        axes = axes.reshape(1, -1)
    
    jacobian_data = []
    
    for i, idx in enumerate(indices):
        x_test = test_inputs[idx]
        y_test = test_labels[idx]
        text = test_texts[idx]
        
        # Compute Jacobians for this input
        jacobians = compute_analytical_jacobians(network, x_test)
        
        # Store for analysis
        jacobian_data.append({
            'input': x_test,
            'label': y_test,
            'text': text,
            'jacobians': jacobians
        })
        
        # Plot output-to-input Jacobian
        J_output = jacobians['output_to_input']
        
        # Heatmap 1: Output-to-input sensitivity
        sns.heatmap(J_output, annot=True, fmt='.4f', 
                   xticklabels=feature_names, yticklabels=['Output'],
                   cmap='RdBu_r', center=0, ax=axes[i, 0])
        axes[i, 0].set_title(f'∂(output)/∂(input)\n"{text[:20]}..." (Label: {y_test})')
        
        # Heatmap 2: First hidden layer sensitivity
        if 'hidden_1_to_input' in jacobians:
            J_hidden = jacobians['hidden_1_to_input']
            
            sns.heatmap(J_hidden, annot=True, fmt='.3f',
                       xticklabels=feature_names, 
                       yticklabels=[f'H1_{j}' for j in range(J_hidden.shape[0])],
                       cmap='RdBu_r', center=0, ax=axes[i, 1])
            axes[i, 1].set_title(f'∂(hidden_1)/∂(input)\n"{text[:20]}..."')
        else:
            axes[i, 1].text(0.5, 0.5, 'No hidden layer', ha='center', va='center', 
                           transform=axes[i, 1].transAxes)
            axes[i, 1].set_title('No hidden layer')
    
    plt.tight_layout()
    plt.show()
    
    return jacobian_data

# Visualize Jacobians for different tweet examples
jacobian_examples = visualize_jacobian_heatmaps(net, features, labels, texts)

# Analyze patterns across examples
print("\nPATTERN ANALYSIS ACROSS EXAMPLES")
print("=" * 40)

for i, data in enumerate(jacobian_examples):
    text = data['text']
    label = data['label']
    J_output = data['jacobians']['output_to_input']
    
    print(f"\nExample {i+1}: \"{text}\" (Label: {label})")
    print(f"Input features: {data['input']}")
    print(f"Sensitivities: {J_output.flatten()}")
    
    # Find most influential feature
    max_sensitivity_idx = np.argmax(np.abs(J_output.flatten()))
    max_sensitivity = J_output.flatten()[max_sensitivity_idx]
    most_influential = feature_names[max_sensitivity_idx]
    
    print(f"Most influential feature: {most_influential} (sensitivity: {max_sensitivity:.6f})")
    
    # Interpret the sensitivity
    if abs(max_sensitivity) > 0.1:
        direction = "increase" if max_sensitivity > 0 else "decrease"
        print(f"  → Small increases in {most_influential} will {direction} prediction confidence")

print("\n✅ Jacobian visualization and analysis complete!")

## Task 3: Eigenvalue Analysis

Let's analyze the eigenvalues and eigenvectors of our Jacobian matrices to understand stability and conditioning properties.

In [None]:
# TODO: Perform eigenvalue analysis of Jacobian matrices
def analyze_jacobian_eigenvalues(jacobian, name="Jacobian"):
    """
    Analyze eigenvalues and eigenvectors of a Jacobian matrix.
    """
    print(f"\nEIGENVALUE ANALYSIS: {name}")
    print("=" * (20 + len(name)))
    
    # For non-square matrices, analyze J @ J^T and J^T @ J
    if jacobian.shape[0] != jacobian.shape[1]:
        print(f"Non-square matrix {jacobian.shape} - analyzing related square matrices")
        
        # J @ J^T (output space analysis)
        JJT = jacobian @ jacobian.T
        eigenvals_output, eigenvecs_output = np.linalg.eig(JJT)
        
        # J^T @ J (input space analysis)
        JTJ = jacobian.T @ jacobian
        eigenvals_input, eigenvecs_input = np.linalg.eig(JTJ)
        
        print(f"Output space eigenvalues (J @ J^T): {eigenvals_output}")
        print(f"Input space eigenvalues (J^T @ J): {eigenvals_input}")
        
        # Singular values (more fundamental for rectangular matrices)
        U, s, Vt = svd(jacobian)
        print(f"Singular values: {s}")
        print(f"Condition number: {s[0]/s[-1]:.2e}")
        
        # Principal directions
        print(f"\nPrincipal input directions (V):")
        V = Vt.T
        for i, (sv, direction) in enumerate(zip(s, V.T)):
            print(f"  Direction {i+1} (σ={sv:.4f}): {direction}")
            
            # Interpret in terms of features
            max_component_idx = np.argmax(np.abs(direction))
            max_component = direction[max_component_idx]
            dominant_feature = feature_names[max_component_idx]
            print(f"    Dominated by: {dominant_feature} (weight: {max_component:.4f})")
        
        return {
            'singular_values': s,
            'condition_number': s[0]/s[-1] if len(s) > 1 else 1.0,
            'rank': np.sum(s > 1e-10),
            'principal_directions': V
        }
    
    else:
        # Square matrix - direct eigenvalue analysis
        eigenvals, eigenvecs = np.linalg.eig(jacobian)
        
        print(f"Eigenvalues: {eigenvals}")
        print(f"Spectral radius: {np.max(np.abs(eigenvals)):.6f}")
        
        # Stability analysis
        max_eigenval = np.max(np.abs(eigenvals))
        if max_eigenval > 1.0:
            print("⚠️  Spectral radius > 1: Potentially unstable dynamics")
        else:
            print("✅ Spectral radius ≤ 1: Stable dynamics")
        
        return {
            'eigenvalues': eigenvals,
            'spectral_radius': max_eigenval,
            'eigenvectors': eigenvecs
        }

# Analyze eigenvalues for our "Go Dolphins!" example
print("EIGENVALUE ANALYSIS FOR 'GO DOLPHINS!' JACOBIANS")
print("=" * 55)

x = features[0]  # "Go Dolphins!" example
jacobians = compute_analytical_jacobians(net, x)

eigenvalue_results = {}

# Analyze each Jacobian
for name, jacobian in jacobians.items():
    result = analyze_jacobian_eigenvalues(jacobian, name)
    eigenvalue_results[name] = result

print("\n✅ Eigenvalue analysis complete!")

In [None]:
# TODO: Compare eigenvalue spectra across different network architectures
def compare_network_architectures():
    """
    Compare Jacobian eigenvalue properties across different network architectures.
    """
    print("\nCOMPARING JACOBIAN PROPERTIES ACROSS ARCHITECTURES")
    print("=" * 55)
    
    # Define different architectures to test
    architectures = [
        ([3, 1], "Single Layer"),
        ([3, 4, 1], "Two Layer"),
        ([3, 6, 3, 1], "Three Layer"),
        ([3, 8, 4, 2, 1], "Four Layer (Deep)")
    ]
    
    architecture_results = []
    
    x = features[0]  # Test on "Go Dolphins!"
    
    for arch, name in architectures:
        print(f"\n{name} Network: {arch}")
        print("-" * 30)
        
        # Create network
        test_net = JacobianAnalysisNetwork(arch, seed=42)
        
        # Compute Jacobians
        jacobians = compute_analytical_jacobians(test_net, x)
        
        # Analyze output-to-input Jacobian
        J_output_input = jacobians['output_to_input']
        
        # Get singular values for conditioning analysis
        U, s, Vt = svd(J_output_input)
        condition_number = s[0] / s[-1] if len(s) > 1 and s[-1] > 1e-12 else np.inf
        
        # Calculate sensitivity metrics
        frobenius_norm = np.linalg.norm(J_output_input, 'fro')
        max_sensitivity = np.max(np.abs(J_output_input))
        mean_sensitivity = np.mean(np.abs(J_output_input))
        
        # Store results
        result = {
            'name': name,
            'architecture': arch,
            'num_parameters': sum(w.size + b.size for w, b in zip(test_net.weights, test_net.biases)),
            'condition_number': condition_number,
            'frobenius_norm': frobenius_norm,
            'max_sensitivity': max_sensitivity,
            'mean_sensitivity': mean_sensitivity,
            'singular_values': s
        }
        architecture_results.append(result)
        
        print(f"Parameters: {result['num_parameters']}")
        print(f"Condition number: {condition_number:.2e}")
        print(f"Frobenius norm: {frobenius_norm:.6f}")
        print(f"Max sensitivity: {max_sensitivity:.6f}")
        print(f"Singular values: {s}")
    
    # Create comparison visualization
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    names = [r['name'] for r in architecture_results]
    
    # Plot 1: Condition numbers
    condition_numbers = [r['condition_number'] for r in architecture_results]
    bars1 = axes[0, 0].bar(names, condition_numbers)
    axes[0, 0].set_ylabel('Condition Number')
    axes[0, 0].set_title('Jacobian Conditioning by Architecture')
    axes[0, 0].set_yscale('log')
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Color bars by conditioning quality
    for bar, cond_num in zip(bars1, condition_numbers):
        if cond_num > 1000:
            bar.set_color('red')
        elif cond_num > 100:
            bar.set_color('orange')
        else:
            bar.set_color('green')
    
    # Plot 2: Overall sensitivity (Frobenius norm)
    frobenius_norms = [r['frobenius_norm'] for r in architecture_results]
    axes[0, 1].bar(names, frobenius_norms, color='skyblue')
    axes[0, 1].set_ylabel('Frobenius Norm')
    axes[0, 1].set_title('Overall Sensitivity by Architecture')
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Plot 3: Maximum sensitivity
    max_sensitivities = [r['max_sensitivity'] for r in architecture_results]
    axes[1, 0].bar(names, max_sensitivities, color='lightcoral')
    axes[1, 0].set_ylabel('Max Sensitivity')
    axes[1, 0].set_title('Peak Sensitivity by Architecture')
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Plot 4: Number of parameters vs sensitivity
    num_params = [r['num_parameters'] for r in architecture_results]
    axes[1, 1].scatter(num_params, frobenius_norms, s=100, alpha=0.7)
    for i, name in enumerate(names):
        axes[1, 1].annotate(name, (num_params[i], frobenius_norms[i]), 
                           xytext=(5, 5), textcoords='offset points')
    axes[1, 1].set_xlabel('Number of Parameters')
    axes[1, 1].set_ylabel('Frobenius Norm')
    axes[1, 1].set_title('Parameters vs Sensitivity')
    
    plt.tight_layout()
    plt.show()
    
    return architecture_results

arch_comparison = compare_network_architectures()

print("\n✅ Architecture comparison complete!")

## Task 4: Robustness Assessment

Finally, let's use Jacobian analysis to assess how robust our "Go Dolphins!" classifier is to input perturbations and noise.

In [None]:
# TODO: Assess model robustness using Jacobian analysis
def assess_input_robustness(network, test_inputs, test_labels, test_texts):
    """
    Assess model robustness to input perturbations using Jacobian analysis.
    """
    print("ROBUSTNESS ASSESSMENT USING JACOBIAN ANALYSIS")
    print("=" * 50)
    
    robustness_results = []
    
    for i, (x, y, text) in enumerate(zip(test_inputs, test_labels, test_texts)):
        # Compute Jacobian for this input
        jacobians = compute_analytical_jacobians(network, x)
        J = jacobians['output_to_input']
        
        # Original prediction
        original_output = network.forward_detailed(x)['output']
        
        # Robustness metrics
        frobenius_norm = np.linalg.norm(J, 'fro')
        max_sensitivity = np.max(np.abs(J))
        
        # Estimate robustness to small perturbations
        # For perturbation ε, output change ≈ ||J|| * ||ε||
        epsilon_magnitude = 0.1  # Small perturbation magnitude
        estimated_output_change = frobenius_norm * epsilon_magnitude
        
        # Test actual robustness with random perturbations
        num_tests = 50
        actual_changes = []
        
        for _ in range(num_tests):
            # Random perturbation
            perturbation = np.random.normal(0, epsilon_magnitude, size=x.shape)
            perturbed_input = x + perturbation
            
            # Perturbed output
            perturbed_output = network.forward_detailed(perturbed_input)['output']
            actual_change = abs(perturbed_output - original_output)
            actual_changes.append(actual_change)
        
        mean_actual_change = np.mean(actual_changes)
        std_actual_change = np.std(actual_changes)
        
        # Store results
        result = {
            'index': i,
            'text': text,
            'label': y,
            'input': x,
            'original_output': original_output,
            'frobenius_norm': frobenius_norm,
            'max_sensitivity': max_sensitivity,
            'estimated_change': estimated_output_change,
            'mean_actual_change': mean_actual_change,
            'std_actual_change': std_actual_change,
            'jacobian': J
        }
        robustness_results.append(result)
    
    return robustness_results

def analyze_robustness_results(robustness_results):
    """
    Analyze and visualize robustness assessment results.
    """
    print("\nROBUSTNESS ANALYSIS RESULTS")
    print("=" * 35)
    
    # Summary statistics
    frobenius_norms = [r['frobenius_norm'] for r in robustness_results]
    max_sensitivities = [r['max_sensitivity'] for r in robustness_results]
    estimated_changes = [r['estimated_change'] for r in robustness_results]
    actual_changes = [r['mean_actual_change'] for r in robustness_results]
    
    print(f"Average Frobenius norm: {np.mean(frobenius_norms):.6f} ± {np.std(frobenius_norms):.6f}")
    print(f"Average max sensitivity: {np.mean(max_sensitivities):.6f} ± {np.std(max_sensitivities):.6f}")
    print(f"Average estimated change: {np.mean(estimated_changes):.6f} ± {np.std(estimated_changes):.6f}")
    print(f"Average actual change: {np.mean(actual_changes):.6f} ± {np.std(actual_changes):.6f}")
    
    # Correlation between Jacobian-based prediction and actual robustness
    correlation = np.corrcoef(estimated_changes, actual_changes)[0, 1]
    print(f"\nCorrelation (estimated vs actual): {correlation:.4f}")
    
    if correlation > 0.7:
        print("✅ Strong correlation: Jacobian analysis is a good robustness predictor")
    elif correlation > 0.3:
        print("⚠️  Moderate correlation: Jacobian gives useful but imperfect robustness estimates")
    else:
        print("❌ Weak correlation: Jacobian-based estimates don't match actual robustness")
    
    # Detailed results for each example
    print("\nDETAILED ROBUSTNESS BY EXAMPLE:")
    print("-" * 40)
    print(f"{'Ex':<3} | {'Text':<20} | {'Label':<5} | {'Output':<8} | {'Frobenius':<10} | {'Est.Change':<10} | {'Act.Change':<10}")
    print("-" * 85)
    
    for r in robustness_results:
        text_short = r['text'][:18] + '..' if len(r['text']) > 20 else r['text']
        print(f"{r['index']:<3} | {text_short:<20} | {r['label']:<5} | {r['original_output']:<8.4f} | "
              f"{r['frobenius_norm']:<10.6f} | {r['estimated_change']:<10.6f} | {r['mean_actual_change']:<10.6f}")
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Plot 1: Estimated vs Actual robustness
    axes[0, 0].scatter(estimated_changes, actual_changes, alpha=0.7, s=80)
    axes[0, 0].plot([0, max(max(estimated_changes), max(actual_changes))], 
                   [0, max(max(estimated_changes), max(actual_changes))], 'r--', alpha=0.5)
    axes[0, 0].set_xlabel('Estimated Change (Jacobian-based)')
    axes[0, 0].set_ylabel('Actual Change (Empirical)')
    axes[0, 0].set_title(f'Robustness Prediction\n(Correlation: {correlation:.3f})')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot 2: Sensitivity by example
    indices = [r['index'] for r in robustness_results]
    axes[0, 1].bar(indices, frobenius_norms, alpha=0.7)
    axes[0, 1].set_xlabel('Example Index')
    axes[0, 1].set_ylabel('Frobenius Norm')
    axes[0, 1].set_title('Sensitivity by Example')
    
    # Plot 3: Robustness vs Original output
    outputs = [r['original_output'] for r in robustness_results]
    axes[1, 0].scatter(outputs, actual_changes, alpha=0.7, s=80)
    axes[1, 0].set_xlabel('Original Output')
    axes[1, 0].set_ylabel('Actual Change')
    axes[1, 0].set_title('Robustness vs Prediction Confidence')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Plot 4: Distribution of sensitivities
    axes[1, 1].hist(frobenius_norms, bins=10, alpha=0.7, edgecolor='black')
    axes[1, 1].axvline(np.mean(frobenius_norms), color='red', linestyle='--', 
                      label=f'Mean: {np.mean(frobenius_norms):.4f}')
    axes[1, 1].set_xlabel('Frobenius Norm')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Distribution of Sensitivities')
    axes[1, 1].legend()
    
    plt.tight_layout()
    plt.show()

# Perform robustness assessment
robustness_data = assess_input_robustness(net, features, labels, texts)
analyze_robustness_results(robustness_data)

print("\n✅ Robustness assessment complete!")

In [None]:
# TODO: Summarize insights from Jacobian analysis
def summarize_jacobian_insights():
    """
    Summarize key insights from our comprehensive Jacobian analysis.
    """
    print("\n" + "="*60)
    print("COMPREHENSIVE JACOBIAN ANALYSIS SUMMARY")
    print("="*60)
    
    print("\n🔍 WHAT WE DISCOVERED:")
    print("-" * 25)
    
    print("\n1. SENSITIVITY STRUCTURE:")
    print("   • Output sensitivity varies significantly across input features")
    print("   • Different tweets show different sensitivity patterns")
    print("   • Hidden layers reveal intermediate feature combinations")
    
    print("\n2. MATHEMATICAL PROPERTIES:")
    print("   • Jacobian eigenvalues reveal stability characteristics")
    print("   • Condition numbers indicate optimization difficulty")
    print("   • Singular values show principal sensitivity directions")
    
    print("\n3. ROBUSTNESS INSIGHTS:")
    print("   • Frobenius norm predicts sensitivity to input noise")
    print("   • Jacobian-based estimates correlate with empirical robustness")
    print("   • Some examples are inherently more robust than others")
    
    print("\n4. ARCHITECTURAL EFFECTS:")
    print("   • Deeper networks can have different conditioning properties")
    print("   • Parameter count doesn't directly correlate with sensitivity")
    print("   • Architecture choice affects Jacobian spectral properties")
    
    print("\n🧮 MATHEMATICAL SIGNIFICANCE:")
    print("-" * 30)
    
    print("\n• JACOBIAN MATRICES provide complete sensitivity maps")
    print("• EIGENVALUE ANALYSIS reveals stability and conditioning")
    print("• SINGULAR VALUE DECOMPOSITION shows principal directions")
    print("• LINEAR APPROXIMATION enables robustness prediction")
    
    print("\n🎯 PRACTICAL APPLICATIONS:")
    print("-" * 25)
    
    print("\n1. MODEL INTERPRETABILITY:")
    print("   → Which input features most affect predictions?")
    print("   → How do internal representations depend on inputs?")
    
    print("\n2. ROBUSTNESS ASSESSMENT:")
    print("   → How sensitive is the model to input noise?")
    print("   → Which examples are most vulnerable to perturbations?")
    
    print("\n3. OPTIMIZATION ANALYSIS:")
    print("   → Is the loss landscape well-conditioned?")
    print("   → What are the principal optimization directions?")
    
    print("\n4. ARCHITECTURE DESIGN:")
    print("   → How does network depth affect sensitivity?")
    print("   → What architectural choices improve conditioning?")
    
    print("\n🚀 CONNECTION TO MODERN AI:")
    print("-" * 25)
    
    print("\n• ADVERSARIAL ROBUSTNESS: Jacobians reveal vulnerability to attacks")
    print("• FEATURE ATTRIBUTION: Gradients show input importance")
    print("• NETWORK PRUNING: Sensitivity analysis guides compression")
    print("• TRANSFER LEARNING: Jacobian analysis informs fine-tuning")
    
    print("\n" + "="*60)
    print("The Jacobian is the mathematical lens through which we understand")
    print("how neural networks transform information and respond to changes.")
    print("Every modern AI advancement relies on this fundamental analysis!")
    print("="*60)

summarize_jacobian_insights()
print("\n✅ Jacobian analysis complete!")

## What's Next?

You've now mastered Jacobian analysis - a powerful mathematical tool for understanding neural network sensitivity and robustness! Here's what we discovered:

**🔑 Key Mathematical Insights:**
1. **Complete Sensitivity Maps** - Jacobians reveal how every output depends on every input
2. **Eigenvalue Analysis** - Spectral properties indicate stability and conditioning
3. **Robustness Prediction** - Linear approximations estimate response to perturbations
4. **Principal Directions** - SVD reveals most important input-output relationships

**🧮 Mathematical Tools Mastered:**
- **Matrix calculus** for multi-dimensional sensitivity analysis
- **Eigenvalue decomposition** for stability assessment
- **Singular value decomposition** for principal component analysis
- **Linear approximation theory** for robustness estimation

**🎯 Why This Matters:**
Jacobian analysis is fundamental to modern AI:
- **Interpretability**: Understanding which inputs matter most
- **Robustness**: Predicting model vulnerability to noise/attacks
- **Optimization**: Revealing conditioning and convergence properties
- **Architecture Design**: Guiding network structure choices

**🚀 Coming in Problem 4: Vector Fields**
- How do we visualize optimization dynamics across the entire landscape?
- What are vector fields and how do they reveal learning patterns?
- How do different optimizers create different flow patterns?
- What can vector field analysis tell us about convergence?

You're approaching a complete mathematical understanding of AI systems! 🐬➡️📊➡️🎯➡️⚡➡️🚀➡️🧮➡️🔗➡️📐➡️🌊