# Day 6: The First Law of Complexodynamics

> **Interactive Tutorial: Why Complexity Increases Over Time**

**Paper:** Christoph Adami (2011) - [The First Law of Complexodynamics](https://arxiv.org/abs/0912.0368)

---

## üéØ Learning Objectives

By the end of this notebook, you'll understand:
1. Why complexity increases (it's physics, not random!)
2. Information equilibration: I_E = I_L at steady state
3. The fidelity-complexity trade-off
4. Why different organisms have different complexity ceilings
5. How to measure and simulate complexity evolution

**Time:** ~90 minutes

In [None]:
# Setup: Install and import
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter

# Import our implementations
from implementation import (
    shannon_complexity,
    InformationFlow,
    ComplexityTrajectory,
    EvolutionarySimulator,
    channel_capacity_simple,
    fidelity_complexity_curve,
    compare_organisms,
    fitness_counting_ones
)

# Configure plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Setup complete!")

---

## Part 1: Shannon Complexity

### What is Complexity?

In this paper, **complexity** = **Shannon entropy** of a sequence:

$$
C = -\sum_{i} p_i \log_2 p_i \quad \text{(bits per symbol)}
$$

**Intuition:** How unpredictable is each symbol?
- All same letter ‚Üí 0 bits (totally predictable)
- All letters equal frequency ‚Üí 2 bits for DNA (maximum surprise)

In [None]:
# Example 1: Uniform sequence (low complexity)
seq_uniform = "AAAAAAAAAA"
C_uniform = shannon_complexity(seq_uniform)
print(f"Sequence: {seq_uniform}")
print(f"Complexity: {C_uniform:.4f} bits/site")
print(f"‚Üí Completely predictable! (minimum complexity)\n")

# Example 2: Maximum diversity (high complexity)
seq_diverse = "ACGTACGTACGT"
C_diverse = shannon_complexity(seq_diverse)
print(f"Sequence: {seq_diverse}")
print(f"Complexity: {C_diverse:.4f} bits/site")
print(f"‚Üí Maximum diversity! (maximum complexity for DNA)\n")

# Example 3: Intermediate
seq_intermediate = "AAACCGGGTTTT"
C_intermediate = shannon_complexity(seq_intermediate)
print(f"Sequence: {seq_intermediate}")
print(f"Complexity: {C_intermediate:.4f} bits/site")
print(f"‚Üí Moderate diversity (intermediate complexity)")

### üß™ Exercise 1: Your Turn!

Create sequences with different complexities:

In [None]:
# TODO: Create a sequence with complexity ‚âà 1.5 bits/site
# Hint: Use mostly A's and C's, fewer G's and T's

my_sequence = "AACCAACC"  # Modify this!
my_complexity = shannon_complexity(my_sequence)
print(f"Your sequence complexity: {my_complexity:.4f} bits/site")
print(f"Target: 1.5 bits/site")
print(f"Difference: {abs(my_complexity - 1.5):.4f}")

---

## Part 2: The First Law

### Information Equilibration

**The core equation:**

$$
\frac{dC}{dt} = I_E - I_L
$$

Where:
- $I_E$ = Information **gain** from environment (selection)
- $I_L$ = Information **loss** from mutations (copying errors)

**At equilibrium:** $I_E = I_L \implies \frac{dC}{dt} = 0$ (complexity plateaus!)

In [None]:
# Create an information flow calculator
info_flow = InformationFlow(
    mutation_rate=1e-6,
    genome_length=1000000,
    selection_strength=0.01
)

# Calculate information gain and loss
I_E = info_flow.calculate_information_gain()
I_L = info_flow.calculate_information_loss()

print("Information Flow Analysis:")
print(f"  I_E (gain from selection): {I_E:.6f} bits/generation")
print(f"  I_L (loss from mutation):  {I_L:.6f} bits/generation")
print(f"  Net flow (dC/dt):          {I_E - I_L:.6f} bits/generation")
print()
if I_E > I_L:
    print("‚úÖ Complexity is INCREASING (I_E > I_L)")
elif I_E < I_L:
    print("‚ùå Complexity is DECREASING (I_E < I_L) - Error catastrophe!")
else:
    print("‚öñÔ∏è EQUILIBRIUM reached (I_E = I_L)")

### Visualizing the Balance

In [None]:
# Sweep selection strength
selection_values = np.linspace(0, 0.05, 100)
I_E_values = []
I_L_constant = info_flow.calculate_information_loss()

for sel in selection_values:
    temp_flow = InformationFlow(1e-6, 1000000, sel)
    I_E_values.append(temp_flow.calculate_information_gain())

# Plot
plt.figure(figsize=(10, 6))
plt.plot(selection_values, I_E_values, label='I_E (gain)', linewidth=3, color='green')
plt.axhline(I_L_constant, label='I_L (loss)', linewidth=3, color='red', linestyle='--')
plt.fill_between(selection_values, 0, I_E_values, where=(np.array(I_E_values) < I_L_constant), 
                 alpha=0.3, color='red', label='Complexity decreasing')
plt.fill_between(selection_values, 0, I_E_values, where=(np.array(I_E_values) >= I_L_constant), 
                 alpha=0.3, color='green', label='Complexity increasing')

# Find equilibrium point
idx_eq = np.argmin(np.abs(np.array(I_E_values) - I_L_constant))
plt.scatter([selection_values[idx_eq]], [I_E_values[idx_eq]], s=200, color='blue', 
            edgecolors='black', linewidths=2, zorder=5, label='Equilibrium')

plt.xlabel('Selection Strength', fontsize=12)
plt.ylabel('Information Flow (bits/generation)', fontsize=12)
plt.title('Information Equilibration: I_E = I_L', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## Part 3: The Fidelity-Complexity Trade-off

### Channel Capacity

**Maximum sustainable complexity:**

$$
C_{\max} = -\log_2(\mu \cdot L)
$$

Where:
- $\mu$ = mutation rate (per base per generation)
- $L$ = genome length

**Key insight:** Lower mutation rate ‚Üí Higher ceiling!

In [None]:
# Compare organisms
organisms = {
    'RNA Virus': {'mu': 1e-4, 'L': 1e4, 'color': 'red'},
    'Bacteria': {'mu': 1e-6, 'L': 1e6, 'color': 'blue'},
    'Insect': {'mu': 1e-8, 'L': 1e8, 'color': 'green'},
    'Human': {'mu': 1e-9, 'L': 1e9, 'color': 'purple'},
}

print("Complexity Ceilings by Organism:")
print("="*60)
print(f"{'Organism':<15} {'Œº (mutation)':<15} {'L (genome)':<15} {'C_max (bits)':<15}")
print("-"*60)

for name, params in organisms.items():
    C_max = channel_capacity_simple(params['mu'], params['L'])
    print(f"{name:<15} {params['mu']:<15.0e} {params['L']:<15.0e} {C_max:<15.2f}")

print("="*60)
print("\nüí° Key insight: Better copying (lower Œº) ‚Üí More complexity allowed!")

### Plotting the Trade-off Curve

In [None]:
# Sweep mutation rates
mutation_rates = np.logspace(-10, -3, 100)
genome_length = 1e6

C_max_values = fidelity_complexity_curve(mutation_rates, genome_length)

# Plot
plt.figure(figsize=(12, 6))
plt.loglog(mutation_rates, C_max_values, linewidth=3, color='darkblue', label='C_max = -log(Œº¬∑L)')

# Mark organisms
for name, params in organisms.items():
    C = channel_capacity_simple(params['mu'], genome_length)
    plt.scatter([params['mu']], [C], s=300, color=params['color'], 
                edgecolors='black', linewidths=2, label=name, zorder=5)

plt.xlabel('Mutation Rate Œº (per base per generation)', fontsize=12)
plt.ylabel('Maximum Complexity C_max (bits)', fontsize=12)
plt.title('Fidelity-Complexity Trade-off: Better Copying ‚Üí More Complexity', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3, which='both')
plt.tight_layout()
plt.show()

---

## Part 4: Complexity Trajectories

### Time Evolution

**Solution to the differential equation:**

$$
C(t) = C_{\max}\left(1 - e^{-t/\tau}\right)
$$

This is **exponential saturation** - fast growth initially, then plateau.

In [None]:
# Simulate complexity evolution for bacteria
print("Simulating bacterial evolution...")
print("This may take a minute...\n")

sim = EvolutionarySimulator(
    population_size=500,
    genome_length=1000,
    mutation_rate=1e-4,
    fitness_function=fitness_counting_ones,
    alphabet_size=4  # DNA
)

# Track complexity over time
generations = 1000
complexity_history = []

for gen in range(generations):
    sim.step()
    if gen % 10 == 0:
        complexity_history.append(sim.get_complexity())

time_points = np.arange(0, generations, 10)

print(f"‚úÖ Simulation complete!")
print(f"Initial complexity: {complexity_history[0]:.4f} bits/site")
print(f"Final complexity:   {complexity_history[-1]:.4f} bits/site")
print(f"Increase:           {complexity_history[-1] - complexity_history[0]:.4f} bits/site")

### Visualizing Growth

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(time_points, complexity_history, linewidth=3, color='blue', label='Simulated')

# Fit exponential saturation
from scipy.optimize import curve_fit

def saturation_curve(t, C_max, tau):
    return C_max * (1 - np.exp(-t / tau))

params, _ = curve_fit(saturation_curve, time_points, complexity_history, 
                      p0=[max(complexity_history), 100])
C_max_fit, tau_fit = params

plt.plot(time_points, saturation_curve(time_points, *params), 
         'r--', linewidth=2, label=f'Theory: C_max={C_max_fit:.2f}, œÑ={tau_fit:.0f}')

plt.axhline(C_max_fit, color='red', linestyle=':', alpha=0.5, label='Equilibrium plateau')
plt.xlabel('Generation', fontsize=12)
plt.ylabel('Complexity (bits/site)', fontsize=12)
plt.title('Complexity Trajectory: Exponential Saturation', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## Part 5: Multi-Organism Comparison

Let's compare how different organisms evolve complexity.

In [None]:
# Define organisms with realistic parameters
organism_params = {
    'Virus': {'mu': 1e-3, 'L': 100, 'color': 'red', 'tau': 50},
    'Bacteria': {'mu': 1e-4, 'L': 1000, 'color': 'blue', 'tau': 200},
    'Eukaryote': {'mu': 1e-5, 'L': 10000, 'color': 'green', 'tau': 500},
}

plt.figure(figsize=(14, 6))

for name, params in organism_params.items():
    print(f"Simulating {name}...")
    
    sim = EvolutionarySimulator(
        population_size=500,
        genome_length=params['L'],
        mutation_rate=params['mu'],
        fitness_function=fitness_counting_ones,
        alphabet_size=4
    )
    
    complexity_history = []
    for gen in range(1000):
        sim.step()
        if gen % 10 == 0:
            complexity_history.append(sim.get_complexity())
    
    time_points = np.arange(0, 1000, 10)
    plt.plot(time_points, complexity_history, linewidth=3, 
             color=params['color'], label=name)

plt.xlabel('Generation', fontsize=12)
plt.ylabel('Complexity (bits/site)', fontsize=12)
plt.title('Multi-Organism Comparison: Complexity Evolution', fontsize=14, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n‚úÖ All simulations complete!")

---

## Part 6: Key Insights

### What We've Learned

1. **Complexity is measurable** - Shannon entropy quantifies it
2. **Complexity must increase** - It's a law of physics!
3. **There's always a ceiling** - Set by mutation rate (C_max)
4. **Evolution equilibrates** - I_E = I_L at steady state
5. **Trade-offs are fundamental** - Speed vs complexity, fidelity vs genome size

### Connection to Machine Learning

| Evolution | Machine Learning |
|-----------|------------------|
| Genome | Model weights |
| Mutation | Weight decay/noise |
| Selection | Loss gradient |
| C_max | Model capacity |
| I_E | Information from data |
| I_L | Regularization |

**Insight:** Training is just fast evolution!

---

## Part 7: Challenge Problems

### Challenge 1: Optimal Mutation Rate

For a given genome length, what mutation rate maximizes complexity while staying below the error threshold?

In [None]:
# TODO: Implement this!
def find_optimal_mutation_rate(genome_length, selection_strength):
    """
    Find mutation rate that maximizes equilibrium complexity.
    
    Hint: Balance I_E and I_L
    """
    pass

# Test it
# optimal_mu = find_optimal_mutation_rate(1000, 0.01)
# print(f"Optimal mutation rate: {optimal_mu:.2e}")

### Challenge 2: Varying Environment

What happens when the environment changes periodically? Does complexity still increase?

In [None]:
# TODO: Simulate with changing fitness function
# Hint: Switch between fitness_counting_ones and fitness_max_entropy every N generations

### Challenge 3: Sexual Reproduction

How does recombination affect complexity evolution?

In [None]:
# TODO: Add crossover operation to EvolutionarySimulator
# Compare sexual vs asexual evolution

---

## Summary

**The First Law of Complexodynamics:**

> *"The information content of a replicator will increase up to the limit imposed by the accuracy of its replication machinery."*

**In equations:**
$$
\frac{dC}{dt} = I_E - I_L
$$
$$
C(t) = C_{\max}\left(1 - e^{-t/\tau}\right)
$$
$$
C_{\max} = -\log_2(\mu \cdot L)
$$

**Key takeaways:**
- Complexity increase is **inevitable** (given replication + selection)
- There's **always a ceiling** (set by copying fidelity)
- Evolution is **equilibration** (automatic balancing of gain and loss)
- This is **physics**, not mysticism!

---

### Next Steps

1. Try the exercises in `exercises/` folder
2. Run `train_minimal.py` for more experiments
3. Generate visualizations with `visualization.py`
4. Read the original paper: [arXiv:0912.0368](https://arxiv.org/abs/0912.0368)

**Questions? Open an issue on GitHub!**

‚≠ê **Star the repo** if this helped you understand complexodynamics!