## 9. Conclusion

This notebook demonstrates a complete implementation of a DKW-guided fusion/fission controller. The key advantages of this approach include:

- **Statistical rigor**: Theoretical guarantees on error bounds
- **Adaptability**: Learns from observed data to make optimal decisions
- **Robustness**: Hysteresis prevents unstable behavior
- **Configurability**: Parameters can be tuned for different risk profiles

### Next Steps

You can extend this implementation by:
1. Adding more sophisticated error models
2. Implementing different statistical bounds (Hoeffding, Bennett, etc.)
3. Creating multi-armed bandit variants
4. Adding online parameter adaptation
5. Incorporating contextual information beyond difficulty

### Experimental Design

To use this controller in practice:
1. Define your fusion/fission scenarios
2. Collect training data with known difficulty levels
3. Tune parameters based on your risk tolerance
4. Deploy with monitoring and periodic recalibration

**Happy experimenting!** ðŸš€

## 8. Understanding the DKW Inequality

The **Dvoretzky-Kiefer-Wolfowitz (DKW) inequality** provides a theoretical foundation for our controller:

$$P(|F_n(x) - F(x)| \geq \epsilon) \leq 2e^{-2n\epsilon^2}$$

Where:
- $F_n(x)$ is the empirical distribution function
- $F(x)$ is the true distribution function  
- $n$ is the number of samples
- $\epsilon$ is the error bound

In our controller:
- We use this to compute confidence bounds on the true error rate
- The `dkw_epsilon` method implements: $\epsilon = \sqrt{\frac{\ln(2/\delta)}{2n}}$
- This ensures with probability $(1-\delta)$, the true error rate is within $\epsilon$ of our empirical estimate

## Key Features

1. **Statistical Guarantee**: The DKW bound provides theoretical confidence in our error estimates
2. **Adaptive Behavior**: The controller switches between fusion/fission based on observed data
3. **Hysteresis**: Prevents oscillation between modes by requiring clear evidence for transitions
4. **Minimum Samples**: Ensures sufficient data before making adaptive decisions

## Usage Tips

- **Lower `epsilon_target`**: More aggressive, switches to fusion more readily
- **Higher `epsilon_target`**: More conservative, prefers fission
- **Lower `hysteresis`**: More sensitive to changes, switches more frequently
- **Higher `hysteresis`**: More stable, requires stronger evidence to switch
- **Higher `min_samples`**: Delays adaptive behavior until more data is collected

In [None]:
# Interactive parameter exploration
def explore_parameters(epsilon_target=0.10, delta=0.05, min_samples=100, hysteresis=0.05):
    """Explore how different parameters affect the controller's behavior."""
    print(f"Testing with parameters:")
    print(f"  epsilon_target: {epsilon_target}")
    print(f"  delta: {delta}")
    print(f"  min_samples: {min_samples}")
    print(f"  hysteresis: {hysteresis}")
    
    # Create controller with custom parameters
    custom_controller = DKWController(
        epsilon_target=epsilon_target,
        delta=delta,
        min_samples=min_samples,
        hysteresis=hysteresis
    )
    
    # Run experiment with custom controller
    custom_results = {"baseline": [], "proposed": []}
    
    for example in sample_data:
        error = np.random.random() < example["difficulty"]
        custom_controller.add_observation(float(error))
        decision = custom_controller.decide()
        
        custom_results["proposed"].append({
            "id": example["id"],
            "decision": decision,
            "error": error,
        })
        custom_results["baseline"].append({
            "id": example["id"],
            "decision": "fission",
            "error": error,
        })
    
    # Calculate metrics for custom controller
    custom_df = pd.DataFrame(custom_results['proposed'])
    custom_metrics = calculate_metrics(custom_df)
    
    fusion_ratio = custom_metrics['fusion_decisions'] / len(custom_results['proposed'])
    
    print(f"\nResults:")
    print(f"  Fusion decisions: {custom_metrics['fusion_decisions']}/{len(sample_data)} ({fusion_ratio:.1%})")
    print(f"  Overall error rate: {custom_metrics['overall_error_rate']:.3f}")
    if custom_metrics['fusion_decisions'] > 0:
        print(f"  Fusion error rate: {custom_metrics['fusion_error_rate']:.3f}")
    print(f"  Fission error rate: {custom_metrics['fission_error_rate']:.3f}")
    
    return custom_results

# Try different parameter combinations
print("=== PARAMETER EXPLORATION ===")

print("\n1. More aggressive (lower target error rate):")
explore_parameters(epsilon_target=0.05)

print("\n2. More conservative (higher target error rate):")
explore_parameters(epsilon_target=0.20)

print("\n3. Less hysteresis (more sensitive to changes):")
explore_parameters(hysteresis=0.01)

print("\n4. More hysteresis (less sensitive to changes):")
explore_parameters(hysteresis=0.10)

## 7. Interactive Exploration

Try modifying the controller parameters and see how they affect the decision-making process!

In [None]:
# Create visualizations
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

# 1. Decision comparison
decisions_comparison = pd.DataFrame({
    'Baseline': [baseline_metrics['fusion_decisions'], baseline_metrics['fission_decisions']],
    'Proposed': [proposed_metrics['fusion_decisions'], proposed_metrics['fission_decisions']]
}, index=['Fusion', 'Fission'])

decisions_comparison.plot(kind='bar', ax=ax1, color=['skyblue', 'lightcoral'])
ax1.set_title('Decision Frequency Comparison')
ax1.set_ylabel('Number of Decisions')
ax1.legend()
ax1.tick_params(axis='x', rotation=45)

# 2. Error rate comparison
error_rates = pd.DataFrame({
    'Baseline': [baseline_metrics['overall_error_rate']],
    'Proposed': [proposed_metrics['overall_error_rate']]
}, index=['Overall Error Rate'])

error_rates.plot(kind='bar', ax=ax2, color=['orange', 'green'])
ax2.set_title('Overall Error Rate Comparison')
ax2.set_ylabel('Error Rate')
ax2.legend()
ax2.tick_params(axis='x', rotation=45)

# 3. Decision timeline for proposed method
decisions_timeline = [1 if dec == 'fusion' else 0 for dec in proposed_df['decision']]
ax3.plot(range(len(decisions_timeline)), decisions_timeline, 'o-', color='purple', alpha=0.7)
ax3.set_title('Decision Timeline (DKW Controller)')
ax3.set_xlabel('Example Index')
ax3.set_ylabel('Decision (0=Fission, 1=Fusion)')
ax3.set_ylim(-0.1, 1.1)
ax3.grid(True, alpha=0.3)

# 4. Error occurrence vs difficulty
difficulties = [ex['difficulty'] for ex in sample_data]
errors = proposed_df['error'].astype(int)
ax4.scatter(difficulties, errors, alpha=0.6, c=['red' if err else 'blue' for err in errors])
ax4.set_title('Error Occurrence vs Difficulty')
ax4.set_xlabel('Difficulty Level')
ax4.set_ylabel('Error Occurred (0=No, 1=Yes)')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary statistics
print("\n=== SUMMARY ===")
print(f"Total examples processed: {len(sample_data)}")
print(f"Average difficulty: {np.mean(difficulties):.3f}")
print(f"Baseline approach: Always fission (conservative)")
print(f"DKW approach: Adaptive based on error observations")
print(f"Fusion decisions made by DKW: {proposed_metrics['fusion_decisions']}/{len(sample_data)} ({fusion_ratio:.1%})")

## 6. Visualization

Let's create visualizations to better understand the controller's behavior and performance.

In [None]:
# Convert results to DataFrames for easier analysis
baseline_df = pd.DataFrame(results['baseline'])
proposed_df = pd.DataFrame(results['proposed'])

# Calculate performance metrics
def calculate_metrics(df):
    fusion_count = (df['decision'] == 'fusion').sum()
    fission_count = (df['decision'] == 'fission').sum()
    error_rate = df['error'].mean()
    fusion_error_rate = df[df['decision'] == 'fusion']['error'].mean() if fusion_count > 0 else 0
    fission_error_rate = df[df['decision'] == 'fission']['error'].mean() if fission_count > 0 else 0
    
    return {
        'fusion_decisions': fusion_count,
        'fission_decisions': fission_count,
        'overall_error_rate': error_rate,
        'fusion_error_rate': fusion_error_rate,
        'fission_error_rate': fission_error_rate
    }

baseline_metrics = calculate_metrics(baseline_df)
proposed_metrics = calculate_metrics(proposed_df)

print("=== PERFORMANCE COMPARISON ===")
print(f"\nBaseline (Always Fission):")
for key, value in baseline_metrics.items():
    if 'rate' in key:
        print(f"  {key}: {value:.3f}")
    else:
        print(f"  {key}: {value}")

print(f"\nProposed (DKW Controller):")
for key, value in proposed_metrics.items():
    if 'rate' in key:
        print(f"  {key}: {value:.3f}")
    else:
        print(f"  {key}: {value}")

# Calculate performance improvement
fusion_ratio = proposed_metrics['fusion_decisions'] / len(results['proposed'])
print(f"\nThe DKW controller chose fusion {fusion_ratio:.1%} of the time")

## 5. Results Analysis

Let's analyze the performance of both approaches and visualize the controller's decision-making process.

In [None]:
def run_experiment(data):
    """Run DKW controller experiment with inlined data."""
    controller = DKWController()
    results = {"baseline": [], "proposed": []}

    for example in data:
        # Simulate error occurrence based on difficulty
        error = np.random.random() < example["difficulty"]
        controller.add_observation(float(error))
        decision = controller.decide()

        results["proposed"].append({
            "id": example["id"],
            "decision": decision,
            "error": error,
        })
        results["baseline"].append({
            "id": example["id"],
            "decision": "fission",  # Always conservative
            "error": error,
        })

    return results

# Run the experiment
print("Running DKW controller experiment...")
results = run_experiment(sample_data)

print(f"Baseline decisions: {len(results['baseline'])} results")
print(f"Proposed decisions: {len(results['proposed'])} results")

# Show first few results
print("\nFirst few baseline results:")
for result in results['baseline'][:5]:
    print(f"  {result['id']}: {result['decision']}, error={result['error']}")
    
print("\nFirst few proposed results:")
for result in results['proposed'][:5]:
    print(f"  {result['id']}: {result['decision']}, error={result['error']}")

## 4. Experiment Function

The `run_experiment` function simulates the controller's behavior on our sample data. It compares:
- **Baseline**: Always uses fission (conservative approach)
- **Proposed**: Uses DKW controller to adaptively choose between fusion and fission

In [None]:
# Sample input data (replaces reading from ../dataset_001/data_out.json)
# Each example has an ID and a difficulty level (0.0 = easy, 1.0 = very hard)
sample_data = [
    {"id": "example_000", "difficulty": 0.1},
    {"id": "example_001", "difficulty": 0.05},
    {"id": "example_002", "difficulty": 0.8},
    {"id": "example_003", "difficulty": 0.2},
    {"id": "example_004", "difficulty": 0.15},
    {"id": "example_005", "difficulty": 0.9},
    {"id": "example_006", "difficulty": 0.3},
    {"id": "example_007", "difficulty": 0.1},
    {"id": "example_008", "difficulty": 0.7},
    {"id": "example_009", "difficulty": 0.05},
    {"id": "example_010", "difficulty": 0.4},
    {"id": "example_011", "difficulty": 0.6},
    {"id": "example_012", "difficulty": 0.2},
    {"id": "example_013", "difficulty": 0.3},
    {"id": "example_014", "difficulty": 0.8},
]

# Expected output structure (for reference)
expected_output = {
    "baseline": [
        {"id": "example_000", "decision": "fission", "error": False},
        {"id": "example_001", "decision": "fission", "error": False},
        {"id": "example_002", "decision": "fission", "error": True}
    ],
    "proposed": [
        {"id": "example_000", "decision": "fission", "error": False},
        {"id": "example_001", "decision": "fusion", "error": False},
        {"id": "example_002", "decision": "fusion", "error": True}
    ]
}

print(f"Sample data contains {len(sample_data)} examples")
print(f"Difficulty range: {min(ex['difficulty'] for ex in sample_data):.2f} - {max(ex['difficulty'] for ex in sample_data):.2f}")
print("\nFirst few examples:")
for ex in sample_data[:3]:
    print(f"  {ex['id']}: difficulty = {ex['difficulty']}")

## 3. Sample Data (Inlined)

Instead of reading from external JSON files, we'll define our sample data directly in the notebook. This data represents examples with varying difficulty levels that affect error probability.

In [None]:
@dataclass
class DKWController:
    """DKW-guided fusion/fission controller."""
    epsilon_target: float = 0.10
    delta: float = 0.05
    min_samples: int = 100
    hysteresis: float = 0.05

    samples: list = field(default_factory=list)
    current_state: str = "fission"

    def dkw_epsilon(self, n: int) -> float:
        """Compute DKW epsilon for n samples."""
        if n < 2:
            return 1.0
        return np.sqrt(np.log(2 / self.delta) / (2 * n))

    def add_observation(self, error: float) -> None:
        """Add error observation for calibration."""
        self.samples.append(error)

    def decide(self) -> str:
        """Make fusion/fission decision with DKW guarantee."""
        n = len(self.samples)
        if n < self.min_samples:
            return self.current_state

        epsilon = self.dkw_epsilon(n)
        empirical_error = np.mean(self.samples[-self.min_samples:])
        error_upper_bound = empirical_error + epsilon

        if self.current_state == "fusion":
            if error_upper_bound > self.epsilon_target + self.hysteresis:
                self.current_state = "fission"
        else:
            if error_upper_bound < self.epsilon_target - self.hysteresis:
                self.current_state = "fusion"

        return self.current_state

# Create an instance to demonstrate
controller = DKWController()
print(f"Controller initialized in {controller.current_state} mode")
print(f"Target error rate: {controller.epsilon_target}")
print(f"Confidence level: {1 - controller.delta}")
print(f"Minimum samples required: {controller.min_samples}")

## 2. DKW Controller Class

The `DKWController` class implements a decision-making system that uses the DKW inequality to maintain statistical guarantees on error bounds. Here's what each parameter does:

- **epsilon_target**: Target error rate threshold (10% in this example)
- **delta**: Confidence level parameter (5% significance level)
- **min_samples**: Minimum samples needed before making decisions
- **hysteresis**: Prevents oscillation between modes by creating a buffer zone

In [None]:
"""DKW Controller Implementation."""
import json
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from dataclasses import dataclass, field

# Set random seed for reproducibility
np.random.seed(42)
print("Libraries imported successfully!")

## 1. Setup and Imports

First, let's import the required libraries and set up our environment.

# DKW Controller Implementation Demo

This notebook demonstrates a **DKW-guided fusion/fission controller** implementation. The controller uses the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality to make statistically-grounded decisions between fusion and fission modes based on observed error rates.

## Overview
- **Fusion mode**: Aggressive approach with potentially higher performance but more risk
- **Fission mode**: Conservative approach with lower risk but potentially reduced performance
- **DKW guarantee**: Statistical confidence bounds on error rates to guide decision making