## Output Data Structure

The final results follow the same structure as the original `method_out.json` file. Here's a sample of the output for reference:

In [None]:
# Interactive parameter tuning - modify these values and re-run to see the effect!
print("=== INTERACTIVE PARAMETER TUNING ===")
print("Modify the parameters below and re-run this cell to experiment:")
print()

# Adjustable parameters - users can modify these
EPSILON_TARGET = 0.08  # Try values like 0.05, 0.10, 0.15
DELTA = 0.01          # Try values like 0.01, 0.05, 0.10  
MIN_SAMPLES = 50      # Try values like 20, 50, 100, 200
HYSTERESIS = 0.03     # Try values like 0.01, 0.03, 0.05

def run_custom_experiment(epsilon_target, delta, min_samples, hysteresis):
    """Run experiment with custom parameters."""
    controller = DKWController(
        epsilon_target=epsilon_target,
        delta=delta,
        min_samples=min_samples,
        hysteresis=hysteresis
    )
    
    results = {"baseline": [], "proposed": []}
    
    for example in sample_data:
        error = np.random.random() < example["difficulty"]
        controller.add_observation(float(error))
        decision = controller.decide()

        results["proposed"].append({
            "id": example["id"],
            "decision": decision,
            "error": error,
        })
        results["baseline"].append({
            "id": example["id"],
            "decision": "fission",
            "error": error,
        })
    
    return results

# Run experiment with custom parameters
print(f"Running experiment with custom parameters:")
print(f"  â€¢ epsilon_target = {EPSILON_TARGET}")
print(f"  â€¢ delta = {DELTA} (confidence = {1-DELTA:.1%})")
print(f"  â€¢ min_samples = {MIN_SAMPLES}")
print(f"  â€¢ hysteresis = {HYSTERESIS}")
print()

custom_results = run_custom_experiment(EPSILON_TARGET, DELTA, MIN_SAMPLES, HYSTERESIS)

# Quick analysis
proposed = custom_results["proposed"]
baseline = custom_results["baseline"]

proposed_fusion = sum(1 for r in proposed if r["decision"] == "fusion")
proposed_errors = sum(1 for r in proposed if r["error"])
baseline_fusion = sum(1 for r in baseline if r["decision"] == "fusion")
baseline_errors = sum(1 for r in baseline if r["error"])

total = len(proposed)

print(f"Results with custom parameters:")
print(f"  Proposed: {proposed_fusion/total:.1%} fusion, {proposed_errors/total:.3f} error rate")
print(f"  Baseline: {baseline_fusion/total:.1%} fusion, {baseline_errors/total:.3f} error rate")
print(f"  Improvement: {(proposed_fusion - baseline_fusion)} more fusion decisions")
print()
print("ðŸ’¡ Try modifying the parameters above and re-running to see different behaviors!")
print("ðŸ’¡ Lower epsilon_target = more aggressive fusion")
print("ðŸ’¡ Lower delta = higher confidence requirements") 
print("ðŸ’¡ Higher min_samples = more conservative initially")
print("ðŸ’¡ Higher hysteresis = less frequent switching")

## Interactive Experimentation

Try modifying the controller parameters below to see how they affect the decision-making behavior. This makes the notebook interactive and educational!

In [None]:
# Create visualization of the controller's behavior
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('DKW Controller Analysis', fontsize=16)

# 1. Decision timeline
decisions = experiment_results["tracking"]["decisions"]
decision_numeric = [1 if d == "fusion" else 0 for d in decisions]
ax1.plot(decision_numeric, linewidth=2, color='blue', alpha=0.7)
ax1.fill_between(range(len(decision_numeric)), decision_numeric, alpha=0.3, color='blue')
ax1.set_title('Decision Timeline (1=Fusion, 0=Fission)')
ax1.set_xlabel('Example #')
ax1.set_ylabel('Decision')
ax1.set_ylim(-0.1, 1.1)
ax1.grid(True, alpha=0.3)

# 2. Error rate evolution  
error_rates = experiment_results["tracking"]["error_rates"]
valid_indices = [i for i, rate in enumerate(error_rates) if rate is not None]
valid_rates = [rate for rate in error_rates if rate is not None]

if valid_rates:
    ax2.plot(valid_indices, valid_rates, 'r-', linewidth=2, label='Empirical Error Rate')
    ax2.axhline(y=0.10, color='black', linestyle='--', label='Target (Îµ=0.10)')
    ax2.axhline(y=0.15, color='red', linestyle=':', alpha=0.7, label='Upper Threshold')
    ax2.axhline(y=0.05, color='green', linestyle=':', alpha=0.7, label='Lower Threshold')
    ax2.set_title('Error Rate Evolution')
    ax2.set_xlabel('Example #')
    ax2.set_ylabel('Error Rate')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

# 3. Decision distribution comparison
methods = ['Baseline\n(Always Fission)', 'Proposed\n(DKW Controller)']
fusion_rates = [metrics['baseline_fusion_rate'], metrics['proposed_fusion_rate']]
error_rates_bar = [metrics['baseline_error_rate'], metrics['proposed_error_rate']]

x = np.arange(len(methods))
width = 0.35

bars1 = ax3.bar(x - width/2, fusion_rates, width, label='Fusion Rate', alpha=0.8, color='blue')
bars2 = ax3.bar(x + width/2, error_rates_bar, width, label='Error Rate', alpha=0.8, color='red')

ax3.set_xlabel('Method')
ax3.set_ylabel('Rate')
ax3.set_title('Performance Comparison')
ax3.set_xticks(x)
ax3.set_xticklabels(methods)
ax3.legend()
ax3.grid(True, alpha=0.3)

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax3.annotate(f'{height:.3f}',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom', fontsize=10)

# 4. Difficulty vs Error correlation
difficulties = [r["difficulty"] for r in experiment_results["proposed"]]
actual_errors = [1 if r["error"] else 0 for r in experiment_results["proposed"]]

ax4.scatter(difficulties, actual_errors, alpha=0.6, s=20)
ax4.set_xlabel('Example Difficulty')
ax4.set_ylabel('Actual Error (1=Error, 0=Success)')
ax4.set_title('Difficulty vs Actual Errors')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\\n=== Key Insights ===")
print(f"â€¢ Controller achieved {metrics['proposed_fusion_rate']:.1%} fusion usage vs {metrics['baseline_fusion_rate']:.1%} baseline")
print(f"â€¢ Error rates: {metrics['proposed_error_rate']:.3f} (proposed) vs {metrics['baseline_error_rate']:.3f} (baseline)")
if len(valid_rates) > 0:
    print(f"â€¢ Final empirical error rate: {valid_rates[-1]:.3f}")
print(f"â€¢ Total decision switches: {sum(1 for i in range(1, len(decisions)) if decisions[i] != decisions[i-1])}")

## Visualization

Let's visualize the controller's behavior over time to understand how it adapts to the changing error patterns.

In [None]:
# Calculate performance metrics
def analyze_results(results):
    proposed = results["proposed"]
    baseline = results["baseline"]
    
    # Count decisions and errors
    proposed_fusion_count = sum(1 for r in proposed if r["decision"] == "fusion")
    proposed_fission_count = sum(1 for r in proposed if r["decision"] == "fission")
    proposed_error_count = sum(1 for r in proposed if r["error"])
    
    baseline_fusion_count = sum(1 for r in baseline if r["decision"] == "fusion")
    baseline_fission_count = sum(1 for r in baseline if r["decision"] == "fission")
    baseline_error_count = sum(1 for r in baseline if r["error"])
    
    total_examples = len(proposed)
    
    print("=== EXPERIMENT RESULTS ===")
    print(f"Total examples processed: {total_examples}")
    print()
    
    print("PROPOSED (DKW Controller):")
    print(f"  Fusion decisions: {proposed_fusion_count} ({proposed_fusion_count/total_examples:.1%})")
    print(f"  Fission decisions: {proposed_fission_count} ({proposed_fission_count/total_examples:.1%})")
    print(f"  Total errors: {proposed_error_count} ({proposed_error_count/total_examples:.3f} error rate)")
    print()
    
    print("BASELINE (Always Fission):")
    print(f"  Fusion decisions: {baseline_fusion_count} ({baseline_fusion_count/total_examples:.1%})")
    print(f"  Fission decisions: {baseline_fission_count} ({baseline_fission_count/total_examples:.1%})")
    print(f"  Total errors: {baseline_error_count} ({baseline_error_count/total_examples:.3f} error rate)")
    print()
    
    # Performance comparison
    fusion_advantage = proposed_fusion_count - baseline_fusion_count
    print(f"Performance advantage: {fusion_advantage} more fusion decisions")
    print(f"Error rate difference: {(proposed_error_count - baseline_error_count)/total_examples:.3f}")
    
    return {
        "proposed_fusion_rate": proposed_fusion_count / total_examples,
        "proposed_error_rate": proposed_error_count / total_examples,
        "baseline_fusion_rate": baseline_fusion_count / total_examples,
        "baseline_error_rate": baseline_error_count / total_examples
    }

metrics = analyze_results(experiment_results)

## Results Analysis

Let's analyze the results comparing the DKW controller (proposed method) against the baseline conservative approach.

In [None]:
def run_experiment(data):
    """Run DKW controller experiment with inline data."""
    controller = DKWController()
    results = {"baseline": [], "proposed": []}
    decisions_over_time = []
    error_rates_over_time = []
    
    for example in data:
        # Simulate error occurrence based on difficulty
        error = np.random.random() < example["difficulty"]
        controller.add_observation(float(error))
        decision = controller.decide()
        
        # Track decisions and error rates over time
        decisions_over_time.append(decision)
        if len(controller.samples) >= controller.min_samples:
            recent_error_rate = np.mean(controller.samples[-controller.min_samples:])
            error_rates_over_time.append(recent_error_rate)
        else:
            error_rates_over_time.append(None)

        results["proposed"].append({
            "id": example["id"],
            "decision": decision,
            "error": error,
            "difficulty": example["difficulty"]
        })
        results["baseline"].append({
            "id": example["id"],
            "decision": "fission",  # Always conservative
            "error": error,
            "difficulty": example["difficulty"]
        })
    
    # Add tracking information for visualization
    results["tracking"] = {
        "decisions": decisions_over_time,
        "error_rates": error_rates_over_time,
        "samples_count": list(range(1, len(data) + 1))
    }
    
    return results

# Run the experiment
print("Running DKW controller experiment...")
experiment_results = run_experiment(sample_data)
print(f"âœ“ Processed {len(experiment_results['proposed'])} examples")

## Experiment Function

The main experiment function that runs the DKW controller on the sample data and compares it against a baseline conservative approach.

In [None]:
# Sample data that replaces the need for external JSON files
# This simulates a dataset with examples of varying difficulty
sample_data = [
    {"id": f"example_{i:03d}", "difficulty": difficulty}
    for i, difficulty in enumerate([
        # Easy examples (low error probability)
        0.02, 0.01, 0.03, 0.02, 0.01, 0.04, 0.02, 0.01, 0.03, 0.02,
        0.01, 0.02, 0.03, 0.01, 0.02, 0.04, 0.01, 0.02, 0.03, 0.01,
        
        # Medium difficulty examples
        0.08, 0.07, 0.09, 0.08, 0.06, 0.10, 0.07, 0.08, 0.09, 0.07,
        0.06, 0.08, 0.09, 0.07, 0.08, 0.10, 0.06, 0.07, 0.09, 0.08,
        
        # Gradually increasing difficulty
        0.12, 0.11, 0.13, 0.12, 0.10, 0.14, 0.11, 0.12, 0.13, 0.11,
        0.15, 0.14, 0.16, 0.15, 0.13, 0.17, 0.14, 0.15, 0.16, 0.14,
        
        # High difficulty examples (higher error probability) 
        0.20, 0.19, 0.21, 0.20, 0.18, 0.22, 0.19, 0.20, 0.21, 0.19,
        0.25, 0.24, 0.26, 0.25, 0.23, 0.27, 0.24, 0.25, 0.26, 0.24,
        
        # Very challenging examples
        0.30, 0.29, 0.31, 0.30, 0.28, 0.32, 0.29, 0.30, 0.31, 0.29,
        0.35, 0.34, 0.36, 0.35, 0.33, 0.37, 0.34, 0.35, 0.36, 0.34,
        
        # Additional samples to reach minimum required
        *[0.1 + 0.01 * (i % 20) for i in range(40)]
    ])
]

print(f"Created {len(sample_data)} sample examples")
print("First 5 examples:")
for i in range(5):
    print(f"  {sample_data[i]}")
print("...")
print("Last 5 examples:")
for i in range(-5, 0):
    print(f"  {sample_data[i]}")

## Sample Data

Instead of reading from external files, we'll create sample data inline. This represents examples with varying difficulty levels that influence the probability of errors occurring.

In [None]:
@dataclass
class DKWController:
    """DKW-guided fusion/fission controller."""
    epsilon_target: float = 0.10
    delta: float = 0.05
    min_samples: int = 100
    hysteresis: float = 0.05

    samples: list = field(default_factory=list)
    current_state: str = "fission"

    def dkw_epsilon(self, n: int) -> float:
        """Compute DKW epsilon for n samples."""
        if n < 2:
            return 1.0
        return np.sqrt(np.log(2 / self.delta) / (2 * n))

    def add_observation(self, error: float) -> None:
        """Add error observation for calibration."""
        self.samples.append(error)

    def decide(self) -> str:
        """Make fusion/fission decision with DKW guarantee."""
        n = len(self.samples)
        if n < self.min_samples:
            return self.current_state

        epsilon = self.dkw_epsilon(n)
        empirical_error = np.mean(self.samples[-self.min_samples:])
        error_upper_bound = empirical_error + epsilon

        if self.current_state == "fusion":
            if error_upper_bound > self.epsilon_target + self.hysteresis:
                self.current_state = "fission"
        else:
            if error_upper_bound < self.epsilon_target - self.hysteresis:
                self.current_state = "fusion"

        return self.current_state

# Create a controller instance for demonstration
controller = DKWController()
print(f"Initial state: {controller.current_state}")
print(f"Target error rate: {controller.epsilon_target}")
print(f"Confidence level: {1 - controller.delta}")
print(f"Minimum samples needed: {controller.min_samples}")

## DKW Controller Class

The core controller that implements the DKW-guided decision making logic. It maintains a history of error observations and uses statistical bounds to make fusion/fission decisions.

In [None]:
"""Required imports for the DKW Controller."""
import json
import numpy as np
from dataclasses import dataclass, field
import matplotlib.pyplot as plt

# Set random seed for reproducible results
np.random.seed(42)

## Overview

The **Dvoretzky-Kiefer-Wolfowitz (DKW) inequality** provides statistical bounds on the difference between empirical and true distributions. In this implementation:

- **Fusion**: Aggressive strategy that may have higher error rates but better performance
- **Fission**: Conservative strategy with lower error rates but potentially slower performance
- **DKW Controller**: Uses statistical guarantees to switch between strategies based on observed error rates

### Key Parameters:
- `epsilon_target`: Target error rate threshold
- `delta`: Confidence level (1-Î´ confidence)
- `min_samples`: Minimum samples before making decisions
- `hysteresis`: Buffer to prevent rapid switching between states

# DKW Controller Implementation

**Artifact ID:** experiment_001  
**Name:** method.py

This notebook demonstrates a DKW-guided fusion/fission controller implementation. The controller uses the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality to make statistically-sound decisions about when to use fusion vs fission strategies based on error observations.