# DKW Controller Implementation - Interactive Demo

**Artifact ID:** experiment_001  
**Original File:** method.py

This notebook implements a **DKW-guided fusion/fission controller** based on the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality for statistical confidence bounds.

## Overview
- **DKW Controller**: Makes adaptive decisions between "fusion" and "fission" modes based on error observations
- **Statistical Guarantee**: Uses DKW inequality to provide confidence bounds on empirical error rates
- **Self-contained**: All data is inlined - no external files required

## Key Features
- Adaptive decision making with statistical guarantees
- Hysteresis to prevent rapid switching
- Configurable parameters for different use cases

In [None]:
"""DKW Controller Implementation."""
import json
import numpy as np
import matplotlib.pyplot as plt
from dataclasses import dataclass, field

print("âœ“ Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")

## DKW Controller Class

The `DKWController` uses the **Dvoretzky-Kiefer-Wolfowitz inequality** to provide statistical confidence bounds on empirical error rates.

### Key Parameters:
- `epsilon_target`: Target error threshold (default: 0.10)
- `delta`: Confidence parameter for DKW bound (default: 0.05)  
- `min_samples`: Minimum samples before making decisions (default: 100)
- `hysteresis`: Prevents state oscillation (default: 0.05)

### States:
- **Fusion**: Aggressive mode (lower latency, higher risk)
- **Fission**: Conservative mode (higher latency, lower risk)

In [None]:
@dataclass
class DKWController:
    """DKW-guided fusion/fission controller."""
    epsilon_target: float = 0.10
    delta: float = 0.05
    min_samples: int = 100
    hysteresis: float = 0.05

    samples: list = field(default_factory=list)
    current_state: str = "fission"

    def dkw_epsilon(self, n: int) -> float:
        """Compute DKW epsilon for n samples."""
        if n < 2:
            return 1.0
        return np.sqrt(np.log(2 / self.delta) / (2 * n))

    def add_observation(self, error: float) -> None:
        """Add error observation for calibration."""
        self.samples.append(error)

    def decide(self) -> str:
        """Make fusion/fission decision with DKW guarantee."""
        n = len(self.samples)
        if n < self.min_samples:
            return self.current_state

        epsilon = self.dkw_epsilon(n)
        empirical_error = np.mean(self.samples[-self.min_samples:])
        error_upper_bound = empirical_error + epsilon

        if self.current_state == "fusion":
            if error_upper_bound > self.epsilon_target + self.hysteresis:
                self.current_state = "fission"
        else:
            if error_upper_bound < self.epsilon_target - self.hysteresis:
                self.current_state = "fusion"

        return self.current_state

# Display controller parameters
print("DKW Controller initialized with default parameters:")
controller = DKWController()
print(f"- Target error rate (Îµ): {controller.epsilon_target}")
print(f"- Confidence parameter (Î´): {controller.delta}")
print(f"- Minimum samples: {controller.min_samples}")
print(f"- Hysteresis: {controller.hysteresis}")
print(f"- Initial state: {controller.current_state}")

## Sample Dataset

The experiment requires input data with example IDs and difficulty levels. Below is the inline sample dataset that replaces the external JSON file dependency.

**Note**: The original script read from `../dataset_001/data_out.json`, but for self-containment, we create sample data with the expected structure.

In [None]:
# Inline sample data (replaces external JSON file)
# Each example has an 'id' and a 'difficulty' level (probability of error)
sample_data = [
    {"id": "example_000", "difficulty": 0.05},
    {"id": "example_001", "difficulty": 0.03},  
    {"id": "example_002", "difficulty": 0.15},
    {"id": "example_003", "difficulty": 0.08},
    {"id": "example_004", "difficulty": 0.12},
    {"id": "example_005", "difficulty": 0.02},
    {"id": "example_006", "difficulty": 0.18},
    {"id": "example_007", "difficulty": 0.06},
    {"id": "example_008", "difficulty": 0.09},
    {"id": "example_009", "difficulty": 0.04},
    # Add more samples to reach min_samples threshold
] + [
    {"id": f"example_{i:03d}", "difficulty": np.random.uniform(0.01, 0.20)}
    for i in range(10, 120)  # Generate 110 more samples for testing
]

print(f"Created {len(sample_data)} sample data points")
print("Sample entries:")
for i in range(3):
    print(f"  {sample_data[i]}")
print("...")
print(f"  {sample_data[-1]}")

## Experiment Function

The experiment compares two approaches:
- **Baseline**: Always uses "fission" (conservative approach)
- **Proposed**: Uses the DKW controller for adaptive decision making

For each example, an error is simulated based on the difficulty level, and both methods make their decisions.

In [None]:
def run_experiment(data):
    """Run DKW controller experiment with inline data."""
    controller = DKWController()
    results = {"baseline": [], "proposed": []}

    for example in data:
        # Simulate error occurrence based on difficulty
        error = np.random.random() < example["difficulty"]
        controller.add_observation(float(error))
        decision = controller.decide()

        results["proposed"].append({
            "id": example["id"],
            "decision": decision,
            "error": error,
        })
        results["baseline"].append({
            "id": example["id"],
            "decision": "fission",  # Always conservative
            "error": error,
        })

    return results

print("Experiment function defined successfully!")

## Running the Experiment

Now let's execute the experiment with our sample data and analyze the results.

In [None]:
# Run the experiment
np.random.seed(42)  # For reproducible results
results = run_experiment(sample_data)

# Display basic statistics
proposed_results = results["proposed"]
baseline_results = results["baseline"]

print("=== Experiment Results ===")
print(f"Total examples processed: {len(proposed_results)}")
print(f"Baseline (always fission) errors: {sum(1 for r in baseline_results if r['error'])}")
print(f"Proposed (DKW controller) errors: {sum(1 for r in proposed_results if r['error'])}")

print("\nDecision distribution:")
print(f"Baseline - Fission: {sum(1 for r in baseline_results if r['decision'] == 'fission')}")
print(f"Baseline - Fusion: {sum(1 for r in baseline_results if r['decision'] == 'fusion')}")
print(f"Proposed - Fission: {sum(1 for r in proposed_results if r['decision'] == 'fission')}")
print(f"Proposed - Fusion: {sum(1 for r in proposed_results if r['decision'] == 'fusion')}")

print(f"\nFirst 5 results from proposed method:")
for i in range(5):
    r = proposed_results[i]
    print(f"  {r['id']}: {r['decision']} -> Error: {r['error']}")

## Analysis and Visualization

Let's analyze the controller behavior over time and visualize how decisions change as more samples are collected.

In [None]:
# Analyze controller behavior over time
# Extract decision timeline
decisions = [r['decision'] for r in proposed_results]
errors = [r['error'] for r in proposed_results]
indices = list(range(len(decisions)))

# Convert decisions to numeric for plotting
decision_values = [1 if d == 'fusion' else 0 for d in decisions]

# Calculate running error rate
running_errors = []
running_error_rate = []
error_count = 0
for i, error in enumerate(errors):
    error_count += int(error)
    running_errors.append(error_count)
    running_error_rate.append(error_count / (i + 1))

# Create visualization
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 10))

# Plot 1: Decision timeline
ax1.plot(indices, decision_values, 'b-', linewidth=2, label='Decision (1=Fusion, 0=Fission)')
ax1.scatter([i for i, e in enumerate(errors) if e], 
           [decision_values[i] for i, e in enumerate(errors) if e], 
           color='red', s=30, label='Error occurred', alpha=0.7)
ax1.set_ylabel('Decision')
ax1.set_title('Controller Decisions Over Time')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Running error rate
ax2.plot(indices, running_error_rate, 'r-', linewidth=2, label='Running Error Rate')
ax2.axhline(y=0.10, color='g', linestyle='--', label='Target Îµ=0.10')
ax2.axhline(y=0.15, color='orange', linestyle='--', label='Îµ + hysteresis')
ax2.axhline(y=0.05, color='orange', linestyle='--', label='Îµ - hysteresis')
ax2.set_ylabel('Error Rate')
ax2.set_title('Running Error Rate vs Target')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Sample count and DKW epsilon
controller_test = DKWController()
sample_counts = list(range(1, len(proposed_results) + 1))
dkw_epsilons = [controller_test.dkw_epsilon(n) for n in sample_counts]

ax3.plot(sample_counts, dkw_epsilons, 'purple', linewidth=2, label='DKW Îµ(n)')
ax3.axhline(y=controller_test.epsilon_target, color='g', linestyle='--', label='Target Îµ')
ax3.set_xlabel('Sample Count')
ax3.set_ylabel('DKW Epsilon')
ax3.set_title('DKW Confidence Bound vs Sample Count')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary statistics
fusion_decisions = sum(decision_values)
total_decisions = len(decision_values)
final_error_rate = running_error_rate[-1]

print(f"\n=== Final Summary ===")
print(f"Fusion decisions: {fusion_decisions}/{total_decisions} ({100*fusion_decisions/total_decisions:.1f}%)")
print(f"Final empirical error rate: {final_error_rate:.3f}")
print(f"Target error rate: {controller_test.epsilon_target}")
print(f"DKW epsilon at end: {dkw_epsilons[-1]:.3f}")

# Save results as inline data (equivalent to the original output file)
output_data = {
    "baseline": results["baseline"][:3],  # Show first 3 for comparison with original
    "proposed": results["proposed"][:3]
}

print(f"\n=== Sample Output Data (first 3 examples) ===")
print(json.dumps(output_data, indent=2))

## Interactive Exploration

Try modifying the parameters to see how the DKW controller behaves:

### ðŸ”§ Experiment Ideas:
1. **Adjust controller parameters**: Change `epsilon_target`, `delta`, `min_samples`, or `hysteresis`
2. **Modify the dataset**: Add more examples or change difficulty values
3. **Test edge cases**: What happens with very high or very low error rates?
4. **Analyze convergence**: How many samples does the controller need to stabilize?

### ðŸ“š Key Insights:
- The **DKW bound** provides statistical guarantees about error rates
- **Hysteresis** prevents decision oscillation  
- The controller adapts to **empirical error patterns**
- **Fusion mode** enables efficiency gains when error rates are acceptable

### ðŸš€ Next Steps:
This notebook is fully self-contained and ready for experimentation. Modify any cell above and re-run to explore different scenarios!