# Adaptive Operator Selection (AOS)

Adaptive Operator Selection (AOS) allows the algorithm to dynamically select variation operators (crossover, mutation) from a pool based on their performance during the optimization process. This can improve performance on difficult problems where the best operator is unknown or changes over different stages of search.

In this notebook, we will:
1.  Configure NSGA-II with an **Operator Pool** containing different crossover/mutation strategies.
2.  Enable AOS with an **Epsilon-Greedy** policy to balance exploration and exploitation.
3.  Run the optimization and visualize the dynamic operator probabilities over time.

In [None]:
import shutil
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from vamos.optimize import optimize, OptimizeConfig, NSGAIIConfig

# Create a directory for outputs
output_dir = Path("results/aos_demo")
if output_dir.exists():
    shutil.rmtree(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)

## Configuration

We will define a pool of two distinct operator combinations:
1.  **SBX + Polynomial Mutation** (Standard NSGA-II)
2.  **BLX-Alpha + Gaussian Mutation** (Alternative strategy)

The AOS method is set to `epsilon_greedy` with `epsilon=0.2`, meaning 20% of the time it chooses a random operator, and 80% of the time it chooses the best-performing one based on recent rewards.

In [None]:
problem_name = "zdt4"  # ZDT4 is multimodal and sensitive to operator choice
pop_size = 200
max_evals = 20000

aos_config = {
    "enabled": True,
    "method": "epsilon_greedy",
    "reward_scope": "survival",  # Reward based on survival of offspring
    "epsilon": 0.2,
    "min_usage": 1,
    "window_size": 50,  # Moving window for reward calculation
    "operator_pool": [
        # Operator 0: Standard SBX + PM
        {
            "crossover": ["sbx", {"prob": 0.9, "eta": 20}],
            "mutation": ["pm", {"prob": "1/n", "eta": 20, "var_type": "real"}]
        },
        # Operator 1: BLX-Alpha + Gaussian
        {
            "crossover": ["blx_alpha", {"prob": 0.9, "alpha": 0.5}],
            "mutation": ["gaussian", {"prob": "1/n", "sigma": 0.1, "var_type": "real"}]
        }
    ]
}

config = OptimizeConfig(
    problem=problem_name,
    algorithm=NSGAIIConfig()
        .pop_size(pop_size)
        .adaptive_operator_selection(aos_config)
        .result_mode("non_dominated"),
    max_evaluations=max_evals,
    seed=42,
    output_root=str(output_dir)  # Important: AOS trace is currently file-based
)

## Execution

We run the optimization. Note that `output_root` is specified, so `vamos` will write `aos_trace.csv` to the results directory.

In [None]:
print(f"Running optimization on {problem_name}...")
result = optimize(config)
print("Optimization complete.")
print(f"Solutions found: {len(result.F)}")

## Visualizing Operator Dynamics

We load the `aos_trace.csv` file to analyze which operators were chosen throughout the optimization process.

In [None]:
# Locate correct seed directory
run_dir = list(output_dir.glob(f"*/nsgaii/numpy/seed_{config.seed}"))[0]
trace_path = run_dir / "aos_trace.csv"

if trace_path.exists():
    df_trace = pd.read_csv(trace_path)
    print("Trace data loaded.")
    print(df_trace.head())
else:
    print("Warning: aos_trace.csv not found.")

In [None]:
if trace_path.exists():
    # Calculate cumulative usage over time
    df_trace['op_label'] = df_trace['op_name']
    
    plt.figure(figsize=(12, 6))
    sns.histplot(data=df_trace, x='step', hue='op_label', multiple="fill", bins=50, palette="viridis")
    plt.title(f"Operator Selection Probability over Time ({problem_name})")
    plt.xlabel("Generation / Step")
    plt.ylabel("Selection Frequency")
    plt.show()
    
    # Also show rewards
    plt.figure(figsize=(12, 6))
    sns.lineplot(data=df_trace, x='step', y='reward', hue='op_label', alpha=0.3)
    plt.title("Operator Rewards over Time")
    plt.xlabel("Step")
    plt.ylabel("Reward (Survival contribution)")
    plt.show()

## Summary

The plots above demonstrate how the algorithm adaptively switches between operators. If one operator consistently yields higher rewards (e.g., generates offspring that survive), its selection probability increases. This mechanism allows the algorithm to "learn" the best search strategy for the specific problem landscape.