# 🎯 Adaptive Strategy Game

*Explore multi-dimensional strategy alignment through cooperative gameplay*

This notebook implements:
- Multi-dimensional strategy states instead of scalar trust values
- Adaptive cooperation mechanisms based on state alignment
- Interpretable visualization of strategy evolution

Based on Multi-Objective Reinforcement Learning (MORL) and dynamic preference adjustment research.

In [None]:
# Install required packages
!pip install plotly pandas numpy ipywidgets -q

In [None]:
#@title 🎯 Adaptive Strategy Game { display-mode: "form" }
#@markdown Work with AI to reach target states through coordinated strategies!

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
import pandas as pd

# Strategy-inspired colors
STRATEGY_COLORS = {
    'human': '#FF6B6B',         # Warm red
    'ai': '#4ECDC4',           # Cool teal
    'aligned': '#95E1D3',      # Mint green - high alignment
    'divergent': '#F38181',    # Coral - low alignment
    'neutral': '#3D5A80',      # Deep blue
    'target': '#FFE66D'        # Golden - target state
}

class StrategyState:
    """Represents a multi-dimensional strategy state"""
    def __init__(self, exploration=0.5, exploitation=0.5, adaptation=0.5):
        """
        Initialize strategy state with three dimensions:
        - exploration: tendency to try new approaches
        - exploitation: tendency to use known good strategies  
        - adaptation: rate of strategy adjustment
        """
        self.exploration = exploration
        self.exploitation = exploitation
        self.adaptation = adaptation
        
    def merge_with(self, other, weight=0.5):
        """Merge two strategy states with configurable weighting"""
        return StrategyState(
            exploration=weight * self.exploration + (1-weight) * other.exploration,
            exploitation=weight * self.exploitation + (1-weight) * other.exploitation,
            adaptation=weight * self.adaptation + (1-weight) * other.adaptation
        )
    
    def alignment_with(self, other):
        """Calculate alignment between strategy states (0-1)"""
        diff = np.sqrt(
            (self.exploration - other.exploration)**2 +
            (self.exploitation - other.exploitation)**2 +
            (self.adaptation - other.adaptation)**2
        )
        return 1 - (diff / np.sqrt(3))  # Normalize to [0,1]
    
    def as_vector(self):
        """Convert to numpy vector for calculations"""
        return np.array([self.exploration, self.exploitation, self.adaptation])
    
    def distance_to(self, other):
        """Euclidean distance to another state"""
        return np.linalg.norm(self.as_vector() - other.as_vector())

class AdaptiveStrategyGame:
    """Cooperative strategy alignment game"""
    def __init__(self):
        # Initialize level first, before calling generate_target()
        self.level = 1
        self.score = 0
        self.history = []
        
        # Initialize with complementary starting states
        self.human_state = StrategyState(0.7, 0.3, 0.5)  # Explorer
        self.ai_state = StrategyState(0.3, 0.7, 0.5)     # Exploiter
        self.target_state = self.generate_target()
        
        # Nowak's cooperation mechanisms tracking
        self.cooperation_metrics = {
            'direct_reciprocity': 0.5,
            'indirect_reciprocity': 0.5,
            'spatial_structure': 0.5
        }
        
    def generate_target(self):
        """Create target strategy state for current level"""
        # Targets become more specific (lower variance) as levels increase
        variance = 0.3 / np.sqrt(self.level)
        
        # Ensure targets are achievable (not at extremes)
        return StrategyState(
            exploration=np.clip(np.random.normal(0.5, variance), 0.2, 0.8),
            exploitation=np.clip(np.random.normal(0.5, variance), 0.2, 0.8),
            adaptation=np.clip(np.random.normal(0.5, variance), 0.2, 0.8)
        )
    
    def apply_action(self, human_action, ai_action):
        """Apply strategy-altering actions with realistic dynamics"""
        # Human actions affect different dimensions
        if human_action == 'explore':
            self.human_state.exploration = min(1, self.human_state.exploration + 0.1)
            self.human_state.exploitation = max(0, self.human_state.exploitation - 0.05)
        elif human_action == 'exploit':
            self.human_state.exploitation = min(1, self.human_state.exploitation + 0.1)
            self.human_state.exploration = max(0, self.human_state.exploration - 0.05)
        elif human_action == 'adapt':
            self.human_state.adaptation = min(1, self.human_state.adaptation + 0.1)
        elif human_action == 'balance':
            # Move all dimensions toward center
            self.human_state.exploration = 0.9 * self.human_state.exploration + 0.1 * 0.5
            self.human_state.exploitation = 0.9 * self.human_state.exploitation + 0.1 * 0.5
            self.human_state.adaptation = 0.9 * self.human_state.adaptation + 0.1 * 0.5
            
        # AI actions with strategic focus
        if ai_action == 'analyze':
            self.ai_state.exploitation = min(1, self.ai_state.exploitation + 0.1)
            self.ai_state.exploration = max(0, self.ai_state.exploration - 0.05)
        elif ai_action == 'innovate':
            self.ai_state.exploration = min(1, self.ai_state.exploration + 0.1)
            self.ai_state.exploitation = max(0, self.ai_state.exploitation - 0.05)
        elif ai_action == 'optimize':
            self.ai_state.adaptation = min(1, self.ai_state.adaptation + 0.1)
        elif ai_action == 'cooperate':
            # Move toward human state (cooperation)
            weight = 0.1
            self.ai_state = self.ai_state.merge_with(self.human_state, 1 - weight)
            
        # Calculate merged state with alignment weighting
        alignment = self.human_state.alignment_with(self.ai_state)
        merged = self.human_state.merge_with(self.ai_state, 0.5 + 0.2 * alignment)
        
        # Update cooperation metrics
        self.update_cooperation_metrics(human_action, ai_action, alignment)
        
        # Score based on proximity to target
        target_distance = merged.distance_to(self.target_state)
        max_distance = np.sqrt(3)  # Maximum possible distance
        target_alignment = 1 - (target_distance / max_distance)
        
        # Points scale with level and alignment
        points = int(target_alignment * 100 * self.level * (1 + alignment))
        self.score += points
        
        # Record history
        self.history.append({
            'turn': len(self.history) + 1,
            'human_action': human_action,
            'ai_action': ai_action,
            'human_state': dict(vars(self.human_state)),
            'ai_state': dict(vars(self.ai_state)),
            'merged_state': dict(vars(merged)),
            'alignment': alignment,
            'target_match': target_alignment,
            'points': points,
            'cooperation_metrics': dict(self.cooperation_metrics)
        })
        
        # Level up condition (with hysteresis to prevent bouncing)
        if target_alignment > 0.85 and alignment > 0.7:
            self.level += 1
            self.target_state = self.generate_target()
            return 'LEVEL_UP'
            
        return 'CONTINUE'
    
    def update_cooperation_metrics(self, human_action, ai_action, alignment):
        """Track cooperation mechanisms from Nowak's framework"""
        # Direct reciprocity: immediate cooperation
        if (human_action in ['adapt', 'balance'] and 
            ai_action in ['cooperate', 'optimize']):
            self.cooperation_metrics['direct_reciprocity'] = min(1,
                self.cooperation_metrics['direct_reciprocity'] + 0.05)
        
        # Indirect reciprocity: reputation effects
        if alignment > 0.6:
            self.cooperation_metrics['indirect_reciprocity'] = min(1,
                self.cooperation_metrics['indirect_reciprocity'] + 0.02)
        
        # Spatial structure: local stability
        if len(self.history) > 3:
            recent_alignments = [h['alignment'] for h in self.history[-3:]]
            if np.std(recent_alignments) < 0.1:  # Stable cooperation
                self.cooperation_metrics['spatial_structure'] = min(1,
                    self.cooperation_metrics['spatial_structure'] + 0.03)

class StrategyVisualizer:
    """Visualizations for strategy states and dynamics"""
    
    @staticmethod
    def create_strategy_radar(states_dict, title="Strategy States"):
        """Create radar chart of strategy dimensions"""
        categories = ['Exploration', 'Exploitation', 'Adaptation']
        
        fig = go.Figure()
        
        color_map = {
            'human': STRATEGY_COLORS['human'],
            'ai': STRATEGY_COLORS['ai'],
            'merged': STRATEGY_COLORS['aligned'],
            'target': STRATEGY_COLORS['target']
        }
        
        for name, state in states_dict.items():
            values = [state['exploration'], state['exploitation'], state['adaptation']]
            values.append(values[0])  # Complete the circle
            
            fig.add_trace(go.Scatterpolar(
                r=values,
                theta=categories + [categories[0]],
                fill='toself',
                fillcolor=color_map.get(name, STRATEGY_COLORS['neutral']),
                opacity=0.3,
                line=dict(color=color_map.get(name, STRATEGY_COLORS['neutral']), width=3),
                name=name.title()
            ))
        
        fig.update_layout(
            polar=dict(
                radialaxis=dict(
                    visible=True,
                    range=[0, 1],
                    tickfont=dict(size=12)
                ),
                angularaxis=dict(
                    tickfont=dict(size=14)
                )
            ),
            showlegend=True,
            title=title,
            height=400
        )
        
        return fig
    
    @staticmethod
    def create_alignment_flow(history):
        """Visualize alignment over time"""
        turns = [h['turn'] for h in history]
        alignments = [h['alignment'] for h in history]
        target_matches = [h['target_match'] for h in history]
        
        fig = go.Figure()
        
        # Alignment line
        fig.add_trace(go.Scatter(
            x=turns,
            y=alignments,
            mode='lines+markers',
            name='Human-AI Alignment',
            line=dict(color=STRATEGY_COLORS['aligned'], width=3),
            marker=dict(size=8)
        ))
        
        # Target match line
        fig.add_trace(go.Scatter(
            x=turns,
            y=target_matches,
            mode='lines+markers',
            name='Target Match',
            line=dict(color=STRATEGY_COLORS['target'], width=3, dash='dash'),
            marker=dict(size=8, symbol='star')
        ))
        
        # Color background by alignment level
        for i, (turn, align) in enumerate(zip(turns, alignments)):
            if align > 0.8:
                color = STRATEGY_COLORS['aligned']
            elif align > 0.5:
                color = STRATEGY_COLORS['neutral']
            else:
                color = STRATEGY_COLORS['divergent']
                
            fig.add_vrect(
                x0=turn-0.5, x1=turn+0.5,
                fillcolor=color,
                opacity=0.2,
                line_width=0
            )
        
        fig.update_layout(
            title="Strategy Alignment Evolution",
            xaxis_title="Turn",
            yaxis_title="Alignment Level",
            yaxis_range=[0, 1],
            height=400,
            plot_bgcolor='white'
        )
        
        return fig
    
    @staticmethod
    def create_cooperation_metrics(metrics):
        """Visualize Nowak's cooperation mechanisms"""
        fig = go.Figure()
        
        mechanisms = list(metrics.keys())
        values = list(metrics.values())
        
        fig.add_trace(go.Bar(
            x=mechanisms,
            y=values,
            marker_color=[STRATEGY_COLORS['aligned'], 
                         STRATEGY_COLORS['neutral'],
                         STRATEGY_COLORS['target']],
            text=[f'{v:.2f}' for v in values],
            textposition='auto'
        ))
        
        fig.update_layout(
            title="Cooperation Mechanisms (Nowak's Framework)",
            xaxis_title="Mechanism",
            yaxis_title="Strength",
            yaxis_range=[0, 1],
            height=300,
            plot_bgcolor='white'
        )
        
        return fig

# Initialize game
game = AdaptiveStrategyGame()
viz = StrategyVisualizer()

# UI Components
output = widgets.Output()
status = widgets.HTML(value=f"<h2>Level {game.level} | Score: {game.score}</h2>")

# Human action buttons with clear descriptions
human_actions = widgets.ToggleButtons(
    options=['explore', 'exploit', 'adapt', 'balance'],
    description='Your Action:',
    button_style='info',
    tooltips=[
        'Increase exploration tendency',
        'Focus on exploitation', 
        'Enhance adaptation rate',
        'Balance all dimensions'
    ]
)

# AI strategy selector
ai_strategy = widgets.RadioButtons(
    options=['adaptive', 'complementary', 'random'],
    value='adaptive',
    description='AI Strategy:',
)

play_button = widgets.Button(
    description='Execute Strategies',
    button_style='primary',
    icon='play'
)

def get_ai_action(strategy, human_action, game_state):
    """AI chooses action based on strategy"""
    if strategy == 'adaptive':
        # AI adapts to complement human choice
        if human_action == 'explore':
            return 'analyze'  # Balance exploration with analysis
        elif human_action == 'exploit':
            return 'innovate' # Counter exploitation with innovation
        elif human_action == 'adapt':
            return 'optimize' # Support adaptation with optimization
        else:  # balance
            return 'cooperate'
    elif strategy == 'complementary':
        # AI fills gaps in merged state
        merged = game_state.human_state.merge_with(game_state.ai_state)
        if merged.exploration < 0.4:
            return 'innovate'
        elif merged.exploitation < 0.4:
            return 'analyze'
        elif merged.adaptation < 0.4:
            return 'optimize'
        else:
            return 'cooperate'
    else:  # random
        return np.random.choice(['analyze', 'innovate', 'optimize', 'cooperate'])

def on_play_click(b):
    human_action = human_actions.value
    ai_action = get_ai_action(ai_strategy.value, human_action, game)
    
    result = game.apply_action(human_action, ai_action)
    
    with output:
        clear_output(wait=True)
        
        # Get latest state
        latest = game.history[-1]
        
        # Create visualizations
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=('Strategy States', 'Alignment Evolution', 
                          'Cooperation Mechanisms', 'Action Effects'),
            specs=[[{"type": "polar"}, {"type": "xy"}],
                   [{"type": "bar"}, {"type": "bar"}]],
            row_heights=[0.6, 0.4]
        )
        
        # Strategy radar
        states = {
            'human': latest['human_state'],
            'ai': latest['ai_state'],
            'merged': latest['merged_state'],
            'target': dict(vars(game.target_state))
        }
        
        radar_fig = viz.create_strategy_radar(states)
        for trace in radar_fig.data:
            fig.add_trace(trace, row=1, col=1)
            
        # Alignment flow
        flow_fig = viz.create_alignment_flow(game.history)
        for trace in flow_fig.data:
            fig.add_trace(trace, row=1, col=2)
            
        # Cooperation metrics
        coop_fig = viz.create_cooperation_metrics(latest['cooperation_metrics'])
        for trace in coop_fig.data:
            fig.add_trace(trace, row=2, col=1)
            
        # Action effects
        dimensions = ['Exploration', 'Exploitation', 'Adaptation']
        human_values = [latest['human_state']['exploration'],
                       latest['human_state']['exploitation'],
                       latest['human_state']['adaptation']]
        ai_values = [latest['ai_state']['exploration'],
                    latest['ai_state']['exploitation'],
                    latest['ai_state']['adaptation']]
        
        fig.add_trace(
            go.Bar(
                x=dimensions,
                y=human_values,
                name='Human',
                marker_color=STRATEGY_COLORS['human']
            ),
            row=2, col=2
        )
        
        fig.add_trace(
            go.Bar(
                x=dimensions,
                y=ai_values,
                name='AI',
                marker_color=STRATEGY_COLORS['ai']
            ),
            row=2, col=2
        )
        
        fig.update_layout(height=800, showlegend=False, title_text="Adaptive Strategy Dashboard")
        fig.show()
        
        # Status update
        status.value = f"""
        <h2>Level {game.level} | Score: {game.score}</h2>
        <p>Last Action - You: {human_action} | AI: {ai_action}</p>
        <p>Alignment: {latest['alignment']:.2f} | Target Match: {latest['target_match']:.2f}</p>
        <p>Points Earned: {latest['points']}</p>
        """
        
        if result == 'LEVEL_UP':
            print("🎉 LEVEL UP! New target strategy state generated!")
            print(f"New Level: {game.level}")
        
        # Strategic insights
        if latest['alignment'] > 0.8:
            print("💫 Excellent alignment! Your strategies complement each other perfectly.")
        elif latest['alignment'] > 0.5:
            print("🌊 Good coordination. Keep exploring complementary actions.")
        else:
            print("🌀 Low alignment. Try strategies that complement the AI's approach.")
            
        # Cooperation analysis
        print("\n📊 Cooperation Analysis:")
        for mechanism, value in latest['cooperation_metrics'].items():
            print(f"  {mechanism.replace('_', ' ').title()}: {value:.2f}")

play_button.on_click(on_play_click)

# Display UI
display(HTML("""
<h1>🎯 Adaptive Strategy Game</h1>
<p>Work with AI to reach target strategy states through coordinated actions!</p>
"""))
display(status)
display(widgets.HBox([human_actions, ai_strategy]))
display(play_button)
display(output)

#@markdown ---
#@markdown ### 🎮 How to Play:
#@markdown 1. **Choose your strategy action** (explore, exploit, adapt, balance)
#@markdown 2. **Select AI strategy** (adaptive, complementary, random)
#@markdown 3. **Execute strategies** and try to match the target state
#@markdown 4. **Level up** by achieving high alignment with targets
#@markdown 
#@markdown ### 🧪 Experiments:
#@markdown - Which AI strategy creates the most stable cooperation?
#@markdown - Can you maximize all three cooperation mechanisms?
#@markdown - What's the fastest path to reach level 10?

## 🧠 Understanding Multi-Dimensional Strategies

This game models strategy alignment through three key dimensions:

### Strategy Dimensions
- **Exploration**: Tendency to try new approaches and discover novel solutions
- **Exploitation**: Focus on using known effective strategies
- **Adaptation**: Rate at which strategies adjust to new information

### Cooperation Mechanisms (Nowak's Framework)
1. **Direct Reciprocity**: Immediate mutual benefit from coordinated actions
2. **Indirect Reciprocity**: Building reputation through consistent alignment
3. **Spatial Structure**: Creating stable patterns of cooperation

### Key Insights
- Perfect alignment (100%) is often less effective than good alignment (70-80%)
- Complementary strategies can outperform identical strategies
- Small adaptations accumulate into significant strategic shifts

The visualization helps identify which dimensions need adjustment to reach target states efficiently.