# Game Theory Deep Learning Experiments

This notebook demonstrates training neural networks to play different game theory setups and analyze equilibriums.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch

%load_ext autoreload
%autoreload 2


from game_theory import *
from models import *
from experiments import *

## 1. Basic Game Setups

In [2]:
# Create different game instances
pd_game = prisoners_dilemma()
coord_game = coordination_game()
mp_game = matching_pennies()

print("Prisoner's Dilemma payoffs:")
print("Player 1:", pd_game.payoff_p1)
print("Player 2:", pd_game.payoff_p2)

Prisoner's Dilemma payoffs:
Player 1: [[3 0]
 [5 1]]
Player 2: [[3 5]
 [0 1]]


## 2. Train MLP Players

In [3]:
# Train on Prisoner's Dilemma
print("Training on Prisoner's Dilemma...")
p1, p2 = train_players(pd_game, n_episodes=1000)

# Evaluate final strategies
results = evaluate_equilibrium(pd_game, p1, p2)
print(f"\nFinal Results:")
print(f"Player 1 strategy: {results['strategy_p1']}")
print(f"Player 2 strategy: {results['strategy_p2']}")
print(f"Is Nash Equilibrium: {results['is_nash']}")

Training on Prisoner's Dilemma...
Episode 0: P1=2.215, P2=2.332
Episode 100: P1=3.000, P2=3.000
Episode 200: P1=3.000, P2=3.000
Episode 300: P1=3.000, P2=3.000
Episode 400: P1=3.000, P2=3.000
Episode 500: P1=3.000, P2=3.000
Episode 600: P1=3.000, P2=3.000
Episode 700: P1=3.000, P2=3.000
Episode 800: P1=3.000, P2=3.000
Episode 900: P1=3.000, P2=3.000

Final Results:
Player 1 strategy: [1.0000000e+00 8.9574576e-10]
Player 2 strategy: [1.0000000e+00 1.3644568e-08]
Is Nash Equilibrium: False


## 3. Матричные игры с RL агентами

Создаем произвольные матричные игры и обучаем агентов методами RL.

In [4]:
from models import RLAgent, train_rl_agents
from game_theory import MatrixGame

# Создаем матричную игру 3x3
payoff_p1 = np.array([
    [5, 4, 3],
    [4, 3, 2],
    [3, 2, 10]
])

payoff_p2 = np.array([
    [4, 3, 2],
    [3, 2, 3],
    [2, 3, 10]
])

matrix_game = MatrixGame(payoff_p1, payoff_p2)

print("Матрица выплат игрока 1:")
print(payoff_p1)
print("\nМатрица выплат игрока 2:")
print(payoff_p2)

Матрица выплат игрока 1:
[[ 5  4  3]
 [ 4  3  2]
 [ 3  2 10]]

Матрица выплат игрока 2:
[[ 4  3  2]
 [ 3  2  3]
 [ 2  3 10]]


In [5]:
# Обучаем RL агентов с разными стратегиями exploration
print("Обучение RL агентов...")
# Агент 1 без exploration, агент 2 с exploration
agent1, agent2, history = train_rl_agents(matrix_game, n_episodes=3000, epsilon1=0.0, epsilon2=0.2)

# Получаем финальные стратегии
final_probs_p1 = agent1.get_action_probs()
final_probs_p2 = agent2.get_action_probs()

print(f"\nФинальные стратегии:")
print(f"Игрок 1: {final_probs_p1}")
print(f"Игрок 2: {final_probs_p2}")

# Ожидаемые выплаты
exp_payoff1, exp_payoff2 = matrix_game.get_expected_payoffs(final_probs_p1, final_probs_p2)
print(f"\nОжидаемые выплаты: P1={exp_payoff1:.3f}, P2={exp_payoff2:.3f}")

Обучение RL агентов...
Episode 0: Avg rewards P1=3.000, P2=2.000
Episode 1000: Avg rewards P1=9.090, P2=9.110
Episode 2000: Avg rewards P1=9.000, P2=9.050

Финальные стратегии:
Игрок 1: [0.00254804 0.00153028 0.9959216 ]
Игрок 2: [0.0188419  0.0352649  0.94589317]

Ожидаемые выплаты: P1=9.558, P2=9.573


## 4. Эксперимент с разными стратегиями exploration

In [6]:
# Сравниваем разные комбинации exploration
strategies = [
    (0.0, 0.0),  # Оба без exploration
    (0.0, 0.2),  # Только второй с exploration
    (0.2, 0.0),  # Только первый с exploration
    (0.1, 0.1)   # Оба с exploration
]

results = {}

for i, (eps1, eps2) in enumerate(strategies):
    print(f"\nСтратегия {i+1}: eps1={eps1}, eps2={eps2}")
    
    agent1, agent2, _ = train_rl_agents(matrix_game, n_episodes=2000, 
                                        epsilon1=eps1, epsilon2=eps2)
    
    probs1 = agent1.get_action_probs()
    probs2 = agent2.get_action_probs()
    
    exp_p1, exp_p2 = matrix_game.get_expected_payoffs(probs1, probs2)
    
    results[(eps1, eps2)] = {
        'probs1': probs1,
        'probs2': probs2,
        'payoffs': (exp_p1, exp_p2),
        'total': exp_p1 + exp_p2
    }
    
    print(f"Выплаты: P1={exp_p1:.3f}, P2={exp_p2:.3f}, Сумма={exp_p1+exp_p2:.3f}")

# Находим лучшую стратегию
best_strategy = max(results.items(), key=lambda x: x[1]['total'])
print(f"\nЛучшая стратегия: eps1={best_strategy[0][0]}, eps2={best_strategy[0][1]}")
print(f"Суммарная выплата: {best_strategy[1]['total']:.3f}")


Стратегия 1: eps1=0.0, eps2=0.0
Episode 0: Avg rewards P1=4.000, P2=3.000
Episode 1000: Avg rewards P1=9.720, P2=9.700
Выплаты: P1=9.937, P2=9.937, Сумма=19.874

Стратегия 2: eps1=0.0, eps2=0.2
Episode 0: Avg rewards P1=2.000, P2=3.000
Episode 1000: Avg rewards P1=8.790, P2=8.810
Выплаты: P1=9.609, P2=9.619, Сумма=19.229

Стратегия 3: eps1=0.2, eps2=0.0
Episode 0: Avg rewards P1=5.000, P2=4.000
Episode 1000: Avg rewards P1=8.270, P2=8.130
Выплаты: P1=9.654, P2=9.642, Сумма=19.296

Стратегия 4: eps1=0.1, eps2=0.1
Episode 0: Avg rewards P1=3.000, P2=2.000
Episode 1000: Avg rewards P1=8.720, P2=8.750
Выплаты: P1=9.630, P2=9.633, Сумма=19.263

Лучшая стратегия: eps1=0.0, eps2=0.0
Суммарная выплата: 19.874
