# Understanding Nash equilibrium - GT
GT stands for game theory

This Jupyter Notebook provides a simple simulation of different strategies used to achieve nash equilibria and the corresponding Python code is the steps taken to achieve a personal conceptual understanding with the aim to employ these for reinforcement learning agent training.

Welcome to the journey!

---

## 1. Prisoner's Dilemma

A one shot game. can either be a Pure Strategy: A player chooses a specific action with certaintyach player has made a definite choice without randomization. or a mixed strategy: A player assigns a probability to each possible action and then randomly selects an action based on these probabilities
  >> In a one-shot game, players make their choices simultaneously and only once. There is no opportunity to adjust strategies based on the other player's past actions, as the game ends after this single round.
Possible strategies in iterated versions:
  -->Tit fot tat, grim trigger, always defect, alwas cooperate.


In [None]:
# Basic setup

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

def play_game(player1_strategy, player2_strategy):
    return PAYOFFS[(player1_strategy, player2_strategy)]

# Example usage
player1_choice = 'Cooperate'
player2_choice = 'Defect'

payoff = play_game(player1_choice, player2_choice)
print(f"Player 1 chooses {player1_choice}, Player 2 chooses {player2_choice}")
print(f"Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")


In [None]:
# introduce rounds with some randomness --> doesn't garantee a game like feeling

import numpy as np
from numpy.random import choice

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

def play_game(player1_strategy, player2_strategy, rounds):
    if (player1_strategy, player2_strategy) not in PAYOFFS:
        raise ValueError("Invalid strategy. Choose from 'Cooperate' or 'Defect'.")

    # Initialize cumulative payoffs
    total_payoff_player1 = 0
    total_payoff_player2 = 0

    # Loop through the number of rounds
    for current_round in range(1, rounds + 1):
        payoff = PAYOFFS[(player1_strategy, player2_strategy)]

        # Print round details
        print(f"Round {current_round}: Player 1 chooses {player1_strategy}, Player 2 chooses {player2_strategy}")
        print(f"Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")

        total_payoff_player1 += payoff[0]
        total_payoff_player2 += payoff[1]

    return total_payoff_player1, total_payoff_player2

# Possible choices (tuples)
possible_choices = list(PAYOFFS.keys())

# Convert choices to a list of individual strategies
strategies = list(set([choice for sublist in possible_choices for choice in sublist]))

# Randomly select strategies
player1_choice = choice(strategies)
player2_choice = choice(strategies)

# Rounds to be played
rounds = 3

# Play the game
payoff = play_game(player1_choice, player2_choice, rounds)
print(f"\nTotal Payoffs after {rounds} rounds: Player 1: {payoff[0]}, Player 2: {payoff[1]}")


Round 1: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 2: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 3: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 4: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 5: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 6: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 7: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 8: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 9: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 10: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0

Total Payoffs after 10 rounds: Player 1: 50, Pla

### > Explore strategies

In [None]:
import numpy as np
from numpy.random import choice

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

def explore_payoffs():
    print("Exploring all possible payoffs:")
    for strategies, payoff in PAYOFFS.items():
        print(f"Strategies: {strategies} -> Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")
    print()

def play_game(player1_strategy, player2_strategy, rounds):
    if (player1_strategy, player2_strategy) not in PAYOFFS:
        raise ValueError("Invalid strategy. Choose from 'Cooperate' or 'Defect'.")

    total_payoff_player1 = 0
    total_payoff_player2 = 0

    for current_round in range(1, rounds + 1):
        payoff = PAYOFFS[(player1_strategy, player2_strategy)]

        print(f"Round {current_round}: Player 1 chooses {player1_strategy}, Player 2 chooses {player2_strategy}")
        print(f"Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")

        total_payoff_player1 += payoff[0]
        total_payoff_player2 += payoff[1]

    return total_payoff_player1, total_payoff_player2

# Explore all possible payoffs
explore_payoffs()

# Possible choices (tuples)
possible_choices = list(PAYOFFS.keys())

# Convert choices to a list of individual strategies
strategies = list(set([choice for sublist in possible_choices for choice in sublist]))

# Randomly select strategies for simulation
player1_choice = choice(strategies)
player2_choice = choice(strategies)

# Rounds to be played
rounds = 3

# Play the game
payoff = play_game(player1_choice, player2_choice, rounds)
print(f"\nTotal Payoffs after {rounds} rounds: Player 1: {payoff[0]}, Player 2: {payoff[1]}")


Exploring all possible payoffs:
Strategies: ('Cooperate', 'Cooperate') -> Payoffs: Player 1: 3, Player 2: 3
Strategies: ('Cooperate', 'Defect') -> Payoffs: Player 1: 0, Player 2: 5
Strategies: ('Defect', 'Cooperate') -> Payoffs: Player 1: 5, Player 2: 0
Strategies: ('Defect', 'Defect') -> Payoffs: Player 1: 1, Player 2: 1

Round 1: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 2: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0
Round 3: Player 1 chooses Defect, Player 2 chooses Cooperate
Payoffs: Player 1: 5, Player 2: 0

Total Payoffs after 3 rounds: Player 1: 15, Player 2: 0


#### I suppose now we can deduce that we require a more dynamice strategy and choice implementation. Here on we will review simple reinforcement learning (RL) and or Evolutionary Game Theory (EGT) methods.

# EGT vs RL Application

Evolutionary Game Theory and Reinforcement Learning are both approaches used to study and optimize decision-making in environments with strategic interactions, but they are grounded in different theoretical frameworks and are used for different purposes.

----

## 1. **EGT based dilemma**

Implementation of EGT where strategies evolve based on their success..

--

 ### Summary

#### Initialization
- **Population:** Begins with a randomly chosen set of strategies, either 'Cooperate' or 'Defect'.

#### Fitness Evaluation
- **play_game:** Simulates interactions between strategies and calculates the resulting payoffs.
- **evolve_population:** Assesses the fitness of each strategy by evaluating its performance against all other strategies in the population.

#### Strategy Selection
- **Best Strategy:** The strategy with the highest fitness score is selected as the most successful.
- **Mutation:** Introduces genetic diversity by randomly altering some strategies.

#### Evolution Simulation
- **Generations:** The population evolves over multiple generations. Strategies are updated based on their fitness and the mutation process.

#### Results
- **Population Evolution:** Displays the population of strategies for each generation, illustrating how strategies develop over time.

---


In [None]:
import numpy as np
from numpy.random import choice

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

def explore_payoffs():
    print("Exploring all possible payoffs:")
    for strategies, payoff in PAYOFFS.items():
        print(f"Strategies: {strategies} -> Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")
    print()

def play_game(player1_strategy, player2_strategy, rounds):
    if (player1_strategy, player2_strategy) not in PAYOFFS:
        raise ValueError("Invalid strategy. Choose from 'Cooperate' or 'Defect'.")

    total_payoff_player1 = 0
    total_payoff_player2 = 0

    for current_round in range(1, rounds + 1):
        payoff = PAYOFFS[(player1_strategy, player2_strategy)]

        total_payoff_player1 += payoff[0]
        total_payoff_player2 += payoff[1]

    return total_payoff_player1, total_payoff_player2

def evolve_population(population, rounds, mutation_rate=0.1):
    strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
    new_population = []

    for individual in population:
        # Evaluate fitness
        fitness = {strategy: 0 for strategy in strategies}
        for strategy in strategies:
            total_payoff = 0
            for opponent_strategy in strategies:
                payoff = play_game(strategy, opponent_strategy, rounds)
                total_payoff += payoff[0]
            fitness[strategy] = total_payoff

        # Select the best strategy based on fitness
        best_strategy = max(fitness, key=fitness.get)

        # Mutation: with some probability, change the strategy
        if np.random.rand() < mutation_rate:
            best_strategy = np.random.choice(strategies)

        new_population.append(best_strategy)

    return new_population

# Initialize population with random strategies
initial_population_size = 10
strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
population = [np.random.choice(strategies) for _ in range(initial_population_size)]

# Simulate evolution
rounds = 3
generations = 5

print("Initial population:", population)
for generation in range(generations):
    population = evolve_population(population, rounds)
    print(f"Generation {generation + 1}: {population}")

# Explore all possible payoffs
explore_payoffs()


Initial population: ['Defect', 'Defect', 'Cooperate', 'Defect', 'Cooperate', 'Cooperate', 'Cooperate', 'Cooperate', 'Cooperate', 'Defect']
Generation 1: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Generation 2: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect']
Generation 3: ['Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Generation 4: ['Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Generation 5: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Exploring all possible payoffs:
Strategies: ('Cooperate', 'Cooperate') -> Payoffs: Player 1: 3, Player 2: 3
Strategies: ('Cooperate', 'Defect') -> Payoffs: Player 1: 0, Player 2: 5
Strategies: ('Defect', 'Cooperate') -> Payoffs: Player 1: 5, Player 2: 0
Strategies: ('Defect', 

## 1. **RL based dilemma**

We enhance the agents' ability to learn and adapt their strategies based on interactions with the environment.

By integrating RL into the EGT model:
- Agents can continuously learn and optimize their strategies based on feedback

- Evolve through generational changes.

- Enhance the model's ability to simulate adaptive and competitive behavior in complex environments.
---

## Explanation of Q-Learning Integration into EGT Model

### Q-Learning Integration

- **Q-Values:**
  - Track the estimated value of each strategy pair. Initially, all Q-values are set to zero.
  - Q-values are updated based on the rewards received from interactions, reflecting the effectiveness of each strategy.

- **q_learning_update Function:**
  - This function updates the Q-value for a given strategy pair based on the received reward.
  - The update rule typically involves adjusting the Q-value towards the observed reward, taking into account the learning rate and discount factor.

### Policy for Strategy Selection

- **Epsilon-Greedy:**
  - **Exploration:** With probability epsilon (ε), strategies are chosen randomly to explore new possibilities.
  - **Exploitation:** With probability 1-ε, strategies are selected based on the highest Q-value to exploit known successful strategies.
  - This balance helps agents explore new strategies while optimizing based on learned experiences.

### Evolution Process

- **Fitness Calculation:**
  - Assesses how well strategies perform in the environment, typically using accumulated payoffs.
  - Fitness values guide the evolutionary process, determining which strategies are more successful.

- **Strategy Update:**
  - Strategies are adjusted using Q-learning based on their performance in interactions.
  - The Q-learning algorithm updates strategy values, influencing future strategy choices.

- **Mutation:**
  - Introduces random changes to strategies to simulate genetic diversity.
  - Mutation helps maintain diversity in the population and can lead to discovering new effective strategies.

### Simulation

- **Generations:**
  - The population evolves over multiple generations.
  - Strategies are continuously updated based on both the learning outcomes (from Q-learning) and evolutionary principles.
  - The simulation tracks how strategies change over time, incorporating both learned and evolved adaptations.


---

In [None]:
import numpy as np
from numpy.random import choice

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

# Define Q-learning parameters
alpha = 0.01  # Learning rate
gamma = 0.95  # Discount factor
epsilon = 1  # Exploration rate

# Initialize Q-values for strategies
Q_values = {
    ('Cooperate', 'Cooperate'): 0,
    ('Cooperate', 'Defect'): 0,
    ('Defect', 'Cooperate'): 0,
    ('Defect', 'Defect'): 0
}

def explore_payoffs():
    print("Exploring all possible payoffs:")
    for strategies, payoff in PAYOFFS.items():
        print(f"Strategies: {strategies} -> Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")
    print()

def play_game(player1_strategy, player2_strategy, rounds):
    if (player1_strategy, player2_strategy) not in PAYOFFS:
        raise ValueError("Invalid strategy. Choose from 'Cooperate' or 'Defect'.")

    total_payoff_player1 = 0
    total_payoff_player2 = 0

    for current_round in range(1, rounds + 1):
        payoff = PAYOFFS[(player1_strategy, player2_strategy)]
        total_payoff_player1 += payoff[0]
        total_payoff_player2 += payoff[1]

    return total_payoff_player1, total_payoff_player2

def q_learning_update(strategy, opponent_strategy, reward):
    global Q_values
    # Update Q-value using the reward received
    state_action_pair = (strategy, opponent_strategy)
    best_next_action = max(Q_values, key=Q_values.get)
    Q_values[state_action_pair] = Q_values[state_action_pair] + alpha * (reward + gamma * Q_values[best_next_action] - Q_values[state_action_pair])

def evolve_population(population, rounds, mutation_rate=0.1):
    strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
    new_population = []

    for individual in population:
        # Choose strategy using epsilon-greedy policy
        if np.random.rand() < epsilon:
            strategy = np.random.choice(strategies)
        else:
            strategy = max(Q_values, key=Q_values.get)[0]  # Choose best strategy based on Q-values

        # Evaluate fitness
        fitness = {strategy: 0 for strategy in strategies}
        for opponent_strategy in strategies:
            payoff = play_game(strategy, opponent_strategy, rounds)
            fitness[strategy] += payoff[0]

        # Select the best strategy based on fitness
        best_strategy = max(fitness, key=fitness.get)

        # Mutation: with some probability, change the strategy
        if np.random.rand() < mutation_rate:
            best_strategy = np.random.choice(strategies)

        # Update Q-values based on the chosen strategy
        reward = fitness[best_strategy] / len(strategies)
        q_learning_update(strategy, best_strategy, reward)

        new_population.append(best_strategy)

    return new_population

# Initialize population with random strategies
initial_population_size = 10
strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
population = [np.random.choice(strategies) for _ in range(initial_population_size)]

# Simulate evolution
rounds = 10
generations = 15

print("Initial population:", population)
for generation in range(generations):
    population = evolve_population(population, rounds)
    print(f"Generation {generation + 1}: {population}")

# Explore all possible payoffs
explore_payoffs()


Initial population: ['Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Cooperate']
Generation 1: ['Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Cooperate', 'Defect', 'Defect', 'Cooperate']
Generation 2: ['Defect', 'Defect', 'Cooperate', 'Cooperate', 'Cooperate', 'Cooperate', 'Cooperate', 'Cooperate', 'Defect', 'Defect']
Generation 3: ['Defect', 'Defect', 'Cooperate', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Defect']
Generation 4: ['Cooperate', 'Defect', 'Cooperate', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect']
Generation 5: ['Defect', 'Defect', 'Defect', 'Cooperate', 'Cooperate', 'Defect', 'Cooperate', 'Defect', 'Cooperate', 'Defect']
Generation 6: ['Cooperate', 'Defect', 'Cooperate', 'Cooperate', 'Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Cooperate']
Generation 7: ['Defect', 'Cooperate', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Cooperate', 'Defect'

# We can now define the success criteria!!!
---

- **Individual Winner:** The strategy that accumulates the most payoff in its interactions.

- **Generation Winner:** The generation where the population contains the highest-performing strategy.

- **Winning Criteria:** Defined as the strategy with the highest total payoff or fitness score.


In [None]:
import numpy as np
from numpy.random import choice

# Define payoffs
PAYOFFS = {
    ('Cooperate', 'Cooperate'): (3, 3),
    ('Cooperate', 'Defect'): (0, 5),
    ('Defect', 'Cooperate'): (5, 0),
    ('Defect', 'Defect'): (1, 1)
}

# Define Q-learning parameters
alpha = 0.01  # Learning rate
gamma = 0.9  # Discount factor
epsilon = 0.01  # Exploration rate

# Initialize Q-values for strategies
Q_values = {
    ('Cooperate', 'Cooperate'): 0,
    ('Cooperate', 'Defect'): 0,
    ('Defect', 'Cooperate'): 0,
    ('Defect', 'Defect'): 0
}

def explore_payoffs():
    print("Exploring all possible payoffs:")
    for strategies, payoff in PAYOFFS.items():
        print(f"Strategies: {strategies} -> Payoffs: Player 1: {payoff[0]}, Player 2: {payoff[1]}")
    print()

def play_game(player1_strategy, player2_strategy, rounds):
    if (player1_strategy, player2_strategy) not in PAYOFFS:
        raise ValueError("Invalid strategy. Choose from 'Cooperate' or 'Defect'.")

    total_payoff_player1 = 0
    total_payoff_player2 = 0

    for current_round in range(1, rounds + 1):
        payoff = PAYOFFS[(player1_strategy, player2_strategy)]
        total_payoff_player1 += payoff[0]
        total_payoff_player2 += payoff[1]

    return total_payoff_player1, total_payoff_player2

def q_learning_update(strategy, opponent_strategy, reward):
    global Q_values
    # Update Q-value using the reward received
    state_action_pair = (strategy, opponent_strategy)
    best_next_action = max(Q_values, key=Q_values.get)
    Q_values[state_action_pair] = Q_values[state_action_pair] + alpha * (reward + gamma * Q_values[best_next_action] - Q_values[state_action_pair])

def evolve_population(population, rounds, mutation_rate=0.1):
    strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
    new_population = []

    fitness_scores = {strategy: 0 for strategy in strategies}
    for strategy in strategies:
        total_payoff = 0
        for opponent_strategy in strategies:
            payoff = play_game(strategy, opponent_strategy, rounds)
            total_payoff += payoff[0]
        fitness_scores[strategy] = total_payoff

    for individual in population:
        # Choose strategy using epsilon-greedy policy
        if np.random.rand() < epsilon:
            strategy = np.random.choice(strategies)
        else:
            strategy = max(fitness_scores, key=fitness_scores.get)

        # Update Q-values based on the chosen strategy
        reward = fitness_scores[strategy] / len(strategies)
        q_learning_update(strategy, max(fitness_scores, key=fitness_scores.get), reward)

        # Mutation: with some probability, change the strategy
        if np.random.rand() < mutation_rate:
            strategy = np.random.choice(strategies)

        new_population.append(strategy)

    return new_population

def evaluate_population(population, rounds):
    # Calculate total payoffs for each individual
    payoffs = {strategy: 0 for strategy in set(population)}
    for strategy in payoffs:
        total_payoff = 0
        for opponent_strategy in set(population):
            payoff = play_game(strategy, opponent_strategy, rounds)
            total_payoff += payoff[0]
        payoffs[strategy] = total_payoff

    # Determine the best strategy
    best_strategy = max(payoffs, key=payoffs.get)
    return best_strategy, payoffs[best_strategy]

# Initialize population with random strategies
initial_population_size = 10
strategies = list(set([s for sublist in PAYOFFS.keys() for s in sublist]))
population = [np.random.choice(strategies) for _ in range(initial_population_size)]

# Simulate evolution
rounds = 5
generations = 5

for generation in range(generations):
    population = evolve_population(population, rounds)
    best_strategy, best_payoff = evaluate_population(population, rounds)
    print(f"Generation {generation + 1}:")
    print(f"Population: {population}")
    print(f"Best Strategy: {best_strategy} with Payoff: {best_payoff}")

# Explore all possible payoffs
explore_payoffs()


Generation 1:
Population: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Best Strategy: Defect with Payoff: 5
Generation 2:
Population: ['Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Defect', 'Defect']
Best Strategy: Defect with Payoff: 30
Generation 3:
Population: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Best Strategy: Defect with Payoff: 5
Generation 4:
Population: ['Defect', 'Defect', 'Defect', 'Defect', 'Cooperate', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect']
Best Strategy: Defect with Payoff: 30
Generation 5:
Population: ['Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Defect', 'Cooperate']
Best Strategy: Defect with Payoff: 30
Exploring all possible payoffs:
Strategies: ('Cooperate', 'Cooperate') -> Payoffs: Player 1: 3, Player 2: 3
Strategies: ('Cooperate', 'Defect') -> Payoffs: Player 1:

----
----

# Ending note: Lets Understand the Point of the Simulation

1. **Exploring Strategic Interactions:**
   - The notebook demonstrates how different strategies can be applied in a game theory scenario, particularly the Prisoner's Dilemma. The main point is to understand how strategies like "Cooperate" and "Defect" can lead to different outcomes based on the payoffs associated with each choice.

2. **Nash Equilibrium Concept:**
   - By running simulations, we explore the concept of Nash equilibrium—where no player can benefit from changing their strategy unilaterally, assuming the other player's strategy remains the same. This is crucial for understanding stable outcomes in strategic interactions.

3. **Strategy Evolution and Adaptation:**
   - The notebook uses evolutionary game theory and reinforcement learning to simulate how strategies evolve over time. It showcases:
     - **Evolutionary Game Theory (EGT):** How strategies change based on their success in the population. The aim is to see which strategies become dominant over generations.
     - **Reinforcement Learning (RL):** How agents can learn and adapt their strategies based on feedback and interactions, improving their performance over time.

4. **Practical Applications:**
   - These simulations provide insights into real-world scenarios where strategic decision-making is essential. For instance:
     - **Economic Models:** Understanding competitive behaviors and market dynamics.
     - **Social Interactions:** Analyzing cooperation and competition in social or organizational settings.
     - **Algorithm Design:** Improving algorithms for optimization problems where decision-making under uncertainty is involved.

5. **Visualization of Outcomes:**
   - The notebook helps visualize the outcomes of different strategies and how they affect payoffs. It allows you to see how certain strategies (e.g., always defecting) can dominate others and how evolutionary processes influence strategy selection over time.

**Summary:** The primary goal is to provide a practical understanding of game theory concepts through simulations. This includes how Nash equilibrium works, how strategies evolve and adapt, and how reinforcement learning can be used to enhance decision-making strategies.

In essence, the point is to explore and visualize the dynamics of strategic interactions, strategy evolution, and learning, which have broad implications for various fields including economics, social sciences, and artificial intelligence.

---