# (Not so) Markov vs Nash Equilibrium: Rock Paper Scissors


### 100 seasons of Markov vs Nash on Rock Paper Scissors
### 1000 episodes per season

### Bonus: Dataset generation

<a id="1"></a>
<h2 style='background:#FBE338; border:0; color:black'><center>Agent: Nash Equilibrium<center><h2>

![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/John_Forbes_Nash%2C_Jr._by_Peter_Badge.jpg/220px-John_Forbes_Nash%2C_Jr._by_Peter_Badge.jpg)

*...if we all go for the blonde we are blocking each other.*

In [None]:
%%writefile nash_equilibrium.py

import random

def nash_equilibrium(observation, configuration):
    return random.randint(0, 2)

<a id="1"></a>
<h2 style='background:#FBE338; border:0; color:black'><center>Agent: (Not so) Markov<center><h2>

![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Andrei_Markov.jpg/220px-Andrei_Markov.jpg) 

*...breaking chains excites me!*

In [None]:
%%writefile markov_agent.py

import numpy as np
import collections

def markov_agent(observation, configuration):
    k = 2
    global table, action_seq
    if observation.step % 250 == 0: # refresh table every 250 steps
        action_seq, table = [], collections.defaultdict(lambda: [1, 1, 1])    
    if len(action_seq) <= 2 * k + 1:
        action = int(np.random.randint(3))
        if observation.step > 0:
            action_seq.extend([observation.lastOpponentAction, action])
        else:
            action_seq.append(action)
        return action
    # update table
    key = ''.join([str(a) for a in action_seq[:-1]])
    table[key][observation.lastOpponentAction] += 1
    # update action seq
    action_seq[:-2] = action_seq[2:]
    action_seq[-2] = observation.lastOpponentAction
    # predict opponent next move
    key = ''.join([str(a) for a in action_seq[:-1]])
    if observation.step < 500:
        next_opponent_action_pred = np.argmax(table[key])
    else:
        scores = np.array(table[key])
        next_opponent_action_pred = np.random.choice(3, p=scores/scores.sum()) # add stochasticity for second part of the game
    # make an action
    action = (next_opponent_action_pred + 1) % 3
    # if high probability to lose -> let's surprise our opponent with sudden change of our strategy
    if observation.step > 900:
        action = next_opponent_action_pred
    action_seq[-1] = action
    return int(action)

<a id="11"></a>
<h2 style='background:#FBE338; border:0; color:black'><center>Validate<center><h2>




In [None]:
from kaggle_environments import make, evaluate

env = make("rps", configuration={"episodeSteps": 1000})

env.run(["markov_agent.py", "nash_equilibrium.py"])

env.render(mode="ipython", width=800, height=800)

<a id="11"></a>
<h2 style='background:#FBE338; border:0; color:black'><center>Action<center><h2>



In [None]:
seasons = 100
episodes = 1000

In [None]:
import numpy as np
import pandas as pd
import json

import matplotlib.pyplot as plt
import seaborn as sns

from kaggle_environments import make

from IPython.display import Markdown as md

action_board = pd.DataFrame(columns = ["season",
                                      "episode",
                                      "Markov Action",
                                      "Nash Action",
                                      "Markov Reward",
                                      "Nash Reward"])
leaderboard = pd.DataFrame(columns = ["season",
                                      "Markov Reward",
                                      "Nash Reward"])


index = 0
env = make("rps", configuration={"episodeSteps": episodes})

for season in range(seasons):
    env.reset()
    results = env.run(["markov_agent.py", "nash_equilibrium.py"])
    for result in results:
        if (result[0].observation.step == 0):
            continue
        action_board = action_board.append({"season": season,
                              "episode": result[0].observation.step,
                              "Markov Action": result[0].action,
                              "Nash Action": result[1].action,
                              "Markov Reward": result[0].reward,
                              "Nash Reward": result[1].reward},
                                        ignore_index=True)
        if result[0].status == "DONE":
            leaderboard = leaderboard.append({"season": season,
                              "Markov Reward": result[0].reward,
                              "Nash Reward": result[1].reward},
                                        ignore_index=True)

<h1 style='background:#FBE338; border:0; color:black'><center>Result<center><h1>


In [None]:
md('# \(Not so\) Markov - Nash Equilibrium : {} - {}'.format(len(leaderboard[leaderboard["Markov Reward"] > 0]), len(leaderboard[leaderboard["Nash Reward"] > 0])))

In [None]:
md('# Tie : {}'.format(len(leaderboard[leaderboard["Markov Reward"] == 0])))

In [None]:
if (len(leaderboard[leaderboard["Markov Reward"] > 0]) == len(leaderboard[leaderboard["Nash Reward"] > 0])):
    winner = "Tie!"
elif (len(leaderboard[leaderboard["Markov Reward"] > 0]) > len(leaderboard[leaderboard["Nash Reward"] > 0])):
    winner = "Winner is Markov!"
else:
    winner = "Winner is Nash!"
md('<a id="11"></a><h1 style=\'background:#FBE338; border:0; color:black\'><center>{}<center><h2>'.format(winner))

<h1 style='background:#FBE338; border:0; color:black'><center>Analysis<center><h1>

# Season's results

In [None]:
leaderboard.plot(subplots=True, figsize=(15,10))

# Season's reward histogram

In [None]:
leaderboard[['Markov Reward', 'Nash Reward']].plot.hist(bins=10,  alpha=0.5, figsize=(15,10))

# Actions histogram

In [None]:
action_board[['Markov Action', 'Nash Action']].plot.hist(bins=3, alpha=0.5, xticks=[0,1,2], figsize=(15,10))

## All episodes reward

In [None]:
fig, ax = plt.subplots(figsize=(20,10))
for i, g in action_board.groupby('season'):
    g.plot(x='episode', y='Markov Reward', ax=ax, legend=False )

## First half rewards

In [None]:
fig, ax = plt.subplots(figsize=(20,15))
for i, g in action_board[(action_board['episode']<episodes/2)].groupby('season'):
    g.plot(x='episode', y='Markov Reward', ax=ax, legend=False )

## Mid-episodes reward

In [None]:
fig, ax = plt.subplots(figsize=(20,15))
for i, g in action_board[((action_board['episode']>episodes/3) & (action_board['episode']<2*episodes/3))].groupby('season'):
    g.plot(x='episode', y='Markov Reward', ax=ax, legend=False )

## Last half rewards

In [None]:
fig, ax = plt.subplots(figsize=(20,15))
for i, g in action_board[action_board['episode']>episodes/2].groupby('season'):
    g.plot(x='episode', y='Markov Reward', ax=ax, legend=False )

<h1 style='background:#FBE338; border:0; color:black'><center>Conclusion<center><h1>

* Agent `(Not so) Markov` has a clear tendency to shoot `paper`.
* Agent's `(Not so) Markov` change of strategy at `500` & `900` episode is not always good strategy.
* When reward is around `-20` looks like a good point to change strategy.

<h1 style='background:#FBE338; border:0; color:black'><center>Dataset<center><h1>

Dataset is exported, collected and publicly shared in [Rock Paper Scissors Agents Battles](https://www.kaggle.com/jumaru/rock-paper-scissors-agents-battles) dataset.

## Leaderboard

### First 5 seasons rewards

In [None]:
leaderboard.head()

### Last 5 seasons rewards

In [None]:
leaderboard.tail()

## Rewards Statistics 

In [None]:
leaderboard.describe()

# Action board

## First 5 actions

In [None]:
action_board.head()

## Last 5 actions

In [None]:
action_board.tail()

## Actions Statistics

In [None]:
action_board.drop(columns='season').describe()

# Data export

In [None]:
# Report boards
leaderboard_csv = 'Not_so_Markov_leaderboard_S' + str(seasons) + 'E' + str(episodes) + '.csv'
action_board_csv = 'Not_so_Markov_action_board_S'+ str(seasons) + 'E' + str(episodes) + '.csv'
leaderboard.to_csv(leaderboard_csv)
action_board.to_csv(action_board_csv)
print(leaderboard_csv)
print(action_board_csv)

# References

* [Rock Paper Scissors - Nash Equilibrium Strategy](https://www.kaggle.com/ihelon/rock-paper-scissors-nash-equilibrium-strategy) & [Rock Paper Scissors - Agents Comparison](https://www.kaggle.com/ihelon/rock-paper-scissors-agents-comparison) by [Yaroslav Isaienkov](https://www.kaggle.com/ihelon)
* [(Not so) Markov](https://www.kaggle.com/alexandersamarin/not-so-markov) by [Alexander Samarin](https://www.kaggle.com/alexandersamarin)
* [LB simulation](https://www.kaggle.com/superant/lb-simulation) by [Ant 🐜](https://www.kaggle.com/superant)