# Trading Interview Game

Now we have introduced CFR and understand the basics, lets try apply it to a problem inspired by a real interview question from a top quantitative trading firm.

## The Game - Part 1

The game starts with two players, both have to submit a number between 0 and 100 on a piece of paper. After the numbers are submitted, the papers are checked and whoever submitted the larger number must pay the other player their number. 

A few examples:
* Player A submits 90, Player B submits 10. Player B wins so receives \$10.
* Player A submits 15, Player B submits 20. Player A wins so receives \$15.
* Player A submits 1, Player B submits 99. Player A wins so receives \$1.

Imagine you are playing this game and versing an opponent who submits a random number between 1 and 100.
The warm-up question, which can be solved with just paper and pen, is to calculate the optimal number for you to submit, such that you maximise you expected value in this game. 




## The Game - Part 2

Now we introduce a third player to the game who is perfectly rational, as are you, and there is still a player who submits a random bid. The penalty for losing a game is now greater, as the person who submits the highest number must now pay all the other players their number. If there is a tie for highest bid, they each pay half of the third players number.

For example:
* Player A submits 50, Player B submits 40, Player C submits 30. A loses so pays B \\$40 and pays C \\$30.
* Player A submits 20, Player B submits 50, Player C submits 50. B and C tie to lose, so both pay A \\$10 each.


Our previous strategy of picking a single number is no longer profitable as we will get exploited by the new player. It is likely we will now need to play a mixed strategy. Let's try solve this problem using CFR where we will find the nash equilibrium between player A and player B, and play this strategy to maximise our expected value. 

It is worth trying to solve this problem with just pen and paper too, as would be expected in an interview. We have therefore leftout the graph showing the answer and have it in the solutions document.

### Extension
In order to calculate a solution which converged fast enough, I had to use a slight modification on CFR called [CFR+](https://arxiv.org/pdf/1407.5042).

To check your nash equilibrium solution you can see how its EV compares to a pure strategy of choosing a single number. This is an easy way to check if you have made a mistake, as the nash strategy should beat all pure strategies. 

Further, you can compute the *exploitability* of a strategy to get a numerical value for the maximum theoretical exploitability of your strategy. You can also compare this to exploitability of pure strategies. I'll leave it up to you to figure out how to do this

In [41]:
import numpy as np
import matplotlib.pyplot as plt
N = 101
actions = np.arange(1, N)

In [79]:
# hero number is first
def game_outcome(numbers: list[int]) -> int:
    '''
    numbers: [hero_number, villain1_number, villain2_number]
    
    Return the win or loss for the hero given the submitted numbers
    according to the rules of the game
    '''

In [80]:
'''
Action is the number to submit (100 possible actions)

Tip: to speedup the computation, we can incorporate the randomness 
of player C's bid into our payoff function by taking an average over 
all 100 possible choices, instead of actually choosing a random number for player C.
'''

def payoff(hero_action: int, villain_strategy: np.array(float)) -> float:
    '''
    PAYOFF FUNCTION HERE
    '''    

In [81]:
'''
Calculate immediate regret for every action
'''
def calculate_immediate_regret(hero_strategy: np.array(float), villain_strategy: np.array(float)) -> np.array(float):
    '''
    IMMEDIATE REGRET 
    '''


In [82]:
'''
Calculate new strategy based on accumulated regret for the hero
'''

def calculate_strategy(acc_regrets: np.array(float)) -> np.array(float):
    '''
    CALCULATE NEW STRATEGY
    '''


In [83]:
'''
Run CFR algorithm.
We set initial strategy for player A and B to both pick rock 100% of the time. 
'''

strategyA = np.ones(N) / N
strategyB = np.ones(N) / N

acc_regretsA = np.zeros(N)
acc_regretsB = np.zeros(N)

steps = 100

strat_history = []

for t in range(steps):
    acc_regretsA += calculate_immediate_regret(strategyA, strategyB)
    strategyA = calculate_strategy(acc_regretsA)
    
    acc_regretsB += calculate_immediate_regret(strategyB, strategyA)
    strategyB = calculate_strategy(acc_regretsB)

    strat_history.append(strategyA)


UFuncTypeError: Cannot cast ufunc 'add' output from dtype('O') to dtype('float64') with casting rule 'same_kind'