# Coins in a Box - Problem 852
<p>This game has a box of $N$ unfair coins and $N$ fair coins. Fair coins have probability 50% of landing heads while unfair coins have probability 75% of landing heads.</p>

<p>The player begins with a score of 0 which may become negative during play.</p>

<p>At each round the player randomly picks a coin from the box and guesses its type: fair or unfair. Before guessing they may toss the coin any number of times; however, each toss subtracts 1 from their score. The decision to stop tossing and make a guess can be made at any time. After guessing the player's score is increased by 20 if they are right and decreased by 50 if they are wrong. Then the coin type is revealed to the player and the coin is discarded.</p>

<p>After $2N$ rounds the box will be empty and the game is over. Let $S(N)$ be the expected score of the player at the end of the game assuming that they play optimally in order to maximize their expected score.</p>

<p>You are given $S(1) = 20.558591$ rounded to 6 digits after the decimal point.</p>

<p>Find $S(50)$. Give your answer rounded to 6 digits after the decimal point.</p>

## Solution.

In [1]:
from functools import cache
from scipy.stats import binom
from math import gcd
from tqdm import tqdm

In [2]:
@cache
def ev_stay(f, u, H, T):
    ''' Finds ev given that there are f fair coins left, u unfair and that we've seen H heads and T tails and we decide to take a guess now'''
    p = f/(f+u)
    q = 1-p

    t = H + T

    P_F_given_HT = p / (p + q * 3**H/2**t)
    P_U_given_HT = 1 - P_F_given_HT

    ev_f = P_F_given_HT * 20 + P_U_given_HT * (-50) - t
    ev_u = P_F_given_HT * (-50) + P_U_given_HT * (20) - t
    
    
    return max(ev_f, ev_u)

In [3]:
@cache
def P_head(f, u, H, T):
    ''' Finds probability of tossing a head, given that we observed H heads and T tails'''
    p = f/(f+u)
    q = 1-p

    t = H + T

    P_F_given_HT = p / (p + q * 3**H/2**t)
    P_U_given_HT = 1 - P_F_given_HT
    
    return 1/2 * P_F_given_HT + 3/4 * P_U_given_HT

In [4]:
@cache
def generate_tree(depth):
    if depth == 0:
        return [[(0,0)]]

    prev = generate_tree(depth-1)
    last = []
    for t in range(depth, -1, -1):
        last.append((t, depth-t))
    
    return prev + [last]

In [5]:
@cache
def profit(f, u, d=150):
    ''' 
    Finds expected profit when there are f fair coins, u unfair coins and we play semi-optimally (I do not know the optimal strategy but only
    the approximated one)


    We generate a tree of outcomes of some fixed depth, enough for numerical cases --- in most cases at some point we will stop and guess although
    I think we could go forever without making an opitmal guess: e.g. f=u=1 and we get some weird pattern of heads and tails that will at any point
    give us insufficient evidence to take an educated guess. I claim this happens only with a neglible prob.
    '''
    if gcd(f, u) > 1:
        return profit(f//gcd(f,u), u//gcd(f,u))
    
    p = f / (f + u)
    q = 1 - p

    tree = generate_tree(d)
    
    # find ev-s if we were to stay and guess at a given node
    ev = {}
    for level in tree:
        for node in level:
            H, T = node
            ev[node] = ev_stay(f, u, H, T)

    
    # Assume that we always stay and guess when we're on the leaves of the tree (given sufficient depth). Go bottom up of the tree and find ~true ev-s.
    for level in tree[::-1][1:]:
        for node in level:
            H, T = node
            left = (H+1, T)
            right = (H, T+1)

            p_H = P_head(f, u, H, T)
            p_T = 1 - p_H
            ev[node] = max(p_H*ev[left] + p_T*ev[right], ev[node])
        
    
    return ev[(0,0)]

In [6]:
@cache
def S(f, u):
    ''' Solves the question for f fair coins and u unfair coins'''
    if u == 0:
        return 20 * f

    if f == 0:
        return 20 * u

    p = f/(f+u)
    q = 1-p


    return p * S(f-1, u) + q * S(f, u-1) + profit(f, u)

In [7]:
round(profit(1,1), 6)

0.558591

In [8]:
round(S(1,1), 6)

20.558591

In [None]:
S(50, 50)

In [None]:
n = 51
for i in tqdm(range(1, n)):
    for j in range(1, n):
        a = profit(i, j)