https://fivethirtyeight.com/features/how-good-are-you-at-guess-who/

Rule Summary:
- The board has `N` characters.
- Players choose a character as their own.
- Players alternate turns.
- At each turn, a player may:
  - Make a specific guess about the opponent's character (correct = win, incorrect = lose).
  - Ask a yes/no question to reduce the possible characters.

Simple Example:
- The board has 4 characters.
- Player one chooses to eliminate 2 characters; player two does the same.
  - Bisecting the board should produce the fastest game.
- Player one has two characters remaining. Player one can:
  1. Choose to eliminate 1 character, then guess correctly on the next turn.
  1. Guess a character at random, with a 50% chance of winning and a 50% chance of losing.
- If player one chooses option #1, player two can:
  1. Choose to eliminate 1 character, and lose when player one guesses correctly.
  1. Guess a character at random, with a 50% chance of winning and a 50% chance of losing.
- From player two's perspective, option #2 is clearly better.
- Therefore, from player one's perspective, option #1 and option #2 have the same win probability.

Any other approach besides bisecting the board should be inferior. For instance, let's assume that
the player chooses to divide the board into `1` and `N-1` players. `1/N` of the time, the player can guess correctly on the next move. `(N-1)/N` of the time, the player has only eliminated one character.

Alternative Example:
- The board has 4 characters.
- Player one chooses to eliminate 2 characters.
- Player two choose to divide the board into 1, 3 characters
  - 1/4 of the time, there will be 1 character left, and player two can guess correctly on the next turn.
    - This forces player one into guessing one of the two remaining characters, winning 50% of the time.
  - 3/4 of the time, there will be 3 characters left.
    - Player one can proceed to eliminate 1 character, leaving only one remaining.
    - To avoid losing, player two has to guess, with a 1/3 chance of winning.

Looking at the alternative example, player two has a (1/4) * (1/2) + (3/4) * (1/3), or 3/8 chance of winning. This is less than the 50% from the first example.

Extended Example:
- The board has 24 characters.
- The first turn for each player reduces this to 12.
- The second turn for each player reduces this to 6.
- The third turn for each player reduces this to 3.
- Player one then has a:
  - 1/3 chance of forcing player two into a 1/3 guess: 1/3 * 2/3 = 2/9
  - 2/3 chance of leaving player two a choice:
    - Eliminate a character, leaving a (1/3 * 1/2 + 2/3 * 1/2

In [1]:
import Pkg; Pkg.activate("..")

[32m[1mActivating[22m[39m environment at `~/vcs/notebooks/Project.toml`


In [2]:
using  Reinforce

In [3]:
import Reinforce: action, actions, finished, ismdp, maxsteps, reset!, reward, state, step!

struct GuessWhoState
    c₁::Int
    c₂::Int
end

GuessWhoState(n) = GuessWhoState(n, n)

flip(s::GuessWhoState) = GuessWhoState(s.c₂, s.c₁)

mutable struct GuessWhoEnvironment <: AbstractEnvironment
    n::Int  # number of characters
    s::GuessWhoState  # state
    r::Int  # reward
    GuessWhoEnvironment(n) = new(n, GuessWhoState(n), 0)
end

reset!(env::GuessWhoEnvironment) = (env.s = GuessWhoState(env.n); env.r = 0; env)

function makestep(s, a)
    # always step for player 1
    if a == 0
        # Guess and end game
        r = rand() < (1 / s.c₁) ? 1 : -1
        s = GuessWhoState(0, 0)
    else
        r = 0
        remaining = rand() < (a / s.c₁) ? a : (s.c₁ - a)
        s = GuessWhoState(remaining, s.c₂)
    end
    return r, s
end

function step!(env::GuessWhoEnvironment, s, a)
    r, s = makestep(s, a)
    if !finished(env, s)
        # take random step as opponent
        os = flip(s)
        A = actions(env, os)
        or, os = makestep(os, rand(A))
        r = -1.0 * or
        s = flip(os)
    end
    env.r = r
    env.s = s
    return (env.r, env.s)
end
state(env::GuessWhoEnvironment) = env.s
reward(env::GuessWhoEnvironment) = env.r
maxsteps(env::GuessWhoEnvironment) = env.n + 1
actions(env::GuessWhoEnvironment, s) = collect(0:(s.c₁ ÷ 2))
finished(env::GuessWhoEnvironment, s′) = (s′.c₁ == 0 || s′.c₂ == 0)

struct RandomGuessWhoPolicy <: Reinforce.AbstractPolicy end
Reinforce.action(p::RandomGuessWhoPolicy, r, s, A) = rand(A)

### Run Episode

In [4]:
env = GuessWhoEnvironment(4)

GuessWhoEnvironment(4, GuessWhoState(4, 4), 0)

In [5]:
p = RandomGuessWhoPolicy()

RandomGuessWhoPolicy()

In [6]:
ep = Episode(env, p)

Episode{GuessWhoEnvironment,RandomGuessWhoPolicy,Float64}(GuessWhoEnvironment(4, GuessWhoState(4, 4), 0), RandomGuessWhoPolicy(), 0.0, 0.0, 1, 1, 5)

In [7]:
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
    println("Initial state: $s")
    println("Action taken : $a")
    println("Result       : $r")
    println("Final state  : $s′")
    println()
end
println("Performed $(ep.niter) iterations with a result of $(ep.total_reward)");

Initial state: GuessWhoState(4, 4)
Action taken : 1
Result       : 0
Final state  : GuessWhoState(3, 2)

Initial state: GuessWhoState(3, 2)
Action taken : 0
Result       : 1
Final state  : GuessWhoState(0, 0)

Performed 2 iterations with a result of 1.0


In [8]:
R = Reinforce.run_episode(env, p) do (s, a, r, s′)
    # anything you want... this section is called after each step
    println("Initial state: $s")
    println("Action taken : $a")
    println("Result       : $r")
    println("Final state  : $s′")
    println()
end

Initial state: GuessWhoState(4, 4)
Action taken : 2
Result       : 0
Final state  : GuessWhoState(2, 3)

Initial state: GuessWhoState(2, 3)
Action taken : 1
Result       : 0
Final state  : GuessWhoState(1, 2)

Initial state: GuessWhoState(1, 2)
Action taken : 0
Result       : 1
Final state  : GuessWhoState(0, 0)



1.0