## Playground for RSA for Hanabi with 'intention representation' ##

In [29]:
import numpy as np

For this examples, let's imagine a 2 player game of Hanabi.

Situation: Player 2 has a red card on chop position that Player 1 wants her "to save". Player 1 gives the hint "red card (only) on 1st position" to Player 2. Player 2 then reasons about Player 1's intention behind this hint. For now, this means she only reasons about the 1st card in her hand.


In [91]:
# list of all available actions
all_actions = ['red','blue','discard']
all_intentions = ['to play', 'to save', 'to discard']


# using fixed utilities that I made up
utilities = {
    'to play': {'red': 0.4, 'blue': -3, 'discard': -1},
    'to save': {'red': 0.9, 'blue': -2, 'discard': -3},
    'to discard': {'red': -1, 'blue': 0.01, 'discard': 0.9}
}

# assuming flat priors for now
priors = {
    'to play': 1,
    'to save': 1,
    'to discard': 1
}


Pragmatic Listener 

formular: probability_to_play = likelihood_action_given_intention * prior[intention]) / sum (all other likelihoods * prioir[intention])

In [49]:
def prag_listener(action, intention):
    bayes_numerator = (prag_speaker(action, intention) * priors[intention])
    bayes_denominator = 0
    for specific_intention in all_intentions:
        bayes_denominator += prag_speaker(action, specific_intention) * priors[intention]
    return bayes_numerator / bayes_denominator


Pragmatic speaker

In [50]:
def prag_speaker(action, intention):
    alpha = 1
    softmax_numerator = np.exp(alpha * utilities[intention][action])
    softmax_denominator = 0
    for specific_action in all_actions:
        softmax_denominator += np.exp(alpha * utilities[intention][specific_action])
    return softmax_numerator / softmax_denominator

## Experiments ##

In [93]:
prob_to_play = prag_listener('red', 'to play')
prob_to_save = prag_listener('red', 'to save')
prob_to_discard = prag_listener('red', 'to discard')

print("Hint is: 'red'\n")
print("probability to save: ",prob_to_save)
print("probability to play: ",prob_to_play)
print("probability to discard: ",prob_to_discard)

Hint is: 'red'

probability to save:  0.5146285392914093
probability to play:  0.4323242129538188
probability to discard:  0.053047247754771897


## What to do next? ##
- how to get reasonable priors / how to update them
- integrate more of the board state
- integrate: card positions, multiple cards