# Modeling Internal Representation

A useful abstraction is thinking of each of the opponent's keywords as a random variable. We know the opponent has a keyword card with a certain number of fixed keywords, but not what the keywords are. However, we know they must be *a* word. So we may as well think of each of their words as having some probability of being each word possible, and refine the probability values based on context and revealed information.

Somewhat less obvious is that we may do the same for our own keywords; for each word, the distribution would simply have probability 1 for the keyword, and 0 for every other word.

I am going to use log probabilities because some of the probabilities we are working with might get very small, and it will help preserve precision. Additionally, adding and subtracting might yield us some speed benefits in comparison to multiplying. This should also help in the future should we explore this more in the information theory context.

For the sake of example, let's pretend only 4 words exist: "BOAT", "CAT", "RUM", and "DRAG". We might have a keyword card that looks like ("CAT", "BOAT", "RUM", "DRAG"), and be given code (1, 3, 0). For the sake of example, let's break the rules and be naive and use the keywords themselves as the clues. Then we might model it as such:

In [40]:
from dataclasses import dataclass
from itertools import permutations
import numpy as np

ALL_WORDS = ("BOAT", "CAT", "RUM", "DRAG")
ALL_WORD_INDEX = {word: ALL_WORDS.index(word) for word in ALL_WORDS}
ALL_CODES = np.array(list(permutations(range(4), 3)))
LOG_PROB_ZERO = np.float64(-100.0)

@dataclass
class NumpyRandomVariable:
    log_probabilities: np.array
    keyword_indices: np.array


keyword_card = (ALL_WORDS[1], ALL_WORDS[0], ALL_WORDS[2], ALL_WORDS[3])
code = (1, 3, 0)
naive_clue = tuple(keyword_card[i] for i in code)

def guesser_random_variable(keyword, word_index=ALL_WORD_INDEX):
    log_probabilities = np.log(np.zeros(len(word_index)))
    log_probabilities[word_index[keyword]] = 0
    keyword_indices = np.arange(len(word_index))
    return NumpyRandomVariable(log_probabilities, keyword_indices)

random_variables = [guesser_random_variable(keyword) for keyword in keyword_card]
clue_indices = np.array([ALL_WORD_INDEX[clue] for clue in naive_clue])

print(random_variables)
print(clue_indices)

[NumpyRandomVariable(log_probabilities=array([-inf,   0., -inf, -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([  0., -inf, -inf, -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([-inf, -inf,   0., -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([-inf, -inf, -inf,   0.]), keyword_indices=array([0, 1, 2, 3]))]
[0 3 1]


  log_probabilities = np.log(np.zeros(len(word_index)))


Now we can find the log expected likelihood for each code, and guess the one with the highest likelihood. We can find the log expected probability of a code by evaluating a heuristic for the log expected probability of each clue and keyword pairing specified by that code, and adding them together.

For example, a code might be (2, 1, 0), and a clue might be ("CAT", "RUM", "BOAT"). To find the likelihood of the code with our clue, we would use a heursitic to evaluate the log expected likelihood of keyword 2 and "CAT" going together, keyword 1 and "RUM" going together, and keyword 0 and "BOAT" going together, and add them all together. This is equivalent to multiplying the individual expected likelihoods together. 

Note: The heuristic yields an expectation because we are modeling the keywords as random variables. This will prove useful when we explore Intercepter strategies, as they will not have the knowledge our guesser has.

We can use numpy to vectorize this operation and perform it for each possible code, and to find our best guess. For the sake of example, we will use a naive heuristic which just checks if the clue and keyword are equal.

In [41]:
from functools import partial

# vectorized functions

def log_expected_probability(keyword_index_to_log_prob_func, log_probabilties: np.ndarray): # this is log equivalent of E[f(X)]
    keyword_indices = np.indices(log_probabilties.shape)[-1]
    # calculate terms of expectation sum definition
    log_terms = log_probabilties + keyword_index_to_log_prob_func(keyword_indices)
    # subtract max term to mitigate error
    # note: if we lose a lot of precision, we can omit the conversion and reduce, but it will be slower)
    max_log_term = np.max(log_terms, axis=-1)
    log_offset_terms = log_terms - np.expand_dims(max_log_term, axis=-1)
    # convert to regular probability world and evaluate sums to get expectation
    offset_expectation = np.sum(np.exp(log_offset_terms), axis=-1)
    # bring back to log world and add max term back
    log_expectation = np.log(offset_expectation) + max_log_term
    return log_expectation

def naive_clue_and_keyword_log_prob(clue_index, keyword_index):
    return np.where(keyword_index == np.expand_dims(clue_index, axis=-1), np.NZERO, LOG_PROB_ZERO)

def log_expected_probabilities_codes(clue_and_keyword_to_log_probability_func, random_variables: list[NumpyRandomVariable], clue_indices: np.ndarray, codes: np.ndarray = ALL_CODES):
    var_log_probabilities = np.array([random_variable.log_probabilities for random_variable in random_variables])
    r_i, c_i = np.ogrid[slice(len(random_variables)), slice(len(clue_indices))]
    log_expected_probabilities = log_expected_probability(partial(clue_and_keyword_to_log_probability_func, clue_indices[c_i]), var_log_probabilities[r_i])
    return log_expected_probabilities[codes].trace(axis1=1, axis2=2)

naive_log_expected_probabilities_codes = partial(log_expected_probabilities_codes, naive_clue_and_keyword_log_prob)
log_expectations = naive_log_expected_probabilities_codes(random_variables, clue_indices)

print("log expectations")
print(log_expectations.shape)
print(log_expectations)
guess = ALL_CODES[np.argmax(log_expectations)]
print(guess)
print(guess == code)

log expectations
(24,)
[-300. -300. -300. -300. -200. -200. -200. -200. -100. -200.    0. -100.
 -300. -300. -200. -300. -100. -200. -300. -300. -200. -300. -200. -300.]
[1 3 0]
[ True  True  True]


At scale each random variable will be a lot of words. A simple way to handle this could be to simply ignore the lowest probability keywords to form a random variable reduced in size. For example, we might only consider the words which accumulate to 90% of the probability; this should still give us good predictions. 

This requires that our functions accept probabilities and indices, since the probabilities will only be for  a subset of all of the words.

In [42]:


# random variable size-reducing functions

def num_indices_for_cumulative_probability(log_probabilties: np.ndarray, keyword_indices_by_decreasing_probability: np.ndarray, cumulative_probability=1.0):
    # polling from a max heap until we reach cumulative_probability would be better for this theoretically
    # but lost NumPy speed may dominate
    probabilties_by_decreasing_probability = np.exp(log_probabilties[np.expand_dims(np.arange(len(log_probabilties)), axis=-1), keyword_indices_by_decreasing_probability])
    earliest_index = np.argmax(probabilties_by_decreasing_probability.cumsum(axis=-1) >= cumulative_probability, axis=-1)
    return earliest_index + 1


def random_vars_at_least_cumulative_probability(random_variables: list[NumpyRandomVariable], cumulative_probability=1.0):
    var_log_probabilities = np.array([random_variable.log_probabilities for random_variable in random_variables])
    keyword_indices_by_decreasing_probability = (-var_log_probabilities).argsort()
    num_indices = num_indices_for_cumulative_probability(var_log_probabilities, keyword_indices_by_decreasing_probability, cumulative_probability)
    reduced_keyword_indices = keyword_indices_by_decreasing_probability[:, slice(np.max(num_indices))]
    reduced_var_log_probabilities = var_log_probabilities[np.expand_dims(np.arange(len(var_log_probabilities)), axis=-1), reduced_keyword_indices]
    return [NumpyRandomVariable(log_probabilities, keyword_indices) for log_probabilities, keyword_indices in zip(reduced_var_log_probabilities, reduced_keyword_indices)]

# refactored guessing functions

def log_expected_probability(keyword_index_to_log_prob_func, log_probabilities: np.ndarray, keyword_indices: np.ndarray): # this is log equivalent of E[f(X)]
    # calculate terms of expectation sum definition
    log_terms = log_probabilities + keyword_index_to_log_prob_func(keyword_indices)
    # subtract max term to mitigate error
    # note: if we lose a lot of precision, we can omit the conversion and reduce, but it will be slower)
    max_log_term = np.max(log_terms, axis=-1)
    log_offset_terms = log_terms - np.expand_dims(max_log_term, axis=-1)
    # convert to regular probability world and evaluate sums to get expectation
    offset_expectation = np.sum(np.exp(log_offset_terms), axis=-1)
    # bring back to log world and add max term back
    log_expectation = np.log(offset_expectation) + max_log_term
    return log_expectation

def log_expected_probabilities_codes(clue_and_keyword_to_log_probability_func, random_variables: list[NumpyRandomVariable], clue_indices: np.ndarray, codes: np.ndarray = ALL_CODES):
    var_log_probabilities = np.array([random_variable.log_probabilities for random_variable in random_variables])
    var_keyword_indices = np.array([random_variable.keyword_indices for random_variable in random_variables])
    r_i, c_i = np.ogrid[slice(len(random_variables)), slice(len(clue_indices))]
    keyword_to_log_prob_vectorized = partial(clue_and_keyword_to_log_probability_func, clue_indices[c_i])
    log_expected_probabilities = log_expected_probability(keyword_to_log_prob_vectorized, var_log_probabilities[r_i], var_keyword_indices[r_i])
    return log_expected_probabilities[codes].trace(axis1=1, axis2=2)

print(random_variables)
reduced_random_variables = random_vars_at_least_cumulative_probability(random_variables, 0.95)
print(reduced_random_variables)
naive_log_expected_probabilities_codes = partial(log_expected_probabilities_codes, naive_clue_and_keyword_log_prob)
log_expectations = naive_log_expected_probabilities_codes(random_variables, clue_indices)

print("log expectations")
print(log_expectations.shape)
print(log_expectations)
guess = ALL_CODES[np.argmax(log_expectations)]
print(guess)
print(guess == code)

[NumpyRandomVariable(log_probabilities=array([-inf,   0., -inf, -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([  0., -inf, -inf, -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([-inf, -inf,   0., -inf]), keyword_indices=array([0, 1, 2, 3])), NumpyRandomVariable(log_probabilities=array([-inf, -inf, -inf,   0.]), keyword_indices=array([0, 1, 2, 3]))]
[1 1 1 1]
[1 1 1 1]
[[0.]
 [0.]
 [0.]
 [0.]]
[NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([1])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([0])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([2])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([3]))]
log expectations
(24,)
[-300. -300. -300. -300. -200. -200. -200. -200. -100. -200.    0. -100.
 -300. -300. -200. -300. -100. -200. -300. -300. -200. -300. -200. -300.]
[1 3 0]
[ True  True  True]

In the case of the Guesser, all of the expectation lies on the single keyword which we know for certain; we can make a special constructor for the guesser that allows it to bypass work.

In [43]:
def guesser_random_variables(keyword_card, word_index=ALL_WORD_INDEX):
    return [NumpyRandomVariable(np.zeros(1), np.array([word_index[keyword]])) for keyword in keyword_card]

random_variables = guesser_random_variables(keyword_card)
print(random_variables)

naive_log_expected_probabilities_codes = partial(log_expected_probabilities_codes, naive_clue_and_keyword_log_prob)
log_expectations = naive_log_expected_probabilities_codes(random_variables, clue_indices)

print("log expectations")
print(log_expectations.shape)
print(log_expectations)
guess = ALL_CODES[np.argmax(log_expectations)]
print(guess)
print(guess == code)

[NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([1])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([0])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([2])), NumpyRandomVariable(log_probabilities=array([0.]), keyword_indices=array([3]))]
log expectations
(24,)
[-300. -300. -300. -300. -200. -200. -200. -200. -100. -200.    0. -100.
 -300. -300. -200. -300. -100. -200. -300. -300. -200. -300. -200. -300.]
[1 3 0]
[ True  True  True]
