# Riddler Classic 
**2021-01-29**: https://fivethirtyeight.com/features/can-you-guess-the-mystery-word/

You are a contestant on the game show [Lingo](https://en.wikipedia.org/wiki/Lingo_(American_game_show)), where your objective is to determine a five-letter mystery word. You are told this word’s first letter, after which you have five attempts to guess the word. You can guess any five-letter word, even one that has a different first letter.

After each of your guesses, you are told which letters of your guess are also in the mystery word and whether any of the letters are in the correct position. In the example below, T is in the correct position (remember, the first letter is provided to you), while A and C are in the mystery word but not in the correct positions.

<img src='img/mystery-word-1.png' align='center' style='width: 500px;'>

For this example, here’s how you might have figured out the mystery word (TACOS) using all five guesses:

<img src='img/mystery-word-2.png' align='center' style='width: 500px;'>

The mystery word and guesses can contain multiple instances of a letter. For example, the mystery word MISOS contains one O, so a guess with more than one O (like MOSSO) will only have the first O marked as correct (but in this case, in the wrong position).

<img src='img/mystery-word-3.png' align='center' style='width: 500px;'>


As a contestant, your plan is to make a mockery of the game show by adopting a bold strategy: No matter what, before you are even told what the first letter of the mystery word is, you have decided what your first four guesses will be. Then, with your fifth guess, you will use the results of your first four guesses (and your encyclopedic knowledge of five-letter words!) to determine all remaining possibilities for the mystery word. If multiple mystery words are still possible, you will pick one of these at random.

Which four five-letter words would you choose to maximize your chances of victory? Assume that the mystery word is selected randomly from [this word list](https://norvig.com/ngrams/enable1.txt), which is also the list your guesses must be chosen from.

*Extra credit:* For the four five-letter words you chose, what are your chances of victory?

In [1]:
import numpy as np

Let's read in the data. We can ignore any words that aren't exactly 5 letters long.

In [2]:
with open('data/word-list.txt') as f:
    WORDS = tuple(word.strip().upper() for word in f if len(word.strip()) == 5)

I'll store all the words in a `numpy` array with five columns, where each column stores one letter of a words. This will allow me to easily select words with matching characters in certain positions.

In [4]:
words = np.array([tuple(word) for word in WORDS])
words

array([['A', 'A', 'H', 'E', 'D'],
       ['A', 'A', 'L', 'I', 'I'],
       ['A', 'A', 'R', 'G', 'H'],
       ...,
       ['Z', 'O', 'R', 'I', 'S'],
       ['Z', 'O', 'W', 'I', 'E'],
       ['Z', 'Y', 'M', 'E', 'S']], dtype='<U1')

In [5]:
words_five_characters = tuple(word for word in WORDS if len(set(word)) == 5)

In [6]:
words[:, 1] == 'A'

array([ True,  True,  True, ..., False, False, False])

In [7]:
len(words)

8636

In [16]:
target = np.array(tuple('HYPOS'))
guess = np.array(tuple('HELPS'))
guess

array(['H', 'E', 'L', 'P', 'S'], dtype='<U1')

In [30]:
target

array(['H', 'Y', 'P', 'O', 'S'], dtype='<U1')

In [40]:
position_match = (guess == target)
np.where(position_match, guess, None)

array(['H', None, None, None, 'S'], dtype=object)

In [41]:
potential_words = words[np.all(words[:, position_match] == guess[position_match], axis=1)]
potential_words

array([['H', 'A', 'A', 'F', 'S'],
       ['H', 'A', 'A', 'R', 'S'],
       ['H', 'A', 'B', 'U', 'S'],
       ['H', 'A', 'C', 'K', 'S'],
       ['H', 'A', 'D', 'E', 'S'],
       ['H', 'A', 'E', 'M', 'S'],
       ['H', 'A', 'E', 'T', 'S'],
       ['H', 'A', 'F', 'I', 'S'],
       ['H', 'A', 'F', 'T', 'S'],
       ['H', 'A', 'H', 'A', 'S'],
       ['H', 'A', 'I', 'K', 'S'],
       ['H', 'A', 'I', 'L', 'S'],
       ['H', 'A', 'I', 'R', 'S'],
       ['H', 'A', 'J', 'E', 'S'],
       ['H', 'A', 'J', 'I', 'S'],
       ['H', 'A', 'K', 'E', 'S'],
       ['H', 'A', 'L', 'E', 'S'],
       ['H', 'A', 'L', 'L', 'S'],
       ['H', 'A', 'L', 'M', 'S'],
       ['H', 'A', 'L', 'O', 'S'],
       ['H', 'A', 'L', 'T', 'S'],
       ['H', 'A', 'M', 'E', 'S'],
       ['H', 'A', 'N', 'D', 'S'],
       ['H', 'A', 'N', 'G', 'S'],
       ['H', 'A', 'N', 'K', 'S'],
       ['H', 'A', 'N', 'T', 'S'],
       ['H', 'A', 'R', 'D', 'S'],
       ['H', 'A', 'R', 'E', 'S'],
       ['H', 'A', 'R', 'K', 'S'],
       ['H', '

In [37]:
letters_in_word = np.isin(guess, target)

In [45]:
letters_in_incorrect_position = guess[np.logical_xor(letters_in_word, position_match)]
letters_in_incorrect_position

array(['P'], dtype='<U1')

In [50]:
np.logical_xor(letters_in_word, position_match)

array([False, False, False,  True, False])