# Find Candidate Words

Suppose you are working on a puzzle which is one of the many variants of Wordle and are stuck on a word - maybe you know some of the letters but can figure out how to arrange them to find the correct word.

Below is a method you can use to create word hints.

Cesar Mugnatto  
2024-03-03

In [1]:
# Import the necessary libaries
import itertools
import numpy as np
import requests
import json

In [2]:
# We will use this API to obtain a list of possible matching words given the known letters in your word
# Full documentation of the API endpoints and syntax located at https://www.datamuse.com/api/
api_url = "https://api.datamuse.com/words?sp="

As an example, suppose you made 3 guesses which resulted in the following response:

![Guesses](images/guesses.png)

We know that the "r" in position 1 (using a 0-based index) is in the correct spot, and the "a" in position 2 is also correct. We know the letter "t" exists somewhere but not in position 3 which is what we guessed.

Examining the "keyboard" where our guesses have been registered, we see that the letters Q, W, F, H, J, K, Z, X, C, V, B have not been used in any guesses yet, so all are still avaiable.

![Available](images/remaining_letters.png)

We would fill out the parameters `known_letters_loc` and `avail_letters` below with the relevant info.

In [3]:
# The length of the word you want to generate
word_length = 5 # Typical for Wordle

# The letters of the word you already know in the form (letter, position)
# where letter is a single alphabetic character, and position is a value
# between 0 and the length of the word - 1 (0-based indexing)
# or -1 (signifying an unknown position)

known_letters_loc = [
      ('a', 2)
    , ('r', 1)
    , ('t', -1)
]

# Letters that could still be used - we will add the known letters to these since letters can repeat within words
avail_letters = ['q', 'w', 'f', 'h', 'j', 'k', 'z', 'x', 'c', 'v', 'b']

In [4]:
# First extract the letters for which we know their location
fixed_letters = sorted([l for l in known_letters_loc if l[1] >= 0], key=lambda k: k[1])
fixed_letters

[('r', 1), ('a', 2)]

In [5]:
# We also need just the letters without their position for calling the API
known_letters = [l[0] for l in known_letters_loc]
known_letters

['a', 'r', 't']

In [6]:
avail_letters.extend(known_letters)
avail_letters

['q', 'w', 'f', 'h', 'j', 'k', 'z', 'x', 'c', 'v', 'b', 'a', 'r', 't']

In [7]:
# To call the API, we need to pass all possible permutations of the letters which we know exist
known_combos = list(itertools.permutations(known_letters))
known_combos

[('a', 'r', 't'),
 ('a', 't', 'r'),
 ('r', 'a', 't'),
 ('r', 't', 'a'),
 ('t', 'a', 'r'),
 ('t', 'r', 'a')]

In [8]:
# From those permutations, build the query portion of the API
known_patterns = ['*'+('*'.join(c))+'*' for c in known_combos]
known_patterns

['*a*r*t*', '*a*t*r*', '*r*a*t*', '*r*t*a*', '*t*a*r*', '*t*r*a*']

In [9]:
# Now, loop through each of the query patterns calling the API
candidate_words = []
for p in known_patterns:
    req_url = api_url + p
    resp = requests.get(req_url)
    words = json.loads(resp.text)
    # The API will return many words that do not match our required length so we filter out the non-matching lengths
    good_words = [w['word'] for w in words if len(w['word'])==word_length]
    if len(good_words) > 0:
        # If we still have some words of the correct length remaining, add them to the list
        candidate_words.extend(good_words)

print(candidate_words)

['darth', 'tarot', 'tatar', 'treat', 'tract', 'trait', 'wrath', 'taper', 'tarry', 'tardy', 'tiara', 'taser', 'tears', 'tapir', 'treat', 'tread', 'trial', 'ultra', 'tetra', 'tiara', 'terra', 'torah']


In [10]:
# Now we can apply filtering on the remaining words where the letter and position are both known
# First convert the letters in all of the words to a numpy array (syntactically easier to filter)
candidate_arr = np.unique(np.array([list(w) for w in candidate_words]), axis=0)
candidate_arr

array([['d', 'a', 'r', 't', 'h'],
       ['t', 'a', 'p', 'e', 'r'],
       ['t', 'a', 'p', 'i', 'r'],
       ['t', 'a', 'r', 'd', 'y'],
       ['t', 'a', 'r', 'o', 't'],
       ['t', 'a', 'r', 'r', 'y'],
       ['t', 'a', 's', 'e', 'r'],
       ['t', 'a', 't', 'a', 'r'],
       ['t', 'e', 'a', 'r', 's'],
       ['t', 'e', 'r', 'r', 'a'],
       ['t', 'e', 't', 'r', 'a'],
       ['t', 'i', 'a', 'r', 'a'],
       ['t', 'o', 'r', 'a', 'h'],
       ['t', 'r', 'a', 'c', 't'],
       ['t', 'r', 'a', 'i', 't'],
       ['t', 'r', 'e', 'a', 'd'],
       ['t', 'r', 'e', 'a', 't'],
       ['t', 'r', 'i', 'a', 'l'],
       ['u', 'l', 't', 'r', 'a'],
       ['w', 'r', 'a', 't', 'h']], dtype='<U1')

In [11]:
# We filter out any words that contain letters that have been determined not to exist in the word
filtered_candidate_arr = np.array([x for x in candidate_arr.tolist() if len(set(x).difference(set(avail_letters))) == 0])
filtered_candidate_arr

array([['t', 'a', 't', 'a', 'r'],
       ['t', 'r', 'a', 'c', 't'],
       ['w', 'r', 'a', 't', 'h']], dtype='<U1')

In [12]:
# Loop through each of the known letter positions and apply the filter of this letter to its position in the word
for f in fixed_letters:
    filtered_candidate_arr = filtered_candidate_arr[np.argwhere(filtered_candidate_arr[:, f[1]] == f[0]).reshape(1, -1)[0]]

# Convert the array of letters remaining into words remaining and display them
list_candidate_words = [''.join(l) for l in filtered_candidate_arr]
print('\n'.join(list_candidate_words))

tract
wrath


If we had chosen:

```python
known_letters_loc = [
      ('a', -1)
    , ('i', 1)
    , ('c', 4)
    , ('l', -1)
]
```

and for:

```python
avail_letters = ['q', 'z', 'x', 'j']
```

our result would have been displayed as the only word possibel given our criteria:

```text
lilac
```