# **Spelling Bee (version without S)**

## [Riddler Classic](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/), Jan 3, 2020

### solution by [Laurent Lessard](https://laurentlessard.com)

The New York Times recently launched some new word puzzles, one of which is [Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee). In this game, seven letters are arranged in a honeycomb lattice, with one letter in the center. Here’s the lattice from December 24, 2019:

<img src="https://fivethirtyeight.com/wp-content/uploads/2020/01/Screen-Shot-2019-12-24-at-5.46.55-PM.png?w=1136" width="250">

The goal is to identify as many words that meet the following criteria:
- The word must be at least four letters long.
- The word must include the central letter.
- The word cannot include any letter beyond the seven given letters.

Note that letters can be repeated. For example, the words GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, seven-letter words are worth 7 points, etc. Words that use all of the seven letters in the honeycomb are known as “pangrams” and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 15 points.

Which seven-letter honeycomb results in the highest possible game score? To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.

For consistency, please use [this word list](https://norvig.com/ngrams/enable1.txt) to check your game score.

---


# My solution

I used a brute-force approach. There are 3,364,900 different game boards and 44,585 eligible words. Solving this puzzle amounts to enumerating all possible words for each board and adding up their scores. In order to make this as efficient as possible, I used a few tricks:
- I filtered the dictionary to exclude words shorter than 4 letters or that used more than 7 different letters, since these can never occur.
- I pre-computed the scores of all remaining words in the dictionary
- I created separate sub-dictionaries for each letter of the alphabet (all words that contain 'a', all words that contain 'b', etc.)
- I used for loops that exit early when filtering the dictionary, which results in a 2x speed-up

After all was said and done, the function I wrote took on average 5.22ms to compute the score of each board, which meant it took about 4 hours 53 min to solve the whole problem.

The winning board was "RAEGINT" (with R in the center), for a total score of 3898.
This board yields a total of 537 words, 50 of which are pangrams.
The highest-scoring words are REAGGREGATING and REINTEGRATING (20 points each).

Here is the code I wrote to solve the problem:

In [72]:
from itertools import combinations
import json

### Methods that get run only once (efficiency is not important)

In [146]:
# read word list
with open('enable1.txt', 'r') as f:
    wlist = f.read().splitlines()
    
# all letters available (exlude 's')
letters = 'abcdefghijklmnopqrtuvwxyz'
    
# keep only the words that have length at least 4
wlist = [word for word in wlist if len(word) >= 4]

# eliminate words that contain the letter 's'
wlist = [word for word in wlist if 's' not in word]

# keep only the words that have at most 7 distinct letters
wlist = [word for word in wlist if len(set(word)) <= 7]

# create a list of words for each center letter
wlistc = {}
for center_letter in letters:
    wlistc[center_letter] = [word for word in wlist if center_letter in word]

# what is the score of a given word?
def word_score(word):
    if len(word) == 4:
        return 1
    else:
        score = len(word)        
        # if pangram (uses 7 different letters)
        if len(set(word)) == 7:
            score += 7
        return score

# dictionary of all words and their scores
wdict = {word:word_score(word) for word in wlist}
print("There are", len(wdict), "possible words that use at most 7 different letters")

There are 44585 possible words that use at most 7 different letters


### Methods that get run a ton of times (make as efficient as possible!)

In [109]:
# list of words that are relevant for a particular set of seven letters.
def score_set(seven_letters):
    ssl = set(seven_letters)
    # words that contain the center letter
    return sum( [wdict[word] for word in wlistc[seven_letters[0]] if set(word).issubset(ssl)] )


# more efficient implementation

def subset_eff(word,letterset):
    for lett in word:
        if lett not in letterset:
            return False
    return True

def score_set_eff(seven_letters):
    score = 0
    for word in wlistc[seven_letters[0]]:
        if subset_eff(word,seven_letters):
            score += wdict[word]
    return score

### Full set of of seven-letter boards that must be tested

In [110]:
%%time
# set of seven-letter boards that we must test
test_seven_letters = [ c + "".join(w) for c in letters for w in combinations(letters.replace(c,''),6)]
len(test_seven_letters)

Wall time: 1.65 s


3364900

### Test each of the 7-letter boards

In [113]:
%%time
test_scores = { seven_letters: score_set_eff(seven_letters) for seven_letters in test_seven_letters }
best_word = max(test_scores, key=test_scores.get)
print("best board:", best_word, "\nbest score:", test_scores[best_word])
with open('test_scores.json', 'w') as file:
    json.dump(test_scores, file)

best board: raegint 
best score: 3898
Wall time: 4h 53min 9s


### Take a closer look at the winner(s)

In [147]:
test_scores_list = [(key,val) for (key,val) in test_scores.items()]
test_scores_list.sort( key = lambda x:x[1], reverse=True )

In [148]:
# top scoring boards
test_scores_list[:5]

[('raegint', 3898),
 ('naegirt', 3782),
 ('eaginrt', 3769),
 ('eadinrt', 3672),
 ('taeginr', 3421)]

In [136]:
# extract words from this winner
r = [(word,wdict[word]) for word in wlistc['r'] if set(word).issubset('raegint')]
r.sort(key=lambda x: x[1], reverse=True)

In [138]:
# total number of words that can be made
print(len(r))

537


In [155]:
r[:60]

[('reaggregating', 20),
 ('reintegrating', 20),
 ('entertaining', 19),
 ('intenerating', 19),
 ('regenerating', 19),
 ('reinitiating', 19),
 ('aggregating', 18),
 ('gratineeing', 18),
 ('integrating', 18),
 ('itinerating', 18),
 ('reattaining', 18),
 ('reintegrate', 18),
 ('reiterating', 18),
 ('retargeting', 18),
 ('entraining', 17),
 ('entreating', 17),
 ('garnierite', 17),
 ('generating', 17),
 ('greatening', 17),
 ('ingratiate', 17),
 ('interregna', 17),
 ('intreating', 17),
 ('regranting', 17),
 ('retraining', 17),
 ('retreating', 17),
 ('argentine', 16),
 ('argentite', 16),
 ('gartering', 16),
 ('integrate', 16),
 ('intergang', 16),
 ('iterating', 16),
 ('nattering', 16),
 ('rattening', 16),
 ('regrating', 16),
 ('retagging', 16),
 ('retaining', 16),
 ('retearing', 16),
 ('tangerine', 16),
 ('targeting', 16),
 ('tattering', 16),
 ('aerating', 15),
 ('gnattier', 15),
 ('gratinee', 15),
 ('interage', 15),
 ('treating', 15),
 ('granite', 14),
 ('gratine', 14),
 ('ingrate', 14),
 ('t