# Anigrams

#### Riddler Classic for Friday September 16, 2022. Solution by [Laurent Lessard](https://laurentlessard.com)

This week's [Riddler Classic](https://fivethirtyeight.com/features/can-you-build-the-biggest-anigram/) is a particular game of dice.

> If you like Wordle, then you might also enjoy Anigrams, a game created by Friend-of-the-Riddler™ Adam Wagner.
> In the game of Anigrams, you unscramble successively larger, nested collections of letters to create a valid “chain” of six English words between four and nine letters in length.
> For example, a chain of five words (sadly, less than the six needed for a valid game of Anigrams) can be constructed using the following sequence, with each term after the first including one additional letter than the previous term:
>
> - DEIR (which unscrambles to make the words DIRE, IRED or RIDE)
> - DEIRD (DRIED or REDID)
> - DEIRDL (DIRLED, DREIDL or RIDDLE)
> - DEIRDLR (RIDDLER)
> - DEIRDLRS (RIDDLERS)
> 
> What is the longest chain of such nested anagrams you can create, starting with four letters?
>
> For specificity, all valid words must come from Peter Norvig’s [world list](https://norvig.com/ngrams/enable1.txt) (a list we’ve used [previously](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/) here at The Riddler).

---

In [80]:
import time

In [51]:
# read full word list
with open('enable1.txt', 'r') as f:
    wordlist = f.read().splitlines()
print(len(wordlist), 'words in total')
    
# keep only the words that have length at least 4
wordlist = [word for word in wordlist if len(word) >= 4]
print(len(wordlist), 'words of length at least 4')

# sort each word alphabetically and remove duplicates
wordset = { ''.join(sorted(word)) for word in wordlist }
print(len(wordset), 'equivalences classes')

# find length of longest word
longest_word = max(wordset,key=lambda w: len(w))
MAX_LENGTH = len(longest_word)
print('the longest word is', longest_word, 'and it has has length', MAX_LENGTH)

# split the words by length into lists
wordlists = []
for length in range(4, MAX_LENGTH+1):
    wordlists.append( [ word for word in wordset if len(word) == length ] )
for wlist in wordlists:
    wlist.sort()
    
# print out word lengths
for length in range(4, MAX_LENGTH+1):
    print('Length', '{:2d}'.format(length), ':', len(wordlists[length-4]), 'total words')

172820 words in total
171752 words of length at least 4
155689 equivalences classes
the longest word is aaaacdeeeeeeehiilmnnrsttttty and it has has length 28
Length  4 : 2674 total words
Length  5 : 6303 total words
Length  6 : 11958 total words
Length  7 : 19424 total words
Length  8 : 25434 total words
Length  9 : 23396 total words
Length 10 : 19708 total words
Length 11 : 15242 total words
Length 12 : 11231 total words
Length 13 : 7777 total words
Length 14 : 5102 total words
Length 15 : 3182 total words
Length 16 : 1935 total words
Length 17 : 1123 total words
Length 18 : 592 total words
Length 19 : 329 total words
Length 20 : 160 total words
Length 21 : 62 total words
Length 22 : 30 total words
Length 23 : 13 total words
Length 24 : 9 total words
Length 25 : 2 total words
Length 26 : 0 total words
Length 27 : 2 total words
Length 28 : 1 total words


In [91]:
# Since there are no words of length 26, it is impossible to have a chain any longer than 25
# so we can restrict our attention to words that are length 25 or less.

'''
This function returns true if the words w1 and w2 (len(w1)+1 == len(w2)) are valid neighbors
in a chain; i.e. w2 has the same letters as w1 but contains one extra letter.
'''
def isconnected(w1, w2):
    i = 0
    while i < len(w1):
        if w1[i] == w2[i]:
            i += 1
        else:
            return w1[i:] == w2[i+1:]
    return True

In [92]:
graph = {}

for i in range(20):
    start_time = time.time()
    for w1 in wordlists[i]:
        graph[w1] = []
        for w2 in wordlists[i+1]:
            if isconnected(w1,w2):
                graph[w1].append(w2)

    end_time = time.time()
    print('Length', i+4, '-->', i+5, ': ', '{0:.3g}'.format(end_time-start_time), 'seconds')

Length 4 --> 5 :  7.23 seconds
Length 5 --> 6 :  36.7 seconds
Length 6 --> 7 :  110 seconds
Length 7 --> 8 :  235 seconds
Length 8 --> 9 :  280 seconds
Length 9 --> 10 :  232 seconds
Length 10 --> 11 :  162 seconds
Length 11 --> 12 :  82.7 seconds
Length 12 --> 13 :  43.4 seconds
Length 13 --> 14 :  19.8 seconds
Length 14 --> 15 :  8.1 seconds
Length 15 --> 16 :  3.14 seconds
Length 16 --> 17 :  1.12 seconds
Length 17 --> 18 :  0.344 seconds
Length 18 --> 19 :  0.102 seconds
Length 19 --> 20 :  0.029 seconds
Length 20 --> 21 :  0.00599 seconds
Length 21 --> 22 :  0.000994 seconds
Length 22 --> 23 :  0 seconds
Length 23 --> 24 :  0 seconds


In [108]:
# initialize with all words of length 4
# only keep those that are reachable as we progress
pool_levels = [wordlists[0]]
for i in range(20):
    pool_levels.append( set().union(*[ graph[w] for w in pool_levels[i] ]) )
    print('Length', i+4, ': ', len(pool_levels[i]), 'words remaining')

Length 4 :  2674 words remaining
Length 5 :  5872 words remaining
Length 6 :  10047 words remaining
Length 7 :  14827 words remaining
Length 8 :  16868 words remaining
Length 9 :  13114 words remaining
Length 10 :  7808 words remaining
Length 11 :  3678 words remaining
Length 12 :  1430 words remaining
Length 13 :  441 words remaining
Length 14 :  126 words remaining
Length 15 :  26 words remaining
Length 16 :  2 words remaining
Length 17 :  0 words remaining
Length 18 :  0 words remaining
Length 19 :  0 words remaining
Length 20 :  0 words remaining
Length 21 :  0 words remaining
Length 22 :  0 words remaining
Length 23 :  0 words remaining


In [115]:
'''
given a sorted string, return a list of words that could produce that string
'''
def unsort_word(wstr):
    return [ word for word in wordlist if ''.join(sorted(word)) == wstr ]

In [117]:
# these are the possible ending words
[ unsort_word(wstr) for wstr in pool_levels[12] ]

[['underestimations'], ['indeterminations']]

In [168]:
# reconstruct the optimal chain
optimal_chain = [ list(pool_levels[12])[0] ]

for i in range(12):
    optimal_chain.append(  [ word for word in pool_levels[12-i-1] if isconnected(word,optimal_chain[i]) ][0] )

optimal_chain = optimal_chain[::-1]
print(optimal_chain)

['einn', 'aeinn', 'aeinnt', 'aeinnst', 'aeinnost', 'aeiinnost', 'aeiimnnost', 'aeiimnnorst', 'aeiimnnorstt', 'adeiimnnorstt', 'adeeiimnnorstt', 'adeeiimnnorsttu', 'adeeiimnnorssttu']


In [169]:
# reverse order and put in correct format
'''
find extra letter present in second word
'''
def get_extra_letter(w1, w2):
    L = len(w1)
    for i in range(L):
        if w1[i] != w2[i]:
            return w2[i]
    return w2[-1]

In [170]:
# get sequence of "extra" letters added at each step
letter_sequence = [ get_extra_letter( optimal_chain[i], optimal_chain[i+1] ) for i in range(len(optimal_chain)-1) ]
print(letter_sequence)

['a', 't', 's', 'o', 'i', 'm', 'r', 't', 'd', 'e', 'u', 's']


In [171]:
# print the final sequence along with anagrammed words
final_chain = [optimal_chain[0]]
for lett in letter_sequence:
    final_chain = final_chain + [final_chain[-1] + lett]
    
print(final_chain)

['einn', 'einna', 'einnat', 'einnats', 'einnatso', 'einnatsoi', 'einnatsoim', 'einnatsoimr', 'einnatsoimrt', 'einnatsoimrtd', 'einnatsoimrtde', 'einnatsoimrtdeu', 'einnatsoimrtdeus']


In [172]:
# print formatted list for the solution:
for word in final_chain:
    print(word.upper(), ':', unsort_word(''.join(sorted(word))) )

EINN : ['nine']
EINNA : ['inane']
EINNAT : ['innate']
EINNATS : ['inanest', 'stanine']
EINNATSO : ['enations', 'sonatine']
EINNATSOI : ['antinoise']
EINNATSOIM : ['antimonies', 'antinomies']
EINNATSOIMR : ['inseminator', 'nitrosamine']
EINNATSOIMRT : ['terminations']
EINNATSOIMRTD : ['antimodernist']
EINNATSOIMRTDE : ['determinations']
EINNATSOIMRTDEU : ['underestimation']
EINNATSOIMRTDEUS : ['underestimations']
