# **Can you decipher the secret message?**

## [Riddler Classic](https://fivethirtyeight.com/features/can-you-decipher-the-secret-message/), June 4, 2021

### solution by [Laurent Lessard](https://laurentlessard.com)

Alexander has long had an interest in topology, which just might be related to his submitted puzzle. Consider the following image showing a particular uppercase sans serif font:

<img src="https://fivethirtyeight.com/wp-content/uploads/2021/06/Screen-Shot-2021-05-31-at-6.53.29-PM.png?w=1400" width=500>

Alexander thinks many of these letters are equivalent, but he leaves it to you to figure out how and why. He also has a message for you:

<img src="https://fivethirtyeight.com/wp-content/uploads/2021/06/Screen-Shot-2021-05-31-at-6.56.54-PM.png?w=1400" width=500>

It may not look like much, but Alexander assures me that it is equivalent to exactly one word in the English language.

What is Alexander’s message?

---

## Solution

We start by creating a partition of the alphabet, which determines which groups of letters are considered to be equivalent. For example, we could use:
```python
['ABCDEFG','HIJKLMNOP','QRSTUVWXYZ']
```
Each letter maps to a group number `(A:1, K:2, X:3, etc)`. We can then map each English word to an integer. For example, `"GRAPE":13121`. If two words map to the same integer, then they are equivalent. We then start with our target word `"YIRTHA"` and see how many matches it has in the entire word list. The goal is to find a grouping that leads to a single match for the target word. The grouping should be thematic (based on the "topology" clue in the puzzle), and the word itself is also likely thematic.

The most thematic grouping is to use topological homeomorphism, i.e. counting the number of "holes" that each letter contains. This leads to the grouping:
```python
['CEFGHIJKLMNSTUVWXYZ', 'ADOPQR', 'B']
```
Unfortunately, this leads to 301 matches. Likely because the grouping is so lopsided; most letters have no holes. So we need to refine our grouping a bit. I tried several others (see code below). The solution turns out to be to group letters based on number of holes _and_ number of "dead ends". For example, `A` and `R` are still equivalent (both have one hole and two dead ends), but `A` and `Q` are no longer equivalent, because `Q` only has one dead end. This leads us to the grouping:
```python
['AR', 'B', 'CGIJLMNSUVWZ', 'DO', 'PQ', 'EFTY', 'HKX']
```
And this works! The match is `EUREKA`, which makes sense. Code is below

### Main code

In [101]:
# download a list of English words
import urllib.request
target_url = "https://norvig.com/ngrams/enable1.txt"
wordlist = urllib.request.urlopen(target_url).read().decode("utf-8").upper().splitlines()

# all letters in the alphabet
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [142]:
def findmatches(targetword, grouping):
    # filter the list to only keep words of the appropriate length
    wordlength = len(targetword)
    shortlist = [ word for word in wordlist if len(word) == wordlength ] 

    # create a map of each letter to its corresponding group number
    lettermap = { letter : ix for ix,group in enumerate(grouping,start=1) for letter in group }

    # function that maps words to groups (works for < 10 groups)
    def myhash(word):
        h = 0
        for letter in word:
            h = 10*h + lettermap[letter]
        return h

    # convert each word into its equivalence class identifier
    hashlist = [myhash(word) for word in shortlist]

    # get matching words in the list
    targethash = myhash(targetword)
    return [shortlist[ix] for ix,h in enumerate(hashlist) if h == targethash] 

In [147]:
# specify the target word
targetword = 'YIRTHA'

# some partitions that did not work
grouping = ['CEFGHIJKLMNSTUVWXYZ', 'ADOPQR', 'B']  # topological homeomorphism (number of holes)
grouping = ['BCDGIJLMNOPSTUVWZ', 'AEFQRXY', 'HK']  # number of pen strokes required to draw the letter
grouping = ['CIJOSU', 'DLPQTVX', 'ABFGHKNRYZ', 'EMW']  # number of distinct {curve,line} objects in each letter
grouping = ['BDO', 'PQ', 'ACGIJLMNRSUVWZ', 'EFTY', 'HKX']  # number of "dead ends" in each letter

# the winner!
grouping = ['AR', 'B', 'CGIJLMNSUVWZ', 'DO', 'PQ', 'EFTY', 'HKX']  # continuous deformation (preserving dead ends)

# matching list of words
matchlist = findmatches(targetword, grouping)

print(f'Target word is {seedword}')
print(f'There are {len(matchlist)} matches')
print('first 10 results:')
matchlist[:10]

Target word is YIRTHA
There are 1 matches
first 10 results:


['EUREKA']

### Supplementary functions (encoding/decoding)

In [316]:
import random

numgroups = len(grouping)
lettermap = { letter : ix for ix,group in enumerate(grouping) for letter in group }
inversemap = [ [] for _ in range(numgroups) ]
for letter in alphabet:
    inversemap[lettermap[letter]].append(letter)
    
def scramble(sentence):
    output = []
    for letter in sentence:
        if letter in ' ()':
            output.append(letter)
        else:
            output.append(random.choice(inversemap[lettermap[letter]]))
    return ''.join(output)

def unscramble(sentence):
    output = []
    words = sentence.split()
    for word in words:
        matches = findmatches(word, grouping)
        if len(matches) == 1:
            output.append(matches[0])
        else:
            output.append('?' * len(word))
    return ' '.join(output)

In [324]:
random.seed(0)
encoded = scramble('DEFORMATION TOPOLOGICAL HOMEOMORPHISM WITHOUT COLLAPSING PROTRUDING PORTIONS')
print(encoded)

OYEORNRYMDU FOPDVOUZVAL HDWTOUDRQKVWJ USYXOCE GOZWAQMJZM PADFAUOGGM QDRTZDUM


In [325]:
decoded = unscramble(encoded)
print(decoded)

DEFORMATION TOPOLOGICAL HOMEOMORPHISM WITHOUT COLLAPSING PROTRUDING PORTIONS
