# Can you find the fish in state names?

This is my solution to the Riddler Classic from May 22nd, 2020

https://fivethirtyeight.com/features/somethings-fishy-in-the-state-of-the-riddler/

From Mark Bradwin comes a fishy puzzle about state names:

Ohio is the only state whose name doesn’t share any letters with the word “mackerel.” It’s strange, but it’s true.

But that isn’t the only pairing of a state and a word you can say that about — it’s not even the only fish! Kentucky has “goldfish” to itself, Montana has “jellyfish” and Delaware has “monkfish,” just to name a few.

What is the longest “mackerel?” That is, what is the longest word that doesn’t share any letters with exactly one state? (If multiple “mackerels” are tied for being the longest, can you find them all?)

Extra credit: Which state has the most “mackerels?” That is, which state has the most words for which it is the only state without any letters in common with those words?

For both the Riddler and the extra credit, please refer to Friend of the Riddler™ Peter Norvig’s word list:

https://norvig.com/ngrams/word.list

In [1]:
# import pandas and set display options
import pandas as pd
pd.set_option('display.max_columns', None)

# declare filepaths
word_path = r"C:\Users\************************\ANALYSIS\538 riddles\- data\wordlist.csv"
state_path = r"C:\Users\***********************\- ANALYSIS\538 riddles\- data\US states.csv"

# read in the data to dataframes, and convert all letters to lower case
wordlist = pd.read_csv(word_path, header=None, names =['word'], keep_default_na=False)
wordlist['word'] = wordlist['word'].str.lower()
stateslist = pd.read_csv(state_path, header=None, names = ['state'])
stateslist['state'] = stateslist['state'].str.lower()

# display head of each dataframe
display(wordlist.head())
display(stateslist.head())

Unnamed: 0,word
0,aa
1,aah
2,aahed
3,aahing
4,aahs


Unnamed: 0,state
0,alabama
1,alaska
2,arizona
3,arkansas
4,california


In [2]:
# initialise a list to store dictionaries of the letters of each word
letters_in_words = []

# loop through the wordlist, create a dictionarty of the letters of each word, and append to letters_in_words
for word in wordlist['word']:
    letter_dict = dict.fromkeys(word, True)  
    letters_in_words.append(letter_dict)

# create a dataframe to store the results    
df = pd.DataFrame(letters_in_words)
df.set_index(wordlist['word'], inplace=True)

# display the head and tail of the dataframe
display(df.head())
display(df.tail())

Unnamed: 0_level_0,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
word,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
aa,True,,,,,,,,,,,,,,,,,,,,,,,,,
aah,True,,,,,,,True,,,,,,,,,,,,,,,,,,
aahed,True,,,True,True,,,True,,,,,,,,,,,,,,,,,,
aahing,True,,,,,,True,True,True,,,,,True,,,,,,,,,,,,
aahs,True,,,,,,,True,,,,,,,,,,,True,,,,,,,


Unnamed: 0_level_0,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
word,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
zythum,,,,,,,,True,,,,,True,,,,,,,True,True,,,,True,True
zythums,,,,,,,,True,,,,,True,,,,,,True,True,True,,,,True,True
zyzzyva,True,,,,,,,,,,,,,,,,,,,,,True,,,True,True
zyzzyvas,True,,,,,,,,,,,,,,,,,,True,,,True,,,True,True
zzz,,,,,,,,,,,,,,,,,,,,,,,,,,True


In [3]:
# create a dataframe to store the results
initial_results = pd.DataFrame(columns=['state', 'word', 'word_length'])

# loop through the states, and create a list of unique letters in that state, removing whitespaces
for state in stateslist['state']:
    state_letters = []
    state_letters = list(set(state))
    state_letters.sort()
    if " " in state_letters: state_letters.remove(" ")  
    
    # create a temp df and filter columns for only the letters that the state contains
    # then filter the dataframe for rows (words) that are all NaN, i.e. they don't share any letters with that state
    temp_df = df[state_letters].copy()
    state_in_question = []
    state_in_question = list(temp_df.index[temp_df.isnull().all(1)])
    state_in_question = pd.DataFrame(data=state_in_question, columns = ['word'])
    state_in_question['state'] = state
    state_in_question['word_length'] = state_in_question['word'].str.len()
    
    # append the values in this dataframe to the final_results dataframe
    initial_results = initial_results.append(state_in_question, sort=False, ignore_index=True)

In [4]:
# sort values in final_results by state and then word_length
initial_results.sort_values(by = ['state', 'word_length'], ascending = False, inplace = True)

# create a final_results dataframe with duplicate words removed
final_results = initial_results.drop_duplicates(subset = 'word', keep = False)

# what are the top twenty largest mackerels
display(final_results.sort_values(by = 'word_length', ascending = False).head(20))

Unnamed: 0,state,word,word_length
3520,alabama,counterproductivenesses,23
268792,mississippi,hydrochlorofluorocarbon,23
3519,alabama,counterproductiveness,21
46431,alabama,unconscientiousnesses,21
42119,alabama,supposititiousnesses,20
46555,alabama,underconsciousnesses,20
133240,hawaii,overscrupulousnesses,20
370344,ohio,transcendentalnesses,20
373218,ohio,untranslatablenesses,20
441845,utah,incompressiblenesses,20


In [5]:
# display the number of 'mackerels' for each state 
for state in stateslist['state']:
    no_of_mackerels = final_results[final_results['state'] == state].shape[0]
    display(state + " has " + str(no_of_mackerels) + " mackerels")

'alabama has 8274 mackerels'

'alaska has 1261 mackerels'

'arizona has 0 mackerels'

'arkansas has 0 mackerels'

'california has 0 mackerels'

'colorado has 481 mackerels'

'connecticut has 9 mackerels'

'delaware has 399 mackerels'

'florida has 0 mackerels'

'georgia has 0 mackerels'

'hawaii has 1763 mackerels'

'idaho has 0 mackerels'

'illinois has 79 mackerels'

'indiana has 482 mackerels'

'iowa has 201 mackerels'

'kansas has 884 mackerels'

'kentucky has 1580 mackerels'

'louisiana has 0 mackerels'

'maine has 14 mackerels'

'maryland has 67 mackerels'

'massachusetts has 0 mackerels'

'michigan has 4 mackerels'

'minnesota has 0 mackerels'

'mississippi has 4863 mackerels'

'missouri has 73 mackerels'

'montana has 648 mackerels'

'nebraska has 0 mackerels'

'nevada has 1229 mackerels'

'new hampshire has 0 mackerels'

'new jersey has 337 mackerels'

'new mexico has 30 mackerels'

'new york has 105 mackerels'

'north carolina has 0 mackerels'

'north dakota has 54 mackerels'

'ohio has 11342 mackerels'

'oklahoma has 369 mackerels'

'oregon has 682 mackerels'

'pennsylvania has 0 mackerels'

'rhode island has 0 mackerels'

'south carolina has 0 mackerels'

'south dakota has 0 mackerels'

'tennessee has 1339 mackerels'

'texas has 639 mackerels'

'utah has 6619 mackerels'

'vermont has 27 mackerels'

'virginia has 107 mackerels'

'washington has 0 mackerels'

'west virginia has 0 mackerels'

'wisconsin has 60 mackerels'

'wyoming has 1364 mackerels'