# The Spelling Bee - *Solution*

<img src="https://raw.githubusercontent.com/civilinifr/fc_python_class/master/medium_challenges/images/calvin_spelling_bee.jpg" alt="Drawing" style="width: 350px;">
* Image created using Microsoft Designer

## Problem Description

Calvin is back home after a lengthy space voyage relaxing in his library. After finishing reading the New York Times, he turns to the games section and plays the "Spelling Bee" game:

https://www.nytimes.com/puzzles/spelling-bee

The objective of the game is to come up with as many words as possible which use some combination of the 7 displayed letters. One of the letters is fixed, meaning it has to be used at some point in the word. Only words of four letters or more are considered. 

Problem is, Calvin is really bad at this game, and quickly gives up. Can you design a code that is able to solve the Spelling Bee puzzle? 

<img src="https://raw.githubusercontent.com/civilinifr/fc_python_class/master/medium_challenges/images/bee_1.png" alt="Drawing" style="width: 400px;">



## Solution

We will be using the Natural Language Toolkit (NLTK) package for this task. The premise is that we're going to load a dictionary and then filter based on the letters that will not be used.

In [3]:
# Import the packages
from nltk.corpus import words
import pandas as pd
import numpy as np
import re

In [4]:
# Import the dictionary (only will need to run this once)
import nltk
nltk.download('words')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\fcivilin\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [6]:
# Example function using a fixed letter "u" and letters "r", "n", "y", "l", "a", "t". 
def main():
    """
    Main function to return the wordlist 
    :return:
    """
    # Fixed letter
    fixed_letter = 'u'

    # The 6 possible letters
    possible_letters = ['r', 'n', 'y', 'l', 'a', 't']

    # word_list = words.words()
    wordlist = words.words()
    wordlist = [x.lower() for x in wordlist]

    # df = pd.DataFrame(english_words_lower_set)
    df = pd.DataFrame(wordlist)

    # Get the length of words
    mylen = np.vectorize(len)
    words_len = mylen(df[0].values)

    # Remove from the dataframe words with length less than 4
    short_ind = np.where(words_len < 4)[0]
    df.drop(df.index[short_ind], inplace=True)
    df = df.reset_index(drop=True)

    # Find the letters that will not be used in the matching
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
                'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    alphabet.remove(fixed_letter)
    for val in possible_letters:
        alphabet.remove(val)
    unused_letters = alphabet

    # Remove words from the dictionary which include the unused letters
    unused_words_ind = []
    for wordind in np.arange(len(df[0].values)):
        word = df[0].values[wordind]
        if any([characters in unused_letters for characters in word]):
            unused_words_ind.append(wordind)

    df.drop(df.index[unused_words_ind], inplace=True)
    df = df.reset_index(drop=True)

    # Lastly, remove all the words that do not contain the fixed letter
    dropind = []
    for wordind in np.arange(len(df[0].values)):
        word = df[0].values[wordind]
        if not fixed_letter in word:
            dropind.append(wordind)

    df.drop(df.index[dropind], inplace=True)
    df = df.reset_index(drop=True)

    df = df.sort_values([0])
    df = df.reset_index(drop=True)

    # Print out the wordlist
    print(df)

    # We can output the list to a path with a simple pandas function
    # outfolder = '/output/folder/path/'
    # df.to_csv(f'{outfolder}wordlist.txt', index=False, header=None)

    return


# Main
main()

          0
0      aaru
1    alraun
2    alruna
3     altun
4     alula
..      ...
175   yulan
176    yurt
177   yurta
178  yuruna
179    yutu

[180 rows x 1 columns]
