# Forward

I want to start off by taking a moment to thank you all for being here. I want to take an additional moment to acknowledge the privilege that has allowed me to take time off work and attempt my current career change.

### About Me

Hi I amd Daniel Cohen (he/him), reformed geologist and aspiring data-scientist! I started my journey to data-sciences in October 2019 when I enrolled in a data science accelerator program. Since then I have countless online tutorials and plenty of collaborations with fellow coders. My objective is to pivot my career from environmental consulting to data-sciences. Without a degree in computer-science or a related discipline how to you convince potential employers of your abilities? A portfolio! 

### My Ask

Hopefully I am reaching the end of the beginning of my journey, but I still have a long way to go. My ask from you is to provide a code review. Please volunteer suggestions for improvements or how to make this project more professional.


# New York Times Spelling Bee

Impress your friends and co-workers with your ability to provide answers to the New York Time's, Spelling Bee word game! 

For the uninitated the NYTSB is a daily word puzzel where you get 7 letters, 6 outer and 1 inner letters - You score points by making words from the available letters! Each word must include the center letter and be longer than 3 characters. In this notebook and supporting files you will see 3 solutions.


![](http://www.jeanyoungkim.com/projects/assets/bee/spelling-bee.png)


In [2]:
import time
run spelling_bee

BASE
0.7044458389282227 54

Method 1
277.05665826797485 54

Method 2
0.8110339641571045 54

Method 3
0.5704441070556641

Pro
0.290848970413208 24


# List of Words

The code does not know which combination of letters consitute a word. Utilized Natural Lanugage Processing Tool Kit (NLTK) corpus word. 


In [23]:
center = p.centerLetter
letters = p.validLetters

import nltk
nltk.download('words')

vocab = words.words()
print("Number of words in vocab to evaluate: {}. Today's valid letters are {} and the center letter is {}".format(len(vocab), letters, center))
vocab[:10]

Number of words in vocab to evaluate: 236736. Today's valid letters are ['n', 'a', 'c', 'l', 't', 'v', 'y'] and the center letter is n


[nltk_data] Downloading package words to /Users/drahcir1/nltk_data...
[nltk_data]   Package words is already up-to-date!


['A',
 'a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'Aani',
 'aardvark',
 'aardwolf',
 'Aaron']

There are four criteria for a word to be considered a possible solution.

1. It must be 4 or more characters
2. No proper nouns (no capitals)
3. Contains only letters from the puzzle, letters can be repeated
4. Center character at least once

# Stupid Method 1

### The Simple Son: Loads entire vocabulary and loops it 4 times to evaluate each solution criteria. 

This method is stupid. Minimum Big(O) is vocab * 4. But I did learn a few things.

1.     I needed to iterate through a 200k list. I wanted to edit the list inplace to save memory and keep my code clean. When the python interrupter iterates through a list it starts at the beginneg and moves towars the end. The issue is that if you remove a word in this process it misses up the indexing. To fix this I itterated from the back of the list. 
     

In [20]:
temp = list('abcdefghijklmnopqrstuvwxyz')
vowels = list('aeiou')

for c in temp:
    if c not in vowels:
        temp.remove(c)
print(temp)

#backwards itteration
temp = list('abcdefghijklmnopqrstuvwxyz')
for c in temp[::-1]:
    if c not in vowels:
        temp.remove(c)
print(temp)

['a', 'c', 'e', 'g', 'i', 'k', 'm', 'o', 'q', 's', 'u', 'w', 'y']
['a', 'e', 'i', 'o', 'u']


2.  Work smarter not harder - embrace the built-in data types. I think my over-confident male ego I wanted to code everything myself. Turns out people the python base code is much better than anything I could write. For instance using the set data type.

In [28]:
start = time.time()
for word in vocab[::-1]:
    if not set(list(word)).issubset(letters):
        vocab.remove(word)
end = time.time()
print(end-start)

301.7398669719696


In [33]:
vocab = words.words()
start = time.time()
vocab[:] = [word for word in vocab[::-1] if set(list(word)).issubset(letters)]
end = time.time()
print(end-start, len(vocab))

0.19579100608825684 165


In [34]:
#stupid way
vocab = words.words()
start = time.time()

def evaluate(word):
    for letter in word:
        if not letter in letters:
            return False
    else:
        return True
    
vocab[:] = [word for word in vocab[::-1] if evaluate(word)]
end = time.time()
print(end-start, len(vocab))

0.06855511665344238 165


# Lazy Method 2

### Lazy only loads words that meet criteria

Three insights in this method. 
1. Instead of iterating through the list 4 times to check each solution criteria iterate through the list once and check all four at once. 
2. Instead of loading the entire vocab into memory, read the corpus one word at a time and only load it to memory if it passes the solution criteria check. 
3. Slice into the list istead of iterating through it.

        Slice operations require more thought. To access the slice [a:b] of a list, we 
        must iterate over every element between indices a and b. So, slice access is 
        O(k)O(k), where kk is the size of the slice. Deleting a slice is O(n)O(n) for 
        the same reason that deleting a single element is O(n)O(n): nn subsequent 
        elements must be shifted toward the list's beginning.[1]
        
        [1] https://bradfieldcs.com/algos/analysis/performance-of-python-types/


In [43]:
start = time.time()
vocab = words.words()
for word in vocab[:-1000:-1]: #iterates the 1000 words at the back of the list ONLY
    if not set(list(word)).issubset(letters):
        vocab.remove(word)
end = time.time()
print(end-start)

1.6309876441955566


In [46]:
vocab = words.words()
start = time.time()
vocab[:] = [word for word in vocab[::-1] if set(list(word)).issubset(letters)]
end = time.time()
print(end-start, len(vocab))
vocab[:10]

0.19491004943847656 165


['tall', 'cat', 'at', 'any', 'ant', 'all', 'act', 'a', 'yn', 'yaya']

In [45]:
vocab = words.words()
start = time.time()

def evaluate(word):
    for letter in word:
        if not letter in letters:
            return False
    else:
        return True
    
vocab[:] = [word for word in vocab[::-1] if evaluate(word)]
end = time.time()
print(end-start, len(vocab))
vocab[:10]

0.0689091682434082 165


# Method 3 (depreciated )

### A Trie data type 

![](https://media.geeksforgeeks.org/wp-content/cdn-uploads/Trie.png)

Trie is an efficient information reTrieval data structure. Using Trie, search complexities can be brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST will need time proportional to M * log N, where M is maximum string length and N is number of keys in tree. Using Trie, we can search the key in O(M) time. However the penalty is on Trie storage requirements [2]

[2] https://www.geeksforgeeks.org/trie-insert-and-search/

# Pro method

I was writing a script that would scrape the daily puzzle and automate the solution generation progress. While parsing the code I noticed that the solutions are passed in javascript dictonary.


In [35]:
p.gameDataDict

{'today': {'expiration': 1592290800,
  'displayWeekday': 'Monday',
  'displayDate': 'June 15, 2020',
  'printDate': '2020-06-15',
  'centerLetter': 'n',
  'outerLetters': ['a', 'c', 'l', 't', 'v', 'y'],
  'validLetters': ['n', 'a', 'c', 'l', 't', 'v', 'y'],
  'pangrams': ['vacantly'],
  'answers': ['vacantly',
   'anal',
   'anally',
   'annal',
   'canal',
   'cancan',
   'canna',
   'canny',
   'cant',
   'cantata',
   'clan',
   'cyan',
   'lantana',
   'naan',
   'nana',
   'nanny',
   'natal',
   'natant',
   'natty',
   'naval',
   'navally',
   'navy',
   'vacancy',
   'vacant'],
  'id': 5593,
  'freeExpiration': 0,
  'editor': 'Sam Ezersky'},
 'yesterday': {'displayWeekday': 'Sunday',
  'displayDate': 'June 14, 2020',
  'printDate': '2020-06-14',
  'centerLetter': 'l',
  'outerLetters': ['g', 'h', 'm', 'o', 't', 'y'],
  'validLetters': ['l', 'g', 'h', 'm', 'o', 't', 'y'],
  'pangrams': ['mythology'],
  'answers': ['mythology',
   'glom',
   'gloom',
   'gloomy',
   'goggly',
  