### Riddler Classic: 

Ahh, all about wordle! <add notes> 
    
    
### General Approach: 
    
- Build process that can play the game
- Determine optimal start word based on win-probability across all "mystery" words (thankfully sorted in alphabetical order so that there aren't too many spoilers)
    
    
### Status:
    
Able to win a few games in 3 or less guesses, but seems rare. There are a few bugs to clean up such as:
    - my dictionaries with indices are super ugly and I can't seem to fix it....annoying
    - i don't have an optimal strategy, just choosing based on most diverse words. curious if I need to consider vowels here

In [1]:
import pandas as pd
import random
from collections import defaultdict

In [2]:
# Functions for checking guess against mystery word
def excludeChars(guess, mystery_word):
    """Determine which chars should not be guessed again
       Return set of these chars
    """
    exclude_chars = set()
    for letter in guess:
        if letter not in mystery_word:
            exclude_chars.update(letter)
    return exclude_chars


def includeChars(guess, mystery_word):
    """Determine which chars should be guessed again
       Return set of these chars
    """
    include_chars = set()
    for letter in guess:
        if letter in mystery_word:
            include_chars.update(letter)
    return include_chars

def properPos(guess, mystery_word):
    """Determine if any chars properly guessed are in the proper index for the mystery word
        Return dict with index
    """
    pos_dict = defaultdict(set)
    for i, char in enumerate(guess):
        if char == mystery_word[i]:
            pos_dict[char].add(i)
    return pos_dict
    
def improperPos(guess, mystery_word, pos_dict):
    """Determine if any chars properly guessed are eligible but in the improper index for the mystery word
        Return dict with index
        
        Note: Minor bug - if a mystery word has 2 chars that are the same there could be issues in return
    """
    impos_dict = defaultdict(set)
    for i, char in enumerate(guess):
        if (char in mystery_word) and (char != mystery_word[i]) and (i not in pos_dict[char]):
            impos_dict[char].add(i)     
    return impos_dict

# functions for search

def charRemoveWords(ex_set, in_set, guess_words):
    """Remove words from eligible guesses based on ex_set (excluded chars) and
       in_set (included chars). 
       
       Return new set
    """
    new_set = set()
    for guess in guess_words:
        # check if illicit chars are in guess, then remove
        ex_len = len([char for char in guess if char in ex_set])
            
        # check if we have any eligible chars, if not remove
        in_len = len([char for char in guess if char in in_set])
        
        # ensure 0 ex_len and > 0 in_len
        if (ex_len > 0):
            continue
        elif (in_len < len(in_set)):
            continue
        else:
            new_set.add(guess)  
    return new_set

def idxRemoveWords(proper_pos, improper_pos, guess_words):
    """Limit words from eligible guesses based on index information
    
        I think this will be the toughest....
        
        Nice thing with default is we can check for keys that don't exist 
       
       Return new set
    """
    new_set = set()
    
    # first we need to know all the correct indices found
    len_idx = 0
    for k,v in proper_pos.items():
        len_idx += len(v)
    
    for guess in guess_words: 
        
        # track if a word has a bad_idx
        bad_idx = len([i for i,char in enumerate(guess) if i in improper_pos[char]])
        
        # track if a word has proper_idx 
        proper_idx = len([i for i,char in enumerate(guess) if i in proper_pos[char]])
        
        if (bad_idx > 0) or (proper_idx != len_idx):
            continue
        else:
            new_set.add(guess)
    return new_set

# Next function: Need to determine an optimal next guess
# All words are equal in terms of matching indices 
# - look for most diversity -> how many unique chars outside of what is in inclusion list
def nextGuess(include_chars, guess_words):
    """Maximize diversity of characters"""
    max_dict = defaultdict(set)
    
    # iterate and find new chars from each guess
    for guess in guess_words:
        new_chars = len(set([c for c in guess]) - include_chars)
        max_dict[new_chars].add(guess)
        
    # find max
    max_idx = max(max_dict.keys())
    
    # rndomly choose from
    return random.choice(tuple(max_dict[max_idx]))

In [3]:
# sample process:
# read in myster words
mystery_corpus = pd.read_csv("data/mystery_words.csv", header = None)
mystery_list = [w[0] for w in mystery_corpus.values]
myster_words = set(mystery_list)

# read in eligible guess words
guess_corpus = pd.read_csv("data/guess_words.csv", header = None)
guess_list = [w[0] for w in guess_corpus.values]
guess_words = set(guess_list)

# we make a guess and start to collection information
guess = random.choice(tuple(guess_words))
mystery_word = random.choice(tuple(myster_words))

# at initial guess we need the following:
exclude_chars = set() # words can't include these 
include_chars = set() # words need to include these
proper_pos = defaultdict(set) # dictionary that stores the proper indices for a letter...note it needs to be val of list for multiple letters
improper_pos = defaultdict(set) # dictionary that stores the improper indices for a letter...note it needs to be val of list for multiple letters

### We can now figure out which chars are accepted and which are not
i = 0
while i < 3:
    print(f"My guess is {guess}")
    print(f"My mystery word is {mystery_word}")
    
    # check if we guesses correct:
    if guess == mystery_word:
        print("You win!")
        break
    
    
    # not accepted:
    exclude_chars.update(excludeChars(guess, mystery_word))
    print(f"After my guess, my exclusion list is: {exclude_chars}")

    # accepted:
    include_chars.update(includeChars(guess, mystery_word))
    print(f"After my guess, my inclusion list is: {include_chars}")

    #### We can now think through positions: 

    # proper pos
    temp_dict = properPos(guess,mystery_word)
    for k, v in temp_dict.items():
        proper_pos[k] = proper_pos[k].union(v)
    
    # ugly cleanup step - TODO: Fix
    new_dict = defaultdict(set)
    for k, v in proper_pos.items():
        if len(proper_pos[k]) != 0:
            new_dict[k] = v
    proper_pos = new_dict
    
    print(f"After my guess, my proper indices are: {proper_pos}")

    # improper pos
    temp_dict = improperPos(guess,mystery_word, proper_pos)
    for k, v in temp_dict.items():
        improper_pos[k] = improper_pos[k].union(v)

    # ugly cleanup step - TODO: Fix
    new_dict = defaultdict(set)
    for k, v in improper_pos.items():
        if len(improper_pos[k]) != 0:
            new_dict[k] = v
    improper_pos = new_dict

    print(f"After my guess, my improper indices are: {improper_pos}")

    # Move into reduce step 
    # we first reduce our set of words down based on excluded & included chars 
    print(f"Prior to char removal, total guess words: {len(guess_words)}")
    new_words = charRemoveWords(exclude_chars, include_chars, guess_words)
    print(f"After char removal, total guess words: {len(new_words)}")

    # we then reduce our set of words down based on proper & improper infices
    print(f"Prior to idx removal, total guess words: {len(new_words)}")
    guess_words = idxRemoveWords(proper_pos, improper_pos, new_words)
    print(f"After idx removal, total guess words: {len(guess_words)}")


    ### Make another guess 
    guess = nextGuess(include_chars, guess_words)
    
    print(f"next guess is {guess} \n")
    i += 1

My guess is riots
My mystery word is upset
After my guess, my exclusion list is: {'o', 'r', 'i'}
After my guess, my inclusion list is: {'s', 't'}
After my guess, my proper indices are: defaultdict(<class 'set'>, {})
After my guess, my improper indices are: defaultdict(<class 'set'>, {'t': {3}, 's': {4}})
Prior to char removal, total guess words: 12972
After char removal, total guess words: 875
Prior to idx removal, total guess words: 875
After idx removal, total guess words: 219
next guess is stuck 

My guess is stuck
My mystery word is upset
After my guess, my exclusion list is: {'c', 'o', 'r', 'i', 'k'}
After my guess, my inclusion list is: {'t', 'u', 's'}
After my guess, my proper indices are: defaultdict(<class 'set'>, {})
After my guess, my improper indices are: defaultdict(<class 'set'>, {'t': {1, 3}, 's': {0, 4}, 'u': {2}})
Prior to char removal, total guess words: 219
After char removal, total guess words: 61
Prior to idx removal, total guess words: 61
After idx removal, total 