# Wordle Solver
## Curtis Peterson

This notebook contains a solver for the [wordle](https://www.powerlanguage.co.uk/wordle/) puzzle. The solver uses a one-hot encoder to create $130$ (i.e. $5*26$) variables that correspond to each letter of the alphabet in each of the 5 letter-positions. Each word in the list of valid words is therefore represented by a $130$-dimensional vector. A score is associated with each of the $130$ letter-position variables by adding the vectors associated with all valid words, allowing each word to be scored. The solver eliminates words from the list based on feedback from the game, giving each word a new score based on only the remaining possible words.

This is a work in progress and may be difficult to use.

In [2]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
import numpy as np

In [3]:
word_df = pd.read_csv('cleaned_word_list.csv')
word_df.head()

Unnamed: 0,0,1,2,3,4
0,a,a,h,e,d
1,a,a,l,i,i
2,a,a,r,g,h
3,a,a,r,t,i
4,a,b,a,c,a


In [4]:
enc = OneHotEncoder(sparse=False)
encoded_word_list = enc.fit_transform(word_df)
encoded_word_list.shape

(9330, 130)

In [5]:
letters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
positions = [1,2,3,4,5]

encoding_index = []
n1 = 0
for pos in positions:
    for lett in letters:
        encoding_index.append((lett,pos,n1))
        n1 += 1

In [6]:
letter_scores = encoded_word_list.sum(axis=0)

In [7]:
scores_temp = letter_scores*encoded_word_list
scores = scores_temp.sum(axis=1)
word_index = np.arange(len(scores))
scores_sorted, index_sorted = [i[0] for i in sorted(zip(scores,word_index), reverse=True)], [i[1] for i in sorted(zip(scores,word_index), reverse=True)]
print(scores_sorted[:10])
print(word_df.iloc[index_sorted[:10]])



[6804.0, 6676.0, 6364.0, 6175.0, 6127.0, 6122.0, 6105.0, 6085.0, 6069.0, 6016.0]
      0  1  2  3  4
6847  s  a  r  e  e
7507  s  o  r  e  e
7492  s  o  o  e  y
7230  s  i  r  e  e
1020  b  o  r  e  e
6827  s  a  m  e  y
6846  s  a  r  e  d
1688  c  o  o  e  e
6369  r  a  r  e  e
4370  l  a  r  e  e


In [8]:
def word_df_elim(incorrect_lett, correct_lett, correct_pos_lett, incorrect_pos_lett, max_occurrence_count, correct_lett_threshold, word_df):
    filter1 = (word_df.isin(incorrect_lett).sum(axis=1) == 0)
    
    if len(correct_lett) > 0:
        filter2 = (word_df.isin(correct_lett).sum(axis=1) >= correct_lett_threshold)
    else:
        filter2 = filter1
    
    filter3_temp = []
    if len(correct_lett_pos) > 0:
        for lett in correct_pos_lett:
            filter3_temp.append(word_df.iloc[:,lett[0]] == lett[1])
        filter3 = (pd.DataFrame(filter3_temp).all(axis=0))
    else:
        filter3 = filter1

    filter4_temp = []
    if len(incorrect_lett_pos) > 0:
        for lett in incorrect_pos_lett:
            filter4_temp.append(word_df.iloc[:,lett[0]] != lett[1])
        filter4 = (pd.DataFrame(filter4_temp).all(axis=0))
    else:
        filter4=filter1

    filter5_temp = []
    if len(max_occurrence_count) > 0:
        for lett in max_occurrence_count:
            count_temp = (word_df == lett[1]).sum(axis=1)
            filter5_temp.append(count_temp <= lett[0])
        filter5 = pd.DataFrame(filter5_temp).all(axis=0)
    else:
        filter5 = filter1

    filter = filter1 & filter2 & filter3 & filter4 & filter5
    return filter



## Solver
The solver is in the cell below. The 6 variables on the top line are lists that should be populated with feedback from the game. The lists can be added to as the game continues. The lists below are populated as an example with feedback from the game shown in the following image:

![Cat](wordle_example.png)

In [20]:
incorrect_lett = ['s','a','e','f'] #Individual lower-case letters as strings. These are the letters that are not in the word.
correct_lett = ['r'] #Individual lower-case letters as strings. These are the letters that are in the word, but at an unknown position. This is the yellow R in the image above.
incorrect_lett_pos = [(2,'r')] #Tuple representing incorrect positions of correct letters. This passes the information that R is not in the third position.
correct_lett_pos = [(1,'r'),(2,'o'),(4,'y')] #Tuple consisting correct position-letter pairs. These are the green R, O, and Y in the image above.
max_occurrence_count = [(1,'r')] #Tuple representing the max number of a particular letter. This is the grey second R in the image above.
correct_lett_threshold = 1 #This eliminates words that don't contain a certain number of letters from the correct_lett variable. Does not include correct_lett_pos.

filter = word_df_elim(incorrect_lett,correct_lett,correct_lett_pos,incorrect_lett_pos,max_occurrence_count,correct_lett_threshold,word_df)
good_index = word_index[filter]

new_letter_scores = encoded_word_list[filter].sum(axis=0)
new_scores_temp = new_letter_scores*encoded_word_list[filter,:]
new_scores = new_scores_temp.sum(axis=1)
new_word_index = np.arange(len(new_scores))

good_scores_sorted, good_index_sorted = [i[0] for i in sorted(zip(new_scores,new_word_index), reverse=True)], [i[1] for i in sorted(zip(new_scores,new_word_index), reverse=True)]
print(word_df[filter].iloc[good_index_sorted].head(20))
print(good_scores_sorted[:7])



      0  1  2  3  4
3952  i  r  o  n  y
2312  d  r  o  n  y
1828  c  r  o  n  y
3426  g  r  o  v  y
3413  g  r  o  d  y
6149  p  r  o  x  y
[22.0, 22.0, 22.0, 21.0, 21.0, 20.0]


The correct answer in this case was PROXY.