# Algorithm for Collecting Wordle Solver Data
Now that we have a method for generating word scores for possible Wordle solution guesses we can create a method for testing the accuracy of our solver by collecting data on the number of required guesses for the algorithm version against a list of Wordle solutions.  In order to do this we need a code to compare the current guess to a solution word and generate the gray, yellow, and green responses for each letter in the guess.

The basic rules for Wordle are for a guessed letter that matches the solution letter in the same position in the word the response is green.  For example if the solution is TENET and the guess is TRACE, then the first letter T will have a green response.  If a guessed letter is in the solution word but not in that position then the response is yellow.  Continuing our example the E in trace will have a yellow response since TENET has a letter E in it, but not as the fifth letter.  Finally if a guessed letter is not in the solution word then the response will be gray.  In our example the R, A, and C will all give gray responses since none of those letters are in the solution word TENET.  Pulling this all together our guess of TRACE will give as a response: green, gray, gray, gray, yellow.

The rules get slightly more complicated when there is a repeated letter however.  If the guess has a repeated letter and that letter does not appear at all in the solution then both the repeated guess letters will be gray.  If there is exactly 1 instance of the repeated letter in the solution then one of the guess letters will be gray and the other will be either green or yellow.  If there are 2 or more of the repeated letter in the solution then the repeated letter guess responses will be some combination of yellow and green with no gray.

In [4]:
import collections

solutionword='comma'
guess='canon'

def response_guess(guess , solutionword):
    response = ''
    guesscount = collections.Counter(guess)
    solutionwordcount = collections.Counter(solutionword)
    for i in range(5):
        if guess[i] not in solutionwordcount:
            response += 'X'
        elif guess[i] == solutionword[i]:
            response += 'G'
        elif guesscount.get(guess[i]) == 1:
            response += 'Y'
        elif guesscount.get(guess[i]) == 2 and solutionwordcount.get(guess[i]) == 1:
            response += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 2 and solutionwordcount.get(guess[i]) >= 2:
            response += 'Y'
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) == 1:
            response += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) == 2:
            response += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) >= 3:
            response += 'Y'
    return response

print(solutionword)
print(guess)
print(response_guess(guess,solutionword))

comma
canon
GYXYX


Here we ignore any cases where the guess word has more than 3 instances of the same letter as there are no five letter words where four or more of the letters are the same.  Now that we have a method for generating Wordle responses to a solution word for any given guess we can combine this with our Wordle solver algoritm and record how many guesses it takes for a given algorithm to guess the solution word.

In [19]:
with open('WordleDict.txt', 'r') as file:
    word_list=[word for line in file for word in line.split()]

import string
import collections
alphabet_string = string.ascii_uppercase
alphabet_list = list(alphabet_string)

#Collect all 5 letter non-proper noun words from the dictionary
fiveletterwords = [x for x in word_list if len(x)==5 and x[0] not in alphabet_list]

#collect lists of each letter in each position
l_1 = [x[0] for x in fiveletterwords]
l_2 = [x[1] for x in fiveletterwords]
l_3 = [x[2] for x in fiveletterwords]
l_4 = [x[3] for x in fiveletterwords]
l_5 = [x[4] for x in fiveletterwords]
all_letters = l_1+l_2+l_3+l_4+l_5

#count up the number of occurences of each letter in each list
c_1=collections.Counter(l_1)
c_2=collections.Counter(l_2)
c_3=collections.Counter(l_3)
c_4=collections.Counter(l_4)
c_5=collections.Counter(l_5)
alc=collections.Counter(all_letters)

#define next guess based off of word score algorithm calculator
def next_guess(fiveletterwords , guessnumber):
    guess_score=[]
    #for first 2 guesses punish repeated letters and use total letter count
    if guessnumber<2:
        for i in fiveletterwords:
            guess_score.append((alc[i[0]]+alc[i[1]]+alc[i[2]]+alc[i[3]]+alc[i[4]])*len(set(i))/5)
    #for remaining guesses dont penalize repeated letters and use positional letter count
    else:
        for i in fiveletterwords:
            guess_score.append((c_1[i[0]]+c_2[i[1]]+c_3[i[2]]+c_4[i[3]]+c_5[i[4]]))
    #sort the wordscores to display the best first guess
    wordtuple = list(zip(fiveletterwords , guess_score ))
    sortedwordtuple = sorted(wordtuple, key=lambda pair: pair[1], reverse=True)
    return sortedwordtuple[0][0]

#define Wordle response generation function
def response_guess(guess , solutionword):
    responses = ''
    guesscount = collections.Counter(guess)
    solutionwordcount = collections.Counter(solutionword)
    for i in range(5):
        if guess[i] not in solutionwordcount:
            responses += 'X'
        elif guess[i] == solutionword[i]:
            responses += 'G'
        elif guesscount.get(guess[i]) == 1:
            responses += 'Y'
        elif guesscount.get(guess[i]) == 2 and solutionwordcount.get(guess[i]) == 1:
            responses += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 2 and solutionwordcount.get(guess[i]) >= 2:
            responses += 'Y'
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) == 1:
            responses += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) == 2:
            responses += 'X'
            guesscount.subtract(guess[i])
        elif guesscount.get(guess[i]) == 3 and solutionwordcount.get(guess[i]) >= 3:
            responses += 'Y'
    return responses

#define fucntion which takes Wordle response and trims remaining possible guesses list
def trim_guess_list(guess , responses , fiveletterwords):
    guessletters = [char for char in guess]
    repeats = [x for x, count in collections.Counter(guess).items() if count > 1]
    repeatsindex=[]
    for rl in repeats:
        repeatsindex.append([i for i, j in enumerate(guessletters) if j==rl])
    repeatsresponses=[]
    for j in repeatsindex:
        repeatsresponses.append([responses[i] for i in j])
    repeatresponsedict=dict(zip(repeats , repeatsresponses))
    for i in range(5):
        if guess[i] in repeats:
            #all repeated letters are gray
            if (repeatresponsedict.get(guess[i]).count('G')+repeatresponsedict.get(guess[i]).count('Y'))==0:
                fiveletterwords = [x for x in fiveletterwords if guess[i] not in x]
            #the current letter is gray and there is exactly 1 non-gray repeated letter
            elif responses[i]=='X' and (repeatresponsedict.get(guess[i]).count('G')+repeatresponsedict.get(guess[i]).count('Y'))==1:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i])==1) and x[i] != guess[i]]
            #the current letter is gray and there are exactly 2 non-gray repeated letters
            elif responses[i]=='X' and (repeatresponsedict.get(guess[i]).count('G')+repeatresponsedict.get(guess[i]).count('Y'))==2:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) == 2) and x[i] != guess[i]]
            #the current letter is yellow and there are no other repeated letters in the word
            elif responses[i] == 'Y' and (repeatresponsedict.get(guess[i]).count('G') + repeatresponsedict.get(guess[i]).count('Y')) == 1:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) == 1) and x[i] != guess[i]]
            #the current letter is yellow and there is exactly 1 other repeated letter in the word
            elif responses[i]=='Y' and (repeatresponsedict.get(guess[i]).count('G')+repeatresponsedict.get(guess[i]).count('Y'))==2:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) >= 2) and x[i] != guess[i]]
            #the current letter is yellow and there are exactly 2 other repeated letters in the word
            elif responses[i]=='Y' and (repeatresponsedict.get(guess[i]).count('G')+repeatresponsedict.get(guess[i]).count('Y'))==3:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) >= 3) and x[i] != guess[i]]
            #the current letter is green and are no other repeated letters in the word
            elif responses[i] == 'G' and (repeatresponsedict.get(guess[i]).count('G') + repeatresponsedict.get(guess[i]).count('Y')) == 1:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) == 1) and x[i] == guess[i]]
            #the current letter is green and there is exactly 1 other repeated letter in the word
            elif responses[i] == 'G' and (repeatresponsedict.get(guess[i]).count('G') + repeatresponsedict.get(guess[i]).count('Y')) == 2:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) >= 2) and x[i] == guess[i]]
            #the current letter is green and there are exactly 2 other repeated letter in the word
            elif responses[i] == 'G' and (repeatresponsedict.get(guess[i]).count('G') + repeatresponsedict.get(guess[i]).count('Y')) == 3:
                fiveletterwords = [x for x in fiveletterwords if (x.count(guess[i]) >= 3) and x[i] == guess[i]]
        # old rules for non-repeated letter guesses
        elif responses[i] == 'X':
            fiveletterwords = [x for x in fiveletterwords if guess[i] not in x ]
        elif responses[i] == 'G':
            fiveletterwords = [x for x in fiveletterwords if guess[i] == x[i] ]
        elif responses[i] == 'Y':
            fiveletterwords = [x for x in fiveletterwords if ( guess[i] in x and guess[i] != x[i] )]    
    return fiveletterwords

#define solution word list to test the algorithm
#solutionwordlist=['apple' , 'quote' , 'comma' , 'train' , 'arose']
#print(fiveletterwords)
solutionwordlist=[x for x in word_list if len(x)==5 and x[0] not in alphabet_list]
solutionwordlist=solutionwordlist[:500]
solutionguesscount=[]

#loop through the solution word list and collect algorithm results
for solution in solutionwordlist:
    
    #Repopulate the dictionary
    fiveletterwords = [x for x in word_list if len(x)==5 and x[0] not in alphabet_list]
    
    solutionword=solution
    guessnumber=0
    while True:
        guess=next_guess(fiveletterwords , guessnumber)
        guessresponse=response_guess(guess , solutionword)
        guessnumber+=1
        if guessresponse=='GGGGG':
            solutionguesscount.append(guessnumber)
            break
        else:
            fiveletterwords=trim_guess_list(guess , guessresponse , fiveletterwords)
            
print(collections.Counter(solutionguesscount))

Counter({4: 196, 3: 138, 5: 92, 6: 29, 2: 27, 7: 12, 8: 4, 9: 1, 1: 1})
