Explanation regarding the double call to calculate_entropies() #9

woctezuma · 2022-07-23T15:39:11Z

I don't understand the double call to calculate_entropies(). Could you clarify the reason behind it?

Lines 112 to 117 in f633495

    
           candidates = all_dictionary 
        
           entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns) 
        
           if max(entropies.values()) < 0.1: 
        
               candidates = all_words 
        
               entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

Indeed:

the set all_words is a subset of all_dictionary,
the variable entropies is a dictionary whose keys are candidates by design.

So the second call to calculate_entropies() recomputes entropy values which were already computed by the first call.

If the objective is to decrease the number of items fed to max(), it could be done by filtering entropies returned by the first call.

Wordle-Bot/wordle.py

Lines 119 to 120 in f633495

    
           # Guess the candiate with highest entropy 
        
           guess_word = max(entropies.items(), key=lambda x: x[1])[0]

The text was updated successfully, but these errors were encountered:

GillesVandewiele · 2022-07-23T18:57:02Z

Wordle keeps track of two lists: (i) a list of possible correct answers and (ii) a much bigger list of valid words that the user can enter. The first call will try to find a good guess from list (i), but in case there are still many options, it might be better to guess a word from list (ii) to have better outcomes for the next round.

woctezuma · 2022-07-23T19:05:04Z

There are indeed two lists:

2315 entries in words.txt, which corresponds to (i),
12972 entries in all_words.txt, which corresponds to (ii).

Wordle-Bot/wordle.py

Lines 11 to 12 in f633495

    
           DICT_FILE_all = 'all_words.txt' 
        
           DICT_FILE = 'words.txt'

Here:

DICT_FILE corresponds to (i),
DICT_FILE_all corresponds to (ii).

Wordle-Bot/wordle.py

Lines 71 to 77 in f633495

    
           # load all 5-letter-words for making patterns  
        
           with open(DICT_FILE_all) as ifp: 
        
               all_dictionary = list(map(lambda x: x.strip(), ifp.readlines())) 
        
           # Load 2315 words for solutions 
        
           with open(DICT_FILE) as ifp: 
        
               dictionary = list(map(lambda x: x.strip(), ifp.readlines()))

Here:

dictionary corresponds to (i),
all_dictionary corresponds to (ii).

Wordle-Bot/wordle.py

Lines 112 to 113 in f633495

    
           candidates = all_dictionary 
        
           entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

And here, we have all_dictionary which corresponds to (ii), not to (i).

Regarding the all_words variable:

the all_words variable is initialized with all_dictionary, so it should not matter:

Wordle-Bot/wordle.py

Line 107 in f633495

all_words = set(all_dictionary)

then it gets smaller with iterations and the calls to intersection(), so it should be a subset of all_dictionary in the end:

Wordle-Bot/wordle.py

Lines 130 to 132 in f633495

    
           # Filter our list of remaining possible words 
        
           words = pattern_dict[guess_word][info] 
        
           all_words = all_words.intersection(words)

Regarding the dictionary variable, it is only used once, in the for-loop to evaluate the bot.

Wordle-Bot/wordle.py

Line 97 in f633495

for WORD_TO_GUESS in tqdm(dictionary):

woctezuma · 2022-07-24T17:50:49Z

I wonder whether what you wanted to do would have been:

candidates = all_words
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

if max(entropies.values()) < 0.1:
    candidates = all_dictionary
    entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

instead of:

Wordle-Bot/wordle.py

Lines 112 to 117 in f633495

    
           candidates = all_dictionary 
        
           entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns) 
        
           if max(entropies.values()) < 0.1: 
        
               candidates = all_words 
        
               entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

GillesVandewiele · 2022-07-25T10:22:10Z

Ok yes, you are correct! This logic seems to indeed be switched up & the calculations in the if-block are redundant. I think it should indeed be switched as you suggested, but we could bench the code to be sure the results are the same (or better)? We could iterate over every possible word and calculate how many guesses the bot needs on average.

I will merge your PRs later this week. Once again, many thanks for your contributions!

woctezuma · 2022-07-25T15:13:39Z

No problem. I agree it is better to test before merging:

that I have not broken anything with the chunks,
whether the logic with the if-block works better if it is switched.

woctezuma mentioned this issue Jul 23, 2022

Remove useless code #10

Open

This was referenced Jul 24, 2022

Compute entropy in chunks #13

Closed

Switch entropy logic #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation regarding the double call to calculate_entropies() #9

Explanation regarding the double call to calculate_entropies() #9

woctezuma commented Jul 23, 2022 •

edited

GillesVandewiele commented Jul 23, 2022

woctezuma commented Jul 23, 2022 •

edited

woctezuma commented Jul 24, 2022 •

edited

GillesVandewiele commented Jul 25, 2022

woctezuma commented Jul 25, 2022

Explanation regarding the double call to calculate_entropies() #9

Explanation regarding the double call to calculate_entropies() #9

Comments

woctezuma commented Jul 23, 2022 • edited

GillesVandewiele commented Jul 23, 2022

woctezuma commented Jul 23, 2022 • edited

woctezuma commented Jul 24, 2022 • edited

GillesVandewiele commented Jul 25, 2022

woctezuma commented Jul 25, 2022

woctezuma commented Jul 23, 2022 •

edited

woctezuma commented Jul 23, 2022 •

edited

woctezuma commented Jul 24, 2022 •

edited