Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation regarding the double call to calculate_entropies() #9

Open
woctezuma opened this issue Jul 23, 2022 · 5 comments
Open

Explanation regarding the double call to calculate_entropies() #9

woctezuma opened this issue Jul 23, 2022 · 5 comments

Comments

@woctezuma
Copy link
Contributor

woctezuma commented Jul 23, 2022

I don't understand the double call to calculate_entropies(). Could you clarify the reason behind it?

Wordle-Bot/wordle.py

Lines 112 to 117 in f633495

candidates = all_dictionary
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)
if max(entropies.values()) < 0.1:
candidates = all_words
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

Indeed:

  • the set all_words is a subset of all_dictionary,
  • the variable entropies is a dictionary whose keys are candidates by design.

So the second call to calculate_entropies() recomputes entropy values which were already computed by the first call.

If the objective is to decrease the number of items fed to max(), it could be done by filtering entropies returned by the first call.

Wordle-Bot/wordle.py

Lines 119 to 120 in f633495

# Guess the candiate with highest entropy
guess_word = max(entropies.items(), key=lambda x: x[1])[0]

@GillesVandewiele
Copy link
Owner

Wordle keeps track of two lists: (i) a list of possible correct answers and (ii) a much bigger list of valid words that the user can enter. The first call will try to find a good guess from list (i), but in case there are still many options, it might be better to guess a word from list (ii) to have better outcomes for the next round.

@woctezuma
Copy link
Contributor Author

woctezuma commented Jul 23, 2022

There are indeed two lists:

  • 2315 entries in words.txt, which corresponds to (i),
  • 12972 entries in all_words.txt, which corresponds to (ii).

Wordle-Bot/wordle.py

Lines 11 to 12 in f633495

DICT_FILE_all = 'all_words.txt'
DICT_FILE = 'words.txt'

Here:

  • DICT_FILE corresponds to (i),
  • DICT_FILE_all corresponds to (ii).

Wordle-Bot/wordle.py

Lines 71 to 77 in f633495

# load all 5-letter-words for making patterns
with open(DICT_FILE_all) as ifp:
all_dictionary = list(map(lambda x: x.strip(), ifp.readlines()))
# Load 2315 words for solutions
with open(DICT_FILE) as ifp:
dictionary = list(map(lambda x: x.strip(), ifp.readlines()))

Here:

  • dictionary corresponds to (i),
  • all_dictionary corresponds to (ii).

Wordle-Bot/wordle.py

Lines 112 to 113 in f633495

candidates = all_dictionary
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

And here, we have all_dictionary which corresponds to (ii), not to (i).


Regarding the all_words variable:

  • the all_words variable is initialized with all_dictionary, so it should not matter:

all_words = set(all_dictionary)

  • then it gets smaller with iterations and the calls to intersection(), so it should be a subset of all_dictionary in the end:

Wordle-Bot/wordle.py

Lines 130 to 132 in f633495

# Filter our list of remaining possible words
words = pattern_dict[guess_word][info]
all_words = all_words.intersection(words)


Regarding the dictionary variable, it is only used once, in the for-loop to evaluate the bot.

for WORD_TO_GUESS in tqdm(dictionary):

@woctezuma
Copy link
Contributor Author

woctezuma commented Jul 24, 2022

I wonder whether what you wanted to do would have been:

candidates = all_words
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

if max(entropies.values()) < 0.1:
    candidates = all_dictionary
    entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

instead of:

Wordle-Bot/wordle.py

Lines 112 to 117 in f633495

candidates = all_dictionary
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)
if max(entropies.values()) < 0.1:
candidates = all_words
entropies = calculate_entropies(candidates, all_words, pattern_dict, all_patterns)

This was referenced Jul 24, 2022
@GillesVandewiele
Copy link
Owner

Ok yes, you are correct! This logic seems to indeed be switched up & the calculations in the if-block are redundant. I think it should indeed be switched as you suggested, but we could bench the code to be sure the results are the same (or better)? We could iterate over every possible word and calculate how many guesses the bot needs on average.

I will merge your PRs later this week. Once again, many thanks for your contributions!

@woctezuma
Copy link
Contributor Author

No problem. I agree it is better to test before merging:

  • that I have not broken anything with the chunks,
  • whether the logic with the if-block works better if it is switched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants