Bart Massey 2022
YouTuber Matt Parker proposed and solved an interesting problem inspired by the game Wordle a while back: find a set of five five-letter words from a dictionary that collectively contain 25 of the 26 letters of the English alphabet. I'm calling this problem Wordle5, because names are convenient.
For a summary history of the problem, see the README
in
Philip Meier's
solution repo:
this solution inspired me to create my own solution in Rust;
specifically, I started by responding to this
Reddit thread.
Parker's original solution took about a month: over 2.7 million seconds. This program currently solves the problem in about 5ms for me, more than 8.5 orders of magnitude faster. See the Performance section for much more information.
Build the program with
cargo build --release
See the Features section for a list of build features.
Invoke the program with a list of the dictionary files to be read. Dictionary files should consist of ASCII lowercase words, one per line. The easy invocation is
cargo run --release words-nyt-wordle.txt
My approach is "brute-force" state-space search, as is the standard method for solvers for this. (Apparently somebody used a graph clique finder, which is tres cool, but I can't imagine it's faster.)
The key is in selection of state space and in pruning. For
the state space I choose to search for words by containing
letter from "hardest" letter to "easiest": a letter is
considered harder if it has fewer words containing it. I
thus start my depth-first search by selecting a word
containing a q
and proceeding from there.
We may and must use only 25 of the 26 letters in our solution. This complicates the search space a bit: we must allow for "skipping" a letter during the search.
Several kinds of pruning are applied this search.
-
Reuse Checking: This is the simplest case: a word is not considered if previously-selected words contain any of its letters. Any search that fails to do this is ridiculous. Reuse checks are what makes searching from hardest to easiest letter a good candidate for the search space.
-
Candidate Filtering: The word list for each letter is filtered before starting the search to remove words that can't possibly work given the previous known-chosen letters. If a word w at a given candidate letter l in the search contains more than one letter that was previously selected, w is removed from the word list for l. (The search order is statically known, making this calculation possible.)
Words with one previously selected letter may still be candidates: because there are only 25 letters to be selected, that letter may have been skipped in the search. (I experimented with building separate zero-overlap and one-overlap candidate word lists, but it did not show a speed improvement.)
-
Pseudovowel Pruning: "Pseudovowels" for a word list l are sets p of letters such that every word in l must contain at least one letter in p. For the dictionaries used here, the set of vowels "aeiouwy" obeys this property. The algorithm automatically calculates a set of pseudovowels for the given dictionary up front: the current approach is to just take the letters in decreasing dictionary frequency until they are seen to be pseudovowels by examining the dictionary. For the NYT Wordle dictionary, the program uses pseudovowels "aeilnorsu".
Having a set of n pseudovowels for a dictionary allows filtering out words that contain more than n - 5 pseudovowels up front.
Given a set of n pseudovowels, the program can check during the search that these pseudovowels have not been "overused". If there are fewer than n - (5 - d) pseudovowels remaining after selecting a word w at depth d, there are not enough remaining pseudovowels to make the remaining words. Thus, w can be omitted from the search.
The program finds the standard vowels "aeiouyw" itself and uses them as pseudovowels. This is "fair" because the standard vowels are being discovered by the program rather than using expert knowledge.
The program also uses a second set of pseudovowels for pruning. These are the "global pseudovowels", which are calculated greedily using letter frequencies over the dictionary rather than iteratively. The program finds global pseudovowels "aeilnorsu" for the standard dictionary. Note that these are only pseudovowels for a dictionary of words with no duplicate letters: "pygmy" is a counterexample otherwise.
Pruning against global pseudovowels and standard vowels gives a gives a large reduction in search space. This is a bit of a mystery to me at this point.
The resulting algorithm looks something like this:
Calculate and sort a sequence of letter groups g and a set of pseudovowels p
To search at position i in g at depth d,
with a set of seen letters and a partial soln and a skipped indicator:
if d == 5
save soln
return
find the next position j after i with a not-seen g.letter
for each word w in g.words
if w contains seen letters
continue
if pseudovowel pruning eliminates w
continue
search at position i + 1, depth d + 1, skipped,
with updated seen and soln
if not skipped
search at position i + 1, depth d, skipped true,
with seen and soln (skip g.letter)
The "obvious" performance-improving techniques have been used in the implementation of this algorithm. Letters are represented as integers 0..26. Letter sets are represented as bitsets in 32 bits. Words are represented as letter sets (since they are guaranteed to contain five unique letters). The dictionary of text words is replaced with a dictionary of letter-set words early on, with a map back to text kept around to print the solution.
The "POPCOUNT" primitive, which counts the number of set
bits in a 32-bit word, plays a key role in performance
here. See my
popcount micro-benchmarks
repo for for an account of how much slower this operation
can be if not provided by your processor. Thank goodness
modern processors have finally incorporated the damn thing,
and that Rust's count_ones()
intrinsic provides easy
access to it.
Various features have been hidden behind Rust feature
gates and must be turned on at compile time. See below for
specifics.
My solution is blazingly fast, solving the standard problem in about 5ms single-threaded on my Ryzen 9 3900X desktop.
The comment thread on this YouTube video seems to be the source of fastest solutions right now: I'm about 4× faster than the next-best reported solution.
Timing shows that much of the runtime of the single-threaded version is spent in the solver proper: about 1.5ms for init, 2.5ms for the solver, 1ms of unknown overhead. This leaves little room for improvement by solver speedup.
When building for best performance, you may want to build a
statically-linked binary for more reproducible best times.
On my box I use the x86_64-unknown-linux-musl
build target
for this. You may also want to use RUSTC_FLAGS="-C target-cpu=native"
, although it doesn't make a difference
for me. Note that you definitely want to time the binary:
don't use cargo run
when timing as it adds major overhead.
On Linux, at least, the standard timing options are garbage.
I recommend installing http://github.com/BartMassey/realtime
and then using
realtime target/x86_64-unknown-linux-musl/release/wordle5 words-nyt-wordle.txt
Some of the things I have tried to improve performance:
-
I've tried profile-guided optimization, but it doesn't seem to help.
-
I've tried putting everything on a RAMdisk, but it doesn't seem to help.
-
I tried going to
no_std
to get rid of the 2-3ms of startup overhead, thinking it was due to Rust startup This was a massive uglification of the code, and produced no noticeable speedup. -
I used parallelism for the longest time. As the program got faster, the speedups got microscopic. At this point we are probably limited by data access rather than CPU.
At this point, the performance is really fragile; small tweaks make hard-to-understand differences. Further, setup time and search time are looking pretty balanced. I think it's unlikely that further tuning of the existing approach can make this code dramatically quicker: a whole new solver algorithm would be needed.
In terms of overheads, the remaining possibilites are ugly.
- One could cheat massively by compiling the pre-digested dictionaries into the program to save a millisecond or two, but ugh.
The main
branch code uses std::fs::read_to_string()
followed by line splitting of the string to read the
dictionaries. The branch bufread
in this repo moves to
using std::fs::open()
and std::io::BufRead::lines()
. It
is dramatically slower, taking about 20ms extra just to
process the dictionaries. Rust I/O performance is a bit
wack.
The main
branch uses a recursive solver. The branch
nonrecursive-solvify
in this repo makes the solver
iterative using an explicit stack. It is not noticeably
faster currently, but that appears to be a function of the
already-rapid solver time.
The branch no_std
in this repo allows building a no_std
version of the program. It is not faster than the std
version, but was sure more work to produce.
The branch parallelized
in this repo includes both a
rayon
-parallelized and custom scoped-thread-parallelized
version of the program. These provided less than a
millisecond of speedup, and thus were removed to simplify
the program.
The branch retro2
in this repo includes both Python and
Rust versions of all the stages of the construction of
wordle5
, as recreated for expository purposes.
The branches retro2-par
and retro2-par-fast
in this repo
contain parallelized versions of the retro2
code with
naïve and pruned search. The retro2-par-fast
branch has
the fastest Python version currently, at around 120ms.
To see node counts from the solver, build with the
instrument
feature. This will display node counts at each
search tree depth as well as a total node count to stderr
.
To get times for initialization, solver, and output build
with the timing
feature. This will display the wall-clock
time for each of these pieces to stderr
.
To get end-to-end runtime in main()
, build with the
full-timing
feature. This will display the wall-clock time
for this piece to stderr
.
-
words-nyt-wordle.txt
(12945 words): I've taken the liberty of combining the NYT Wordle word lists into a single file and filtering duplicates. It doesn't change performance notably, and it's way more convenient. -
words-alpha.txt
(15913 words): I've filteredwords_alpha.txt
(wherever that came from) to just five-letter words and removed all consonant-only words (all words contain at least one of aeiouyw). -
words-bart.txt
(17560 words): I've built my own wordlist as the union of wordlists in mywordlists
repo, filtered the same aswords-alpha.txt
.
Thanks to @stew675
on YouTube for informative and
thoughtful discussions.
This work is made available under the "MIT License." Please
see the file LICENSE.txt
in this distribution for license
terms. The provided dictionaries are used without
permission: no license is provided, express or implied, for
these.