wordchain is a command-line tool (and library) that can find the longest chain of partially overlapping words in a given list of words. Why you would ever want that? I don't know. But now you have it.
Let's say you have a text file with a bunch of names in each line:
Jacob
Michael
Joshua
Matthew
Christopher
Andrew
Daniel
Ethan
Joseph
William
Anthony
Nicholas
David
Alexander
Ryan
Tyler
James
John
Jonathan
Brandon
then you can execute wordchain and you will get:
Longest chain (6): JoshuAlexandeRyAndreWilliaMichael
Isn't that just great?
The tool has pretty good command-line help text. Just try wordchain -h
.
Try it out for some small workloads before giving it a long list.
Note that because of reasons, the application won't accept lists longer than 256 words. But that would take an eternity to calculate anyway, trust me.
- Tweaking the value of the
--granularity
flag is pretty important. Try finding a good value for small and intermediate workloads first and increase that value for larger workloads. - The
--quick-results
switch really degrades the overall execution time, so use it only when you can live with a good guess for the longest chain.
You will need a more-or-less recent version of the rust toolchain, available from rustup.
git clone https://github.com/1HPorange/wordchain.git
cd wordchain
cargo build --release
The executable can be found in target/release/
.
For secret omega-turbo ultra-boost, consider compiling
specifically for your native environment by setting the
environment variable RUSTFLAGS="-C target-cpu=native"
. Just
remember to not share the executable after that, or your PC
will explode.
To run as fast as possible, wordchain uses some optimizations:
- never operates on strings directly, only byte-size vector indices
- is non-recursive to avoid stack-related performance issues
- is parallelized with a configurable granularity to avoid load-balancing issues
- builds lookup structures up-front to avoid recalculation
- sorts words heuristically to shorten expected runtime
- runs entirely lock-free
- uses bitmasks to emulate a hashset with perfect hashing, which is used to avoid cycles
- uses safe estimates of longest chains with a specific first word to potentially abort later words early
- Validate CL arguments. Right now I trust you. I shouldn't.