Watch the entire English language blossom from Wiktionary + Google Books N-grams, rendered as a living, breathing prefix galaxy.
- Zero-config takeover –
./setup.sh
spins up the virtualenv, fetches every dataset, caches the heavy lifts, and ships final MP4/GIF output. - Radial growth cinematics – the trie erupts from the core alphabet, framing decades of linguistic evolution as a neon fractal.
- Repeatable science – every artifact (lemmata, first-year inference, trie counts, layouts) checkpoints to disk and into a reusable tarball for instant re-renders.
- Battle-tested – streams 26 full 1-gram shards, handles 1.4GB Wiktionary dumps, and renders 220 frames in glorious 1080p.
Share it, remix it, drop it in your next data-viz thread.
cd /Users/grey/Projects/graph-visualizations
bash setup.sh
The script will:
- Create/upgrade
venv/
with Python 3. - Download Wiktionary + Google Books 1-gram shards (
a
–z
). - Extract English lemmas, infer first-use years, aggregate prefix counts.
- Render 220 radial frames (
outputs/frames/frame-0000.png
→frame-0219.png
). - Encode
outputs/english_trie_timelapse.mp4
and a share-ready GIF.
Rerun the script anytime—artifact caching means future passes jump straight to rendering.
Stage | Script | Output |
---|---|---|
Lemma extraction | src/ingest/wiktionary_extract.py |
artifacts/lemmas/lemmas.tsv |
First-year inference | src/ingest/ngram_first_year.py |
artifacts/years/first_years.tsv |
Prefix aggregation | src/build/build_prefix_trie.py |
artifacts/trie/prefix_counts.jsonl |
Layout generation | src/viz/layout.py |
artifacts/layout/prefix_positions.json (legacy back-compat) |
Frame rendering | src/viz/render_frames.py |
outputs/frames/ |
Encoding | src/viz/encode.py |
outputs/english_trie_timelapse.mp4 + .gif |
source venv/bin/activate
python -m src.viz.render_frames artifacts/trie/prefix_counts.jsonl outputs/frames
python -m src.viz.encode outputs/frames outputs/english_trie_timelapse.mp4 outputs/english_trie_timelapse.gif
Use flags such as --min-radius
, --max-radius
, --base-edge-alpha
, or --start-progress
to tune the vibe.
Load artifacts/years/first_years.tsv
to explore in Neo4j (Community & Enterprise safe):
:param batch => $rows;
UNWIND $rows AS row
WITH row WHERE row.word IS NOT NULL AND row.word <> ""
MERGE (w:Word {text: row.word})
SET w.first_year = CASE
WHEN row.first_year = "" THEN NULL
ELSE toInteger(row.first_year)
END;
- Drop the GIF in language history threads (#linguistics #dataart).
- Remix the radial layout with alternative color ramps or depth cutoffs.
- Pair the timelapse with poetry readings for maximum feels.
- Wiktionary community & Google Books N-gram team for open data.
- You, for showing the world how beautifully language grows.
For more open source software and content on Knowledge Graphs, GNNs, and Graph Databases, Join our community on X!