LLMs are one branch of a much larger tree. This repo is a growing knowledge graph of the statistics, data science, and AI that lives beyond — and beneath — the chatbot.
Every loss function, every optimizer, every probabilistic primitive, every classical ML method, every signal-processing trick a modern chatbot stands on top of has a name, a history, and a place in a larger map. Most of that map is invisible in the day-to-day discourse around LLMs.
This project makes the map navigable, one atomic node at a time:
- A node is a single concept — an algorithm, model class, framework, method, system, or mathematical construct.
- Each node has a one-sentence technical descriptor and a list of concrete real-world deployments. No hand-waving.
- Edges between nodes are deliberately omitted. You draw the graph. Different consumers want different topologies — taxonomic, historical, dependency, pedagogical — and a fixed edge set would foreclose that.
| File | What it is |
|---|---|
ai-nodes.yaml |
The catalog. 165 nodes across 11 branches, from symbolic AI through reinforcement learning, with statistics, optimization, information theory, and signal processing as substrates. |
dist/data/graph.json |
Browser artifact (generated by build). Immutable snapshot of nodes + edges consumed by the visualization. |
schema/graph.cypher |
LadybugDB schema (Node and Edge tables). |
scripts/import-yaml.ts |
YAML → LadybugDB importer. Validates all nodes and edges. |
scripts/export-json.ts |
LadybugDB → JSON exporter. Produces deterministic dist/data/graph.json. |
scripts/export-yaml.ts |
LadybugDB → YAML exporter. Proves round-trip: import → export → diff has no semantic diff. |
graph.html |
Cytoscape.js visualization (WIP: will fetch dist/data/graph.json instead of YAML at runtime). |
The catalog is the substance; the build pipeline (YAML → LadybugDB → JSON) is the machinery.
- Node.js ≥14.15.0
- pnpm ≥8.0.0 (enforced via
.npmrc)
# Install dependencies
pnpm install
# Compile TypeScript and generate dist/data/graph.json
pnpm run buildWhat the build does:
- Compile TypeScript → JavaScript (scripts/, test/)
- Import
ai-nodes.yaml+ai-edges.yaml→.ladybugdb/(embedded columnar DB) - Export LadybugDB →
dist/data/graph.json(browser artifact)
Output artifacts:
dist/data/graph.json— nodes (with degree), edges, metadata; ~1.3 KB.ladybugdb/— transient database; gitignored
Performance: <30 seconds from clean checkout.
pnpm test # Run test suite (46 tests: schema, importer, round-trip validation)
pnpm test:watch # Watch mode
pnpm test:coverage # Coverage report (v8 provider)
pnpm run import # Just YAML → LadybugDB
pnpm run export # Just LadybugDB → JSON
pnpm run export:yaml # LadybugDB → YAML (round-trip validation)YAML → LadybugDB → JSON
- YAML (
ai-nodes.yaml,ai-edges.yaml) is the human-editable source of truth. All changes land here. - LadybugDB (embedded columnar graph DB) is the canonical storage. Validation happens at import time.
- JSON (
dist/data/graph.json) is the immutable artifact consumed by the browser. Byte-stable for identical input; sorted deterministically (nodes by ID, edges by source→target→type).
See docs/spec/00-project-spec.md for the full spec (§3: Storage Model, §4: Pipeline Contracts).
The full schema is documented in the header of ai-nodes.yaml. Briefly:
| Field | Notes |
|---|---|
id |
kebab-case slug, stable identifier |
name |
canonical display name |
aliases (optional) |
common alternative names |
branch |
one of 11 fixed groupings (see file header) |
type |
algorithm · method · model-class · system · framework · math-construct |
era |
year or decade of significant introduction |
status |
foundational · active · legacy · emerging · dormant |
descriptor |
one technical sentence |
anchors |
list of concrete, verifiable real-world uses |
PRs are welcome. The bar for node inclusion:
- Atomic. One concept per node. If you find yourself writing "…and also…", it's probably two nodes.
- Concrete
anchors. Named systems, products, papers, or deployments — not "used in industry" or "widely applied". - One-sentence
descriptor. Resist the urge to expand. - Existing branches only. If a concept doesn't obviously fit, it belongs in
cross-cutting, not a new branch. - YAML must parse. Quick check:
python3 -c "import yaml; yaml.safe_load(open('ai-nodes.yaml'))"
Good PRs to open:
- Missing nodes within an existing branch.
- Better
anchorsfor an existing node (more concrete, more verifiable). - Tightening a
descriptorthat has drifted into two sentences. - New
aliasesthat people actually use in the wild.
Not yet open. Edges are populated via a three-stage pipeline (bulk-seed from Wikidata → LLM-densify → human curation). Details will be documented in Phase 2.
Open an issue first if you want to argue for:
- A new branch
- A schema change
- Removing or merging an existing node
Version 0.1.0. Early, growing, opinionated about conciseness.
Dual:
- Content —
ai-nodes.yamland any future data / docs are licensed under CC BY 4.0. SeeLICENSE-CC-BY-4.0. Use and adapt freely with attribution. - Code — any code that lands in this repo is licensed under MIT. See
LICENSE-MIT. No code lives here yet, but the license is in place for when it does.