beyond-chatbot

LLMs are one branch of a much larger tree. This repo is a growing knowledge graph of the statistics, data science, and AI that lives beyond — and beneath — the chatbot.

The idea

Every loss function, every optimizer, every probabilistic primitive, every classical ML method, every signal-processing trick a modern chatbot stands on top of has a name, a history, and a place in a larger map. Most of that map is invisible in the day-to-day discourse around LLMs.

This project makes the map navigable, one atomic node at a time:

A node is a single concept — an algorithm, model class, framework, method, system, or mathematical construct.
Each node has a one-sentence technical descriptor and a list of concrete real-world deployments. No hand-waving.
Edges between nodes are deliberately omitted. You draw the graph. Different consumers want different topologies — taxonomic, historical, dependency, pedagogical — and a fixed edge set would foreclose that.

What's in the repo today

File	What it is
`ai-nodes.yaml`	The catalog. 165 nodes across 11 branches, from symbolic AI through reinforcement learning, with statistics, optimization, information theory, and signal processing as substrates.
`dist/data/graph.json`	Browser artifact (generated by build). Immutable snapshot of nodes + edges consumed by the visualization.
`schema/graph.cypher`	LadybugDB schema (Node and Edge tables).
`scripts/import-yaml.ts`	YAML → LadybugDB importer. Validates all nodes and edges.
`scripts/export-json.ts`	LadybugDB → JSON exporter. Produces deterministic `dist/data/graph.json`.
`scripts/export-yaml.ts`	LadybugDB → YAML exporter. Proves round-trip: import → export → diff has no semantic diff.
`graph.html`	Cytoscape.js visualization (WIP: will fetch `dist/data/graph.json` instead of YAML at runtime).

The catalog is the substance; the build pipeline (YAML → LadybugDB → JSON) is the machinery.

Building the project

Requirements

Node.js ≥14.15.0
pnpm ≥8.0.0 (enforced via .npmrc)

Build steps

# Install dependencies
pnpm install

# Compile TypeScript and generate dist/data/graph.json
pnpm run build

What the build does:

Compile TypeScript → JavaScript (scripts/, test/)
Import ai-nodes.yaml + ai-edges.yaml → .ladybugdb/ (embedded columnar DB)
Export LadybugDB → dist/data/graph.json (browser artifact)

Output artifacts:

dist/data/graph.json — nodes (with degree), edges, metadata; ~1.3 KB
.ladybugdb/ — transient database; gitignored

Performance: <30 seconds from clean checkout.

Development commands

pnpm test              # Run test suite (46 tests: schema, importer, round-trip validation)
pnpm test:watch       # Watch mode
pnpm test:coverage    # Coverage report (v8 provider)
pnpm run import       # Just YAML → LadybugDB
pnpm run export       # Just LadybugDB → JSON
pnpm run export:yaml  # LadybugDB → YAML (round-trip validation)

The build pipeline

YAML → LadybugDB → JSON

YAML (ai-nodes.yaml, ai-edges.yaml) is the human-editable source of truth. All changes land here.
LadybugDB (embedded columnar graph DB) is the canonical storage. Validation happens at import time.
JSON (dist/data/graph.json) is the immutable artifact consumed by the browser. Byte-stable for identical input; sorted deterministically (nodes by ID, edges by source→target→type).

See docs/spec/00-project-spec.md for the full spec (§3: Storage Model, §4: Pipeline Contracts).

The node schema

The full schema is documented in the header of ai-nodes.yaml. Briefly:

Field	Notes
`id`	kebab-case slug, stable identifier
`name`	canonical display name
`aliases` (optional)	common alternative names
`branch`	one of 11 fixed groupings (see file header)
`type`	`algorithm` · `method` · `model-class` · `system` · `framework` · `math-construct`
`era`	year or decade of significant introduction
`status`	`foundational` · `active` · `legacy` · `emerging` · `dormant`
`descriptor`	one technical sentence
`anchors`	list of concrete, verifiable real-world uses

Contributing

Adding or improving nodes

PRs are welcome. The bar for node inclusion:

Atomic. One concept per node. If you find yourself writing "…and also…", it's probably two nodes.
Concrete anchors. Named systems, products, papers, or deployments — not "used in industry" or "widely applied".
One-sentence descriptor. Resist the urge to expand.
Existing branches only. If a concept doesn't obviously fit, it belongs in cross-cutting, not a new branch.

YAML must parse. Quick check:

python3 -c "import yaml; yaml.safe_load(open('ai-nodes.yaml'))"

Good PRs to open:

Missing nodes within an existing branch.
Better anchors for an existing node (more concrete, more verifiable).
Tightening a descriptor that has drifted into two sentences.
New aliases that people actually use in the wild.

Contributing edges (Phase 2)

Not yet open. Edges are populated via a three-stage pipeline (bulk-seed from Wikidata → LLM-densify → human curation). Details will be documented in Phase 2.

Structural changes

Open an issue first if you want to argue for:

A new branch
A schema change
Removing or merging an existing node

Status

Version 0.1.0. Early, growing, opinionated about conciseness.

License

Dual:

Content — ai-nodes.yaml and any future data / docs are licensed under CC BY 4.0. See LICENSE-CC-BY-4.0. Use and adapt freely with attribution.
Code — any code that lands in this repo is licensed under MIT. See LICENSE-MIT. No code lives here yet, but the license is in place for when it does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

beyond-chatbot

The idea

What's in the repo today

Building the project

Requirements

Build steps

Development commands

The build pipeline

The node schema

Contributing

Adding or improving nodes

Contributing edges (Phase 2)

Structural changes

Status

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs/spec		docs/spec
schema		schema
scripts		scripts
test		test
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE-CC-BY-4.0		LICENSE-CC-BY-4.0
LICENSE-MIT		LICENSE-MIT
README.md		README.md
ai-edges.yaml		ai-edges.yaml
ai-nodes.yaml		ai-nodes.yaml
graph.html		graph.html
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

beyond-chatbot

The idea

What's in the repo today

Building the project

Requirements

Build steps

Development commands

The build pipeline

The node schema

Contributing

Adding or improving nodes

Contributing edges (Phase 2)

Structural changes

Status

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages