Statistical Learning Knowledge Graph

An interactive knowledge graph covering statistical learning concepts, methods, and models — and the relationships between them.

307 concepts · 497 relationships · 6 domains · 8 relationship types

What this is

This project maps the conceptual structure of statistical learning as a directed graph. Every concept (OLS, MLE, confidence intervals, logistic regression...) is a node. Every meaningful relationship between concepts (A requires B, A assumes B, A produces B...) is a typed, directed edge.

The result is a navigable knowledge structure you can explore by concept, by domain, or by relationship type — and update every quarter as new material is covered.

Repository structure

/
├── kg-project/              ← data layer (source of truth)
│   ├── domains/             ← one .txt file per domain, lists all nodes
│   │   ├── probability_theory.txt
│   │   ├── probability_distributions.txt
│   │   ├── statistical_inference.txt
│   │   ├── regression_and_linear_models.txt
│   │   ├── generalized_linear_models.txt
│   │   └── model_evaluation_and_selection.txt
│   ├── edges/               ← one .json file per relationship type
│   │   ├── instance_of.json
│   │   ├── requires.json
│   │   ├── assumes.json
│   │   ├── uses_distribution.json
│   │   ├── measures.json
│   │   ├── produces.json
│   │   ├── corresponds_to.json
│   │   └── implemented_by.json
│   ├── output/              ← auto-generated, do not edit manually
│   │   ├── statistical_learning_kg.json
│   │   └── kg_visualization.html
│   ├── build.py             ← validates and rebuilds the graph
│   ├── update_site.py       ← syncs data into the React app
│   └── README.md            ← data layer docs
│
└── slkg/                    ← presentation layer (React app)
    ├── src/
    │   ├── data/graph.ts    ← auto-generated by update_site.py
    │   ├── pages/           ← Graph, Explore, Concept, Domains, Path, Compare, Edges
    │   ├── components/      ← KGCanvas, NodePanel, Layout, UI components
    │   ├── lib/             ← graphUtils.ts, constants.ts
    │   └── types/           ← TypeScript type definitions
    ├── package.json
    └── vite.config.ts

kg-project owns the data. slkg owns the presentation. update_site.py is the only bridge between them.

Domains

Domain	Concepts
Probability Theory	34
Probability Distributions	22
Statistical Inference	107
Regression and Linear Models	92
Generalized Linear Models	16
Model Evaluation and Selection	36

These six domains are designed to cover the full scope of statistical learning. New concepts are always added into an existing domain — no new domains should be needed.

Relationship types

Type	Direction	Meaning
`instance_of`	A → B	A is a specific case or member of B
`requires`	A → B	A cannot be defined without B (hard logical dependency)
`assumes`	A → B	A is optimal only when B holds; violating B weakens but doesn't break A
`uses_distribution`	A → B	A operates under or derives from distribution B
`measures`	A → B	A quantifies or diagnoses B
`produces`	A → B	Executing A yields B as direct output
`corresponds_to`	A ↔ B	A and B are structurally dual or symmetric
`implemented_by`	A → B	Abstract concept A is concretely realized through method B

Key distinction — requires vs assumes:

requires: B's absence makes A undefined or incoherent (e.g. MLE requires likelihood)
assumes: B's violation makes A suboptimal but still computable (e.g. OLS assumes homoscedasticity)

Running locally

Prerequisites

Python 3.8+
Node.js 18+

React app (slkg)

cd slkg
npm install
npm run dev
# open http://localhost:5173

Standalone HTML visualization (no build step)

# open directly in browser — no server needed
open kg-project/output/kg_visualization.html

Quarterly update workflow

Do this at the start of each new quarter, after covering new material.

Step 1 — Add new nodes

Open the relevant domain file in kg-project/domains/ and append new lines at the bottom (above any # need to audit section if present).

Format:

id | Canonical Name | node_type | structural_role

Rules:

id must be snake_case, all lowercase, unique across all domains
node_type: one of Concept, Method, Model, Conceptual Organizer
structural_role: one of Core, Branch, Subbranch, Leaf

Example (adding to statistical_inference.txt):

variational_inference | Variational Inference | Method | Subbranch
elbo | Evidence Lower Bound | Concept | Leaf
mean_field_approximation | Mean Field Approximation | Method | Leaf

Role guidance:

Core — top-level organising concept for a cluster (use sparingly)
Branch — mid-level grouping concept
Subbranch — intermediate grouping, more specific than Branch
Leaf — concrete concept, method, or model (most new nodes will be Leaf)

Step 2 — Validate the new nodes

cd kg-project
python build.py --validate

Fix any errors (duplicate ids, invalid formats) before continuing. Warnings can be left for now.

Step 3 — Generate edges with Claude

Open a new conversation with Claude and send this message:

I'm updating my Statistical Learning Knowledge Graph with new nodes.
Please generate edges for the new nodes listed below.

New nodes:
[paste the lines you added in Step 1]

Use the edge schema:
- instance_of: A is a specific case of B
- requires: A cannot be defined without B
- assumes: A needs B for optimality but works without it
- uses_distribution: A uses distribution B
- measures: A quantifies B
- produces: executing A yields B
- corresponds_to: A and B are dual/symmetric
- implemented_by: abstract A is realized by method B

For each edge, provide JSON in this format:
{
  "source": {"id": "...", "canonical_name": "...", "domain": "..."},
  "target": {"id": "...", "canonical_name": "...", "domain": "..."},
  "edge_type": "...",
  "confidence": 0.0–1.0,
  "generated_by": "llm",
  "notes": ""
}

Run Agent 1 (instance_of), Agent 2 (uses_distribution), Agent 3 (requires + assumes),
Agent 4 (measures + produces), Agent 5 (corresponds_to + implemented_by),
then Agent 6 semantic review.

Claude will return edge JSON grouped by type.

Step 4 — Merge new edges

For each edge type that has new edges, open kg-project/edges/<type>.json and append the new edge objects to the "edges" array.

Example — adding to kg-project/edges/requires.json:

{
  "edge_type": "requires",
  "description": "...",
  "edges": [
    ...existing edges...,
    {
      "source": {"id": "variational_inference", "canonical_name": "Variational Inference", "domain": "Statistical Inference"},
      "target": {"id": "posterior_distribution", "canonical_name": "Posterior Distribution", "domain": "Statistical Inference"},
      "edge_type": "requires",
      "confidence": 0.95,
      "generated_by": "llm",
      "notes": "Variational inference approximates the posterior; requires the posterior concept to be defined."
    }
  ]
}

Step 5 — Rebuild the graph

cd kg-project
python build.py

If there are errors, fix them. Warnings are acceptable (low-confidence edges, borderline cases).

Expected output:

[1/5] Loading nodes ...   307+ nodes loaded, 0 errors
[2/5] Loading edges ...   497+ edges loaded
[3/5] Validating ...      All checks passed.
[4/5] Building graph ...  output/statistical_learning_kg.json
[5/5] Updating viz ...    output/kg_visualization.html
Build complete.

Step 6 — Sync to the React app

python update_site.py

This rewrites slkg/src/data/graph.ts with the new data.

Step 7 — Test locally

cd ../slkg
npm run dev
# open http://localhost:5173
# verify new concepts appear in Graph and Explore pages

Step 8 — Deploy

cd slkg
npm run build
git add .
git commit -m "Q[N] update: add [X] new concepts"
git push
# Vercel detects the push and auto-deploys

Validation rules

python build.py --validate checks all of these automatically:

Code	Rule
L1-01	No self-loops (source ≠ target)
L1-02	All domain names are valid (one of the six)
L1-03	All node ids exist in domain list
L1-04	Confidence in [0.0, 1.0]
L1-05	Low-confidence edges (< 0.85) must have a note explaining why
L1-06	`generated_by` is one of: `auto`, `llm`, `human`
L1-07	`edge_type` field matches the filename it lives in
L1-08	`instance_of` edges use `generated_by: auto` or `human`, confidence 1.0
L1-09	`uses_distribution` targets must be in Probability Distributions domain
L1-10	No duplicate (source, target, edge_type) triples across all files
L1-11	Same node pair cannot appear in both `requires` and `assumes`
L2-01	`corresponds_to` edges follow alphabetical source < target convention

Tech stack

Data layer (kg-project)

Python 3 — no dependencies beyond the standard library
build.py — validation, graph construction, visualization update
update_site.py — data sync bridge to React app

Presentation layer (slkg)

React 18 + TypeScript
Vite (build tool)
Tailwind CSS (styling)
D3 v7 (force simulation, loaded via CDN)
React Router v6 (page routing)

Pages

Page	Route	Description
Graph	`/graph`	Interactive force-directed graph, ego (radial) view on double-click
Explore	`/explore`	Browse all concepts with domain/role/type filters
Concept	`/concept/:id`	Full detail page for a single concept — all relationships grouped by type
Domains	`/domains`	Domain overview with node counts, role breakdowns, cross-domain edge stats
Learning Path	`/path`	BFS shortest-path finder between any two concepts
Compare	`/compare`	Side-by-side structural comparison of two concepts
Edge Explorer	`/edges`	Browse all edges by type, sortable and filterable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistical Learning Knowledge Graph

What this is

Repository structure

Domains

Relationship types

Running locally

Prerequisites

React app (slkg)

Standalone HTML visualization (no build step)

Quarterly update workflow

Step 1 — Add new nodes

Step 2 — Validate the new nodes

Step 3 — Generate edges with Claude

Step 4 — Merge new edges

Step 5 — Rebuild the graph

Step 6 — Sync to the React app

Step 7 — Test locally

Step 8 — Deploy

Validation rules

Tech stack

Pages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
kg-project		kg-project
slkg		slkg
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Statistical Learning Knowledge Graph

What this is

Repository structure

Domains

Relationship types

Running locally

Prerequisites

React app (slkg)

Standalone HTML visualization (no build step)

Quarterly update workflow

Step 1 — Add new nodes

Step 2 — Validate the new nodes

Step 3 — Generate edges with Claude

Step 4 — Merge new edges

Step 5 — Rebuild the graph

Step 6 — Sync to the React app

Step 7 — Test locally

Step 8 — Deploy

Validation rules

Tech stack

Pages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages