An interactive knowledge graph covering statistical learning concepts, methods, and models — and the relationships between them.
307 concepts · 497 relationships · 6 domains · 8 relationship types
This project maps the conceptual structure of statistical learning as a directed graph. Every concept (OLS, MLE, confidence intervals, logistic regression...) is a node. Every meaningful relationship between concepts (A requires B, A assumes B, A produces B...) is a typed, directed edge.
The result is a navigable knowledge structure you can explore by concept, by domain, or by relationship type — and update every quarter as new material is covered.
/
├── kg-project/ ← data layer (source of truth)
│ ├── domains/ ← one .txt file per domain, lists all nodes
│ │ ├── probability_theory.txt
│ │ ├── probability_distributions.txt
│ │ ├── statistical_inference.txt
│ │ ├── regression_and_linear_models.txt
│ │ ├── generalized_linear_models.txt
│ │ └── model_evaluation_and_selection.txt
│ ├── edges/ ← one .json file per relationship type
│ │ ├── instance_of.json
│ │ ├── requires.json
│ │ ├── assumes.json
│ │ ├── uses_distribution.json
│ │ ├── measures.json
│ │ ├── produces.json
│ │ ├── corresponds_to.json
│ │ └── implemented_by.json
│ ├── output/ ← auto-generated, do not edit manually
│ │ ├── statistical_learning_kg.json
│ │ └── kg_visualization.html
│ ├── build.py ← validates and rebuilds the graph
│ ├── update_site.py ← syncs data into the React app
│ └── README.md ← data layer docs
│
└── slkg/ ← presentation layer (React app)
├── src/
│ ├── data/graph.ts ← auto-generated by update_site.py
│ ├── pages/ ← Graph, Explore, Concept, Domains, Path, Compare, Edges
│ ├── components/ ← KGCanvas, NodePanel, Layout, UI components
│ ├── lib/ ← graphUtils.ts, constants.ts
│ └── types/ ← TypeScript type definitions
├── package.json
└── vite.config.ts
kg-project owns the data. slkg owns the presentation. update_site.py is the only bridge between them.
| Domain | Concepts |
|---|---|
| Probability Theory | 34 |
| Probability Distributions | 22 |
| Statistical Inference | 107 |
| Regression and Linear Models | 92 |
| Generalized Linear Models | 16 |
| Model Evaluation and Selection | 36 |
These six domains are designed to cover the full scope of statistical learning. New concepts are always added into an existing domain — no new domains should be needed.
| Type | Direction | Meaning |
|---|---|---|
instance_of |
A → B | A is a specific case or member of B |
requires |
A → B | A cannot be defined without B (hard logical dependency) |
assumes |
A → B | A is optimal only when B holds; violating B weakens but doesn't break A |
uses_distribution |
A → B | A operates under or derives from distribution B |
measures |
A → B | A quantifies or diagnoses B |
produces |
A → B | Executing A yields B as direct output |
corresponds_to |
A ↔ B | A and B are structurally dual or symmetric |
implemented_by |
A → B | Abstract concept A is concretely realized through method B |
Key distinction — requires vs assumes:
requires: B's absence makes A undefined or incoherent (e.g. MLE requires likelihood)assumes: B's violation makes A suboptimal but still computable (e.g. OLS assumes homoscedasticity)
- Python 3.8+
- Node.js 18+
cd slkg
npm install
npm run dev
# open http://localhost:5173# open directly in browser — no server needed
open kg-project/output/kg_visualization.htmlDo this at the start of each new quarter, after covering new material.
Open the relevant domain file in kg-project/domains/ and append new lines at the bottom (above any # need to audit section if present).
Format:
id | Canonical Name | node_type | structural_role
Rules:
idmust besnake_case, all lowercase, unique across all domainsnode_type: one ofConcept,Method,Model,Conceptual Organizerstructural_role: one ofCore,Branch,Subbranch,Leaf
Example (adding to statistical_inference.txt):
variational_inference | Variational Inference | Method | Subbranch
elbo | Evidence Lower Bound | Concept | Leaf
mean_field_approximation | Mean Field Approximation | Method | Leaf
Role guidance:
Core— top-level organising concept for a cluster (use sparingly)Branch— mid-level grouping conceptSubbranch— intermediate grouping, more specific than BranchLeaf— concrete concept, method, or model (most new nodes will be Leaf)
cd kg-project
python build.py --validateFix any errors (duplicate ids, invalid formats) before continuing. Warnings can be left for now.
Open a new conversation with Claude and send this message:
I'm updating my Statistical Learning Knowledge Graph with new nodes.
Please generate edges for the new nodes listed below.
New nodes:
[paste the lines you added in Step 1]
Use the edge schema:
- instance_of: A is a specific case of B
- requires: A cannot be defined without B
- assumes: A needs B for optimality but works without it
- uses_distribution: A uses distribution B
- measures: A quantifies B
- produces: executing A yields B
- corresponds_to: A and B are dual/symmetric
- implemented_by: abstract A is realized by method B
For each edge, provide JSON in this format:
{
"source": {"id": "...", "canonical_name": "...", "domain": "..."},
"target": {"id": "...", "canonical_name": "...", "domain": "..."},
"edge_type": "...",
"confidence": 0.0–1.0,
"generated_by": "llm",
"notes": ""
}
Run Agent 1 (instance_of), Agent 2 (uses_distribution), Agent 3 (requires + assumes),
Agent 4 (measures + produces), Agent 5 (corresponds_to + implemented_by),
then Agent 6 semantic review.
Claude will return edge JSON grouped by type.
For each edge type that has new edges, open kg-project/edges/<type>.json and append the new edge objects to the "edges" array.
Example — adding to kg-project/edges/requires.json:
{
"edge_type": "requires",
"description": "...",
"edges": [
...existing edges...,
{
"source": {"id": "variational_inference", "canonical_name": "Variational Inference", "domain": "Statistical Inference"},
"target": {"id": "posterior_distribution", "canonical_name": "Posterior Distribution", "domain": "Statistical Inference"},
"edge_type": "requires",
"confidence": 0.95,
"generated_by": "llm",
"notes": "Variational inference approximates the posterior; requires the posterior concept to be defined."
}
]
}cd kg-project
python build.pyIf there are errors, fix them. Warnings are acceptable (low-confidence edges, borderline cases).
Expected output:
[1/5] Loading nodes ... 307+ nodes loaded, 0 errors
[2/5] Loading edges ... 497+ edges loaded
[3/5] Validating ... All checks passed.
[4/5] Building graph ... output/statistical_learning_kg.json
[5/5] Updating viz ... output/kg_visualization.html
Build complete.
python update_site.pyThis rewrites slkg/src/data/graph.ts with the new data.
cd ../slkg
npm run dev
# open http://localhost:5173
# verify new concepts appear in Graph and Explore pagescd slkg
npm run build
git add .
git commit -m "Q[N] update: add [X] new concepts"
git push
# Vercel detects the push and auto-deployspython build.py --validate checks all of these automatically:
| Code | Rule |
|---|---|
| L1-01 | No self-loops (source ≠ target) |
| L1-02 | All domain names are valid (one of the six) |
| L1-03 | All node ids exist in domain list |
| L1-04 | Confidence in [0.0, 1.0] |
| L1-05 | Low-confidence edges (< 0.85) must have a note explaining why |
| L1-06 | generated_by is one of: auto, llm, human |
| L1-07 | edge_type field matches the filename it lives in |
| L1-08 | instance_of edges use generated_by: auto or human, confidence 1.0 |
| L1-09 | uses_distribution targets must be in Probability Distributions domain |
| L1-10 | No duplicate (source, target, edge_type) triples across all files |
| L1-11 | Same node pair cannot appear in both requires and assumes |
| L2-01 | corresponds_to edges follow alphabetical source < target convention |
Data layer (kg-project)
- Python 3 — no dependencies beyond the standard library
build.py— validation, graph construction, visualization updateupdate_site.py— data sync bridge to React app
Presentation layer (slkg)
- React 18 + TypeScript
- Vite (build tool)
- Tailwind CSS (styling)
- D3 v7 (force simulation, loaded via CDN)
- React Router v6 (page routing)
| Page | Route | Description |
|---|---|---|
| Graph | /graph |
Interactive force-directed graph, ego (radial) view on double-click |
| Explore | /explore |
Browse all concepts with domain/role/type filters |
| Concept | /concept/:id |
Full detail page for a single concept — all relationships grouped by type |
| Domains | /domains |
Domain overview with node counts, role breakdowns, cross-domain edge stats |
| Learning Path | /path |
BFS shortest-path finder between any two concepts |
| Compare | /compare |
Side-by-side structural comparison of two concepts |
| Edge Explorer | /edges |
Browse all edges by type, sortable and filterable |