Turn any text into a knowledge graph. Find the gaps. Get AI-grounded insight.
Self-hostable. AGPL. Algorithm parity with the academic state-of-the-art on every benchmark we've measured.
Reading isn't the bottleneck anymore. Synthesizing is.
You have a hundred research notes, a dozen tabs of articles on the same topic, a transcript of an interview, a draft you can't quite finish — and somewhere in that pile is a structure you can't see. Which concepts are central? Which are isolated? What bridges what? What's missing?
Noosphere answers those questions. Feed it text — anything from a paragraph to a corpus — and it returns a concept graph with cluster structure, structural gaps, centrality rankings, and (if you want) LLM-generated advice grounded in the graph itself rather than hallucinated.
Same idea as InfraNodus. Self-hosted. AGPL. No subscription. Real algorithm rigour underneath.
| Use case | What Noosphere gives you |
|---|---|
| Personal knowledge management | Map your Obsidian/Logseq/Roam notes as a network. See dominant concepts, isolated ideas, bridges between sub-topics. |
| Research / literature review | Drop in 50 paper abstracts, get a cluster map of sub-fields and the gaps where research is sparse. |
| Content & SEO strategy | webSearchVsIntent returns the gap between what users ask and what top-ranking content covers — a content brief in JSON. |
| Discourse analysis | Qualitative researchers analyzing transcripts, focus groups, social media discourse. |
| Writing | Paste your draft. See which concepts are central, which are underdeveloped, where bridges are weak. |
| Teaching | Students map their understanding; teachers see what's covered vs. missing. |
| AI workflows | Wire Noosphere as an MCP server (planned) so Claude/Cursor/Zed can read your notes as a graph and answer "what's the bridge between my project notes about X and Y?" |
- 14 InfraNodus-compatible REST endpoints under
/api/v1/ - Graph engine — co-occurrence graph from sliding-window text analysis with deterministic UUIDv5 node ids
- Two community-detection algorithms: Louvain + Leiden refinement (default) and Infomap (opt-in, better ground-truth recovery on benchmarks). Multi-resolution sweep, 64 deterministic restarts in parallel.
- Five centralities: betweenness (Brandes), degree, closeness, eigenvector, PageRank
- Structural gap detection between communities (Burt's structural holes)
- Force-directed layout (Fruchterman–Reingold, Barnes–Hut quadtree) for visualization coordinates
- Eight built-in language modules with auto-detection: English, German, French, Spanish, Portuguese, Russian, Japanese, Chinese
- Web search integration via SearXNG — fetch SERP results / search-intent suggestions, build graphs from them, compare them
- LLM advice through any OpenAI-compatible endpoint (OpenAI, LM Studio, Ollama, vLLM, llama.cpp server) — six "lens" modes: develop, reinforce, gaps, latent, imagine, optimize
- Comparison engine — merge / overlap / difference between any number of texts, optionally with AI advice grounded in the comparison
- API key auth + Redis rate limiting with optional Authentik OIDC layered on
- Multilingual — script-aware tokenization handles Latin, Cyrillic, Hiragana/Katakana, and CJK ideographs without whitespace heuristics
We test against canonical benchmarks with hand-crafted edge lists, real datasets from Mark Newman's network archive, and ground-truth-labeled graphs:
| Benchmark | Metric | Our value | Literature reference |
|---|---|---|---|
| Zachary's Karate Club | Modularity | 0.4198 | 0.4188 (NetworkX) |
| Karate Club | Communities | 4 | 4 |
| Lusseau Dolphins | Modularity | 0.5277 | ~0.52 |
| NCAA Football | Modularity | 0.6046 | ~0.60 |
| NCAA Football | ARI vs. ground truth (Louvain) | 0.807 | 0.80–0.85 |
| NCAA Football | ARI vs. ground truth (Infomap) | 0.897 | 0.85–0.92 |
We match or exceed published values on every benchmark we've measured.
# Prerequisites: .NET 10 SDK, Docker
git clone https://github.com/CySpiegel/noosphere
cd noosphere
dotnet run --project Noosphere.AppHostThe Aspire dashboard opens with health and telemetry for Postgres, Redis, the API, and (when configured) SearXNG. The API is reachable on the URL the dashboard prints.
curl -X POST http://localhost:<port>/api/v1/graphAndStatements \
-H "Content-Type: application/json" \
-d '{"text": "Knowledge graphs organize concepts into clusters. Networks reveal patterns. Bridges expose gaps."}'You get back the full Graphology JSON — nodes with centralities, edges with weights, communities, gaps, and structural metrics.
Noosphere talks to any OpenAI-compatible chat-completions endpoint. Set in appsettings.json:
{
"Llm": {
"BaseUrl": "https://api.openai.com", // or http://localhost:1234 for LM Studio
"DefaultModel": "gpt-4o",
"ApiKey": "sk-..."
}
}Verified live against OpenAI, LM Studio, and Ollama. MaxTokens = 0 lets the server decide.
Run any SearXNG instance, then set WebSearch:BaseUrl in appsettings.json (or WebSearch__BaseUrl env var). Endpoints 9–14 (/api/v1/import/webSearch*) light up automatically; without the config they stay disabled.
All endpoints under /api/v1/:
| Endpoint | What it does |
|---|---|
POST /graphAndStatements |
Text → graph + statements + summary |
POST /graphAndAdvice |
Text → graph + AI advice |
POST /dotGraph |
Graphology JSON → DOT (Graphviz) |
POST /dotGraphFromText |
Text → DOT |
POST /graphAiAdvice |
Existing graph → AI advice |
POST /listGraphs |
List your saved graphs (filterable) |
POST /search |
Search statement content across saved graphs → build graph from results |
POST /compareGraphs |
Multi-context merge / overlap / difference |
POST /graphsAndAiAdvice |
Same comparison + AI advice grounded in the merged graph |
POST /import/webSearchResultsGraph |
SERP results → graph |
POST /import/webSearchResultsAiAdvice |
SERP results → graph + advice |
POST /import/webSearchIntentGraph |
Related-queries / "people also ask" → graph |
POST /import/webSearchIntentAiAdvice |
Intent → graph + advice |
POST /import/webSearchVsIntentGraph |
Content vs. demand gap — what users want that content doesn't cover |
POST /import/webSearchVsIntentAiAdvice |
Same + AI advice |
Response shape is the standard Graphology JSON wrapped in InfraNodus-compatible envelope keys.
| Concern | Self-host (Noosphere) | Hosted alternatives |
|---|---|---|
| Cost | Free | $14–$45 / month |
| Privacy | Your text never leaves your machine | Sent to a third party |
| LLM choice | Any OpenAI-compatible endpoint, including local | Their LLM, their pricing |
| Customization | Source-available, AGPL — extend or fork | Closed |
| Search backend | Self-hosted SearXNG (no API key) | Google API key required |
| Multi-tenancy | API keys + per-user rate limits built in | Per-seat pricing |
- MCP server — expose every endpoint as MCP tools so Claude/Cursor/Zed can read your notes as a knowledge graph (Phase 5)
- React frontend — interactive Sigma.js graph canvas with cluster/gap/statement panels (Phase 5)
- Obsidian plugin — analyze the active note or your whole vault (planned, separate repo)
- Docker production stack + Traefik + EF migrations + Authentik OIDC (Phase 6)
- Statement-aware community detection prior — bias clusters by which sentences tokens co-occur in (Phase 6)
- Overlapping communities — concepts that legitimately belong to multiple clusters (Phase 6)
See docs/phases/ for the full roadmap and what's done.
| Layer | Tech |
|---|---|
| Runtime | .NET 10 |
| Orchestration | .NET Aspire 13 |
| API | ASP.NET Core Minimal API |
| Database | PostgreSQL 17 + EF Core 10 |
| Cache / rate limiting | Redis 7 |
| LLM | Any OpenAI-compatible endpoint |
| Web search | SearXNG (optional) |
| Frontend (planned) | React 19 + TypeScript + Sigma.js v3 |
Zero external NLP dependencies. Tokenization, stopwords, lemmatization, POS tagging — all pure C# with pluggable language modules. No Python sidecar, no spaCy, no NLTK.
Zero external graph library. TextGraph is a custom adjacency-list with O(1) lookups. No Neo4j, no JGraphT.
547 non-live tests + 9 live integration tests against real Postgres, Redis, OpenAI-compatible LLMs, and SearXNG. CI-friendly: live tests skip silently when their backends are unreachable.
| Suite | Tests | Verifies |
|---|---|---|
| Algorithm correctness | Karate / Florentine / Krackhardt | Match academic literature on canonical small graphs |
| Ground-truth parity | Dolphins / NCAA Football | Match published modularity + ARI on real datasets |
| Self-consistency invariants | Multiple | Determinism, conservation laws, comparison-engine identities |
| Schema conformance | Per endpoint | InfraNodus-compatible JSON shape pinned |
| Property-based | Erdős–Rényi, Barabási–Albert, two-cliques | Universal graph properties hold for randomly-generated input |
| NLP edge cases | 22 | Emoji, RTL, mixed scripts, very long sentences, hashtags |
| Full stack E2E | API + Postgres + Redis + fake LLM | /listGraphs, /search, saved-graph compare end-to-end |
| Live LLM | OpenAI-compatible | Real API calls against your configured endpoint |
| Live web search | SearXNG | Real SERPs + intent extraction |
- docs/ARCHITECTURE.md — full system design
- docs/CODING_STANDARDS.md — required style and conventions
- docs/phases/ — phased implementation roadmap with completion status
- CLAUDE.md — concise guide for AI assistants working in this repo
docs/API_REFERENCE.md(planned, Phase 6)docs/ALGORITHMS.md(planned, Phase 6)docs/DEPLOYMENT.md(planned, Phase 6)
AGPL-3.0. Use it, fork it, run it on your own infrastructure. Network use counts as distribution — if you host a public instance, your modifications must be source-available too.
Issues and PRs welcome. The codebase is structured around small, well-tested slices — see docs/CODING_STANDARDS.md for what "well-tested" means here (every algorithmic claim is pinned by a test against a published reference value).