Skip to content

CySpiegel/Noosphere

Repository files navigation

Noosphere

Turn any text into a knowledge graph. Find the gaps. Get AI-grounded insight.

Self-hostable. AGPL. Algorithm parity with the academic state-of-the-art on every benchmark we've measured.


Why it exists

Reading isn't the bottleneck anymore. Synthesizing is.

You have a hundred research notes, a dozen tabs of articles on the same topic, a transcript of an interview, a draft you can't quite finish — and somewhere in that pile is a structure you can't see. Which concepts are central? Which are isolated? What bridges what? What's missing?

Noosphere answers those questions. Feed it text — anything from a paragraph to a corpus — and it returns a concept graph with cluster structure, structural gaps, centrality rankings, and (if you want) LLM-generated advice grounded in the graph itself rather than hallucinated.

Same idea as InfraNodus. Self-hosted. AGPL. No subscription. Real algorithm rigour underneath.


What you can do with it

Use case What Noosphere gives you
Personal knowledge management Map your Obsidian/Logseq/Roam notes as a network. See dominant concepts, isolated ideas, bridges between sub-topics.
Research / literature review Drop in 50 paper abstracts, get a cluster map of sub-fields and the gaps where research is sparse.
Content & SEO strategy webSearchVsIntent returns the gap between what users ask and what top-ranking content covers — a content brief in JSON.
Discourse analysis Qualitative researchers analyzing transcripts, focus groups, social media discourse.
Writing Paste your draft. See which concepts are central, which are underdeveloped, where bridges are weak.
Teaching Students map their understanding; teachers see what's covered vs. missing.
AI workflows Wire Noosphere as an MCP server (planned) so Claude/Cursor/Zed can read your notes as a graph and answer "what's the bridge between my project notes about X and Y?"

What's in the box (today)

  • 14 InfraNodus-compatible REST endpoints under /api/v1/
  • Graph engine — co-occurrence graph from sliding-window text analysis with deterministic UUIDv5 node ids
  • Two community-detection algorithms: Louvain + Leiden refinement (default) and Infomap (opt-in, better ground-truth recovery on benchmarks). Multi-resolution sweep, 64 deterministic restarts in parallel.
  • Five centralities: betweenness (Brandes), degree, closeness, eigenvector, PageRank
  • Structural gap detection between communities (Burt's structural holes)
  • Force-directed layout (Fruchterman–Reingold, Barnes–Hut quadtree) for visualization coordinates
  • Eight built-in language modules with auto-detection: English, German, French, Spanish, Portuguese, Russian, Japanese, Chinese
  • Web search integration via SearXNG — fetch SERP results / search-intent suggestions, build graphs from them, compare them
  • LLM advice through any OpenAI-compatible endpoint (OpenAI, LM Studio, Ollama, vLLM, llama.cpp server) — six "lens" modes: develop, reinforce, gaps, latent, imagine, optimize
  • Comparison engine — merge / overlap / difference between any number of texts, optionally with AI advice grounded in the comparison
  • API key auth + Redis rate limiting with optional Authentik OIDC layered on
  • Multilingual — script-aware tokenization handles Latin, Cyrillic, Hiragana/Katakana, and CJK ideographs without whitespace heuristics

Algorithm quality vs. literature

We test against canonical benchmarks with hand-crafted edge lists, real datasets from Mark Newman's network archive, and ground-truth-labeled graphs:

Benchmark Metric Our value Literature reference
Zachary's Karate Club Modularity 0.4198 0.4188 (NetworkX)
Karate Club Communities 4 4
Lusseau Dolphins Modularity 0.5277 ~0.52
NCAA Football Modularity 0.6046 ~0.60
NCAA Football ARI vs. ground truth (Louvain) 0.807 0.80–0.85
NCAA Football ARI vs. ground truth (Infomap) 0.897 0.85–0.92

We match or exceed published values on every benchmark we've measured.


Getting started

Quickstart (Aspire dashboard, full stack)

# Prerequisites: .NET 10 SDK, Docker
git clone https://github.com/CySpiegel/noosphere
cd noosphere
dotnet run --project Noosphere.AppHost

The Aspire dashboard opens with health and telemetry for Postgres, Redis, the API, and (when configured) SearXNG. The API is reachable on the URL the dashboard prints.

First request

curl -X POST http://localhost:<port>/api/v1/graphAndStatements \
  -H "Content-Type: application/json" \
  -d '{"text": "Knowledge graphs organize concepts into clusters. Networks reveal patterns. Bridges expose gaps."}'

You get back the full Graphology JSON — nodes with centralities, edges with weights, communities, gaps, and structural metrics.

Pointing it at your LLM

Noosphere talks to any OpenAI-compatible chat-completions endpoint. Set in appsettings.json:

{
  "Llm": {
    "BaseUrl": "https://api.openai.com",   // or http://localhost:1234 for LM Studio
    "DefaultModel": "gpt-4o",
    "ApiKey": "sk-..."
  }
}

Verified live against OpenAI, LM Studio, and Ollama. MaxTokens = 0 lets the server decide.

Adding web search

Run any SearXNG instance, then set WebSearch:BaseUrl in appsettings.json (or WebSearch__BaseUrl env var). Endpoints 9–14 (/api/v1/import/webSearch*) light up automatically; without the config they stay disabled.


API at a glance

All endpoints under /api/v1/:

Endpoint What it does
POST /graphAndStatements Text → graph + statements + summary
POST /graphAndAdvice Text → graph + AI advice
POST /dotGraph Graphology JSON → DOT (Graphviz)
POST /dotGraphFromText Text → DOT
POST /graphAiAdvice Existing graph → AI advice
POST /listGraphs List your saved graphs (filterable)
POST /search Search statement content across saved graphs → build graph from results
POST /compareGraphs Multi-context merge / overlap / difference
POST /graphsAndAiAdvice Same comparison + AI advice grounded in the merged graph
POST /import/webSearchResultsGraph SERP results → graph
POST /import/webSearchResultsAiAdvice SERP results → graph + advice
POST /import/webSearchIntentGraph Related-queries / "people also ask" → graph
POST /import/webSearchIntentAiAdvice Intent → graph + advice
POST /import/webSearchVsIntentGraph Content vs. demand gap — what users want that content doesn't cover
POST /import/webSearchVsIntentAiAdvice Same + AI advice

Response shape is the standard Graphology JSON wrapped in InfraNodus-compatible envelope keys.


Why self-host

Concern Self-host (Noosphere) Hosted alternatives
Cost Free $14–$45 / month
Privacy Your text never leaves your machine Sent to a third party
LLM choice Any OpenAI-compatible endpoint, including local Their LLM, their pricing
Customization Source-available, AGPL — extend or fork Closed
Search backend Self-hosted SearXNG (no API key) Google API key required
Multi-tenancy API keys + per-user rate limits built in Per-seat pricing

What's coming

  • MCP server — expose every endpoint as MCP tools so Claude/Cursor/Zed can read your notes as a knowledge graph (Phase 5)
  • React frontend — interactive Sigma.js graph canvas with cluster/gap/statement panels (Phase 5)
  • Obsidian plugin — analyze the active note or your whole vault (planned, separate repo)
  • Docker production stack + Traefik + EF migrations + Authentik OIDC (Phase 6)
  • Statement-aware community detection prior — bias clusters by which sentences tokens co-occur in (Phase 6)
  • Overlapping communities — concepts that legitimately belong to multiple clusters (Phase 6)

See docs/phases/ for the full roadmap and what's done.


Tech stack

Layer Tech
Runtime .NET 10
Orchestration .NET Aspire 13
API ASP.NET Core Minimal API
Database PostgreSQL 17 + EF Core 10
Cache / rate limiting Redis 7
LLM Any OpenAI-compatible endpoint
Web search SearXNG (optional)
Frontend (planned) React 19 + TypeScript + Sigma.js v3

Zero external NLP dependencies. Tokenization, stopwords, lemmatization, POS tagging — all pure C# with pluggable language modules. No Python sidecar, no spaCy, no NLTK.

Zero external graph library. TextGraph is a custom adjacency-list with O(1) lookups. No Neo4j, no JGraphT.


Test coverage

547 non-live tests + 9 live integration tests against real Postgres, Redis, OpenAI-compatible LLMs, and SearXNG. CI-friendly: live tests skip silently when their backends are unreachable.

Suite Tests Verifies
Algorithm correctness Karate / Florentine / Krackhardt Match academic literature on canonical small graphs
Ground-truth parity Dolphins / NCAA Football Match published modularity + ARI on real datasets
Self-consistency invariants Multiple Determinism, conservation laws, comparison-engine identities
Schema conformance Per endpoint InfraNodus-compatible JSON shape pinned
Property-based Erdős–Rényi, Barabási–Albert, two-cliques Universal graph properties hold for randomly-generated input
NLP edge cases 22 Emoji, RTL, mixed scripts, very long sentences, hashtags
Full stack E2E API + Postgres + Redis + fake LLM /listGraphs, /search, saved-graph compare end-to-end
Live LLM OpenAI-compatible Real API calls against your configured endpoint
Live web search SearXNG Real SERPs + intent extraction

Documentation

  • docs/ARCHITECTURE.md — full system design
  • docs/CODING_STANDARDS.md — required style and conventions
  • docs/phases/ — phased implementation roadmap with completion status
  • CLAUDE.md — concise guide for AI assistants working in this repo
  • docs/API_REFERENCE.md (planned, Phase 6)
  • docs/ALGORITHMS.md (planned, Phase 6)
  • docs/DEPLOYMENT.md (planned, Phase 6)

License

AGPL-3.0. Use it, fork it, run it on your own infrastructure. Network use counts as distribution — if you host a public instance, your modifications must be source-available too.


Contributing

Issues and PRs welcome. The codebase is structured around small, well-tested slices — see docs/CODING_STANDARDS.md for what "well-tested" means here (every algorithmic claim is pinned by a test against a published reference value).

About

Turn any text into a knowledge graph. Find the conceptual gaps. Get AI-grounded insight. Self-hostable, AGPL, drop-in InfraNodus-compatible REST API. Louvain+Leiden+Infomap community detection at academic literature parity. 8 languages, web-search integration, OpenAI-compatible LLM. .NET 10 + Postgres.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors