Stop guessing. Start decomposing. Structured decomposition for LLM prompts β breaks complex questions into a dependency graph of atomic facts, resolves them sequentially, and synthesizes a grounded answer.
When you ask an LLM a complex question, it tries to answer the whole thing at once β hallucinating details, missing constraints, and guessing facts it should verify first.
β "Compare React and Vue for enterprise SSR dashboard"
β LLM guesses React is better without checking SSR maturity,
enterprise adoption, team size trade-offs, or bundle size.
ThinkGraph intercepts the prompt and forces structured thinking before answering:
β
LLM first resolves: "What is React's SSR maturity?" + "What is Vue's SSR maturity?"
+ "What are enterprise adoption rates?" + "Which fits a 10-person team?"
Then synthesizes from verified facts only.
Result: 50%+ accuracy improvement on multi-hop prompts.
Your Prompt
β
βΌ
βββββββββββ βββββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββββββ
β TRIAGE βββββ>β DECOMPOSE βββββ>β RESOLVE βββββ>β SELF-CONSISβββββ>β SYNTHESIZE β
β β β β β β β TENCY β β β
β Is this β β Emit DAG β β Answer β β VOTE (if β β Build answer β
β complex?β β of atomic β β each β β enabled) β β from verifiedβ
β β β sub-Qs β β sub-Q in β β β β facts only β
β Skip if β β with deps β β topo orderβ β Multiple β β β
β trivial β β β β β β attempts β β β β
β β β β β Web searchβ β centroid β β β
βββββββββββ βββββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββββββ
| Stage | What it does |
|---|---|
| Triage | Classify prompt: trivial / single-hop / multi-hop / planning / creative |
| Decompose | Break into atomic sub-questions with explicit dependency DAG |
| Resolve | Answer each node in topological order, with caching |
| Synthesize | Build final answer from verified facts only |
| Present | Answer with uncertainty notes if any fact was low-confidence |
Run 2-3 synthesis attempts, vote on the most consistent one via Jaccard centroid. Catches hallucinations without extra LLM calls.
python thinkgraph.py vote "answer variant 1" "answer variant 2" "answer variant 3"
# {"winner": "...", "score": 0.72, "response_count": 3}Auto-search for low-confidence facts. No API key needed β pure HTTP + HTML parsing.
python thinkgraph.py web-search "React 19 streaming SSR benchmark" --num-results 5Compress long context before feeding synthesis. Keeps the most important sentences by TF-IDF weight.
python thinkgraph.py compress long_text.txt --ratio 0.4
# Compressed: 200 -> 80 words (kept 40%)After resolving parent nodes, automatically prune children whose questions are already answered by their parents.
python thinkgraph.py prune-dag graph.json --facts facts.json --prompt "your original question"Score answers on keyword recall, precision, claim count, and uncertainty markers.
python thinkgraph.py ab-score "React has better SSR support" \
--ground-truth "React and Vue both support SSR with React 18 offering streaming"
# keyword_recall: 50.00% precision: 71.40%Register custom resolve functions (API calls, database lookups, shell commands).
python thinkgraph.py plugin-register my_api <<'EOF'
def my_api(question, ctx):
return {"claim": api.lookup(question), "confidence": 0.95}
EOF
python thinkgraph.py plugin-list
# shell, weblookup, my_apiExport pipeline results as JSON, YAML, or Markdown report.
python thinkgraph.py export results.json --format markdown > report.mdExpose ThinkGraph as an MCP server. Compatible with Claude Desktop, Cursor, and any MCP client.
python mcp/thinkgraph_mcp.pyConfigure in your MCP client:
{
"mcpServers": {
"thinkgraph": {
"command": "python",
"args": ["/path/to/mcp/thinkgraph_mcp.py"]
}
}
}| MCP Tool | Description |
|---|---|
thinkgraph_triage |
Classify prompt complexity |
thinkgraph_validate_dag |
Validate DAG, get execution batches |
thinkgraph_vote |
Self-consistency voting |
thinkgraph_web_search |
DuckDuckGo web search |
thinkgraph_cache_get |
Look up cached facts |
thinkgraph_cache_set |
Store resolved facts |
thinkgraph_tokens |
Estimate token count |
| Agent | Setup | Auto-loaded? |
|---|---|---|
| OpenCode | .opencode/skills/thinkgraph/SKILL.md |
β Yes |
| Claude Code | ~/.claude/skills/thinkgraph/SKILL.md |
β Yes |
| Cursor | .cursor/rules/thinkgraph.mdc |
Via install script |
| Codex | AGENTS.md section |
Via install script |
| Copilot | .github/copilot-instructions.md |
Via install script |
| Gemini CLI | GEMINI.md section |
Via install script |
git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph
# Install adapters for all detected agents
python install.py
# Or dry-run first
python install.py --dry-run
# Restart your agent. ThinkGraph activates automatically on complex prompts.# Core pipeline
python thinkgraph.py triage "compare React and Vue"
python thinkgraph.py validate-dag graph.json
python thinkgraph.py cache-get "what is react ssr maturity"
python thinkgraph.py tokens "your text"
# New features
python thinkgraph.py vote "resp1" "resp2" "resp3"
python thinkgraph.py web-search "query" --num-results 5
python thinkgraph.py compress file.txt --ratio 0.4
python thinkgraph.py prune-dag graph.json --facts facts.json
python thinkgraph.py ab-score "answer" --ground-truth "reference"
python thinkgraph.py export results.json --format markdown
python thinkgraph.py plugin-list
python thinkgraph.py plugin-register myname "python code"| Stage | Max |
|---|---|
| Triage | 50 |
| Decompose | 200 |
| Per sub-question | 300 |
| Synthesize | 600 |
| Hard ceiling | 4Γ direct answer |
Pipeline aborts to direct answer if ceiling is breached.
- Precompiled regex β all patterns compiled once at import, not per-call
- LRU memoization β
normalize_question,question_hash,estimate_tokens, andcompute_term_freqare all cached - Global cache β facts persist at
~/.thinkgraph/cache.jsonacross projects and sessions - Unicode-safe output β all CLI output uses
sys.stdout.writeto avoid cp1252 encoding errors - Efficient data structures β sets for membership, tuples for cached composite values
# All tests
python tests/test_golden.py # 15/15 passing β triage, normalization, hashing
python tests/test_new_features.py # 14/14 passing β voting, web search, MCP server
python tests/benchmark.py # 10 prompts β quality scoring (compression 70%, vote 64%)
# Quick smoke test
python cli/thinkgraph.py triage "compare React and Vue"
python cli/thinkgraph.py vote "React is fast" "React is very fast" "React is quick"thinkgraph/
βββ SKILL.md # Canonical protocol (the source of truth)
βββ protocol/
β βββ prompts.md # Verbatim prompt templates for each stage
β βββ dag.md # DAG schema, topo-sort pseudocode, cache format
β βββ questions.md # Onboarding + per-invocation question templates
βββ adapters/
β βββ opencode/SKILL.md # OpenCode skill
β βββ claude/SKILL.md # Claude Code (auto-loaded by OpenCode too)
β βββ cursor/thinkgraph.mdc # Cursor rules
β βββ codex/AGENTS.md # Codex section
β βββ copilot/ # Copilot section
β βββ gemini/GEMINI.md # Gemini CLI section
βββ cli/
β βββ thinkgraph.py # Helper CLI (Python 3.8+, stdlib only)
βββ mcp/
β βββ thinkgraph_mcp.py # MCP server (JSON-RPC 2.0 over stdio)
β βββ README.md # MCP setup guide
βββ .github/workflows/
β βββ thinkgraph.yml # GitHub Action β auto-analyze issues
βββ tests/
β βββ test_golden.py # 15 tests
β βββ test_new_features.py # 14 tests
β βββ benchmark.py # Quality benchmark suite
βββ install.py # Multi-agent installer (auto-detect, idempotent)
βββ README.md # This file
βββ LICENSE
| # | Feature | Status | Notes |
|---|---|---|---|
| 1 | Self-consistency voting | β Done | Jaccard centroid, thinkgraph.py vote |
| 2 | Web grounding | β Done | DuckDuckGo HTML, zero API key, web-search |
| 3 | MCP server | β Done | JSON-RPC 2.0 stdio, 7 tools, mcp/thinkgraph_mcp.py |
| 4 | Prompt compression | β Done | TF-IDF sentence extraction, compress |
| 5 | Benchmark suite | β Done | 10 prompts, quality scoring, tests/benchmark.py |
| 6 | Cache sync | β Done | Global ~/.thinkgraph/cache.json, per-project .pcg/ |
| 7 | Streaming support | π Planned | Incremental DAG + fact emission |
| 8 | Multi-model routing | π Planned | Cheap sub-nodes, expensive synthesis |
| 9 | Dynamic DAG pruning | β Done | Auto-remove nodes whose parents answer them |
| 10 | Export formats | β Done | JSON, YAML, Markdown |
| 11 | Recursive depth (3+) | π Planned | Max depth configurable, budget warnings |
| 12 | A/B testing mode | β Done | ab-score, keyword recall + precision |
| 13 | Plugin hooks | β Done | Custom resolve fns, plugin-register |
| 14 | CLI interactive mode | π Planned | thinkgraph interactive REPL |
| 15 | GitHub Action | β Done | Auto-comment on issues, thinkgraph.yml |
β = Implemented π = Planned π§ = In progress
- Fork β branch β commit β PR
- Run tests:
python tests/test_golden.py && python tests/test_new_features.py - Benchmark:
python tests/benchmark.py
MIT β do whatever you want.