Skip to content

Mayne-X/thinkgraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 ThinkGraph

License Python Agents Tests Features

Stop guessing. Start decomposing. Structured decomposition for LLM prompts β€” breaks complex questions into a dependency graph of atomic facts, resolves them sequentially, and synthesizes a grounded answer.


The Problem

When you ask an LLM a complex question, it tries to answer the whole thing at once β€” hallucinating details, missing constraints, and guessing facts it should verify first.

❌ "Compare React and Vue for enterprise SSR dashboard"
   β†’ LLM guesses React is better without checking SSR maturity,
     enterprise adoption, team size trade-offs, or bundle size.

ThinkGraph intercepts the prompt and forces structured thinking before answering:

βœ… LLM first resolves: "What is React's SSR maturity?" + "What is Vue's SSR maturity?"
   + "What are enterprise adoption rates?" + "Which fits a 10-person team?"
   Then synthesizes from verified facts only.

Result: 50%+ accuracy improvement on multi-hop prompts.


How It Works

Your Prompt
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TRIAGE  │────>β”‚ DECOMPOSE │────>β”‚  RESOLVE  │────>β”‚ SELF-CONSIS│────>β”‚  SYNTHESIZE  β”‚
β”‚         β”‚     β”‚           β”‚     β”‚           β”‚     β”‚   TENCY    β”‚     β”‚              β”‚
β”‚ Is this β”‚     β”‚ Emit DAG  β”‚     β”‚ Answer    β”‚     β”‚  VOTE (if  β”‚     β”‚ Build answer β”‚
β”‚ complex?β”‚     β”‚ of atomic β”‚     β”‚ each      β”‚     β”‚  enabled)  β”‚     β”‚ from verifiedβ”‚
β”‚         β”‚     β”‚ sub-Qs    β”‚     β”‚ sub-Q in  β”‚     β”‚            β”‚     β”‚ facts only   β”‚
β”‚ Skip if β”‚     β”‚ with deps β”‚     β”‚ topo orderβ”‚     β”‚ Multiple   β”‚     β”‚              β”‚
β”‚ trivial β”‚     β”‚           β”‚     β”‚           β”‚     β”‚ attempts β†’ β”‚     β”‚              β”‚
β”‚         β”‚     β”‚           β”‚     β”‚ Web searchβ”‚     β”‚ centroid   β”‚     β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features

Core Protocol (5-stage pipeline)

Stage What it does
Triage Classify prompt: trivial / single-hop / multi-hop / planning / creative
Decompose Break into atomic sub-questions with explicit dependency DAG
Resolve Answer each node in topological order, with caching
Synthesize Build final answer from verified facts only
Present Answer with uncertainty notes if any fact was low-confidence

Self-Consistency Voting

Run 2-3 synthesis attempts, vote on the most consistent one via Jaccard centroid. Catches hallucinations without extra LLM calls.

python thinkgraph.py vote "answer variant 1" "answer variant 2" "answer variant 3"
# {"winner": "...", "score": 0.72, "response_count": 3}

Web Grounding (DuckDuckGo β€” zero API key)

Auto-search for low-confidence facts. No API key needed β€” pure HTTP + HTML parsing.

python thinkgraph.py web-search "React 19 streaming SSR benchmark" --num-results 5

Prompt Compression (TF-IDF sentence extraction)

Compress long context before feeding synthesis. Keeps the most important sentences by TF-IDF weight.

python thinkgraph.py compress long_text.txt --ratio 0.4
# Compressed: 200 -> 80 words (kept 40%)

Dynamic DAG Pruning

After resolving parent nodes, automatically prune children whose questions are already answered by their parents.

python thinkgraph.py prune-dag graph.json --facts facts.json --prompt "your original question"

A/B Testing Mode

Score answers on keyword recall, precision, claim count, and uncertainty markers.

python thinkgraph.py ab-score "React has better SSR support" \
  --ground-truth "React and Vue both support SSR with React 18 offering streaming"
# keyword_recall: 50.00%  precision: 71.40%

Plugin Hooks

Register custom resolve functions (API calls, database lookups, shell commands).

python thinkgraph.py plugin-register my_api <<'EOF'
def my_api(question, ctx):
    return {"claim": api.lookup(question), "confidence": 0.95}
EOF
python thinkgraph.py plugin-list
# shell, weblookup, my_api

Export Formats

Export pipeline results as JSON, YAML, or Markdown report.

python thinkgraph.py export results.json --format markdown > report.md

MCP Server (7 tools)

Expose ThinkGraph as an MCP server. Compatible with Claude Desktop, Cursor, and any MCP client.

python mcp/thinkgraph_mcp.py

Configure in your MCP client:

{
  "mcpServers": {
    "thinkgraph": {
      "command": "python",
      "args": ["/path/to/mcp/thinkgraph_mcp.py"]
    }
  }
}
MCP Tool Description
thinkgraph_triage Classify prompt complexity
thinkgraph_validate_dag Validate DAG, get execution batches
thinkgraph_vote Self-consistency voting
thinkgraph_web_search DuckDuckGo web search
thinkgraph_cache_get Look up cached facts
thinkgraph_cache_set Store resolved facts
thinkgraph_tokens Estimate token count

Works With Everything

Agent Setup Auto-loaded?
OpenCode .opencode/skills/thinkgraph/SKILL.md βœ… Yes
Claude Code ~/.claude/skills/thinkgraph/SKILL.md βœ… Yes
Cursor .cursor/rules/thinkgraph.mdc Via install script
Codex AGENTS.md section Via install script
Copilot .github/copilot-instructions.md Via install script
Gemini CLI GEMINI.md section Via install script

Quick Start

git clone https://github.com/Mayne-X/thinkgraph.git
cd thinkgraph

# Install adapters for all detected agents
python install.py

# Or dry-run first
python install.py --dry-run

# Restart your agent. ThinkGraph activates automatically on complex prompts.

CLI Reference

# Core pipeline
python thinkgraph.py triage "compare React and Vue"
python thinkgraph.py validate-dag graph.json
python thinkgraph.py cache-get "what is react ssr maturity"
python thinkgraph.py tokens "your text"

# New features
python thinkgraph.py vote "resp1" "resp2" "resp3"
python thinkgraph.py web-search "query" --num-results 5
python thinkgraph.py compress file.txt --ratio 0.4
python thinkgraph.py prune-dag graph.json --facts facts.json
python thinkgraph.py ab-score "answer" --ground-truth "reference"
python thinkgraph.py export results.json --format markdown
python thinkgraph.py plugin-list
python thinkgraph.py plugin-register myname "python code"

Token Budgets

Stage Max
Triage 50
Decompose 200
Per sub-question 300
Synthesize 600
Hard ceiling 4Γ— direct answer

Pipeline aborts to direct answer if ceiling is breached.


Performance Optimizations

  • Precompiled regex β€” all patterns compiled once at import, not per-call
  • LRU memoization β€” normalize_question, question_hash, estimate_tokens, and compute_term_freq are all cached
  • Global cache β€” facts persist at ~/.thinkgraph/cache.json across projects and sessions
  • Unicode-safe output β€” all CLI output uses sys.stdout.write to avoid cp1252 encoding errors
  • Efficient data structures β€” sets for membership, tuples for cached composite values

Tests

# All tests
python tests/test_golden.py      # 15/15 passing β€” triage, normalization, hashing
python tests/test_new_features.py # 14/14 passing β€” voting, web search, MCP server
python tests/benchmark.py        # 10 prompts β€” quality scoring (compression 70%, vote 64%)

# Quick smoke test
python cli/thinkgraph.py triage "compare React and Vue"
python cli/thinkgraph.py vote "React is fast" "React is very fast" "React is quick"

Project Structure

thinkgraph/
β”œβ”€β”€ SKILL.md                    # Canonical protocol (the source of truth)
β”œβ”€β”€ protocol/
β”‚   β”œβ”€β”€ prompts.md              # Verbatim prompt templates for each stage
β”‚   β”œβ”€β”€ dag.md                  # DAG schema, topo-sort pseudocode, cache format
β”‚   └── questions.md            # Onboarding + per-invocation question templates
β”œβ”€β”€ adapters/
β”‚   β”œβ”€β”€ opencode/SKILL.md       # OpenCode skill
β”‚   β”œβ”€β”€ claude/SKILL.md         # Claude Code (auto-loaded by OpenCode too)
β”‚   β”œβ”€β”€ cursor/thinkgraph.mdc    # Cursor rules
β”‚   β”œβ”€β”€ codex/AGENTS.md          # Codex section
β”‚   β”œβ”€β”€ copilot/                # Copilot section
β”‚   └── gemini/GEMINI.md        # Gemini CLI section
β”œβ”€β”€ cli/
β”‚   └── thinkgraph.py            # Helper CLI (Python 3.8+, stdlib only)
β”œβ”€β”€ mcp/
β”‚   β”œβ”€β”€ thinkgraph_mcp.py        # MCP server (JSON-RPC 2.0 over stdio)
β”‚   └── README.md               # MCP setup guide
β”œβ”€β”€ .github/workflows/
β”‚   └── thinkgraph.yml          # GitHub Action β€” auto-analyze issues
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_golden.py          # 15 tests
β”‚   β”œβ”€β”€ test_new_features.py    # 14 tests
β”‚   └── benchmark.py            # Quality benchmark suite
β”œβ”€β”€ install.py                   # Multi-agent installer (auto-detect, idempotent)
β”œβ”€β”€ README.md                   # This file
└── LICENSE

Roadmap

# Feature Status Notes
1 Self-consistency voting βœ… Done Jaccard centroid, thinkgraph.py vote
2 Web grounding βœ… Done DuckDuckGo HTML, zero API key, web-search
3 MCP server βœ… Done JSON-RPC 2.0 stdio, 7 tools, mcp/thinkgraph_mcp.py
4 Prompt compression βœ… Done TF-IDF sentence extraction, compress
5 Benchmark suite βœ… Done 10 prompts, quality scoring, tests/benchmark.py
6 Cache sync βœ… Done Global ~/.thinkgraph/cache.json, per-project .pcg/
7 Streaming support πŸ”œ Planned Incremental DAG + fact emission
8 Multi-model routing πŸ”œ Planned Cheap sub-nodes, expensive synthesis
9 Dynamic DAG pruning βœ… Done Auto-remove nodes whose parents answer them
10 Export formats βœ… Done JSON, YAML, Markdown
11 Recursive depth (3+) πŸ”œ Planned Max depth configurable, budget warnings
12 A/B testing mode βœ… Done ab-score, keyword recall + precision
13 Plugin hooks βœ… Done Custom resolve fns, plugin-register
14 CLI interactive mode πŸ”œ Planned thinkgraph interactive REPL
15 GitHub Action βœ… Done Auto-comment on issues, thinkgraph.yml

βœ… = Implemented πŸ”œ = Planned 🚧 = In progress


Contributing

  1. Fork β†’ branch β†’ commit β†’ PR
  2. Run tests: python tests/test_golden.py && python tests/test_new_features.py
  3. Benchmark: python tests/benchmark.py

License

MIT β€” do whatever you want.

About

An AI prompt decomposition pipeline that breaks complex prompts into atomic dependency graphs. Triage, decompose, resolve, and synthesize for hallucination-free 🧠 LLM outputs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages