Open-source knowledge graph for AI research ideas. Train by feeding papers and let LLMs distill reusable method inspirations and research questions into a Neo4j graph. At inference time, external AI agents query the graph via CLI to compose novel research directions.
Most paper-to-idea pipelines stop at summaries or require manual curation. IdeaForgeX builds a structured, graph-backed knowledge base where ideas are first-class entities — traceable, reusable, and improvable over time. The inference layer is intentionally thin (just a CLI query API) so external AI agents can assemble their own innovation workflows on top.
- 🧠 One LLM, clean loop — single LLM-A judge decides whether a paper yields extractable ideas or just needs recording. No feedback-loop noise.
- 🕸️ Neo4j knowledge graph — dual node types (
Inspiration+Question), four edge types, multi-granularity refinement chains - 🔌 Agent-friendly CLI — four query commands (
retrieve,inspect,random,relate) return structured JSON for external agents - 🐳 Local Docker setup — test and personal Neo4j instances via
docker compose - 📄 OpenAlex + arXiv — tiered paper resolution with automatic fallback
- ✅ Full test suite —
uv run pytest -vcovers training, retrieval, and CLI output
| Directory | Purpose |
|---|---|
src/agent/ |
LangGraph training workflow |
src/llm/ |
Prompt templates and chat/embedding client |
src/paper/ |
OpenAlex discovery, arXiv PDF extraction, paper resolver |
src/neo4j/ |
Schema bootstrap, retrieval traversal, graph maintenance |
src/cli/ |
CLI query commands (retrieve / inspect / random / relate) |
tests/ |
Regression coverage for training, CLI, config, and LLM client |
docs/ |
Design, architecture, data model, CLI usage guide |
- Python 3.11+
uvpackage manager- Docker + Docker Compose (for Neo4j)
# 1. Clone and install dependencies
git clone https://github.com/<your-org>/IdeaForgeX.git
cd IdeaForgeX
uv sync
# 2. Start Neo4j
docker compose up -d
# 3. Create config from template
cp config.example.yaml config.yamlEdit
config.yamland fill in your API keys:
llm_api_key— your LLM provider (DeepSeek, OpenAI, etc.)embedding_api_key— your embedding provideropenalex_api_key— OpenAlex API key for paper discoveryneo4j_password— Neo4j database password
# 4. Bootstrap the graph schema (idempotent)
uv run ifx bootstrap# Train a paper into the knowledge graph
uv run ifx train 1706.03762 # arXiv ID
uv run ifx train "Attention Is All You Need" # title search
# Query the graph (external agents call these commands)
uv run ifx retrieve "few-shot learning with diffusion models"
uv run ifx inspect INSP_4
uv run ifx random --count 5
uv run ifx random --query "cross-modal attention" --count 3
uv run ifx relate INSP_1 INSP_10For CLI usage guide, see docs/use_cli.md. For full JSON schemas, see docs/superpowers/specs/2025-05-30-cli-spec.md.
Developer setup, testing, and graph reset instructions: docs/dev_setup.md.
All parameters live in config.yaml (see config.example.yaml for defaults). Environment variables prefixed with IDEAFORGEX_ can override any field at runtime.
| Section | Key fields |
|---|---|
| LLM | llm_base_url, llm_api_key, llm_model_name |
| Embedding | embedding_base_url, embedding_api_key, embedding_model_name, embedding_dim |
| Paper | openalex_api_key, short_abstract_threshold |
| Neo4j | neo4j_uri, neo4j_user, neo4j_password, neo4j_database |
| Retrieval | k_hits, max_neighbors, max_depth, score_decay, final_k |
| Logging | log_level — DEBUG / INFO / WARNING / ERROR (overridden by LOG_LEVEL env var) |
Training: A LangGraph state machine loads a paper, generates a retrieval query, searches the graph, and calls the LLM to decide whether the paper introduces novel ideas worth extracting. New Inspiration and Question nodes are written transactionally into Neo4j.
Inference: External AI agents call CLI commands to explore the graph — retrieve for relevance-ranked search, inspect for deep dives, random for serendipity, and relate for path discovery. The agent then composes novelty proposals using its own LLM.
This project is licensed under the GNU AGPLv3. See LICENSE for details.
Issues and pull requests are welcome. Before opening a PR:
uv run pytest -vmust passbootstrap,train,retrieve,inspect,random, andrelateshould work locally- New behavior should be covered by tests