An agentic retrieval assistant that pulls from six distinct knowledge surfaces and picks the right tool itself. Talk to it through a CLI or a browser UI; drag a file in and it'll route it into the right backend automatically.
v0.2 is the current focus. v0.1 (LlamaIndex
FunctionAgentinmain.py/server.py/modules/) still works for comparison. New code lives underdatamind/. For an end-to-end walkthrough seeGETTING_STARTED.mdor the docs site.
| Capability | Backend | Tools the agent gets |
|---|---|---|
| KB (RAG) | Chroma + BM25 with Reciprocal Rank Fusion | kb_search, kb_list_documents, kb_count, kb_reindex |
| Graph | NetworkX, JSON-persisted | graph_search_entities, graph_traverse, graph_neighbors, graph_upsert_triples |
| Database | SQLAlchemy (SQLite / MySQL / Postgres) | db_list_tables, db_describe_table, db_query_sql, db_query_nl |
| Skills | .claude/skills/<name>/SKILL.md + safe Python tools |
skill_search, skill_get, skill_list, calculator, unit_convert, get_current_time, analyze_text |
| Memory | SQLite with cosine recall + LLM fact extraction; scope-typed (global / profile / session) for multi-tenant isolation |
memory_save, memory_recall, memory_forget, memory_list_profiles |
| Ingest ✨ | Conversational data import — drop a file in via chat or the browser drag-drop zone | kb_add_file, kb_add_path, db_import_csv, graph_add_triples_from_text |
| Hooks ✨ v0.3 | Sandboxed tool dispatch — every call is intercepted; Allow / Deny / AskUser / Rewrite; tamper-evident audit log per profile |
PathAllowlistHook, DestructiveSqlHook, AuditLogHook (built-in; user hooks pluggable) |
27 tools total. All routed through one ToolRegistry; the agent decides what to call and in what order.
git clone https://github.com/your-org/DataMind.git && cd DataMind
python -m venv .venv && source .venv/bin/activate
pip install -e .
cp .env.datamind.example .env.datamind
$EDITOR .env.datamind # set DATAMIND__LLM__API_KEY at minimum
# 1. Smoke-test the gateway (~2 s)
python -m datamind.scripts.hello_sdk
# 2. Seed a realistic enterprise dataset (17 docs / 64 graph nodes / 6 tables / 101 rows)
python -m datamind.scripts.seed_enterprise_demo
# 3. Watch the agent answer 8 cross-backend questions on its own
DATAMIND__DATA__PROFILE=enterprise_demo \
python -m datamind.scripts.hello_enterprise
# 4. Or just open the browser UI
DATAMIND__DATA__PROFILE=enterprise_demo \
python -m uvicorn datamind.server:app --port 8000
# → http://127.0.0.1:8000 — drag any .md / .csv / .txt into the dropzone, ask questions, watch tools fireMore detail in GETTING_STARTED.md.
Ask: "工程部 Shanghai 的员工工资加起来是多少?"
The agent figures out it needs SQL, tries db_query_nl, gets an empty result, recovers by inspecting the schema (db_list_tables → db_describe_table), discovers the column is Eng not Engineering, rewrites the SQL itself, and answers ¥26,000 — without any of that being hard-coded. Same agent picks graph_search_entities + graph_neighbors for relationship questions, kb_search + skill_get for SOP questions, memory_save for "remember this for me" requests.
Frontend stays the same regardless. The 27 tools, the streaming SSE protocol, and the chat UI work identically across two interchangeable agent backends:
DATAMIND__AGENT__BACKEND=native # default — pure-Python anthropic SDK + self-written loop
DATAMIND__AGENT__BACKEND=sdk # claude-agent-sdk + claude-code-router (CCR)
# unlocks Hooks / Subagents / Compaction / Plan mode
Both verified end-to-end against the same 8 enterprise-demo questions (numbers here).
The 4 ingest tools turn the agent into a read-and-write surface:
you → "把 /Users/foo/sales-q2.csv 导入成数据表 q2_sales"
agent → calls db_import_csv(path=..., table='q2_sales') ✓ 18 rows inserted
you → "Q2 sales pipeline 里 in-pipeline 单子总额是多少?哪个 sales rep 单子最多?"
agent → calls db_query_sql(...) ✓ answers from the freshly-imported table
Or drop the file into the browser dropzone and click 导入. Or say "把这段加进图谱:陈诚晋升 Tech Lead,向 Ann 汇报" → agent calls graph_add_triples_from_text, LLM extracts triples, graph upserts them. No restart, no reindex.
v0.1 was functional but coupled: a global AppState, hard-wired modules, vendor-locked to the claude CLI. v0.2 reshapes it around:
- Protocols + registries — every capability is a
Protocol; concrete classes register under a short name. New DB dialect / embedding provider / retriever strategy = one file. - Pluggable agent loop —
native(anthropic SDK) orsdk(claude-agent-sdk + CCR), one ENV switch. - Real SSE streaming through FastAPI — not v0.1's fake character-sliced streaming.
- Zero global state — every request owns its own
RequestContextwith a trace id. - Side-by-side with v0.1 — old code paths untouched, easy comparison.
See Architecture for full detail.
DataMind/
├── datamind/ # ── v0.2 (new code) ──────────────────
│ ├── agent/ # base.py + loop_native.py + loop_sdk.py
│ ├── capabilities/ # kb / graph / db / skills / memory /
│ │ # ingest / embedding
│ ├── core/ # Protocol, Registry, Config, Logging, Tools
│ ├── scripts/ # hello_*.py + seed_enterprise_demo.py
│ ├── cli.py # `python -m datamind ...`
│ ├── server.py # FastAPI + real SSE + /api/upload
│ └── tests/ # 95 passing tests (no network required)
│
├── .claude/skills/ # SDK-style knowledge skills (SKILL.md)
├── static/app.html # browser UI (drag-drop + tool cards + sidebar)
├── scripts/start_ccr.sh # one-line CCR launcher (for sdk backend)
├── demo-uploads/ # 6 sample files to drag-drop into the UI
│
├── modules/ core/ main.py server.py benchmark/ # ── v0.1 legacy ─
│
├── data/profiles/<profile>/ # per-profile raw inputs
├── storage/<profile>/ # per-profile indexes & DBs
├── pyproject.toml # v0.2 install + CLI entry
└── .env.datamind.example # nested env template
One environment variable switches data + storage directories in lockstep:
DATAMIND__DATA__PROFILE=customer_a python -m datamind chatMaps to data/profiles/customer_a/ and storage/customer_a/.
pytest datamind/tests/
# 95 passed in ~0.6s — no network requiredPlus live smoke + benchmark scripts:
hello_sdk, hello_kb, hello_db, hello_graph, hello_skills, hello_memory, hello_agent,
seed_enterprise_demo, hello_enterprise (8 cross-backend questions).
See DataMind-Doc for architecture, configuration reference, per-capability deep dives, and tutorials in English and Chinese.