Agentic RAG for Drug Knowledge Retrieval, Reasoning, and Evidence Synthesis
DrugClaw is a drug-centered multi-agent RAG system designed for queries that generic assistants often handle poorly: drug targets, adverse drug reactions, drug-drug interactions, mechanisms of action, pharmacogenomics, repurposing, and evidence synthesis across heterogeneous biomedical resources.
It is not a generic RAG stack with a biomedical prompt on top. DrugClaw is opinionated around drug-native tasks from resource organization to retrieval strategy, reasoning flow, and final answer structure.
Most biomedical QA systems stop at "retrieve a few passages and summarize them." Drug questions are harder: they require precise handling of target evidence, ADR provenance, DDI mechanisms, labeling details, and PGx constraints. Some tools connect many databases but flatten them into a single rigid interface; others optimize for conversational UX while relying on weak, thin, or poorly traceable evidence.
- Organizes drug resources through a registry-backed 15-subcategory skill tree
- Uses a Code Agent to query each source in its native style instead of forcing one rigid abstraction
- Supports graph-based reasoning for multi-hop evidence synthesis
- Keeps Web Search as a fallback for recent literature and external evidence
- Built around drug-native tasks, not generic biomedical branding
The runtime resource registry is the source of truth for what is currently enabled, degraded, missing local metadata, or disabled. Availability depends on the environment, local files under resources_metadata/, optional dependencies, and API reachability.
In short, DrugClaw is not trying to be just another fluent assistant. Its goal is to raise resource density, retrieval fidelity, and evidence-grounded reasoning at the same time.
Run the commands below from the cloned repository root.
pip install langgraph openaiOptional dependencies for selected CLI-based skills:
pip install chembl_webresource_client
pip install libchebipy
pip install bioservicesDrugClaw uses any OpenAI-compatible API endpoint. This includes OpenAI, Azure OpenAI, LLaMA served via vLLM or Ollama, and other OpenAI-compatible providers.
First copy the template:
cp api_keys.example.json api_keys.jsonThen fill in your real credentials:
{
"api_key": "your-api-key-here",
"base_url": "https://your-endpoint.com/v1",
"model": "gpt-4o",
"max_tokens": 20000,
"timeout": 60,
"temperature": 0.7
}Example configurations for common providers:
| Provider | base_url |
model |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
gpt-4o, gpt-4o-mini |
| Azure OpenAI | https://YOUR.openai.azure.com/v1 |
your deployment name |
| vLLM (local LLaMA) | http://localhost:8000/v1 |
meta-llama/Llama-3.1-8B-Instruct |
| Ollama | http://localhost:11434/v1 |
llama3.1, qwen2.5 |
| Together AI | https://api.together.xyz/v1 |
meta-llama/Llama-3.1-70B-Instruct-Turbo |
This is the recommended first experience. At this stage you only need a working LLM config; local resource packs can wait until later.
You can run it without installation:
python -m drugclaw list
python -m drugclaw doctor
python -m drugclaw demoThe default demo uses:
SIMPLEmode- online labeling-style resources
- a metformin labeling and safety query
You can also run your own query:
python -m drugclaw run --query "What are the known drug targets of imatinib?"If the demo runs successfully, you already have a minimal usable setup. The next step is optional and only matters when you want broader coverage from LOCAL_FILE skills and local datasets.
Many skills use LOCAL_FILE access mode. Those resources are not required for the first demo, but they improve coverage and unlock skills that depend on local datasets.
Recommended resolution order:
- Use files already present under
resources_metadata/... - If missing, sync from the maintained mirror first
- Only fall back to the original source site if the mirror does not contain the data
- Do not commit private credentials, local snapshots, or temporary downloads under
resources_metadata/; only keep curated minimal fixtures that are required for tests
Maintained mirror:
https://huggingface.co/datasets/Mike2481/DrugClaw_resources_data
Directory examples:
resources_metadata/dti/...resources_metadata/adr/...resources_metadata/drug_knowledgebase/...resources_metadata/drug_repurposing/...resources_metadata/ddi/...
If some old SKILL.md, example.py, or archived docs still show absolute paths, treat them as legacy examples. The active convention is the repository-local resources_metadata/... layout.
Downloading local resources is recommended if you want more stable retrieval from dataset-backed skills and better overall coverage than the minimal online-first demo path.
pip install -e . --no-build-isolation
git config core.hooksPath .githooks
drugclaw list
drugclaw doctor
drugclaw demo
drugclaw run --query "What are the known drug targets of imatinib?"The main entrypoint is still the CLI. If you want sample scripts, they now live under examples/.
The lightweight demo wrapper is:
python examples/run_minimal.pyYou can also pass CLI arguments through it:
python examples/run_minimal.py demo --preset label
python examples/run_minimal.py run --query "What prescribing and safety information is available for metformin?"python -m drugclaw doctorIt checks:
- whether
api_keys.json(ornavigator_api_keys.json) exists and is complete - whether
langgraphandopenaiare importable - whether built-in demo presets have the resources they need
- whether the
drugclawcommand is installed - whether repository Git hooks are enabled
git config core.hooksPath .githooksThese hooks block committing API key files.
python -m drugclaw listIt shows:
- built-in demo presets
- supported thinking modes
- recommended first commands
- common resource filter combinations
from drugclaw.config import Config
from drugclaw.main_system import DrugClawSystem
from drugclaw.models import ThinkingMode
config = Config(key_file="api_keys.json")
system = DrugClawSystem(config)
result = system.query(
"What prescribing and safety information is available for metformin?",
thinking_mode=ThinkingMode.SIMPLE,
resource_filter=["DailyMed", "openFDA Human Drug", "MedlinePlus Drug Info"],
)
print(result["answer"])from drugclaw.models import ThinkingMode
system.query("...", thinking_mode=ThinkingMode.GRAPH)
system.query("...", thinking_mode=ThinkingMode.SIMPLE)
system.query("...", thinking_mode=ThinkingMode.WEB_ONLY)Each skill ships with its own SKILL.md and example.py. The Code Agent reads both, learns the source's native usage pattern, and generates query code dynamically for the current task.
That means DrugClaw does not require every database, API, or dataset to pretend to be the same thing.
For LOCAL_FILE skills, the recommended default behavior is:
- check
resources_metadata/...first - if missing, guide the user to the maintained Hugging Face mirror
- do not assume the original download endpoint is still reliable
DrugClaw covers all 15 subcategories through the runtime resource registry:
- drug targets and activity (DTI)
- adverse drug reactions and pharmacovigilance (ADR)
- drug knowledgebases
- mechanisms of action
- labeling and prescribing information
- ontology and normalization
- drug repurposing
- pharmacogenomics
- drug-drug interactions
- drug toxicity
- drug combinations
- drug molecular properties
- drug-disease associations
- patient reviews
- drug NLP / text mining
GRAPH: retrieve -> graph build -> rerank -> respond -> reflectSIMPLE: retrieve and answer directlyWEB_ONLY: use only online search and literature retrieval
DrugClaw is suitable for questions such as:
- "What are the known targets, adverse effects, and interaction risks of imatinib?"
- "Which approved drugs may be repurposed for triple-negative breast cancer?"
- "What pharmacogenomic guidance exists for clopidogrel and CYP2C19?"
- "Are there clinically meaningful interactions between warfarin and NSAIDs?"
User Query
|
v
Retriever Agent
|- navigates the 15-subcategory skill tree
|- extracts key entities
|- selects relevant resources
|
v
Code Agent
|- reads SKILL.md + example.py
|- writes custom query code
|- executes resource-specific retrieval
|
+--> SIMPLE mode --> Responder --> Final Answer
|
+--> GRAPH mode
-> Graph Builder
-> Reranker
-> Responder
-> Reflector
-> optional Web Search
-> Final Answer
Use the CLI to inspect the current registry summary and per-resource status:
python -m drugclaw list
python -m drugclaw doctorlist shows registry-derived totals and a status line for each resource. doctor explains why a resource is unavailable, including missing metadata paths and missing dependencies when detectable.
If you want the human-readable answer plus the structured claim/evidence summary, use:
python -m drugclaw run --query "What does imatinib target?" --show-evidence| Category | Skills |
|---|---|
| DTI | ChEMBL, BindingDB, DGIdb, Open Targets Platform, TTD, STITCH, TarKG, GDKD, Molecular Targets, Molecular Targets Data |
| ADR | FAERS, SIDER, nSIDES, ADReCS |
| Drug Knowledgebase | UniD3, DrugBank, IUPHAR/BPS, DrugCentral, CPIC, PharmKG, WHO Essential Medicines List, FDA Orange Book |
| Drug Mechanism | DRUGMECHDB |
| Drug Labeling | openFDA Human Drug, DailyMed, MedlinePlus Drug Info |
| Drug Ontology | RxNorm, ChEBI, ATC/DDD, NDF-RT |
| Drug Repurposing | RepoDB, DRKG, OREGANO, Drug Repurposing Hub, DrugRepoBank, RepurposeDrugs |
| Pharmacogenomics | PharmGKB |
| DDI | MecDDI, DDInter, KEGG Drug |
| Drug Toxicity | UniTox, LiverTox, DILIrank, DILI |
| Drug Combination | DrugCombDB, DrugComb |
| Drug Molecular Property | GDSC |
| Drug Disease | SemaTyP |
| Drug Review | WebMD Drug Reviews, Drug Reviews (Drugs.com) |
| Drug NLP | DDI Corpus 2013, DrugProt, ADE Corpus, CADEC, PsyTAR, TAC 2017 ADR, PHEE |
WebSearch is also available as an external retrieval supplement built around DuckDuckGo + PubMed style search.
README.md / README_CN.md
GitHub-facing entry docs
examples/
Optional runnable examples and compatibility wrappers
scripts/legacy/
Historical maintainer helpers, not part of the public interface
drugclaw/
Package runtime and CLI entrypoints
skills/
<subcategory>/<skill_name>/
*_skill.py
example.py
SKILL.md
README.md
skillexamples/
Resource-specific usage examples and operator notes
tools/
Maintainer smoke checks and resource validation scripts
resources_metadata/
local data files
For a short contributor-facing directory guide, see docs/repository-guide.md.
DrugClaw does not require every biomedical source to be flattened into a single interface.
DrugClaw can turn free-form retrieval results into triples, subgraphs, ranked paths, and evidence-aware answers rather than simply stitching together excerpts.
This system is built around drug tasks: DTI, ADR, DDI, labeling, repurposing, PGx, and mechanism reasoning.
- The repository can be imported directly from the project root.
pyproject.tomlis now aligned with the current package layout for local CLI usage.- Some skills still depend on local files under
resources_metadata/. - Multi-iteration
GRAPHbehavior still depends on further configuration such asMAX_ITERATIONS.
If you use DrugClaw in research or product work, cite this repository and the upstream resources used by the selected skills.
MIT License

