Search, read, and analyze 8M+ biomedical papers from the command line.
Paperclip is a CLI and MCP server for AI agents where every paper is a directory containing full text, sections, figures, and supplements on a virtual filesystem at /papers/.
- Search with natural language or regex across 8M+ papers from bioRxiv, medRxiv, and PubMed Central
- Run parallel AI readers across papers with
mapand synthesize withreduce - Pipe results through standard Unix tools (
grep,awk,sed,jq, etc.) - Ask questions about figures with vision AI
- Query the database directly with SQL
Full documentation: paperclip.gxl.ai
This repository hosts the source code for the Paperclip CLI client. Use it to:
Python 3.8+ required.
curl -fsSL https://paperclip.gxl.ai/install.sh | bashInstalls to ~/.paperclip/ with a wrapper at ~/.local/bin/paperclip.
Or install via pip:
pip install https://paperclip.gxl.ai/paperclip.whl
paperclip setupSign-in happens automatically on first use, or run manually:
paperclip loginpaperclip config
# Server: https://paperclip.gxl.ai
# Auth: ✓ you@example.com
# Config: ~/.paperclipUse Paperclip as an MCP server directly — no local install needed.
claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcpThen start claude, enter /mcp, and select Authenticate under the paperclip server.
Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):
{
"mcpServers": {
"paperclip": {
"url": "https://paperclip.gxl.ai/mcp",
"type": "http"
}
}
}Then Cmd/Ctrl + Shift + P → Tools & MCPs, enable the paperclip server, and authenticate.
# Search for papers
paperclip search "CRISPR base editing efficiency"
# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json
# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines
# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines
# Regex search across the entire corpus (sub-second)
paperclip grep "alphamissense" /papers/
# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"
# Run SQL queries
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"
# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txtUse paperclip bash '...' for pipes and chains:
paperclip bash 'search "protein folding" | grep "deep learning"'| Command | Description |
|---|---|
search |
Hybrid search (BM25 + vector) across 8M+ papers |
searches |
Run multiple queries in parallel and merge results |
grep |
Regex search within a paper or across the entire corpus |
scan |
Multi-pattern grep in a single pass |
lookup |
Find papers by DOI, PMC ID, PMID, author, title, journal |
sql |
Read-only SQL queries against the papers database |
export |
Export SQL results to CSV |
map |
Parallel AI reader across multiple papers |
reduce |
Synthesize map results into summaries, tables, or themes |
filter |
Filter search results for relevance |
ask-image |
Analyze figures with vision AI |
cat |
Read files from the paper filesystem |
head / tail |
Preview first or last lines |
ls / tree |
List directory contents |
grep / scan |
Search within papers |
sed / awk / jq |
Text processing |
results |
View, browse, and export saved results |
pull |
Download papers or files locally |
config |
Show or set configuration |
install |
Install agent skill for Claude Code, Cursor, or Codex |
update |
Update to the latest version |
Install a skill so your coding agent can use Paperclip automatically:
paperclip installSupports Claude Code, Cursor, and Codex. The skill teaches the agent the full command set. Then just mention /paperclip in your prompt:
Using /paperclip, find recent papers on GLP-1 receptor agonists and summarize the primary endpoints.
Each paper lives at /papers/<id>/:
meta.json — title, authors, doi, date, abstract, journal
content.lines — full text, line-numbered (L<n>: <text>)
sections/ — named section files (Introduction.lines, Methods.lines, ...)
figures/ — figure files (PMC papers)
supplements/ — supplementary files (PMC papers)
Paper IDs use prefixes by source: bio_ (bioRxiv), med_ (medRxiv), PMC (PubMed Central).
Apache-2.0 — see LICENSE.