Search, read, and analyze biomedical papers, regulatory documents, and clinical trials from the command line.
Paperclip is a CLI and MCP server for AI agents where every document is a directory containing full text, sections, figures, and supplements on a virtual filesystem.
- Search with natural language or regex across biomedical papers from bioRxiv, medRxiv, arXiv, and PubMed Central, plus FDA regulatory documents, ClinicalTrials.gov, and international regulatory and trial registries
- Run parallel AI readers across papers with
mapand synthesize withreduce - Pipe results through standard Unix tools (
grep,awk,sed,jq, etc.) - Ask questions about figures with vision AI
- Query the database directly with SQL
Full documentation: paperclip.gxl.ai
This repository hosts the source code for the Paperclip CLI client. Use it to:
Python 3.8+ required.
curl -fsSL https://paperclip.gxl.ai/install.sh | bashInstalls to ~/.paperclip/ with a wrapper at ~/.local/bin/paperclip.
Or install via pip:
pip install https://paperclip.gxl.ai/paperclip.whl
paperclip setupSign-in happens automatically on first use, or run manually:
paperclip loginpaperclip config
# Server: https://paperclip.gxl.ai
# Auth: ✓ you@example.com
# Config: ~/.paperclipUse Paperclip as an MCP server directly — no local install needed.
claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcpThen start claude, enter /mcp, and select Authenticate under the paperclip server.
Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):
{
"mcpServers": {
"paperclip": {
"url": "https://paperclip.gxl.ai/mcp",
"type": "http"
}
}
}Then Cmd/Ctrl + Shift + P → Tools & MCPs, enable the paperclip server, and authenticate.
# Search for papers
paperclip search "CRISPR base editing efficiency"
# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json
# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines
# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines
# Regex search across the entire corpus (sub-second)
paperclip grep "alphamissense" /papers/
# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"
# Run SQL queries
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"
# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txtUse paperclip bash '...' for pipes and chains:
paperclip bash 'search "protein folding" | grep "deep learning"'| Command | Description |
|---|---|
search |
Hybrid search (BM25 + vector) across papers, regulatory documents, and trials |
searches |
Run multiple queries in parallel and merge results |
grep |
Regex search within a paper or across the entire corpus |
scan |
Multi-pattern grep in a single pass |
lookup |
Find papers by DOI, PMC ID, PMID, author, title, journal |
sql |
Read-only SQL queries against the papers database |
map |
Parallel AI reader across multiple papers |
reduce |
Synthesize map results into summaries, tables, or themes |
filter |
Filter search results for relevance |
ask-image |
Analyze figures with vision AI |
cat |
Read files from the paper filesystem |
head / tail |
Preview first or last lines |
ls / tree |
List directory contents |
grep / scan |
Search within papers |
sed / awk / jq |
Text processing |
results |
View, browse, and export saved results |
config |
Show or set configuration, connection diagnostics |
install |
Install agent skill for Claude Code, Cursor, or Codex |
update |
Update to the latest version |
| Paper Repos | |
init |
Create a new paper repo |
checkout |
List repos, switch repos or branches |
add / remove |
Add or remove papers |
import |
Seed repo from a paper's bibliography |
commit |
Snapshot with reasoning message |
annotate |
Pin notes to specific papers |
status |
Repo state: papers, branches, annotations |
log |
Commit history |
diff |
Compare commits or branches |
export |
Export to BibTeX, RIS, Markdown, or CSV |
branch / merge |
Branching and merging |
cite |
Citation counts and relationships |
Install a skill so your coding agent can use Paperclip automatically:
paperclip installSupports Claude Code, Cursor, and Codex. The skill teaches the agent the full command set. Then just mention /paperclip in your prompt:
Using /paperclip, find recent papers on GLP-1 receptor agonists and summarize the primary endpoints.
Each paper lives at /papers/<id>/:
meta.json — title, authors, doi, date, abstract, journal
content.lines — full text, line-numbered (L<n>: <text>)
sections/ — named section files (Introduction.lines, Methods.lines, ...)
figures/ — figure files (PMC papers)
supplements/ — supplementary files (PMC papers)
Paper IDs use prefixes by source: bio_ (bioRxiv), med_ (medRxiv), PMC (PubMed Central), arx_ (arXiv). Regulatory documents and clinical trials are accessed via /fda/ and /clinicaltrials/ virtual directories.
Build versioned, annotated collections of papers with git-like workflows:
# Create a repo and seed from a key paper's references
paperclip init my-review "Systematic review of XYZ"
paperclip import PMC11271413 --min-cites 50
paperclip import refs.bib # import .bib/.ris → library + repo
# View your personal library (persists across repos)
paperclip library
# Curate: annotate, commit
paperclip annotate PMC123 "Key finding on mechanism X"
paperclip commit -m "Initial seed from review + manual curation"
# Review your work
paperclip repo # list all repos
paperclip repo <name> # repo overview: papers, branches, annotations
paperclip log # commit history
paperclip diff 9a6d..559a # compare commits
# Export to reference managers
paperclip export bib -o refs.bib # BibTeX (annotations in note field)
paperclip export ris -o refs.ris # RIS (Zotero, Paperpile, Mendeley, EndNote)
paperclip export md -o review.md # structured markdown report
paperclip export csv -o papers.csv # tabular dataRedirect cat to write any paper file to disk. Text files come back as text; figures and other binaries stream as raw bytes when stdout is redirected (no base64 wrapping):
paperclip cat /papers/PMC10791696/meta.json > meta.json
paperclip cat /papers/PMC10791696/figures/fig1.tif > fig1.tifFor bulk, loop over ls:
mkdir -p figures
for f in $(paperclip ls /papers/PMC10791696/figures/); do
paperclip cat /papers/PMC10791696/figures/$f > figures/$f
doneThe gxl-paperclip package ships a Python SDK alongside the CLI, so you can call Paperclip directly from scripts, notebooks, and other tools. Installing the package (via pip install or the installer script above) gives you both the paperclip command and the gxl_paperclip module.
The SDK uses API keys (OAuth is reserved for interactive CLI sign-in). Create a key from the dashboard and make it available to your code:
export PAPERCLIP_API_KEY="pk_..."from gxl_paperclip import PaperclipClient
client = PaperclipClient.from_env() # picks up PAPERCLIP_API_KEY
# — or pass an explicit strategy —
from gxl_paperclip import APIKeyAuth
client = PaperclipClient(auth=APIKeyAuth("pk_..."))from_env() falls back to the credentials saved by paperclip login (~/.paperclip/credentials.json) via FileCredentialsAuth when no API key is set — handy on a workstation where you've already signed in.
from gxl_paperclip import PaperclipClient
client = PaperclipClient.from_env()
result = client.search("CRISPR lipid nanoparticle", limit=5, source="pmc")
print(result.output) # same formatted text the CLI prints
print(result.result_id) # e.g. "s_14bebc10" — pass to map_()
for event in client.map_("What delivery methods were used?", from_results=result.result_id):
if event.type == "progress":
print(f"{event.completed}/{event.total} papers done")
else:
print(event.output)Every optional kwarg defaults to None (or False for flags) on the client, which means the flag is omitted from the underlying command — the server then applies its own default.
client.search(query, *, limit=None, source=None, exact=False, since=None, sort=None, author=None, journal=None, year=None, type=None, category=None, mode=None, all=False, timeout=None) -> ExecuteResult
Hybrid search across bioRxiv, medRxiv, arXiv, PubMed Central, FDA, ClinicalTrials.gov, and international registries.
| Argument | Default when omitted | Notes |
|---|---|---|
query |
required | Natural-language query string. |
limit |
100 |
Server caps at 1000. |
source |
PMC, bioRxiv, medRxiv, arXiv | Pass "pmc", "biorxiv", "medrxiv", "arxiv", "abstracts", "fda", "trials", or a comma-separated list. |
exact |
False |
True switches search mode to phrase matching. |
since |
no recency filter | e.g. "7d", "30d", "6m", "1y". |
sort |
"relevance" |
Pass "date" for newest-first. |
author |
no filter | Substring match on authors. |
journal |
no filter | PMC only. |
year |
no filter | e.g. 2024. |
type |
no filter | e.g. "review-article" (PMC). |
category |
no filter | e.g. "Neuroscience" (bioRxiv). |
mode |
"any" |
Also supports "all", "50%", "75%". |
all |
False |
When True, searches the full corpus instead of the default recency-weighted slice. |
timeout |
120 s |
Seconds before the request aborts. |
Look up papers by a metadata field.
| Argument | Default when omitted | Notes |
|---|---|---|
field |
required | "doi", "pmc", "pmid", "author", "title", "journal", "year", "keywords", etc. |
value |
required | The value to match (partial, case-insensitive). |
limit |
25 |
|
timeout |
120 s |
Read-only SQL over the documents table. 15s server-side timeout, 200-row cap.
| Argument | Default when omitted | Notes |
|---|---|---|
query |
required | Must be a SELECT against documents. |
source |
"all" |
Pass "pmc" or "biorxiv" to restrict. |
timeout |
120 s |
Run an AI reader against every paper in a prior search/lookup result set. Yields MapProgressEvent objects (OAuth streaming path) followed by a single MapResultEvent.
| Argument | Default when omitted | Notes |
|---|---|---|
question |
required | Question asked against each paper. |
from_results |
required | Pass the result_id returned by search or lookup. |
timeout |
300 s |
Map defaults to the slow-command timeout. |
Download a paper or single file from the virtual filesystem.
| Argument | Default when omitted | Notes |
|---|---|---|
target |
required | e.g. "PMC10791696" or "PMC10791696/figures/fig1.jpg". |
dest |
current directory | Output directory on the server's side of the command. |
timeout |
120 s |
Analyse a paper figure with vision AI.
| Argument | Default when omitted | Notes |
|---|---|---|
path |
required | Figure path, e.g. "PMC11576387/figures/fx1.jpg". |
question |
"Describe this figure in detail." |
Custom prompt. |
fn |
free-form prompt | Pass "describe" or "extract-data" for canned flows. |
timeout |
300 s |
Uses the slow-command default. |
Run an arbitrary server-side pipeline, exactly like paperclip bash '...'.
result = client.bash('search "protein folding" | grep -i "deep learning"')| Argument | Default when omitted | Notes |
|---|---|---|
script |
required | A single shell-style command string. |
timeout |
120 s |
Ping the server and confirm auth works. Returns HealthStatus(reachable: bool, output: str, exit_code: int).
client.results.list(*, limit=None) -> list[ResultRow]— recent saved results for the authenticated user. Server defaultlimitis20.client.results.get(result_id) -> ResultData— raw saved output for a specific result ID (e.g."s_14bebc10","m_ec2c9cc9").
Typed wrappers over the virtual filesystem commands. Each returns an ExecuteResult.
| Method | Defaults |
|---|---|
papers.cat(path) |
no options |
papers.head(path, *, lines=None) |
lines defaults to the CLI's head default (10). |
papers.tail(path, *, lines=None) |
lines defaults to the CLI's tail default (10). |
papers.ls(path) |
no options |
papers.grep(pattern, path, *, ignore_case=False, extended=False) |
no flags passed when both are False. |
papers.scan(path, patterns) |
multiple patterns OR'd in a single pass. |
Escape hatch for any command without a typed wrapper (sed, awk, sort, cut, tr, jq, new server commands, ...). args is a list of argv tokens — the SDK quotes them for you.
result = client.execute("awk", ["-F", "\t", "{print $1}", "/papers/PMC1/content.lines"])Streaming escape hatch. Currently only "map" streams; other commands raise ValueError.
All HTTP and network failures raise a subclass of PaperclipError:
from gxl_paperclip import (
AuthError, RateLimitError, NotFoundError, ServerError,
RequestTimeoutError, NetworkError,
)
try:
client.search("AlphaFold")
except AuthError:
... # invalid API key or expired credentials
except RateLimitError:
... # HTTP 429
except RequestTimeoutError:
... # client-side timeoutExecuteResult(output, exit_code, elapsed_ms, result_id, download_url, download_filename, cwd, raw)MapProgressEvent(total, completed, failed, elapsed_s)MapResultEvent(output, result_id, elapsed_ms, exit_code)ResultRow(result_id, command, raw_input, latency_ms, created_at, raw)ResultData(result_id, output, command, raw_input, latency_ms, created_at, raw)HealthStatus(reachable, output, exit_code, elapsed_ms)
Apache-2.0 — see LICENSE.