Paperclip

Search, read, and analyze biomedical papers, regulatory documents, and clinical trials from the command line.

Paperclip is a CLI and MCP server for AI agents where every document is a directory containing full text, sections, figures, and supplements on a virtual filesystem.

Search with natural language or regex across biomedical papers from bioRxiv, medRxiv, arXiv, and PubMed Central, plus FDA regulatory documents, ClinicalTrials.gov, and international regulatory and trial registries
Run parallel AI readers across papers with map and synthesize with reduce
Pipe results through standard Unix tools (grep, awk, sed, jq, etc.)
Ask questions about figures with vision AI
Query the database directly with SQL

Full documentation: paperclip.gxl.ai

Community

This repository hosts the source code for the Paperclip CLI client. Use it to:

Install

Python 3.8+ required.

curl -fsSL https://paperclip.gxl.ai/install.sh | bash

Installs to ~/.paperclip/ with a wrapper at ~/.local/bin/paperclip.

Or install via pip:

pip install https://paperclip.gxl.ai/paperclip.whl
paperclip setup

Sign in

Sign-in happens automatically on first use, or run manually:

paperclip login

Verify

paperclip config
# Server:  https://paperclip.gxl.ai
# Auth:    ✓ you@example.com
# Config:  ~/.paperclip

MCP Server (alternative)

Use Paperclip as an MCP server directly — no local install needed.

Claude Code

claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcp

Then start claude, enter /mcp, and select Authenticate under the paperclip server.

Cursor

Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):

{
  "mcpServers": {
    "paperclip": {
      "url": "https://paperclip.gxl.ai/mcp",
      "type": "http"
    }
  }
}

Then Cmd/Ctrl + Shift + P → Tools & MCPs, enable the paperclip server, and authenticate.

Quick Start

# Search for papers
paperclip search "CRISPR base editing efficiency"

# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json

# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines

# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines

# Regex search across the entire corpus (sub-second)
paperclip grep "alphamissense" /papers/

# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"

# Run SQL queries
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"

# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txt

Use paperclip bash '...' for pipes and chains:

paperclip bash 'search "protein folding" | grep "deep learning"'

Commands

Command	Description
`search`	Hybrid search (BM25 + vector) across papers, regulatory documents, and trials
`searches`	Run multiple queries in parallel and merge results
`grep`	Regex search within a paper or across the entire corpus
`scan`	Multi-pattern grep in a single pass
`lookup`	Find papers by DOI, PMC ID, PMID, author, title, journal
`sql`	Read-only SQL queries against the papers database
`map`	Parallel AI reader across multiple papers
`reduce`	Synthesize map results into summaries, tables, or themes
`filter`	Filter search results for relevance
`ask-image`	Analyze figures with vision AI
`cat`	Read files from the paper filesystem
`head` / `tail`	Preview first or last lines
`ls` / `tree`	List directory contents
`grep` / `scan`	Search within papers
`sed` / `awk` / `jq`	Text processing
`results`	View, browse, and export saved results
`config`	Show or set configuration, connection diagnostics
`install`	Install agent skill for Claude Code, Cursor, or Codex
`update`	Update to the latest version
Paper Repos
`init`	Create a new paper repo
`checkout`	List repos, switch repos or branches
`add` / `remove`	Add or remove papers
`import`	Seed repo from a paper's bibliography
`commit`	Snapshot with reasoning message
`annotate`	Pin notes to specific papers
`status`	Repo state: papers, branches, annotations
`log`	Commit history
`diff`	Compare commits or branches
`export`	Export to BibTeX, RIS, Markdown, or CSV
`branch` / `merge`	Branching and merging
`cite`	Citation counts and relationships

Agent Integration

Install a skill so your coding agent can use Paperclip automatically:

paperclip install

Supports Claude Code, Cursor, and Codex. The skill teaches the agent the full command set. Then just mention /paperclip in your prompt:

Using /paperclip, find recent papers on GLP-1 receptor agonists and summarize the primary endpoints.

Paper Filesystem

Each paper lives at /papers/<id>/:

meta.json        — title, authors, doi, date, abstract, journal
content.lines    — full text, line-numbered (L<n>: <text>)
sections/        — named section files (Introduction.lines, Methods.lines, ...)
figures/         — figure files (PMC papers)
supplements/     — supplementary files (PMC papers)

Paper IDs use prefixes by source: bio_ (bioRxiv), med_ (medRxiv), PMC (PubMed Central), arx_ (arXiv). Regulatory documents and clinical trials are accessed via /fda/ and /clinicaltrials/ virtual directories.

Paper Repos

Build versioned, annotated collections of papers with git-like workflows:

# Create a repo and seed from a key paper's references
paperclip init my-review "Systematic review of XYZ"
paperclip import PMC11271413 --min-cites 50
paperclip import refs.bib                    # import .bib/.ris → library + repo

# View your personal library (persists across repos)
paperclip library

# Curate: annotate, commit
paperclip annotate PMC123 "Key finding on mechanism X"
paperclip commit -m "Initial seed from review + manual curation"

# Review your work
paperclip repo                       # list all repos
paperclip repo <name>                # repo overview: papers, branches, annotations
paperclip log                        # commit history
paperclip diff 9a6d..559a            # compare commits

# Export to reference managers
paperclip export bib -o refs.bib     # BibTeX (annotations in note field)
paperclip export ris -o refs.ris     # RIS (Zotero, Paperpile, Mendeley, EndNote)
paperclip export md -o review.md     # structured markdown report
paperclip export csv -o papers.csv   # tabular data

Saving files locally

Redirect cat to write any paper file to disk. Text files come back as text; figures and other binaries stream as raw bytes when stdout is redirected (no base64 wrapping):

paperclip cat /papers/PMC10791696/meta.json > meta.json
paperclip cat /papers/PMC10791696/figures/fig1.tif > fig1.tif

For bulk, loop over ls:

mkdir -p figures
for f in $(paperclip ls /papers/PMC10791696/figures/); do
  paperclip cat /papers/PMC10791696/figures/$f > figures/$f
done

Python SDK

The gxl-paperclip package ships a Python SDK alongside the CLI, so you can call Paperclip directly from scripts, notebooks, and other tools. Installing the package (via pip install or the installer script above) gives you both the paperclip command and the gxl_paperclip module.

Authentication

The SDK uses API keys (OAuth is reserved for interactive CLI sign-in). Create a key from the dashboard and make it available to your code:

export PAPERCLIP_API_KEY="pk_..."

from gxl_paperclip import PaperclipClient

client = PaperclipClient.from_env()           # picks up PAPERCLIP_API_KEY
# — or pass an explicit strategy —
from gxl_paperclip import APIKeyAuth
client = PaperclipClient(auth=APIKeyAuth("pk_..."))

from_env() falls back to the credentials saved by paperclip login (~/.paperclip/credentials.json) via FileCredentialsAuth when no API key is set — handy on a workstation where you've already signed in.

Quick start

from gxl_paperclip import PaperclipClient

client = PaperclipClient.from_env()

result = client.search("CRISPR lipid nanoparticle", limit=5, source="pmc")
print(result.output)           # same formatted text the CLI prints
print(result.result_id)        # e.g. "s_14bebc10" — pass to map_()

for event in client.map_("What delivery methods were used?", from_results=result.result_id):
    if event.type == "progress":
        print(f"{event.completed}/{event.total} papers done")
    else:
        print(event.output)

Method reference

Every optional kwarg defaults to None (or False for flags) on the client, which means the flag is omitted from the underlying command — the server then applies its own default.

`client.search(query, *, limit=None, source=None, exact=False, since=None, sort=None, author=None, journal=None, year=None, type=None, category=None, mode=None, all=False, timeout=None) -> ExecuteResult`

Hybrid search across bioRxiv, medRxiv, arXiv, PubMed Central, FDA, ClinicalTrials.gov, and international registries.

Argument	Default when omitted	Notes
`query`	required	Natural-language query string.
`limit`	`100`	Server caps at 1000.
`source`	PMC, bioRxiv, medRxiv, arXiv	Pass `"pmc"`, `"biorxiv"`, `"medrxiv"`, `"arxiv"`, `"abstracts"`, `"fda"`, `"trials"`, or a comma-separated list.
`exact`	`False`	`True` switches search mode to phrase matching.
`since`	no recency filter	e.g. `"7d"`, `"30d"`, `"6m"`, `"1y"`.
`sort`	`"relevance"`	Pass `"date"` for newest-first.
`author`	no filter	Substring match on authors.
`journal`	no filter	PMC only.
`year`	no filter	e.g. `2024`.
`type`	no filter	e.g. `"review-article"` (PMC).
`category`	no filter	e.g. `"Neuroscience"` (bioRxiv).
`mode`	`"any"`	Also supports `"all"`, `"50%"`, `"75%"`.
`all`	`False`	When `True`, searches the full corpus instead of the default recency-weighted slice.
`timeout`	`120` s	Seconds before the request aborts.

`client.lookup(field, value, *, limit=None, timeout=None) -> ExecuteResult`

Look up papers by a metadata field.

Argument	Default when omitted	Notes
`field`	required	`"doi"`, `"pmc"`, `"pmid"`, `"author"`, `"title"`, `"journal"`, `"year"`, `"keywords"`, etc.
`value`	required	The value to match (partial, case-insensitive).
`limit`	`25`
`timeout`	`120` s

`client.sql(query, *, source=None, timeout=None) -> ExecuteResult`

Read-only SQL over the documents table. 15s server-side timeout, 200-row cap.

Argument	Default when omitted	Notes
`query`	required	Must be a `SELECT` against `documents`.
`source`	`"all"`	Pass `"pmc"` or `"biorxiv"` to restrict.
`timeout`	`120` s

`client.map_(question, *, from_results, timeout=None) -> Iterator[MapEvent]`

Run an AI reader against every paper in a prior search/lookup result set. Yields MapProgressEvent objects (OAuth streaming path) followed by a single MapResultEvent.

Argument	Default when omitted	Notes
`question`	required	Question asked against each paper.
`from_results`	required	Pass the `result_id` returned by `search` or `lookup`.
`timeout`	`300` s	Map defaults to the slow-command timeout.

`client.pull(target, dest=None, *, timeout=None) -> ExecuteResult`

Download a paper or single file from the virtual filesystem.

Argument	Default when omitted	Notes
`target`	required	e.g. `"PMC10791696"` or `"PMC10791696/figures/fig1.jpg"`.
`dest`	current directory	Output directory on the server's side of the command.
`timeout`	`120` s

`client.ask_image(path, question=None, *, fn=None, timeout=None) -> ExecuteResult`

Analyse a paper figure with vision AI.

Argument	Default when omitted	Notes
`path`	required	Figure path, e.g. `"PMC11576387/figures/fx1.jpg"`.
`question`	`"Describe this figure in detail."`	Custom prompt.
`fn`	free-form prompt	Pass `"describe"` or `"extract-data"` for canned flows.
`timeout`	`300` s	Uses the slow-command default.

`client.bash(script, *, timeout=None) -> ExecuteResult`

Run an arbitrary server-side pipeline, exactly like paperclip bash '...'.

result = client.bash('search "protein folding" | grep -i "deep learning"')

Argument	Default when omitted	Notes
`script`	required	A single shell-style command string.
`timeout`	`120` s

`client.health(*, timeout=None) -> HealthStatus`

Ping the server and confirm auth works. Returns HealthStatus(reachable: bool, output: str, exit_code: int).

`client.results`

client.results.list(*, limit=None) -> list[ResultRow] — recent saved results for the authenticated user. Server default limit is 20.
client.results.get(result_id) -> ResultData — raw saved output for a specific result ID (e.g. "s_14bebc10", "m_ec2c9cc9").

`client.papers.*`

Typed wrappers over the virtual filesystem commands. Each returns an ExecuteResult.

Method	Defaults
`papers.cat(path)`	no options
`papers.head(path, *, lines=None)`	`lines` defaults to the CLI's `head` default (`10`).
`papers.tail(path, *, lines=None)`	`lines` defaults to the CLI's `tail` default (`10`).
`papers.ls(path)`	no options
`papers.grep(pattern, path, *, ignore_case=False, extended=False)`	no flags passed when both are `False`.
`papers.scan(path, patterns)`	multiple patterns OR'd in a single pass.

`client.execute(command, args=None, *, timeout=None) -> ExecuteResult`

Escape hatch for any command without a typed wrapper (sed, awk, sort, cut, tr, jq, new server commands, ...). args is a list of argv tokens — the SDK quotes them for you.

result = client.execute("awk", ["-F", "\t", "{print $1}", "/papers/PMC1/content.lines"])

`client.stream(command, args=None, *, timeout=None) -> Iterator[MapEvent]`

Streaming escape hatch. Currently only "map" streams; other commands raise ValueError.

Error handling

All HTTP and network failures raise a subclass of PaperclipError:

from gxl_paperclip import (
    AuthError, RateLimitError, NotFoundError, ServerError,
    RequestTimeoutError, NetworkError,
)

try:
    client.search("AlphaFold")
except AuthError:
    ...  # invalid API key or expired credentials
except RateLimitError:
    ...  # HTTP 429
except RequestTimeoutError:
    ...  # client-side timeout

Result types

ExecuteResult(output, exit_code, elapsed_ms, result_id, download_url, download_filename, cwd, raw)
MapProgressEvent(total, completed, failed, elapsed_s)
MapResultEvent(output, result_id, elapsed_ms, exit_code)
ResultRow(result_id, command, raw_input, latency_ms, created_at, raw)
ResultData(result_id, output, command, raw_input, latency_ms, created_at, raw)
HealthStatus(reachable, output, exit_code, elapsed_ms)

License

Apache-2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/gxl_paperclip		src/gxl_paperclip
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
banner.jpg		banner.jpg
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Paperclip

Community

Install

Sign in

Verify

MCP Server (alternative)

Claude Code

Cursor

Quick Start

Commands

Agent Integration

Paper Filesystem

Paper Repos

Saving files locally

Python SDK

Authentication

Quick start

Method reference

client.search(query, *, limit=None, source=None, exact=False, since=None, sort=None, author=None, journal=None, year=None, type=None, category=None, mode=None, all=False, timeout=None) -> ExecuteResult

client.lookup(field, value, *, limit=None, timeout=None) -> ExecuteResult

client.sql(query, *, source=None, timeout=None) -> ExecuteResult

client.map_(question, *, from_results, timeout=None) -> Iterator[MapEvent]

client.pull(target, dest=None, *, timeout=None) -> ExecuteResult

client.ask_image(path, question=None, *, fn=None, timeout=None) -> ExecuteResult

client.bash(script, *, timeout=None) -> ExecuteResult

client.health(*, timeout=None) -> HealthStatus

client.results

client.papers.*

client.execute(command, args=None, *, timeout=None) -> ExecuteResult

client.stream(command, args=None, *, timeout=None) -> Iterator[MapEvent]

Error handling

Result types

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

`client.search(query, *, limit=None, source=None, exact=False, since=None, sort=None, author=None, journal=None, year=None, type=None, category=None, mode=None, all=False, timeout=None) -> ExecuteResult`

`client.lookup(field, value, *, limit=None, timeout=None) -> ExecuteResult`

`client.sql(query, *, source=None, timeout=None) -> ExecuteResult`

`client.map_(question, *, from_results, timeout=None) -> Iterator[MapEvent]`

`client.pull(target, dest=None, *, timeout=None) -> ExecuteResult`

`client.ask_image(path, question=None, *, fn=None, timeout=None) -> ExecuteResult`

`client.bash(script, *, timeout=None) -> ExecuteResult`

`client.health(*, timeout=None) -> HealthStatus`

`client.results`

`client.papers.*`

`client.execute(command, args=None, *, timeout=None) -> ExecuteResult`

`client.stream(command, args=None, *, timeout=None) -> Iterator[MapEvent]`