GitHub - GXL-ai/paperclip: Paperclip — search, read, and analyze 8M+ biomedical papers from the command line

Search, read, and analyze 8M+ biomedical papers from the command line.

Paperclip is a CLI and MCP server for AI agents where every paper is a directory containing full text, sections, figures, and supplements on a virtual filesystem at /papers/.

Search with natural language or regex across 8M+ papers from bioRxiv, medRxiv, and PubMed Central
Run parallel AI readers across papers with map and synthesize with reduce
Pipe results through standard Unix tools (grep, awk, sed, jq, etc.)
Ask questions about figures with vision AI
Query the database directly with SQL

Full documentation: paperclip.gxl.ai

Community

This repository hosts the source code for the Paperclip CLI client. Use it to:

Install

Python 3.8+ required.

curl -fsSL https://paperclip.gxl.ai/install.sh | bash

Installs to ~/.paperclip/ with a wrapper at ~/.local/bin/paperclip.

Or install via pip:

pip install https://paperclip.gxl.ai/paperclip.whl
paperclip setup

Sign in

Sign-in happens automatically on first use, or run manually:

paperclip login

Verify

paperclip config
# Server:  https://paperclip.gxl.ai
# Auth:    ✓ you@example.com
# Config:  ~/.paperclip

MCP Server (alternative)

Use Paperclip as an MCP server directly — no local install needed.

Claude Code

claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcp

Then start claude, enter /mcp, and select Authenticate under the paperclip server.

Cursor

Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):

{
  "mcpServers": {
    "paperclip": {
      "url": "https://paperclip.gxl.ai/mcp",
      "type": "http"
    }
  }
}

Then Cmd/Ctrl + Shift + P → Tools & MCPs, enable the paperclip server, and authenticate.

Quick Start

# Search for papers
paperclip search "CRISPR base editing efficiency"

# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json

# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines

# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines

# Regex search across the entire corpus (sub-second)
paperclip grep "alphamissense" /papers/

# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"

# Run SQL queries
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"

# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txt

Use paperclip bash '...' for pipes and chains:

paperclip bash 'search "protein folding" | grep "deep learning"'

Commands

Command	Description
`search`	Hybrid search (BM25 + vector) across 8M+ papers
`searches`	Run multiple queries in parallel and merge results
`grep`	Regex search within a paper or across the entire corpus
`scan`	Multi-pattern grep in a single pass
`lookup`	Find papers by DOI, PMC ID, PMID, author, title, journal
`sql`	Read-only SQL queries against the papers database
`export`	Export SQL results to CSV
`map`	Parallel AI reader across multiple papers
`reduce`	Synthesize map results into summaries, tables, or themes
`filter`	Filter search results for relevance
`ask-image`	Analyze figures with vision AI
`cat`	Read files from the paper filesystem
`head` / `tail`	Preview first or last lines
`ls` / `tree`	List directory contents
`grep` / `scan`	Search within papers
`sed` / `awk` / `jq`	Text processing
`results`	View, browse, and export saved results
`pull`	Download papers or files locally
`config`	Show or set configuration
`install`	Install agent skill for Claude Code, Cursor, or Codex
`update`	Update to the latest version

Agent Integration

Install a skill so your coding agent can use Paperclip automatically:

paperclip install

Supports Claude Code, Cursor, and Codex. The skill teaches the agent the full command set. Then just mention /paperclip in your prompt:

Using /paperclip, find recent papers on GLP-1 receptor agonists and summarize the primary endpoints.

Paper Filesystem

Each paper lives at /papers/<id>/:

meta.json        — title, authors, doi, date, abstract, journal
content.lines    — full text, line-numbered (L<n>: <text>)
sections/        — named section files (Introduction.lines, Methods.lines, ...)
figures/         — figure files (PMC papers)
supplements/     — supplementary files (PMC papers)

Paper IDs use prefixes by source: bio_ (bioRxiv), med_ (medRxiv), PMC (PubMed Central).

License

Apache-2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/gxl_paperclip		src/gxl_paperclip
LICENSE		LICENSE
README.md		README.md
banner.jpg		banner.jpg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Community

Install

Sign in

Verify

MCP Server (alternative)

Claude Code

Cursor

Quick Start

Commands

Agent Integration

Paper Filesystem

License

About

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Community

Install

Sign in

Verify

MCP Server (alternative)

Claude Code

Cursor

Quick Start

Commands

Agent Integration

Paper Filesystem

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 1

Languages