Skip to content

GXL-ai/paperclip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paperclip

Search, read, and analyze 8M+ biomedical papers from the command line.

Paperclip is a CLI and MCP server for AI agents where every paper is a directory containing full text, sections, figures, and supplements on a virtual filesystem at /papers/.

  • Search with natural language or regex across 8M+ papers from bioRxiv, medRxiv, and PubMed Central
  • Run parallel AI readers across papers with map and synthesize with reduce
  • Pipe results through standard Unix tools (grep, awk, sed, jq, etc.)
  • Ask questions about figures with vision AI
  • Query the database directly with SQL

Full documentation: paperclip.gxl.ai

Community

This repository hosts the source code for the Paperclip CLI client. Use it to:

Install

Python 3.8+ required.

curl -fsSL https://paperclip.gxl.ai/install.sh | bash

Installs to ~/.paperclip/ with a wrapper at ~/.local/bin/paperclip.

Or install via pip:

pip install https://paperclip.gxl.ai/paperclip.whl
paperclip setup

Sign in

Sign-in happens automatically on first use, or run manually:

paperclip login

Verify

paperclip config
# Server:  https://paperclip.gxl.ai
# Auth:    ✓ you@example.com
# Config:  ~/.paperclip

MCP Server (alternative)

Use Paperclip as an MCP server directly — no local install needed.

Claude Code

claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcp

Then start claude, enter /mcp, and select Authenticate under the paperclip server.

Cursor

Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):

{
  "mcpServers": {
    "paperclip": {
      "url": "https://paperclip.gxl.ai/mcp",
      "type": "http"
    }
  }
}

Then Cmd/Ctrl + Shift + P → Tools & MCPs, enable the paperclip server, and authenticate.

Quick Start

# Search for papers
paperclip search "CRISPR base editing efficiency"

# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json

# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines

# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines

# Regex search across the entire corpus (sub-second)
paperclip grep "alphamissense" /papers/

# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"

# Run SQL queries
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"

# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txt

Use paperclip bash '...' for pipes and chains:

paperclip bash 'search "protein folding" | grep "deep learning"'

Commands

Command Description
search Hybrid search (BM25 + vector) across 8M+ papers
searches Run multiple queries in parallel and merge results
grep Regex search within a paper or across the entire corpus
scan Multi-pattern grep in a single pass
lookup Find papers by DOI, PMC ID, PMID, author, title, journal
sql Read-only SQL queries against the papers database
export Export SQL results to CSV
map Parallel AI reader across multiple papers
reduce Synthesize map results into summaries, tables, or themes
filter Filter search results for relevance
ask-image Analyze figures with vision AI
cat Read files from the paper filesystem
head / tail Preview first or last lines
ls / tree List directory contents
grep / scan Search within papers
sed / awk / jq Text processing
results View, browse, and export saved results
pull Download papers or files locally
config Show or set configuration
install Install agent skill for Claude Code, Cursor, or Codex
update Update to the latest version

Agent Integration

Install a skill so your coding agent can use Paperclip automatically:

paperclip install

Supports Claude Code, Cursor, and Codex. The skill teaches the agent the full command set. Then just mention /paperclip in your prompt:

Using /paperclip, find recent papers on GLP-1 receptor agonists and summarize the primary endpoints.

Paper Filesystem

Each paper lives at /papers/<id>/:

meta.json        — title, authors, doi, date, abstract, journal
content.lines    — full text, line-numbered (L<n>: <text>)
sections/        — named section files (Introduction.lines, Methods.lines, ...)
figures/         — figure files (PMC papers)
supplements/     — supplementary files (PMC papers)

Paper IDs use prefixes by source: bio_ (bioRxiv), med_ (medRxiv), PMC (PubMed Central).

License

Apache-2.0 — see LICENSE.

About

Paperclip — search, read, and analyze 8M+ biomedical papers from the command line

Resources

License

Stars

Watchers

Forks

Languages