Skip to content

Hoemr/paper2code-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paper2code-qa

Locate the code implementation of a research paper's core method — just by talking to your agent.

Give the agent a paper URL, optionally a repo URL, and it finds exactly where the paper's method is implemented: files, classes, functions, line numbers, call chains, and paper-to-code alignment.

How it works (for the user)

You talk to your agent naturally. The agent runs the analysis behind the scenes.

Examples of what you say:

"帮我找一下这篇论文的代码实现:https://arxiv.org/abs/2305.xxxxx"

"Where is the loss function in this paper? Paper: https://arxiv.org/pdf/2402.xxxxx, Repo: https://github.com/user/repo"

"这篇论文的 trainer 在代码的哪个文件里?https://openreview.net/forum?id=xxxx"

The agent will:

  1. Download and parse the paper
  2. Find the code repository (if you didn't provide one)
  3. Clone and index the code
  4. Search for matching implementations
  5. Present file paths, line numbers, confidence scores, and explanations

You never need to run a command yourself.

Install

Claude Code

cd paper2code-qa
bash install_claude.sh

Then just talk to it:

> 帮我找一下这篇论文的代码:https://arxiv.org/abs/2305.xxxxx

Codex

cd paper2code-qa
bash install_codex.sh

What you get

The agent presents a structured summary:

📄 Paper: Online Policy Distillation
Core method: OPD

📁 Repository: github.com/user/opd
Key files: train.py, configs/default.yaml

🎯 Top Code Matches:

1. src/trainer/opd_trainer.py — OPDTrainer.compute_loss [HIGH]
   → Implements Eq.4 (reverse KL divergence)
   → Variables match: beta, teacher_logits, student_logits

2. src/model/teacher.py — TeacherModel.forward [MEDIUM]
   → Teacher logits generation (Section 3.1)

🔗 Call Chain:
train.py → build_trainer() → OPDTrainer.compute_loss() → F.kl_div()

⚠️ Missing: rollout sampling not found (may use external library)

Requirements

The agent needs:

  • Python ≥ 3.9
  • git

For better results (optional, auto-detected):

  • ripgrep — faster code search
  • pdftotext / pymupdf — better PDF text extraction

Privacy

All output files go to the skill's internal directory. Nothing is written to your project.

Design

This skill follows a scripts + agent architecture:

  • Scripts (deterministic): Fetch paper, parse PDF, clone repo, index code, search with ripgrep/regex/AST, score candidates.
  • Agent (judgment): Reads the actual code, verifies matches against paper equations, explains alignment in natural language.

This split ensures reproducibility while leveraging the agent's ability to read, compare, and explain.

About

Locate code implementation of a research paper's core method. Conversation-driven agent skill — no API dependency. Works with Claude Code & Codex.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors