A curated literature database of 1,663 research papers on Agentic Software Engineering, drawn from top-tier venues in Software Engineering, Programming Languages, Security, and NLP. This repository also provides an automated paper-labeling skill — a Claude Code pipeline that extracts, filters, and classifies new papers from raw proceedings files in various formats (e.g., bib and html), keeping the database up to date with minimal manual effort.
- Browse the Website
- Tracked Venues
- Taxonomy
- Paper Selection
- Adding New Papers
- Contributing
- Extending the Taxonomy
- Disclaimer and Contact
Open web/index.html locally or browse online. The interface supports:
- Full-text search across titles and abstracts
- Year and venue filters — independent single-select dropdowns; venue names are normalized (e.g., "ICSE" matches all ICSE years)
- Label filter — select one or more research topics from the sidebar or by clicking label pills on paper cards; multiple labels combine with AND logic
- Expandable abstracts — click any paper card to reveal its abstract
- Active filter summary — each active constraint is shown as a removable tag below the toolbar
All filter dimensions (year, venue, labels) are optional and combine with AND logic: only papers satisfying every active constraint are shown.
Papers are systematically collected for all proceedings from 2023–2026 that have been publicly released. The database additionally includes selected papers from earlier years (2020–2022) and other venues on a best-effort basis.
Tracked venues:
Software Engineering (SE)
- ICSE (2023--2025), FSE (2023--2025), ASE (2023--2025), ISSTA (2022--2025)
- TSE (2023--2024), TOSEM (2023--2024)
Programming Languages (PL)
- PLDI (2023, 2025), OOPSLA (2023--2025), POPL (2025), CC (2025), COLM (2025)
Security
- S&P (2023--2025), USENIX Security (2023--2025), CCS (2023--2025), NDSS (2024--2026)
- RAID (2023)
Natural Language Processing (NLP)
- ACL (2023--2025), EMNLP (2020, 2023--2025), NAACL (2024--2025)
Machine Learning (ML)
- ICML (2021, 2023--2025), NeurIPS (2022--2024), ICLR (2021, 2023--2025)
Each
█block represents ~9 papers. Bars are scaled within each track independently.
| Venue | 2023 | 2024 | 2025 | Total |
|---|---|---|---|---|
| ICSE | ███░░░░░░░░░░░░░░░░░ 23 |
██████░░░░░░░░░░░░░░ 53 |
██████████░░░░░░░░░░ 90 |
166 |
| FSE | ███░░░░░░░░░░░░░░░░░ 31 |
█████░░░░░░░░░░░░░░░ 45 |
██████░░░░░░░░░░░░░░ 54 |
130 |
| ASE | ████░░░░░░░░░░░░░░░░ 36 |
█████████░░░░░░░░░░░ 78 |
████████████████████ 178 |
292 |
| ISSTA | █░░░░░░░░░░░░░░░░░░░ 10 |
█████░░░░░░░░░░░░░░░ 45 |
█████░░░░░░░░░░░░░░░ 43 |
98 |
| Total | 100 | 221 | 365 | 686 |
| Venue | 2023 | 2024 | 2025 | Total |
|---|---|---|---|---|
| PLDI | ██░░░░░░░░░░░░░░░░░░ 2 |
░░░░░░░░░░░░░░░░░░░░ 0 |
████░░░░░░░░░░░░░░░░ 4 |
6 |
| OOPSLA | ████░░░░░░░░░░░░░░░░ 4 |
█████████████░░░░░░░ 13 |
█████████████████░░░ 17 |
34 |
| POPL | ░░░░░░░░░░░░░░░░░░░░ 0 |
░░░░░░░░░░░░░░░░░░░░ 0 |
█░░░░░░░░░░░░░░░░░░░ 1 |
1 |
| Total | 6 | 13 | 22 | 41 |
| Venue | 2023 | 2024 | 2025 | 2026 | Total |
|---|---|---|---|---|---|
| CCS | ██░░░░░░░░░░░░░░░░░░ 4 |
██████████████░░░░░░ 24 |
███████████░░░░░░░░░ 19 |
— | 47 |
| USENIXSec | ██░░░░░░░░░░░░░░░░░░ 3 |
█████████░░░░░░░░░░░ 16 |
█████████████░░░░░░░ 22 |
— | 41 |
| S&P | █░░░░░░░░░░░░░░░░░░░ 1 |
█████░░░░░░░░░░░░░░░ 9 |
███████░░░░░░░░░░░░░ 12 |
— | 22 |
| NDSS | — | ██░░░░░░░░░░░░░░░░░░ 3 |
████████████░░░░░░░░ 21 |
██████████████████░░ 31 |
55 |
| Total | 8 | 52 | 74 | 31 | 165 |
| Venue | 2023 | 2024 | 2025 | Total |
|---|---|---|---|---|
| ACL | ██░░░░░░░░░░░░░░░░░░ 23 |
████████░░░░░░░░░░░░ 79 |
███████████████████░ 192 |
294 |
| EMNLP | ████░░░░░░░░░░░░░░░░ 39 |
██████░░░░░░░░░░░░░░ 59 |
███████████████░░░░░ 152 |
250 |
| NAACL | ░░░░░░░░░░░░░░░░░░░░ 0 |
█░░░░░░░░░░░░░░░░░░░ 6 |
██░░░░░░░░░░░░░░░░░░ 16 |
22 |
| Total | 62 | 144 | 360 | 566 |
Papers are classified using a two-level taxonomy with 9 top-level categories and 47 sub-categories. A paper may carry multiple labels. The taxonomy is organized into three super-groups:
Papers where LLMs or AI agents are applied to core software engineering tasks.
| Category | Sub-Categories | Papers |
|---|---|---|
| Code Generation | Program Synthesis (410), Code Completion (68), Program Repair (223), Code Translation (69), Decompilation (23), Refactoring (37) | 750 |
| Static Analysis | Bug Detection (249), Program Verification (44), Specification Inference (33), Type Inference (18), Data-flow Analysis (23), Taint Analysis (16), Code Summarization (67), Code Search (51), Clone Detection (21), Call Graph Analysis (8), Symbolic Execution (7), Pointer Analysis (3), Abstract Interpretation (3) | 446 |
| Dynamic Analysis | Test Case Generation (118), Fuzzing (58), Domain-Specific Testing (56), Debugging (40), PoC and Exploit Generation (23), Test Oracle (19), Bug Reproduction (19), Mutation Testing (6) | 287 |
| Code Model | Model Training (407), Binary and IR Model (35) | 427 |
| Other SE Tasks | Doc/Comment/Commit Message Generation (35), Log Analysis (34), Code Review (29) | 96 |
Research on agent architectures and the safety and security properties of code-oriented LLMs.
| Category | Sub-Categories | Papers |
|---|---|---|
| Agent Design | Planning (206), Tool Use (156), Multi-Agent (96), Memory Management (30) | 328 |
| Model Safety and Security | Adversarial Attack (85), Jailbreaking (66), Secure Code Generation (65), Memorization (40), Backdoor Detection (36), Watermarking (35) | 267 |
| Agent Safety and Security | Prompt Injection (73), Agent Defense (36), Access Control (12) | 97 |
Benchmarks, empirical studies, and surveys that assess LLM/agent capabilities for code.
| Category | Sub-Categories | Papers |
|---|---|---|
| Evaluation | Empirical Study (620), Benchmark (392), Survey (26) | 914 |
Each venue's proceedings are processed through a four-stage pipeline:
- Extract — parse titles and abstracts from BibTeX or HTML files.
- Filter — retain papers whose title or abstract contains both LLM-related terms (e.g., "large language model", "GPT", "agent") and code-related terms (e.g., "program", "software", "testing", "verification"). This keyword pass is deliberately permissive (high recall).
- Classify — pass each candidate to the Claude API, which verifies relevance and assigns taxonomy labels. A paper is included only if LLMs or AI agents constitute a central contribution, not merely a baseline or comparison point.
- Merge — add the classified papers to the canonical database (
data/labeldata/labeldata.json) and regenerate the website.
This repository ships a paper-labeler skill that automates the full pipeline: extract → filter → label → merge → rebuild. All scripts live in .claude/skills/paper-labeler/scripts/.
pip install boto3 requests # boto3 for Claude API via AWS Bedrock; requests for NDSS scrapingAWS credentials must be configured (~/.aws/credentials, environment variables, or IAM role) for the labeling step. The filter-only step needs no credentials.
Scans a rawdata folder, skips venues already recorded in data/venues.json, runs the full pipeline for each new venue, and rebuilds the website.
# Preview what would be processed
python .claude/skills/paper-labeler/scripts/process_folder.py --dry-run
# Process all new venues under data/rawdata/
python .claude/skills/paper-labeler/scripts/process_folder.py
# Process a specific year only
python .claude/skills/paper-labeler/scripts/process_folder.py data/rawdata/2025/
# Keyword filter only — no API calls, no merge (useful for a quick check)
python .claude/skills/paper-labeler/scripts/process_folder.py --filter-onlyKey options: --model MODEL, --region REGION, --delay SECONDS, --no-rebuild.
Use when finer control over individual steps is required.
# BibTeX (most venues: ASE, ICSE, FSE, CCS, S&P, OOPSLA, …)
python .claude/skills/paper-labeler/scripts/extract_papers.py \
data/rawdata/2025/ASE2025.bib > /tmp/extracted.json
# ACL Anthology HTML (ACL, EMNLP, NAACL)
python .claude/skills/paper-labeler/scripts/extract_papers.py \
data/rawdata/2025/ACL2025.html > /tmp/extracted.json
# NDSS HTML — titles only; fetch abstracts separately
python .claude/skills/paper-labeler/scripts/extract_papers.py \
data/rawdata/2025/NDSS2025.html > /tmp/ndss_raw.json
python .claude/skills/paper-labeler/scripts/fetch_ndss_abstracts.py \
/tmp/ndss_raw.json -o /tmp/extracted.json# Keyword filter only (no AWS credentials needed)
python .claude/skills/paper-labeler/scripts/label_papers.py \
/tmp/extracted.json --phase filter -o /tmp/filtered.json
# Claude labeling only (requires AWS credentials)
python .claude/skills/paper-labeler/scripts/label_papers.py \
/tmp/filtered.json --phase label -o /tmp/labeled.json
# Both phases in one go
python .claude/skills/paper-labeler/scripts/label_papers.py \
/tmp/extracted.json --phase all -o /tmp/labeled.json# Preview first (no writes)
python .claude/skills/paper-labeler/scripts/merge_labeldata.py \
/tmp/labeled.json --dry-run
# Merge
python .claude/skills/paper-labeler/scripts/merge_labeldata.py \
/tmp/labeled.jsonpython .claude/skills/paper-labeler/scripts/build_site.py
# Output: web/index.htmlWith Claude Code, the pipeline can be invoked conversationally — no need to remember script names or flags:
"Process the ASE2025 rawdata"
"Label the papers in data/rawdata/2025/CCS2025.bib"
"Process the entire 2025 folder"
"Run a dry-run for all unprocessed venues"
"Rebuild the website"
Claude Code will invoke the paper-labeler skill and run the appropriate commands automatically.
| Format | Extension | Example venues | Abstracts |
|---|---|---|---|
| BibTeX | .bib |
ASE, ICSE, FSE, ISSTA, CCS, S&P, OOPSLA, PLDI, TOSEM, TSE, USENIXSec, NAACL | Inline |
| ACL Anthology HTML | .html |
ACL, EMNLP, NAACL (some years) | Inline |
| NDSS HTML | .html |
NDSS | Scraped separately |
| Script | Purpose |
|---|---|
process_folder.py |
Batch mode — scan folder, skip processed venues, run full pipeline |
extract_papers.py |
Step 1 — parse .bib/.html into uniform JSON |
fetch_ndss_abstracts.py |
Step 1b — scrape abstracts from NDSS paper pages |
label_papers.py |
Step 2 — keyword filter + Claude API labeling |
merge_labeldata.py |
Step 3 — merge labeled JSON into labeldata.json |
build_site.py |
Step 4 — regenerate web/index.html from labeldata.json |
import_original.py |
One-time import of legacy papers from original.json |
Full documentation: .claude/skills/paper-labeler/USAGE.md
- Append an entry to
data/labeldata/labeldata.json:{ "Paper Title": { "type": "INPROCEEDINGS", "author": "...", "title": "...", "booktitle": "...", "year": "2025", "abstract": "...", "url": "https://doi.org/...", "venue": "ICSE2025", "labels": ["Static Analysis", "Bug Detection"] } } - Labels must be drawn from the taxonomy above.
- Rebuild the website:
python .claude/skills/paper-labeler/scripts/build_site.py. - Open a pull request.
- Place the
.bibor.htmlproceedings file underdata/rawdata/<year>/. - Run the batch pipeline:
python .claude/skills/paper-labeler/scripts/process_folder.py
- Open a pull request containing the rawdata file and updated
labeldata.json.
If a tracked venue's proceedings have been published but are not yet reflected in the database, please open an issue with the venue name and a link to the proceedings. You may also suggest specific papers with labels.
The pipeline is fully configurable. To track a different research topic across the same venues, edit two sections in .claude/skills/paper-labeler/SKILL.md:
-
## Relevance Criteria— keyword lists and the natural-language prompt used by Claude to decide whether a paper is relevant. For example, to track LLM-for-theorem-proving, add proof-related keywords and update the relevance description. -
## Label Taxonomy— the two-level category hierarchy. Add, remove, or rename categories as needed. After editing, keep theTAXONOMYdict inbuild_site.pyandlabel_papers.pyin sync.
Re-run the pipeline on existing rawdata to reclassify papers under the updated taxonomy:
# Re-label a single extracted file
python .claude/skills/paper-labeler/scripts/label_papers.py \
/tmp/extracted.json --phase all -o /tmp/relabeled.json
# Or reprocess all rawdata from scratch
python .claude/skills/paper-labeler/scripts/process_folder.pyThis repository is intended solely for research purposes. All metadata is sourced from publicly available proceedings pages on ACM, IEEE, and corresponding conference websites. Full-text PDFs are not included or redistributed.
For questions or suggestions, please reach out via stephenw.wangcp@gmail.com or wang6590@purdue.edu.
