reviewer3: compile FULL source + page-trim PDF (fix pdflatex failures) by dangng2004 · Pull Request #86 · ChicagoHAI/OpenAIReview

dangng2004 · 2026-05-15T21:06:08Z

Stacked on top of #85.

Problem

The reviewer3 perturbation run hit ~37% per-paper failure rate. Three distinct failure modes:

pdflatex produced no usable PDF (4 cells across cs_CC paper_005, paper_008) — token truncation in `prepare_units` cuts the LaTeX-as-md source mid-environment, so pdflatex fails on the staged file.
Some affected papers also had `\input{mypreamble.tex}` referencing files we don't have, which pdflatex fatal-errors on. The included preamble defined common shortcuts (`\bbC`, `\calA`, `\vvirg`, `\ootimes`, …), so even if the include is stripped the body fails on undefined commands.
`UnicodeDecodeError: 'utf-8' can't decode byte 0xaa` (1 cell, cs_CC paper_009) — `subprocess.run(text=True)` blew up trying to decode pdflatex's own log bytes.

Fix

Switch the reviewer3 system to compile the full pre-truncation source and trim the rendered PDF to its first N pages — matches the `max_pages: 20` convention coarse already uses.

Three additional pdflatex robustness layers:

Strip orphan `\input`/`\include` whose target file isn't bundled.
Inject a defensive `\providecommand` preamble for common author shortcuts (`\bb[A-Z]`, `\cal[A-Z]`, `\bf[a-z]`, `\eps`, `\vvirg`, `\ootimes`, …) — `\providecommand` is a no-op when already defined, so it's safe blanket coverage.
`subprocess.run` switched to bytes mode so pdflatex's non-UTF8 accent bytes don't kill the Python side.

Files

`benchmarks/perturbation/systems/reviewer3.py` — thread `src_corrupted` + `max_pages` into job payload
`benchmarks/perturbation/systems/reviewer3_adapter.py` — `Reviewer3Job.source` + `.max_pages` fields; `_ensure_pdf` accepts and honors them; helpers `_strip_orphan_includes`, `_inject_rescue_preamble`, `_trim_pages_to`, `_maybe_trim_pages`
`benchmarks/perturbation/run_benchmark.py` — `Config.max_pages: int | None = None`
`benchmarks/perturbation/configs/full_*_reviewer3.yaml` × 8 — `max_pages: 20`

Test plan

Smoke-validated end-to-end on three previously-failing cs_CC papers — all three now produce 20-page trimmed PDFs in 2–4s.
Re-launch the perturbation runner against the same configs; the idempotent skip-completed logic preserves the cells that already succeeded and retries the failed ones.

🤖 Generated with Claude Code

…ilures) The token-based truncation in `prepare_units` cuts the LaTeX-as-md staged file at a token boundary, which routinely leaves the document mid-environment. pdflatex on the staged file then "produces no usable PDF" for a fraction of papers, surfacing as a hard failure in the reviewer3 run. Switches the reviewer3 system to compile the FULL pre-truncation source (`u.src_corrupted`) and then trim the rendered PDF to its first N pages. This matches the `max_pages: 20` convention conference_study/configs/coarse.yaml already uses for coarse, so reviewer3 sees roughly the same content window the other systems see. Three pdflatex robustness fixes layered in: 1. Strip orphan `\input{...}` / `\include{...}` whose target file isn't bundled. pdflatex aborts hard on a missing \input, killing the compile for the whole paper even when the body is fine (paper_005 cs_CC had `\input{mypreamble.tex}`). 2. Inject a defensive preamble of `\providecommand` fallbacks for common author-defined shortcuts (\bbR, \calA, \bfx, \eps, \vvirg, \ootimes, etc.). Authors typically define these in private preamble files we don't have; \providecommand is a no-op when the command is already defined, so the injection is safe blanket coverage. 3. subprocess.run uses bytes (text=False) instead of text=True so the pdflatex log's non-UTF-8 accent bytes don't blow up Python's decoder (paper_009 cs_CC had byte 0xaa at offset ~57k). Changes - Reviewer3System.build_jobs threads `u.src_corrupted` (full path) and `cfg["max_pages"]` into the job payload. - Reviewer3Job adopts `source` + `max_pages` fields; `_submit` / `_ensure_pdf` forward them. - `_ensure_pdf` prefers `source` over `paper` for the compile when set; caches alongside the source with `.trim.pdf` suffix when trimmed. - `_trim_pages_to` (in-place) and `_maybe_trim_pages` (for already-PDF inputs) use pymupdf to cap pages. - `max_pages: 20` added to all 8 `full_*_reviewer3.yaml` configs. - run_benchmark.py Config gains `max_pages: int | None = None`. Smoke-validated on three previously-failing cs_CC papers (2604.19872v1 with missing \input + custom commands, 2604.24325v1 with same pattern, 2604.24879v1 with non-UTF8 bytes in pdflatex output) — all three now produce 20-page trimmed PDFs in 2–4s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dangng2004 · 2026-05-15T21:14:27Z

Auto-closed: fast-forward push moved this commit onto reviewer3-integration (now part of #85). Branch deleted.

dangng2004 merged commit 8c6238f into reviewer3-integration May 15, 2026

dangng2004 deleted the reviewer3-pdf-pagetrim branch May 15, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reviewer3: compile FULL source + page-trim PDF (fix pdflatex failures)#86

reviewer3: compile FULL source + page-trim PDF (fix pdflatex failures)#86
dangng2004 merged 1 commit into
reviewer3-integrationfrom
reviewer3-pdf-pagetrim

dangng2004 commented May 15, 2026

Uh oh!

dangng2004 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dangng2004 commented May 15, 2026

Problem

Fix

Files

Test plan

Uh oh!

dangng2004 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant