Skip to content

reviewer3: compile FULL source + page-trim PDF (fix pdflatex failures)#86

Merged
dangng2004 merged 1 commit into
reviewer3-integrationfrom
reviewer3-pdf-pagetrim
May 15, 2026
Merged

reviewer3: compile FULL source + page-trim PDF (fix pdflatex failures)#86
dangng2004 merged 1 commit into
reviewer3-integrationfrom
reviewer3-pdf-pagetrim

Conversation

@dangng2004
Copy link
Copy Markdown
Contributor

Stacked on top of #85.

Problem

The reviewer3 perturbation run hit ~37% per-paper failure rate. Three distinct failure modes:

  • pdflatex produced no usable PDF (4 cells across cs_CC paper_005, paper_008) — token truncation in `prepare_units` cuts the LaTeX-as-md source mid-environment, so pdflatex fails on the staged file.
  • Some affected papers also had `\input{mypreamble.tex}` referencing files we don't have, which pdflatex fatal-errors on. The included preamble defined common shortcuts (`\bbC`, `\calA`, `\vvirg`, `\ootimes`, …), so even if the include is stripped the body fails on undefined commands.
  • `UnicodeDecodeError: 'utf-8' can't decode byte 0xaa` (1 cell, cs_CC paper_009) — `subprocess.run(text=True)` blew up trying to decode pdflatex's own log bytes.

Fix

Switch the reviewer3 system to compile the full pre-truncation source and trim the rendered PDF to its first N pages — matches the `max_pages: 20` convention coarse already uses.

Three additional pdflatex robustness layers:

  1. Strip orphan `\input`/`\include` whose target file isn't bundled.
  2. Inject a defensive `\providecommand` preamble for common author shortcuts (`\bb[A-Z]`, `\cal[A-Z]`, `\bf[a-z]`, `\eps`, `\vvirg`, `\ootimes`, …) — `\providecommand` is a no-op when already defined, so it's safe blanket coverage.
  3. `subprocess.run` switched to bytes mode so pdflatex's non-UTF8 accent bytes don't kill the Python side.

Files

  • `benchmarks/perturbation/systems/reviewer3.py` — thread `src_corrupted` + `max_pages` into job payload
  • `benchmarks/perturbation/systems/reviewer3_adapter.py` — `Reviewer3Job.source` + `.max_pages` fields; `_ensure_pdf` accepts and honors them; helpers `_strip_orphan_includes`, `_inject_rescue_preamble`, `_trim_pages_to`, `_maybe_trim_pages`
  • `benchmarks/perturbation/run_benchmark.py` — `Config.max_pages: int | None = None`
  • `benchmarks/perturbation/configs/full_*_reviewer3.yaml` × 8 — `max_pages: 20`

Test plan

  • Smoke-validated end-to-end on three previously-failing cs_CC papers — all three now produce 20-page trimmed PDFs in 2–4s.
  • Re-launch the perturbation runner against the same configs; the idempotent skip-completed logic preserves the cells that already succeeded and retries the failed ones.

🤖 Generated with Claude Code

…ilures)

The token-based truncation in `prepare_units` cuts the LaTeX-as-md staged file
at a token boundary, which routinely leaves the document mid-environment.
pdflatex on the staged file then "produces no usable PDF" for a fraction of
papers, surfacing as a hard failure in the reviewer3 run.

Switches the reviewer3 system to compile the FULL pre-truncation source
(`u.src_corrupted`) and then trim the rendered PDF to its first N pages.
This matches the `max_pages: 20` convention conference_study/configs/coarse.yaml
already uses for coarse, so reviewer3 sees roughly the same content window
the other systems see.

Three pdflatex robustness fixes layered in:

1. Strip orphan `\input{...}` / `\include{...}` whose target file isn't
   bundled. pdflatex aborts hard on a missing \input, killing the compile
   for the whole paper even when the body is fine (paper_005 cs_CC had
   `\input{mypreamble.tex}`).

2. Inject a defensive preamble of `\providecommand` fallbacks for common
   author-defined shortcuts (\bbR, \calA, \bfx, \eps, \vvirg, \ootimes,
   etc.). Authors typically define these in private preamble files we
   don't have; \providecommand is a no-op when the command is already
   defined, so the injection is safe blanket coverage.

3. subprocess.run uses bytes (text=False) instead of text=True so the
   pdflatex log's non-UTF-8 accent bytes don't blow up Python's decoder
   (paper_009 cs_CC had byte 0xaa at offset ~57k).

Changes
- Reviewer3System.build_jobs threads `u.src_corrupted` (full path) and
  `cfg["max_pages"]` into the job payload.
- Reviewer3Job adopts `source` + `max_pages` fields; `_submit` / `_ensure_pdf`
  forward them.
- `_ensure_pdf` prefers `source` over `paper` for the compile when set;
  caches alongside the source with `.trim.pdf` suffix when trimmed.
- `_trim_pages_to` (in-place) and `_maybe_trim_pages` (for already-PDF inputs)
  use pymupdf to cap pages.
- `max_pages: 20` added to all 8 `full_*_reviewer3.yaml` configs.
- run_benchmark.py Config gains `max_pages: int | None = None`.

Smoke-validated on three previously-failing cs_CC papers (2604.19872v1 with
missing \input + custom commands, 2604.24325v1 with same pattern, 2604.24879v1
with non-UTF8 bytes in pdflatex output) — all three now produce 20-page
trimmed PDFs in 2–4s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dangng2004 dangng2004 merged commit 8c6238f into reviewer3-integration May 15, 2026
@dangng2004
Copy link
Copy Markdown
Contributor Author

Auto-closed: fast-forward push moved this commit onto reviewer3-integration (now part of #85). Branch deleted.

@dangng2004 dangng2004 deleted the reviewer3-pdf-pagetrim branch May 15, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant