Conversation
|
Docs preview: https://5a7383c0.dd-docs-preview.pages.dev
|
Greptile SummaryThis PR adds a developer note and 9 self-contained recipe scripts documenting how ~11.4M synthetic visual QA pairs were generated with Data Designer to improve long-document VLM reasoning. The documentation and recipe pipeline are well-structured, but two logic bugs were found in the recipe scripts.
|
| Filename | Overview |
|---|---|
| docs/assets/recipes/vlm_long_doc/01-seed-dataset-preparation.py | Seed prep script with a boundary gap in adaptive_window_size — docs with exactly 20, 30, 40, 50, or 60 pages silently get window size 2 instead of 4–7. |
| docs/assets/recipes/vlm_long_doc/07-multi-page-windowed-qa-sdg.py | Multi-page windowed QA recipe; non-reasoning branch for Qwen3.5-122B-A10B sets temperature=0.7/top_p=0.8 in extra_body but passes temperature=1.0/top_p=0.95 to ChatCompletionInferenceParams. |
| docs/assets/recipes/vlm_long_doc/02-nemotron-parse-ocr-sdg.py | Nemotron-Parse OCR pipeline; regex-based bbox parsing and custom column generator look correct. |
| docs/assets/recipes/vlm_long_doc/05-visual-qa-sdg.py | Single-page visual QA pipeline with relevance and correctness judges; no logic issues found. |
| docs/assets/recipes/vlm_long_doc/09-frontier-judge-sdg.py | Frontier judge with 5-rubric scoring; weighted composite score computation and config wiring are correct. |
| docs/assets/recipes/vlm_long_doc/08-whole-document-qa-sdg.py | Whole-document QA recipe targeting full-document multi-page reasoning; no issues found. |
| docs/devnotes/posts/vlm-long-document-understanding.md | Developer note blog post; content and structure look accurate. |
| mkdocs.yml | Navigation entries for dev note and recipe pages added correctly. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[01-seed-dataset-preparation.py\nDownload PDFs → render pages → parquet] --> B[seed_per_page.parquet]
A --> C[seed_windowed.parquet]
A --> D[seed_whole_document.parquet]
B --> E[02-nemotron-parse-ocr-sdg.py\nOCR → transcribed_texts]
B --> F[03-text-qa-sdg.py\nText QA]
B --> G[04-page-classification-sdg.py\nPage classification]
G --> H[05-visual-qa-sdg.py\nSingle-page Visual QA]
B --> I[06-single-page-qa-sdg.py\nAnchored Single-page QA]
C --> J[07-multi-page-windowed-qa-sdg.py\nMulti-page windowed QA]
D --> K[08-whole-document-qa-sdg.py\nWhole-document QA]
H --> L[09-frontier-judge-sdg.py\nFrontier model judge\n5-rubric scoring + weighted composite]
I --> L
J --> L
K --> L
F --> L
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/assets/recipes/vlm_long_doc/01-seed-dataset-preparation.py
Line: 113-125
Comment:
**Boundary gap in `adaptive_window_size` silently returns wrong window size**
All of the conditions use strict `>` and `<`, so documents with exactly 20, 30, 40, 50, or 60 pages fall through every branch and return the default `2` instead of the expected 4, 5, 6, or 7. For example, a 20-page document gets a window of 2 instead of 4, causing the windowed seed to produce far smaller windows than intended.
```suggestion
if n_pages > 10 and n_pages <= 20:
return 3
elif n_pages > 20 and n_pages <= 30:
return 4
elif n_pages > 30 and n_pages <= 40:
return 5
elif n_pages > 40 and n_pages <= 50:
return 6
elif n_pages > 50 and n_pages <= 60:
return 7
elif n_pages > 60:
return 8
return 2
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/assets/recipes/vlm_long_doc/07-multi-page-windowed-qa-sdg.py
Line: 96-107
Comment:
**Non-reasoning branch sets mismatched `temperature`/`top_p` for `Qwen3.5-122B-A10B`**
In the `else` (non-reasoning) branch for `Qwen/Qwen3.5-122B-A10B`, `extra_body` is populated with `temperature=0.7, top_p=0.8`, but the outer variables `temperature` and `top_p` are left at `1.0` and `0.95` (copied from the reasoning branch). `ChatCompletionInferenceParams` receives the outer variables, so the actual inference runs at temperature=1.0/top_p=0.95 — the non-reasoning settings in `extra_body` have no effect.
```suggestion
else:
extra_body = {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"presence_penalty": 1.5,
"repetition_penalty": 1.0,
}
temperature = 0.7
top_p = 0.8
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "added links" | Re-trigger Greptile
| if n_pages > 10 and n_pages < 20: | ||
| return 3 | ||
| elif n_pages > 20 and n_pages < 30: | ||
| return 4 | ||
| elif n_pages > 30 and n_pages < 40: | ||
| return 5 | ||
| elif n_pages > 40 and n_pages < 50: | ||
| return 6 | ||
| elif n_pages > 50 and n_pages < 60: | ||
| return 7 | ||
| elif n_pages > 60: | ||
| return 8 | ||
| return 2 |
There was a problem hiding this comment.
Boundary gap in
adaptive_window_size silently returns wrong window size
All of the conditions use strict > and <, so documents with exactly 20, 30, 40, 50, or 60 pages fall through every branch and return the default 2 instead of the expected 4, 5, 6, or 7. For example, a 20-page document gets a window of 2 instead of 4, causing the windowed seed to produce far smaller windows than intended.
| if n_pages > 10 and n_pages < 20: | |
| return 3 | |
| elif n_pages > 20 and n_pages < 30: | |
| return 4 | |
| elif n_pages > 30 and n_pages < 40: | |
| return 5 | |
| elif n_pages > 40 and n_pages < 50: | |
| return 6 | |
| elif n_pages > 50 and n_pages < 60: | |
| return 7 | |
| elif n_pages > 60: | |
| return 8 | |
| return 2 | |
| if n_pages > 10 and n_pages <= 20: | |
| return 3 | |
| elif n_pages > 20 and n_pages <= 30: | |
| return 4 | |
| elif n_pages > 30 and n_pages <= 40: | |
| return 5 | |
| elif n_pages > 40 and n_pages <= 50: | |
| return 6 | |
| elif n_pages > 50 and n_pages <= 60: | |
| return 7 | |
| elif n_pages > 60: | |
| return 8 | |
| return 2 |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/assets/recipes/vlm_long_doc/01-seed-dataset-preparation.py
Line: 113-125
Comment:
**Boundary gap in `adaptive_window_size` silently returns wrong window size**
All of the conditions use strict `>` and `<`, so documents with exactly 20, 30, 40, 50, or 60 pages fall through every branch and return the default `2` instead of the expected 4, 5, 6, or 7. For example, a 20-page document gets a window of 2 instead of 4, causing the windowed seed to produce far smaller windows than intended.
```suggestion
if n_pages > 10 and n_pages <= 20:
return 3
elif n_pages > 20 and n_pages <= 30:
return 4
elif n_pages > 30 and n_pages <= 40:
return 5
elif n_pages > 40 and n_pages <= 50:
return 6
elif n_pages > 50 and n_pages <= 60:
return 7
elif n_pages > 60:
return 8
return 2
```
How can I resolve this? If you propose a fix, please make it concise.| top_p = 0.95 | ||
| else: | ||
| extra_body = { | ||
| "temperature": 0.7, | ||
| "top_p": 0.8, | ||
| "top_k": 20, | ||
| "min_p": 0.0, | ||
| "presence_penalty": 1.5, | ||
| "repetition_penalty": 1.0, | ||
| } | ||
| temperature = 1.0 | ||
| top_p = 0.95 |
There was a problem hiding this comment.
Non-reasoning branch sets mismatched
temperature/top_p for Qwen3.5-122B-A10B
In the else (non-reasoning) branch for Qwen/Qwen3.5-122B-A10B, extra_body is populated with temperature=0.7, top_p=0.8, but the outer variables temperature and top_p are left at 1.0 and 0.95 (copied from the reasoning branch). ChatCompletionInferenceParams receives the outer variables, so the actual inference runs at temperature=1.0/top_p=0.95 — the non-reasoning settings in extra_body have no effect.
| top_p = 0.95 | |
| else: | |
| extra_body = { | |
| "temperature": 0.7, | |
| "top_p": 0.8, | |
| "top_k": 20, | |
| "min_p": 0.0, | |
| "presence_penalty": 1.5, | |
| "repetition_penalty": 1.0, | |
| } | |
| temperature = 1.0 | |
| top_p = 0.95 | |
| else: | |
| extra_body = { | |
| "temperature": 0.7, | |
| "top_p": 0.8, | |
| "top_k": 20, | |
| "min_p": 0.0, | |
| "presence_penalty": 1.5, | |
| "repetition_penalty": 1.0, | |
| } | |
| temperature = 0.7 | |
| top_p = 0.8 |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/assets/recipes/vlm_long_doc/07-multi-page-windowed-qa-sdg.py
Line: 96-107
Comment:
**Non-reasoning branch sets mismatched `temperature`/`top_p` for `Qwen3.5-122B-A10B`**
In the `else` (non-reasoning) branch for `Qwen/Qwen3.5-122B-A10B`, `extra_body` is populated with `temperature=0.7, top_p=0.8`, but the outer variables `temperature` and `top_p` are left at `1.0` and `0.95` (copied from the reasoning branch). `ChatCompletionInferenceParams` receives the outer variables, so the actual inference runs at temperature=1.0/top_p=0.95 — the non-reasoning settings in `extra_body` have no effect.
```suggestion
else:
extra_body = {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"presence_penalty": 1.5,
"repetition_penalty": 1.0,
}
temperature = 0.7
top_p = 0.8
```
How can I resolve this? If you propose a fix, please make it concise.
Code Review: PR #579 — docs: add VLM long-document understanding dev note and recipesSummaryThis PR is docs-only: it adds a substantial dev-note blog post ( Overall this is a high-quality, well-organized contribution: the dev note is genuinely useful (reads like a lessons-learned write-up rather than a press release), the recipes are self-contained with clear prerequisites and vLLM launch examples, and the SPDX/copyright headers and FindingsCorrectness / Accuracy
Consistency / Style
Security
Test CoverageN/A — documentation and recipe scripts only. The recipes depend on private/gated vLLM deployments and frontier endpoints, so end-to-end CI is not practical. A lightweight import/syntax-smoke test ( Minor nits
VerdictApprove with nits. The content is solid and the recipes follow project conventions. The only item I'd gate on is fixing the malformed cells in the MMLongBench-Doc results table (leading-zero typos like |
📋 Summary
Adds a developer note and 9 runnable recipe scripts documenting how we generated ~11.4M synthetic visual QA pairs with Data Designer to improve long-document visual reasoning in Nemotron-3-Nano-Omni-30B-A3B (MMLongBench-Doc: 26% → 57.5%).
🔗 Related Issue
N/A
🔄 Changes
✨ Added
docs/devnotes/posts/vlm-long-document-understanding.md) covering the iterative pipeline development process, evaluation-driven design, and lessons learneddocs/devnotes/posts/assets/vlm-long-document-understanding/docs/assets/recipes/vlm_long_doc/(01 through 09): seed prep, Nemotron-Parse OCR, text QA, page classification, visual QA, single-page QA, multi-page windowed QA, whole-document QA, and frontier judge filteringdocs/recipes/vlm_long_doc/with download linksdocs/recipes/cards.md)mkdocs.ymlfor both the dev note and recipe pagesdocs/devnotes/.authors.yml🧪 Testing
✅ Checklist
Made with Cursor