docs: sync 26.05 docs/docs with main#2179
Conversation
Greptile SummaryThis PR syncs
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/audio-video.md | Removal of !!! important admonition leaves GPU-pinning note as a code block; stray ) produces SyntaxError in code sample; two near-duplicate segment_audio paragraphs with conflicting API names. |
| docs/docs/extraction/custom-metadata.md | New TOC references 6 non-existent section anchors; How metadata is stored heading renamed without updating content; variable definitions removed but still referenced in code example. |
| docs/docs/extraction/releasenotes.md | RC1 install boilerplate fully replaced with GA 26.05 release notes. |
| docs/docs/extraction/prerequisites-support-matrix.md | CUDA/driver requirements updated; OCR NIM corrected; Nemotron Parse extra documented; caption-scope note added. |
| docs/mkdocs.yml | Nav and redirect updated for notebooks/index.md; exclude_docs pattern fixed. |
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 6
docs/docs/extraction/audio-video.md:66-68
**GPU pinning note silently rendered as a code block**
After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.
### Issue 2 of 6
docs/docs/extraction/audio-video.md:88-91
**Stray `)` produces a `SyntaxError` in the code sample**
The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.
```suggestion
)
)
```
```
### Issue 3 of 6
docs/docs/extraction/audio-video.md:93-97
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**
Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.
### Issue 4 of 6
docs/docs/extraction/custom-metadata.md:40-42
**Undefined variables make the code example un-runnable**
The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.
```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```
### Issue 5 of 6
docs/docs/extraction/custom-metadata.md:5-14
**"On this page" TOC contains 6 broken anchor links**
The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
### Issue 6 of 6
docs/docs/extraction/custom-metadata.md:125-128
**Section heading "How metadata is stored" contains only cross-reference bullets**
The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.
Reviews (1): Last reviewed commit: "docs: sync 26.05 docs/docs with main" | Re-trigger Greptile
| ## On this page { #on-this-page } | ||
|
|
||
| - [Attach metadata at ingestion](#attach-metadata-at-ingestion) | ||
| - [How metadata is stored](#how-metadata-is-stored) | ||
| - [Filter results at query time](#filter-results-at-query-time) | ||
| - [Writing `where` predicates](#writing-where-predicates) | ||
| - [Server-side vs client-side filters](#server-side-vs-client-side-filters) | ||
| - [Inspect hit metadata](#inspect-hit-metadata) | ||
| - [Limitations](#limitations) | ||
| - [Related content](#related-content) |
There was a problem hiding this comment.
"On this page" TOC contains 6 broken anchor links
The table of contents added in this PR references #filter-results-at-query-time, #writing-where-predicates, #server-side-vs-client-side-filters, #inspect-hit-metadata, #limitations, and #related-content. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (## Best Practices, ## Use Custom Metadata to Filter Results During Retrieval, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 5-14
Comment:
**"On this page" TOC contains 6 broken anchor links**
The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
How can I resolve this? If you propose a fix, please make it concise.PR NVIDIA#2194 merged into 26.05 on 2026-06-02 but never reached main. This backport keeps main aligned with the release branch and the published docs.nvidia.com site after Randy's follow-up review. Timeline: - Friday: 26.05 docs built for docs.nvidia upload; branch differed from NRL GitHub Pages source and the uploaded docs were incorrect. - Saturday: diff main vs 26.05 produced PR NVIDIA#2179 to sync extraction docs. - Monday: PR NVIDIA#2179 merged and docs uploaded to the public site. - Follow-up: Randy opened PR NVIDIA#2194 on 26.05 with additional fixes found after the NVIDIA#2179 sync. Those fixes landed on 26.05 only. - This commit: cherry-pick of c5b257e onto main (five extraction doc files only). Changes from NVIDIA#2194: - Fix audio-video.md indented code block rendering - Restore custom-metadata example service variables and storage prose - Move caption scope admonition to multimodal-extraction.md - Trim redundant Helm/OCR deploy detail per review feedback - Restore FAQ Docker Compose note and support-matrix section anchors
Summary
docs/docs/onmainand26.05differ in 13 extraction pages plusdocs/mkdocs.ymlnav/redirects.mainis authoritative — it has the GA 26.05 release notes, updated support matrix (CUDA 13.0 / driver 580, Nemotron Parse extra), caption-scope FAQ, andopen_cliptroubleshooting.26.05branch sodocs/docs/matchesmainexactly (git diff upstream/main -- docs/docs/is empty on this branch).Notable content restored on 26.05
releasenotes.md: Full GA 26.05 highlights (upgrade notes, pipeline, CLI, service, models, multimodal, RAG, VDB, evaluation, packaging, Helm, documentation) instead of RC1 install boilerplateprerequisites-support-matrix.md: Current CUDA/driver requirements and Nemotron Parse dependency notefaq.md/troubleshoot.md: Caption scope FAQ andopen_clipinstall guidancecustom-metadata.md: Restructured filtering doc from mainnotebooks/index.md: Restored main nav path (with matchingmkdocs.ymlredirect)Test plan
git diff upstream/main -- docs/docs/is empty on this branch26.05succeeds with updated nav