Skip to content

docs: sync 26.05 docs/docs with main#2179

Merged
sosahi merged 1 commit into
NVIDIA:26.05from
kheiss-uwzoo:docs/sync-26.05-docs-with-main
Jun 1, 2026
Merged

docs: sync 26.05 docs/docs with main#2179
sosahi merged 1 commit into
NVIDIA:26.05from
kheiss-uwzoo:docs/sync-26.05-docs-with-main

Conversation

@kheiss-uwzoo

@kheiss-uwzoo kheiss-uwzoo commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Audit result: docs/docs/ on main and 26.05 differ in 13 extraction pages plus docs/mkdocs.yml nav/redirects. main is authoritative — it has the GA 26.05 release notes, updated support matrix (CUDA 13.0 / driver 580, Nemotron Parse extra), caption-scope FAQ, and open_clip troubleshooting.
  • This PR updates the 26.05 branch so docs/docs/ matches main exactly (git diff upstream/main -- docs/docs/ is empty on this branch).

Notable content restored on 26.05

  • releasenotes.md: Full GA 26.05 highlights (upgrade notes, pipeline, CLI, service, models, multimodal, RAG, VDB, evaluation, packaging, Helm, documentation) instead of RC1 install boilerplate
  • prerequisites-support-matrix.md: Current CUDA/driver requirements and Nemotron Parse dependency note
  • faq.md / troubleshoot.md: Caption scope FAQ and open_clip install guidance
  • custom-metadata.md: Restructured filtering doc from main
  • notebooks/index.md: Restored main nav path (with matching mkdocs.yml redirect)

Test plan

  • git diff upstream/main -- docs/docs/ is empty on this branch
  • MkDocs build on 26.05 succeeds with updated nav
  • Release notes page shows GA content, not RC1 install-only text

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners May 30, 2026 13:45
@kheiss-uwzoo kheiss-uwzoo requested review from jioffe502 and removed request for a team May 30, 2026 13:45
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label May 30, 2026
@greptile-apps

greptile-apps Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR syncs docs/docs/ on the 26.05 branch with main, replacing RC1 install boilerplate with GA release notes and carrying over content updates including updated CUDA/driver requirements (12.2/535 → 13.0/580), OCR NIM clarifications, Nemotron Parse dependency docs, a new chart-captioning FAQ, and open_clip troubleshooting. Most of the 14 files are clean, but two files — audio-video.md and custom-metadata.md — have defects introduced during the sync that will break the published documentation.

  • audio-video.md: The removal of an !!! important admonition left a critical GPU-pinning note 4-space-indented outside a list (renders as a code block), and the code fence restructuring inserted a stray ) that produces a Python SyntaxError in the copyable example; two near-duplicate segment_audio paragraphs also appeared.
  • custom-metadata.md: The new "On this page" TOC references 6 section anchors that don't exist in the document body; the ## How metadata is stored heading was renamed from "Related Content" without updating its content; and variable definitions (hostname, lancedb_uri, table_name) were removed but are still referenced in the ingestor code example, causing a NameError.

Confidence Score: 3/5

Not safe to merge as-is: two files have doc defects that will ship broken code examples and broken navigation to 26.05 users.

The majority of files in this sync are clean and accurate, but audio-video.md ships a Python SyntaxError in a copyable code block and hides a critical GPU-pinning deployment note as a code block. custom-metadata.md ships an ingestor snippet that throws NameError on first run and an On this page TOC with six dead anchor links. These are visible, immediately reproducible defects in the published documentation that will affect users following the 26.05 setup guides.

docs/docs/extraction/audio-video.md and docs/docs/extraction/custom-metadata.md both need fixes before merge; all other files look correct.

Important Files Changed

Filename Overview
docs/docs/extraction/audio-video.md Removal of !!! important admonition leaves GPU-pinning note as a code block; stray ) produces SyntaxError in code sample; two near-duplicate segment_audio paragraphs with conflicting API names.
docs/docs/extraction/custom-metadata.md New TOC references 6 non-existent section anchors; How metadata is stored heading renamed without updating content; variable definitions removed but still referenced in code example.
docs/docs/extraction/releasenotes.md RC1 install boilerplate fully replaced with GA 26.05 release notes.
docs/docs/extraction/prerequisites-support-matrix.md CUDA/driver requirements updated; OCR NIM corrected; Nemotron Parse extra documented; caption-scope note added.
docs/mkdocs.yml Nav and redirect updated for notebooks/index.md; exclude_docs pattern fixed.
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 6
docs/docs/extraction/audio-video.md:66-68
**GPU pinning note silently rendered as a code block**

After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.

### Issue 2 of 6
docs/docs/extraction/audio-video.md:88-91
**Stray `)` produces a `SyntaxError` in the code sample**

The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.

```suggestion
        )
    )
```
```

### Issue 3 of 6
docs/docs/extraction/audio-video.md:93-97
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**

Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.

### Issue 4 of 6
docs/docs/extraction/custom-metadata.md:40-42
**Undefined variables make the code example un-runnable**

The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.

```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"

ingestor = (
    create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
        .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```

### Issue 5 of 6
docs/docs/extraction/custom-metadata.md:5-14
**"On this page" TOC contains 6 broken anchor links**

The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

### Issue 6 of 6
docs/docs/extraction/custom-metadata.md:125-128
**Section heading "How metadata is stored" contains only cross-reference bullets**

The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.

Reviews (1): Last reviewed commit: "docs: sync 26.05 docs/docs with main" | Re-trigger Greptile

Comment thread docs/docs/extraction/audio-video.md
Comment thread docs/docs/extraction/audio-video.md
Comment thread docs/docs/extraction/audio-video.md
Comment thread docs/docs/extraction/custom-metadata.md
Comment on lines +5 to +14
## On this page { #on-this-page }

- [Attach metadata at ingestion](#attach-metadata-at-ingestion)
- [How metadata is stored](#how-metadata-is-stored)
- [Filter results at query time](#filter-results-at-query-time)
- [Writing `where` predicates](#writing-where-predicates)
- [Server-side vs client-side filters](#server-side-vs-client-side-filters)
- [Inspect hit metadata](#inspect-hit-metadata)
- [Limitations](#limitations)
- [Related content](#related-content)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 "On this page" TOC contains 6 broken anchor links

The table of contents added in this PR references #filter-results-at-query-time, #writing-where-predicates, #server-side-vs-client-side-filters, #inspect-hit-metadata, #limitations, and #related-content. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (## Best Practices, ## Use Custom Metadata to Filter Results During Retrieval, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 5-14

Comment:
**"On this page" TOC contains 6 broken anchor links**

The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread docs/docs/extraction/custom-metadata.md
@sosahi sosahi merged commit 3ce45d8 into NVIDIA:26.05 Jun 1, 2026
14 of 17 checks passed
Comment thread docs/docs/extraction/audio-video.md
Comment thread docs/docs/extraction/audio-video.md
Comment thread docs/docs/extraction/multimodal-extraction.md
Comment thread docs/docs/extraction/prerequisites-support-matrix.md
Comment thread docs/docs/extraction/prerequisites-support-matrix.md
kheiss-uwzoo added a commit to kheiss-uwzoo/nv-ingest that referenced this pull request Jun 2, 2026
PR NVIDIA#2194 merged into 26.05 on 2026-06-02 but never reached main. This
backport keeps main aligned with the release branch and the published
docs.nvidia.com site after Randy's follow-up review.

Timeline:
- Friday: 26.05 docs built for docs.nvidia upload; branch differed from
  NRL GitHub Pages source and the uploaded docs were incorrect.
- Saturday: diff main vs 26.05 produced PR NVIDIA#2179 to sync extraction docs.
- Monday: PR NVIDIA#2179 merged and docs uploaded to the public site.
- Follow-up: Randy opened PR NVIDIA#2194 on 26.05 with additional fixes found
  after the NVIDIA#2179 sync. Those fixes landed on 26.05 only.
- This commit: cherry-pick of c5b257e onto main (five extraction doc
  files only).

Changes from NVIDIA#2194:
- Fix audio-video.md indented code block rendering
- Restore custom-metadata example service variables and storage prose
- Move caption scope admonition to multimodal-extraction.md
- Trim redundant Helm/OCR deploy detail per review feedback
- Restore FAQ Docker Compose note and support-matrix section anchors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants