fix: prevent gh-pages repo bloat from doc preview artifacts#1309
fix: prevent gh-pages repo bloat from doc preview artifacts#1309kevalmorabia97 merged 3 commits intomainfrom
Conversation
…99503) - Pass `-d /tmp/doctrees` to sphinx-build so .doctrees/ cache is never written into build/html and never uploaded to gh-pages - Add `paths` filter to pull_request trigger so the docs workflow only runs on PRs touching docs/** or modelopt/** - Set `single-commit: true` on JamesIves deploy action so main-site pushes squash into one commit instead of accumulating forever - Deduplicate docs build: deploy-preview now downloads the artifact produced by build-docs instead of running a second sphinx-build - Set retention-days: 1 on the artifact since it is only needed for the duration of the workflow run Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
📝 WalkthroughWalkthroughCI docs workflow updated: added path-filtered PR detection and a changes job, guarded builds on closed PRs, reduced job timeouts, changed artifact upload/retention and deploy conditions (including single-commit Pages deploy). Sphinx doctree output redirected to Changes
Sequence Diagram(s)sequenceDiagram
participant PR as Pull Request
participant Actions as GitHub Actions
participant Paths as Paths Filter
participant Builder as build-docs job
participant Artifact as Artifact Storage
participant Preview as deploy-preview job
participant Pages as GitHub Pages
PR->>Actions: push / open / update
Actions->>Paths: run `dorny/paths-filter`
Paths-->>Actions: outputs.docs
Actions->>Builder: run build-docs (skip if action == "closed")
Builder->>Artifact: upload `docs-html` (retention-days:1)
Artifact-->>Preview: provide artifact when needed
Actions->>Preview: run deploy-preview (depends on build-docs + changes)
Preview->>Pages: deploy to gh-pages (single-commit:true)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/pages.yml (1)
54-59: Consider guarding artifact download against build failure.When
build-docsfails (not just skipped), no artifact is uploaded, but the download step will still attempt to run—producing a confusing "artifact not found" error instead of clearly indicating the build failure.Proposed improvement to check build outcome
- name: Download docs artifact - if: github.event.action != 'closed' + if: github.event.action != 'closed' && needs.build-docs.result == 'success' uses: actions/download-artifact@v4 with: name: docs-html path: docs/build/html🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pages.yml around lines 54 - 59, The "Download docs artifact" step tries to fetch "docs-html" even when the build-docs job failed, causing an "artifact not found" error; update that step's conditional to run only when the build-docs job succeeded (for example replace or extend its if with needs.build-docs.result == 'success'), so the Download docs artifact step runs only when the "build-docs" job produced and uploaded the "docs-html" artifact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/pages.yml:
- Around line 54-59: The "Download docs artifact" step tries to fetch
"docs-html" even when the build-docs job failed, causing an "artifact not found"
error; update that step's conditional to run only when the build-docs job
succeeded (for example replace or extend its if with needs.build-docs.result ==
'success'), so the Download docs artifact step runs only when the "build-docs"
job produced and uploaded the "docs-html" artifact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8022e0fb-c96b-43e1-a50a-7b0502ee2f13
📒 Files selected for processing (2)
.github/workflows/pages.ymlnoxfile.py
shengliangxu
left a comment
There was a problem hiding this comment.
LGTM and let's see if it solve the issue
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1309 +/- ##
==========================================
- Coverage 75.60% 74.90% -0.70%
==========================================
Files 462 464 +2
Lines 49960 50230 +270
==========================================
- Hits 37771 37624 -147
- Misses 12189 12606 +417
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Remove the workflow-level paths filter so build-docs always runs as a required CI check on every PR. Add a lightweight 'changes' job using dorny/paths-filter to detect whether docs-relevant files changed, and condition deploy-preview on that output so previews are still only deployed for PRs that touch docs/**, modelopt/**, or pages.yml. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pages.yml:
- Around line 50-54: The deploy-preview filter list currently omits noxfile.py
so changes to the Sphinx invocation (nox -s docs in noxfile.py) won't trigger
the preview; update the filters block to include 'noxfile.py' alongside
'docs/**', 'modelopt/**', and '.github/workflows/pages.yml' so edits to
noxfile.py (which modify docs/build/html output) will correctly gate
deploy-preview.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 2cecc1c4-05c4-4f76-99c2-8b551b4efc0a
📒 Files selected for processing (1)
.github/workflows/pages.yml
What does this PR do?
Type of change: Bug fix
Fixes gh-pages branch bloat that grew from ~26 MB to ~441 MB in four weeks (nvbug 6099503). Three compounding causes were identified and addressed:
.doctrees/cache published to gh-pages —sphinx-buildwas writing its build cache insidebuild/html/which was then uploaded verbatim. Accounts for ~3.3 GB uncompressed across history.JamesIves/github-pages-deploy-actionappending a commit on every push — main-site files accumulated forever withsingle-commit: false(default).synchronizeevent for all PRs —rossjrw/pr-preview-actionre-deployed the full site for every push to any PR regardless of whether docs changed (e.g. PR add: DFlash block diffusion speculative decoding #1128 triggered 64 preview deploys × ~11 MB each).Changes:
-d /tmp/doctreestosphinx-buildso.doctrees/is never written intobuild/html/paths: [docs/**, modelopt/**]filter topull_requesttrigger so the docs workflow only runs on PRs that touch docs or source codesingle-commit: trueon the deploy action so main-site pushes squash into one commitdeploy-previewnow downloads the artifact frombuild-docsinstead of running a secondsphinx-buildretention-days: 1on the artifact since it is only needed for the duration of the workflow runThe one-time cleanup (force-push squashed orphan to gh-pages) was already applied separately — repo is now ~59 MB for a full clone vs ~441 MB before.
Usage
N/A — CI/workflow change only.
Testing
git rev-list --objects --disk-usage origin/gh-pagesnow reports ~28 MB; full clone is ~59 MB.Before your PR is "Ready for review"
CONTRIBUTING.md: N/AAdditional Information
nvbug 6099503
Summary by CodeRabbit