Overview
Build on the recently-shipped admin_jammy/software/url_check/ audit tool (PyAutoLabs/PyAutoLens#508) by wiring a live HTTP URL check into PyAutoBuild as a reusable CI job. After the URL cleanup landed, the audit shows 104 broken URLs remaining across the 12 PyAuto repos — mostly external paywalled / dead links and ~10 internal readthedocs page renames that need editorial fixes. We want those grandfathered into an allowlist so CI tells us when new broken URLs appear, not the existing ones.
Plan
- Add
autobuild/url_check_live.py to PyAutoBuild — a port of the existing tool with --allowlist, --strict, and --format markdown-issue modes, plus an autobuild/url_check_live.sh wrapper.
- Extend
autobuild/url_check.sh (existing regex guard) with ~15 additional forbidden patterns the audit surfaced — hhttps://, joshspeagle/[Nn]autilus, rhayes777/PyAutoBuild, sphinx-doc.org/en/main, bokeh + numfocus CoC paths, tree/release/, etc. PR-blocking, fast, offline.
- Each of 11 consumer repos gets
.url_check_allowlist.txt (auto-generated from the current report.json) and an extended .github/workflows/url_check.yml with a weekly cron job that runs the live audit and opens/updates a [url-check] N broken URLs detected tracking issue.
- Surface the status in the
pyauto-status skill — a new section showing each repo's broken-URL count from its open tracking issue.
Detailed implementation plan
Affected Repositories
- PyAutoBuild (primary — tool + regex guard)
- PyAutoConf, PyAutoFit, PyAutoArray, PyAutoGalaxy, PyAutoLens
- HowToFit, HowToGalaxy, HowToLens
- autofit_workspace, autogalaxy_workspace, autolens_workspace
- PyAutoPrompt (pyauto-status skill update)
Work Classification
Tooling / infrastructure. Doc/CI-only — no library code changes.
Branch Survey
| Repository |
Current Branch |
Dirty? |
| PyAutoBuild |
main |
clean |
| PyAutoConf |
main |
clean |
| PyAutoFit |
main |
clean |
| PyAutoArray |
main |
clean |
| PyAutoGalaxy |
main |
clean |
| PyAutoLens |
main |
clean |
| HowToFit |
main |
clean |
| HowToGalaxy |
main |
clean |
| HowToLens |
main |
3 (regenerated datasets, ignored) |
| autofit_workspace |
main |
clean |
| autogalaxy_workspace |
main |
28 (regenerated datasets, ignored) |
| autolens_workspace |
main |
17 (regenerated datasets, ignored) |
Suggested branch: feature/url-check-ci
Worktree root: ~/Code/PyAutoLabs-wt/url-check-ci/ (created by /start_library)
Conflict status: No conflicts — no other tasks claim any of these repos.
Implementation Steps
-
PyAutoBuild — autobuild/url_check_live.py:
- Direct port of
admin_jammy/software/url_check/url_check.py (now superseded; admin_jammy copy can be deleted in a follow-up cleanup, or left as a backup).
- New flags:
--allowlist <file> — load URLs to ignore from this file (one per line, # comments allowed)
--strict — exit 1 if any URL is broken AND not in the allowlist
--format markdown-issue — emit a Markdown body suitable for gh issue create --body-file -
- Fix the symlink/canonical bug — derive the scan root from
$GITHUB_WORKSPACE (or cwd) rather than Path(__file__).resolve().parents[N]. See [memory feedback_path_file_resolve_symlink].
-
PyAutoBuild — autobuild/url_check_live.sh:
- Wrapper:
bash url_check_live.sh <repo-dir> — finds .url_check_allowlist.txt at repo root, runs live audit in strict mode, emits issue body to stdout.
-
PyAutoBuild — extend autobuild/url_check.sh with the additional ~15 forbidden patterns surfaced by the url-check task (hhttps://, joshspeagle/[Nn]autilus, rhayes777/PyAutoBuild, tree/release/, sphinx-doc.org/en/main, bokeh/bokeh/blob/main/CODE_OF_CONDUCT.md, numfocus/numfocus/blob/main/manual/numfocus-coc.md, Fiterence_anti-harassment, etc.).
-
Per-repo (11 repos):
- Generate
.url_check_allowlist.txt at the repo root by filtering report.json to the URLs flagged broken in that specific repo. Group with # section comment headers (external dead links / internal readthedocs renames / etc.).
- Update
.github/workflows/url_check.yml:
- Keep the existing
url_check_patterns job (every PR + push). It now exercises the expanded regex set.
- Add a
url_check_live job (cron 0 4 * * 1 + workflow_dispatch). Steps: checkout repo + PyAutoBuild, run url_check_live.sh repo > issue_body.md, if non-empty and exit-code non-zero, gh issue create (or gh issue comment if an open [url-check] issue already exists).
- Add the workflow to PyAutoConf and PyAutoArray (currently missing).
-
PyAutoPrompt — pyauto-status skill:
- Add a new "URL Check Status" section between "Idle Repos" and the bottom of the dashboard.
- For each PyAuto repo, query
gh issue list --repo <repo> --search "[url-check]" --state open --json title,number,updatedAt and show: repo / count from title / last update.
- If the tracking issue is closed, show "✓ clean" or omit the repo.
Key Files
PyAutoBuild/autobuild/url_check_live.py — new
PyAutoBuild/autobuild/url_check_live.sh — new
PyAutoBuild/autobuild/url_check.sh — extended regex patterns
- Each consumer repo:
.url_check_allowlist.txt, .github/workflows/url_check.yml
PyAutoPrompt/skills/pyauto-status/pyauto-status.md — new section
Testing
- Run
url_check_live.py --strict against PyAutoLens with its allowlist on the worktree — expect exit 0 (all current broken URLs are allowlisted).
- Manually remove one URL from the allowlist and re-run — expect exit 1 and a Markdown body.
workflow_dispatch the cron job in PyAutoLens once shipped to verify the issue-create path works end-to-end.
Original Prompt
(Direct user request from the previous url-check session — no separate prompt file.)
Can we put a URL check in the CI of PyAutoBuild, I guess we need a list of acceptable failures which would be your 104 broken? … also make bad broken urls appear in the pyauto-status command
Overview
Build on the recently-shipped
admin_jammy/software/url_check/audit tool (PyAutoLabs/PyAutoLens#508) by wiring a live HTTP URL check into PyAutoBuild as a reusable CI job. After the URL cleanup landed, the audit shows 104 broken URLs remaining across the 12 PyAuto repos — mostly external paywalled / dead links and ~10 internal readthedocs page renames that need editorial fixes. We want those grandfathered into an allowlist so CI tells us when new broken URLs appear, not the existing ones.Plan
autobuild/url_check_live.pyto PyAutoBuild — a port of the existing tool with--allowlist,--strict, and--format markdown-issuemodes, plus anautobuild/url_check_live.shwrapper.autobuild/url_check.sh(existing regex guard) with ~15 additional forbidden patterns the audit surfaced —hhttps://,joshspeagle/[Nn]autilus,rhayes777/PyAutoBuild,sphinx-doc.org/en/main, bokeh + numfocus CoC paths,tree/release/, etc. PR-blocking, fast, offline..url_check_allowlist.txt(auto-generated from the currentreport.json) and an extended.github/workflows/url_check.ymlwith a weekly cron job that runs the live audit and opens/updates a[url-check] N broken URLs detectedtracking issue.pyauto-statusskill — a new section showing each repo's broken-URL count from its open tracking issue.Detailed implementation plan
Affected Repositories
Work Classification
Tooling / infrastructure. Doc/CI-only — no library code changes.
Branch Survey
Suggested branch:
feature/url-check-ciWorktree root:
~/Code/PyAutoLabs-wt/url-check-ci/(created by/start_library)Conflict status: No conflicts — no other tasks claim any of these repos.
Implementation Steps
PyAutoBuild —
autobuild/url_check_live.py:admin_jammy/software/url_check/url_check.py(now superseded; admin_jammy copy can be deleted in a follow-up cleanup, or left as a backup).--allowlist <file>— load URLs to ignore from this file (one per line,#comments allowed)--strict— exit 1 if any URL is broken AND not in the allowlist--format markdown-issue— emit a Markdown body suitable forgh issue create --body-file -$GITHUB_WORKSPACE(orcwd) rather thanPath(__file__).resolve().parents[N]. See [memory feedback_path_file_resolve_symlink].PyAutoBuild —
autobuild/url_check_live.sh:bash url_check_live.sh <repo-dir>— finds.url_check_allowlist.txtat repo root, runs live audit in strict mode, emits issue body to stdout.PyAutoBuild — extend
autobuild/url_check.shwith the additional ~15 forbidden patterns surfaced by the url-check task (hhttps://,joshspeagle/[Nn]autilus,rhayes777/PyAutoBuild,tree/release/,sphinx-doc.org/en/main,bokeh/bokeh/blob/main/CODE_OF_CONDUCT.md,numfocus/numfocus/blob/main/manual/numfocus-coc.md,Fiterence_anti-harassment, etc.).Per-repo (11 repos):
.url_check_allowlist.txtat the repo root by filteringreport.jsonto the URLs flagged broken in that specific repo. Group with# sectioncomment headers (external dead links / internal readthedocs renames / etc.)..github/workflows/url_check.yml:url_check_patternsjob (every PR + push). It now exercises the expanded regex set.url_check_livejob (cron0 4 * * 1+workflow_dispatch). Steps: checkout repo + PyAutoBuild, runurl_check_live.sh repo > issue_body.md, if non-empty and exit-code non-zero,gh issue create(orgh issue commentif an open[url-check]issue already exists).PyAutoPrompt —
pyauto-statusskill:gh issue list --repo <repo> --search "[url-check]" --state open --json title,number,updatedAtand show: repo / count from title / last update.Key Files
PyAutoBuild/autobuild/url_check_live.py— newPyAutoBuild/autobuild/url_check_live.sh— newPyAutoBuild/autobuild/url_check.sh— extended regex patterns.url_check_allowlist.txt,.github/workflows/url_check.ymlPyAutoPrompt/skills/pyauto-status/pyauto-status.md— new sectionTesting
url_check_live.py --strictagainst PyAutoLens with its allowlist on the worktree — expect exit 0 (all current broken URLs are allowlisted).workflow_dispatchthe cron job in PyAutoLens once shipped to verify the issue-create path works end-to-end.Original Prompt
(Direct user request from the previous url-check session — no separate prompt file.)