Skip to content

feat: add live URL audit to CI (weekly cron + per-repo allowlist + tracking issue) #87

@Jammy2211

Description

@Jammy2211

Overview

Build on the recently-shipped admin_jammy/software/url_check/ audit tool (PyAutoLabs/PyAutoLens#508) by wiring a live HTTP URL check into PyAutoBuild as a reusable CI job. After the URL cleanup landed, the audit shows 104 broken URLs remaining across the 12 PyAuto repos — mostly external paywalled / dead links and ~10 internal readthedocs page renames that need editorial fixes. We want those grandfathered into an allowlist so CI tells us when new broken URLs appear, not the existing ones.

Plan

  • Add autobuild/url_check_live.py to PyAutoBuild — a port of the existing tool with --allowlist, --strict, and --format markdown-issue modes, plus an autobuild/url_check_live.sh wrapper.
  • Extend autobuild/url_check.sh (existing regex guard) with ~15 additional forbidden patterns the audit surfaced — hhttps://, joshspeagle/[Nn]autilus, rhayes777/PyAutoBuild, sphinx-doc.org/en/main, bokeh + numfocus CoC paths, tree/release/, etc. PR-blocking, fast, offline.
  • Each of 11 consumer repos gets .url_check_allowlist.txt (auto-generated from the current report.json) and an extended .github/workflows/url_check.yml with a weekly cron job that runs the live audit and opens/updates a [url-check] N broken URLs detected tracking issue.
  • Surface the status in the pyauto-status skill — a new section showing each repo's broken-URL count from its open tracking issue.
Detailed implementation plan

Affected Repositories

  • PyAutoBuild (primary — tool + regex guard)
  • PyAutoConf, PyAutoFit, PyAutoArray, PyAutoGalaxy, PyAutoLens
  • HowToFit, HowToGalaxy, HowToLens
  • autofit_workspace, autogalaxy_workspace, autolens_workspace
  • PyAutoPrompt (pyauto-status skill update)

Work Classification

Tooling / infrastructure. Doc/CI-only — no library code changes.

Branch Survey

Repository Current Branch Dirty?
PyAutoBuild main clean
PyAutoConf main clean
PyAutoFit main clean
PyAutoArray main clean
PyAutoGalaxy main clean
PyAutoLens main clean
HowToFit main clean
HowToGalaxy main clean
HowToLens main 3 (regenerated datasets, ignored)
autofit_workspace main clean
autogalaxy_workspace main 28 (regenerated datasets, ignored)
autolens_workspace main 17 (regenerated datasets, ignored)

Suggested branch: feature/url-check-ci
Worktree root: ~/Code/PyAutoLabs-wt/url-check-ci/ (created by /start_library)
Conflict status: No conflicts — no other tasks claim any of these repos.

Implementation Steps

  1. PyAutoBuild — autobuild/url_check_live.py:

    • Direct port of admin_jammy/software/url_check/url_check.py (now superseded; admin_jammy copy can be deleted in a follow-up cleanup, or left as a backup).
    • New flags:
      • --allowlist <file> — load URLs to ignore from this file (one per line, # comments allowed)
      • --strict — exit 1 if any URL is broken AND not in the allowlist
      • --format markdown-issue — emit a Markdown body suitable for gh issue create --body-file -
    • Fix the symlink/canonical bug — derive the scan root from $GITHUB_WORKSPACE (or cwd) rather than Path(__file__).resolve().parents[N]. See [memory feedback_path_file_resolve_symlink].
  2. PyAutoBuild — autobuild/url_check_live.sh:

    • Wrapper: bash url_check_live.sh <repo-dir> — finds .url_check_allowlist.txt at repo root, runs live audit in strict mode, emits issue body to stdout.
  3. PyAutoBuild — extend autobuild/url_check.sh with the additional ~15 forbidden patterns surfaced by the url-check task (hhttps://, joshspeagle/[Nn]autilus, rhayes777/PyAutoBuild, tree/release/, sphinx-doc.org/en/main, bokeh/bokeh/blob/main/CODE_OF_CONDUCT.md, numfocus/numfocus/blob/main/manual/numfocus-coc.md, Fiterence_anti-harassment, etc.).

  4. Per-repo (11 repos):

    • Generate .url_check_allowlist.txt at the repo root by filtering report.json to the URLs flagged broken in that specific repo. Group with # section comment headers (external dead links / internal readthedocs renames / etc.).
    • Update .github/workflows/url_check.yml:
      • Keep the existing url_check_patterns job (every PR + push). It now exercises the expanded regex set.
      • Add a url_check_live job (cron 0 4 * * 1 + workflow_dispatch). Steps: checkout repo + PyAutoBuild, run url_check_live.sh repo > issue_body.md, if non-empty and exit-code non-zero, gh issue create (or gh issue comment if an open [url-check] issue already exists).
    • Add the workflow to PyAutoConf and PyAutoArray (currently missing).
  5. PyAutoPrompt — pyauto-status skill:

    • Add a new "URL Check Status" section between "Idle Repos" and the bottom of the dashboard.
    • For each PyAuto repo, query gh issue list --repo <repo> --search "[url-check]" --state open --json title,number,updatedAt and show: repo / count from title / last update.
    • If the tracking issue is closed, show "✓ clean" or omit the repo.

Key Files

  • PyAutoBuild/autobuild/url_check_live.py — new
  • PyAutoBuild/autobuild/url_check_live.sh — new
  • PyAutoBuild/autobuild/url_check.sh — extended regex patterns
  • Each consumer repo: .url_check_allowlist.txt, .github/workflows/url_check.yml
  • PyAutoPrompt/skills/pyauto-status/pyauto-status.md — new section

Testing

  • Run url_check_live.py --strict against PyAutoLens with its allowlist on the worktree — expect exit 0 (all current broken URLs are allowlisted).
  • Manually remove one URL from the allowlist and re-run — expect exit 1 and a Markdown body.
  • workflow_dispatch the cron job in PyAutoLens once shipped to verify the issue-create path works end-to-end.

Original Prompt

(Direct user request from the previous url-check session — no separate prompt file.)

Can we put a URL check in the CI of PyAutoBuild, I guess we need a list of acceptable failures which would be your 104 broken? … also make bad broken urls appear in the pyauto-status command

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions