Skip to content

[build] Add update_cddl script to refresh pinned w3c/webref CDDL files#17743

Open
titusfortner wants to merge 3 commits into
trunkfrom
cddl-update
Open

[build] Add update_cddl script to refresh pinned w3c/webref CDDL files#17743
titusfortner wants to merge 3 commits into
trunkfrom
cddl-update

Conversation

@titusfortner

@titusfortner titusfortner commented Jul 2, 2026

Copy link
Copy Markdown
Member

🔗 Related Issues

💥 What does this PR do?

  • Adds tooling to refresh the pinned w3c/webref CDDL spec files, and narrows the pin to what's usable. Currently, everything must be updated manually.
  • //scripts:update_cddl resolves the latest webref main commit and file hashes for common/webref_cddl.bzl plus the extension's use_repo(...) list in MODULE.bazel if a new spec is added to the source directory.
  • Wired into the release_updates rake task and the pre-release GitHub workflow alongside the other update_* tasks.

🔧 Implementation Notes

  • Only pinning the -all (union) file of each protocol — dropping the -local/-remote splits, reducing the pin from 17 files to 6.

🤖 AI assistance

  • AI assisted (complete below)
    • Tool(s): Claude Code
    • What was generated: scripts/update_cddl.py and the Rakefile/workflow/BUILD wiring
    • I reviewed all AI output and can explain the change

💡 Additional Considerations

The pin is still a superset of what's consumed: the py/ and javascript/ BUILD.bazel merge lists use 5 of the 6 (webdriver_bidi, permissions, prefetch, ua_client_hints, web_bluetooth). at_driver_all is pinned but unused — a separate protocol (AT Driver), we're pinning it but not merging it for use in our schema generation.

Each scripts/*.py update tool spins up its own bare urllib3.PoolManager() (pinned_browsers, update_cdp, selenium_manager, update_docfx, and this one). Worth considering a shared configured http_client library so timeout/retry/auth policy lives in one place instead of being repeated per script.

🔄 Types of changes

  • New feature (non-breaking change which adds functionality and tests!)

@selenium-ci selenium-ci added the B-build Includes scripting, bazel and CI integrations label Jul 2, 2026
@qodo-code-review

Copy link
Copy Markdown
Contributor

PR Summary by Qodo

Add Bazel/Rake tooling to refresh pinned w3c/webref CDDL specs

✨ Enhancement ⚙️ Configuration changes 🕐 20-40 Minutes

Grey Divider

AI Description

• Add a Bazel-driven script to repin w3c/webref CDDL specs at the latest commit.
• Reduce the pinned set to protocol "-all" union CDDL files (drop local/remote splits).
• Wire CDDL updates into release automation (Rake release_updates + pre-release workflow).
Diagram

graph TD
  A["CI / Maintainer"] --> B["./go update_cddl"] --> C["Rake: update_cddl"] --> D["Bazel: //scripts:update_cddl"] --> E["update_cddl.py"] --> F{{"GitHub API + raw webref"}} --> G["webref_cddl.bzl"] --> H["MODULE.bazel"]
  H --> I["Bazel module ext"] --> J["Pinned spec repos"]

  subgraph Legend
    direction LR
    _actor["Invoker"] ~~~ _proc["Tooling step"] ~~~ _file["Repo file"] ~~~ _ext{{"External source"}}
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Bazel repository_rule to auto-track webref main
  • ➕ Fully Bazel-native (no regex rewriting of repo files).
  • ➕ Could integrate update checks into Bazel fetch phase.
  • ➖ Reduces auditability/reproducibility if it implicitly follows a moving branch.
  • ➖ Harder to coordinate with release automation that expects explicit commits/patches.
2. Pin a webref tarball/archive instead of per-file http_file
  • ➕ Single download + single sha256 to manage.
  • ➕ Avoids per-spec hashing and repeated network calls.
  • ➖ Still need logic to extract/locate specific CDDL files.
  • ➖ Bigger artifacts and less granular diffs when only one spec changes.
3. Treat CDDL specs as vendored sources in-repo
  • ➕ Eliminates external fetch variability during builds.
  • ➕ Simplifies Bazel wiring (no module extension or http_file list).
  • ➖ Bloats repo and increases churn on spec updates.
  • ➖ Harder to attribute changes to upstream webref commits cleanly.

Recommendation: The PR’s approach (explicit commit pin + explicit per-file hashes, with an update tool) is the best fit for reproducible builds and reviewable upstream drift. The main improvement to consider later would be switching from regex-based rewriting to a small structured parser for the MODULE.bazel/use_repo block, but the current implementation is reasonable given the narrow, well-defined patterns being updated.

Files changed (6) +186 / -26

Enhancement (2) +163 / -0
RakefileAdd update_cddl task and include it in release_updates +7/-0

Add update_cddl task and include it in release_updates

• Introduces a new update_cddl Rake task that runs Bazel //scripts:update_cddl. Wires it into the release_updates flow so CDDL pins refresh during release preparation.

Rakefile

update_cddl.pyAdd script to repin webref CDDL commit, hashes, and module repos +156/-0

Add script to repin webref CDDL commit, hashes, and module repos

• Implements a tool that resolves a target webref commit (branch tip or explicit commit), lists available -all CDDL files, downloads them to compute sha256, and rewrites common/webref_cddl.bzl and the MODULE.bazel use_repo block to match. Prints added/removed repo names to make upstream file set drift visible.

scripts/update_cddl.py

Other (4) +23 / -26
pre-release.ymlRun CDDL pin refresh during pre-release updates +5/-0

Run CDDL pin refresh during pre-release updates

• Adds a new "CDDL Spec Files" task to the release-updates matrix to run ./go update_cddl. Ensures the generated patch is applied/committed and reported in the PR summary table.

.github/workflows/pre-release.yml

MODULE.bazelNarrow webref CDDL repos to union (-all) specs only +0/-11

Narrow webref CDDL repos to union (-all) specs only

• Updates the webref_cddl_extension use_repo(...) list to drop -local/-remote CDDL repos. Keeps only the pinned -all union CDDL specs that generation consumes.

MODULE.bazel

webref_cddl.bzlPin only -all CDDL files and document regeneration workflow +10/-15

Pin only -all CDDL files and document regeneration workflow

• Adds documentation clarifying the pinning strategy (single webref main commit) and that pins are regenerated via //scripts:update_cddl. Reduces the pinned file list from local/remote/all splits down to only -all union files (and updates the commit/hash list accordingly).

common/webref_cddl.bzl

BUILD.bazelExpose update_cddl as a Bazel py_binary +8/-0

Expose update_cddl as a Bazel py_binary

• Adds a new py_binary target (//scripts:update_cddl) with urllib3 dependency so the update tool can be run via Bazel and from Rake/CI.

scripts/BUILD.bazel

@qodo-code-review

qodo-code-review Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (1) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 11 rules

Grey Divider


Action required

1. update_cddl.py missing tests 📘 Rule violation ▣ Testability
Description
New functionality was added to update pinned webref CDDL inputs (including rewriting
common/webref_cddl.bzl and MODULE.bazel), but this PR adds no corresponding automated tests to
prevent regressions. Breakage here can silently impact release/update workflows and downstream Bazel
fetch reproducibility.
Code

scripts/update_cddl.py[R1-156]

+#!/usr/bin/env python
+"""Update the pinned CDDL spec files downloaded from w3c/webref.
+
+The WebDriver BiDi (and related) CDDL grammars are not published as an npm
+package; they are extracted from the edited specs and committed to the
+``ed/cddl`` directory of https://github.com/w3c/webref . We pin a single
+webref commit plus a sha256 for the ``-all`` (union) CDDL file of each
+protocol in that directory (see ``common/webref_cddl.bzl``) so the Bazel
+build fetches them reproducibly. The per-end ``-local``/``-remote`` splits are
+skipped: generation merges the union, so only the ``-all`` files are consumed.
+
+This script repoints that pin at the tip of webref's "main" branch (the
+continuous reffy extraction; the "curated" branch is a separate published
+lineage with no shared history, so main keeps pin-to-pin diffs auditable),
+refreshes every hash, and picks up files that upstream has added or removed.
+Because the file set drifts over time, it regenerates both:
+
+  - the ``_COMMIT`` and ``_CDDL_FILES`` entries in ``common/webref_cddl.bzl``
+  - the matching ``use_repo(...)`` list for the extension in ``MODULE.bazel``
+
+-----------------------------------------------------------------------------
+usage: update_cddl.py [-h] [--commit COMMIT] [--branch BRANCH]
+
+options:
+  -h, --help       show this help message and exit
+  --commit COMMIT  pin this exact webref commit instead of the branch tip
+  --branch BRANCH  webref branch to resolve when --commit is omitted (default: main)
+-----------------------------------------------------------------------------
+"""
+
+import argparse
+import hashlib
+import json
+import os
+import re
+from pathlib import Path
+
+import urllib3
+
+http = urllib3.PoolManager()
+root_dir = Path(os.path.realpath(__file__)).parent.parent
+
+REPO = "w3c/webref"
+CDDL_PATH = "ed/cddl"
+API_HEADERS = {"Accept": "application/vnd.github+json", "User-Agent": "selenium-update-cddl"}
+
+BZL_FILE = root_dir / "common" / "webref_cddl.bzl"
+MODULE_FILE = root_dir / "MODULE.bazel"
+
+
+def resolve_commit(branch):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/commits/{branch}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to resolve {REPO}@{branch}: HTTP {r.status}")
+    return json.loads(r.data)["sha"]
+
+
+def list_cddl_files(commit):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/contents/{CDDL_PATH}?ref={commit}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to list {CDDL_PATH} at {commit}: HTTP {r.status}")
+    entries = json.loads(r.data)
+    # Only the "-all" union of each protocol is consumed; the local/remote splits
+    # feed nothing (BiDi generation merges the union), so they are not pinned.
+    return sorted(e["name"] for e in entries if e["type"] == "file" and e["name"].endswith("-all.cddl"))
+
+
+def repo_name(filename):
+    """Derive the Bazel repo name from a CDDL filename.
+
+    ``at-driver-all.cddl`` -> ``at_driver_all_cddl``
+    """
+    return filename[: -len(".cddl")].replace("-", "_") + "_cddl"
+
+
+def sha256_of(commit, filename):
+    url = f"https://raw.githubusercontent.com/{REPO}/{commit}/{CDDL_PATH}/{filename}"
+    r = http.request("GET", url)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to download {url}: HTTP {r.status}")
+    return hashlib.sha256(r.data).hexdigest()
+
+
+def build_entries(commit, filenames):
+    return [(repo_name(name), name, sha256_of(commit, name)) for name in filenames]
+
+
+def existing_repo_names(content):
+    return set(re.findall(r'\(\s*"([a-z0-9_]+)"\s*,\s*"[^"]+\.cddl"', content))
+
+
+def render_cddl_files(entries):
+    lines = ["_CDDL_FILES = ["]
+    for name, filename, sha256 in entries:
+        lines.append(f'    ("{name}", "{filename}", "{sha256}"),')
+    lines.append("]")
+    return "\n".join(lines)
+
+
+def update_pin(commit, entries):
+    content = BZL_FILE.read_text()
+    before = existing_repo_names(content)
+
+    content = re.sub(r'_COMMIT = "[0-9a-f]+"', f'_COMMIT = "{commit}"', content)
+    content = re.sub(r"_CDDL_FILES = \[.*?\n\]", render_cddl_files(entries), content, flags=re.S)
+
+    BZL_FILE.write_text(content)
+
+    after = {name for name, _, _ in entries}
+    return sorted(after - before), sorted(before - after)
+
+
+def update_module(entries):
+    content = MODULE_FILE.read_text()
+    repo_list = "\n".join(f'    "{name}",' for name, _, _ in sorted(entries))
+    new_block = f"use_repo(\n    webref_cddl_extension,\n{repo_list}\n)"
+    content, count = re.subn(
+        r"use_repo\(\n    webref_cddl_extension,\n.*?\n\)",
+        new_block,
+        content,
+        flags=re.S,
+    )
+    if count != 1:
+        raise RuntimeError(f"Expected exactly one webref_cddl_extension use_repo block, found {count}")
+    MODULE_FILE.write_text(content)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--commit", help="pin this exact webref commit instead of the branch tip")
+    parser.add_argument(
+        "--branch",
+        default="main",
+        help="webref branch to resolve when --commit is omitted (default: main)",
+    )
+    args = parser.parse_args()
+
+    commit = args.commit or resolve_commit(args.branch)
+    print(f"Pinning {REPO}@{commit}")
+
+    filenames = list_cddl_files(commit)
+    print(f"Found {len(filenames)} CDDL files in {CDDL_PATH}")
+
+    entries = build_entries(commit, filenames)
+    added, removed = update_pin(commit, entries)
+    update_module(entries)
+
+    for name in added:
+        print(f"  added: {name}")
+    for name in removed:
+        print(f"  removed: {name}")
+    print(f"Updated {BZL_FILE.relative_to(root_dir)} and {MODULE_FILE.relative_to(root_dir)}")
+
+
+if __name__ == "__main__":
+    main()
Evidence
PR Compliance ID 389273 requires tests for new functionality. The PR adds a new py_binary target
(update_cddl) and a new implementation file (scripts/update_cddl.py) that updates pinned inputs,
but no tests were added/updated alongside this new behavior in the change set.

Rule 389273: Require tests for all new functionality and bug fixes
scripts/update_cddl.py[1-156]
scripts/BUILD.bazel[31-37]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
This PR introduces new release/update tooling (`//scripts:update_cddl`) without any automated test coverage, violating the requirement that new functionality includes tests.

## Issue Context
The new script performs non-trivial logic (deriving repo names, rendering `_CDDL_FILES`, and rewriting `common/webref_cddl.bzl` + `MODULE.bazel`). A regression could break release automation or pin incorrect hashes without being caught.

## Fix Focus Areas
- scripts/update_cddl.py[1-156]
- scripts/BUILD.bazel[31-37]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. No auth/timeout on API ✗ Dismissed 🐞 Bug ☼ Reliability
Description
scripts/update_cddl.py performs GitHub API requests without any timeout and without using an
available CI GITHUB_TOKEN, increasing the chance of hangs or transient 403/rate-limit failures
during automated release updates. This makes the new pre-release “CDDL Spec Files” job less reliable
than necessary.
Code

scripts/update_cddl.py[R40-65]

+http = urllib3.PoolManager()
+root_dir = Path(os.path.realpath(__file__)).parent.parent
+
+REPO = "w3c/webref"
+CDDL_PATH = "ed/cddl"
+API_HEADERS = {"Accept": "application/vnd.github+json", "User-Agent": "selenium-update-cddl"}
+
+BZL_FILE = root_dir / "common" / "webref_cddl.bzl"
+MODULE_FILE = root_dir / "MODULE.bazel"
+
+
+def resolve_commit(branch):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/commits/{branch}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to resolve {REPO}@{branch}: HTTP {r.status}")
+    return json.loads(r.data)["sha"]
+
+
+def list_cddl_files(commit):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/contents/{CDDL_PATH}?ref={commit}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to list {CDDL_PATH} at {commit}: HTTP {r.status}")
+    entries = json.loads(r.data)
+    # Only the "-all" union of each protocol is consumed; the local/remote splits
+    # feed nothing (BiDi generation merges the union), so they are not pinned.
+    return sorted(e["name"] for e in entries if e["type"] == "file" and e["name"].endswith("-all.cddl"))
Evidence
The updater makes GitHub API calls via a default PoolManager with no explicit timeout and only sets
Accept/User-Agent headers, while the CI runner environment already provides a GitHub token that
could be used to avoid rate limits and improve reliability.

scripts/update_cddl.py[40-65]
.github/workflows/bazel.yml[89-96]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update_cddl.py` uses `urllib3.PoolManager()` with default timeout/retry behavior and makes unauthenticated requests to `api.github.com`. This can hang indefinitely on network stalls and is more likely to fail under GitHub API throttling.

## Issue Context
The GitHub Actions Bazel workflow exports `GITHUB_TOKEN` into the environment, but the updater never uses it.

## Fix Focus Areas
- scripts/update_cddl.py[40-65]
- .github/workflows/bazel.yml[89-96]

## Implementation notes
- Read `GITHUB_TOKEN` (and/or `GH_TOKEN`) from `os.environ` and, when present, add `Authorization: Bearer <token>` to `API_HEADERS`.
- Configure a default timeout and retries, e.g.:
 - `timeout=urllib3.Timeout(connect=5.0, read=30.0, total=60.0)`
 - `retries=urllib3.Retry(total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503, 504])`
- When raising on non-200 responses, include the response body (or JSON `message`) to make failures diagnosable in CI logs.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Unchecked regex rewrites ✓ Resolved 🐞 Bug ≡ Correctness
Description
update_pin() rewrites common/webref_cddl.bzl with re.sub() but never verifies that the _COMMIT and
_CDDL_FILES patterns matched, so the script can report success while leaving stale pins if the file
format changes. This can silently break the intended “refresh pinned CDDL” behavior in release
automation.
Code

scripts/update_cddl.py[R100-110]

+def update_pin(commit, entries):
+    content = BZL_FILE.read_text()
+    before = existing_repo_names(content)
+
+    content = re.sub(r'_COMMIT = "[0-9a-f]+"', f'_COMMIT = "{commit}"', content)
+    content = re.sub(r"_CDDL_FILES = \[.*?\n\]", render_cddl_files(entries), content, flags=re.S)
+
+    BZL_FILE.write_text(content)
+
+    after = {name for name, _, _ in entries}
+    return sorted(after - before), sorted(before - after)
Evidence
The pin update function uses re.sub without checking replacement counts, while the module update
function does count-checking; this inconsistency means pin updates can fail silently whereas module
updates fail loudly.

scripts/update_cddl.py[100-111]
scripts/update_cddl.py[113-125]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update_pin()` uses `re.sub()` for `_COMMIT` and `_CDDL_FILES` updates but does not check whether replacements occurred. If `common/webref_cddl.bzl` is reformatted (even slightly), the updater can become a no-op without failing.

## Issue Context
`update_module()` already uses `re.subn()` and validates `count == 1`, showing the intended robustness pattern.

## Fix Focus Areas
- scripts/update_cddl.py[100-111]
- scripts/update_cddl.py[113-125]

## Implementation notes
- Replace the two `re.sub(...)` calls with `re.subn(...)` and assert `count == 1` for each substitution.
- If either count is not 1, raise a `RuntimeError` that includes which pattern failed (and ideally the file path) so CI failures are actionable.
- (Optional hardening) Make the regexes whitespace-tolerant (e.g., `\s*`) and/or add explicit sentinel comments in `common/webref_cddl.bzl` to anchor the generated block.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 6ae54e5

Results up to commit 5912632


🐞 Bugs (0) 📘 Rule violations (1) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)


Action required
1. update_cddl.py missing tests 📘 Rule violation ▣ Testability
Description
New functionality was added to update pinned webref CDDL inputs (including rewriting
common/webref_cddl.bzl and MODULE.bazel), but this PR adds no corresponding automated tests to
prevent regressions. Breakage here can silently impact release/update workflows and downstream Bazel
fetch reproducibility.
Code

scripts/update_cddl.py[R1-156]

+#!/usr/bin/env python
+"""Update the pinned CDDL spec files downloaded from w3c/webref.
+
+The WebDriver BiDi (and related) CDDL grammars are not published as an npm
+package; they are extracted from the edited specs and committed to the
+``ed/cddl`` directory of https://github.com/w3c/webref . We pin a single
+webref commit plus a sha256 for the ``-all`` (union) CDDL file of each
+protocol in that directory (see ``common/webref_cddl.bzl``) so the Bazel
+build fetches them reproducibly. The per-end ``-local``/``-remote`` splits are
+skipped: generation merges the union, so only the ``-all`` files are consumed.
+
+This script repoints that pin at the tip of webref's "main" branch (the
+continuous reffy extraction; the "curated" branch is a separate published
+lineage with no shared history, so main keeps pin-to-pin diffs auditable),
+refreshes every hash, and picks up files that upstream has added or removed.
+Because the file set drifts over time, it regenerates both:
+
+  - the ``_COMMIT`` and ``_CDDL_FILES`` entries in ``common/webref_cddl.bzl``
+  - the matching ``use_repo(...)`` list for the extension in ``MODULE.bazel``
+
+-----------------------------------------------------------------------------
+usage: update_cddl.py [-h] [--commit COMMIT] [--branch BRANCH]
+
+options:
+  -h, --help       show this help message and exit
+  --commit COMMIT  pin this exact webref commit instead of the branch tip
+  --branch BRANCH  webref branch to resolve when --commit is omitted (default: main)
+-----------------------------------------------------------------------------
+"""
+
+import argparse
+import hashlib
+import json
+import os
+import re
+from pathlib import Path
+
+import urllib3
+
+http = urllib3.PoolManager()
+root_dir = Path(os.path.realpath(__file__)).parent.parent
+
+REPO = "w3c/webref"
+CDDL_PATH = "ed/cddl"
+API_HEADERS = {"Accept": "application/vnd.github+json", "User-Agent": "selenium-update-cddl"}
+
+BZL_FILE = root_dir / "common" / "webref_cddl.bzl"
+MODULE_FILE = root_dir / "MODULE.bazel"
+
+
+def resolve_commit(branch):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/commits/{branch}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to resolve {REPO}@{branch}: HTTP {r.status}")
+    return json.loads(r.data)["sha"]
+
+
+def list_cddl_files(commit):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/contents/{CDDL_PATH}?ref={commit}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to list {CDDL_PATH} at {commit}: HTTP {r.status}")
+    entries = json.loads(r.data)
+    # Only the "-all" union of each protocol is consumed; the local/remote splits
+    # feed nothing (BiDi generation merges the union), so they are not pinned.
+    return sorted(e["name"] for e in entries if e["type"] == "file" and e["name"].endswith("-all.cddl"))
+
+
+def repo_name(filename):
+    """Derive the Bazel repo name from a CDDL filename.
+
+    ``at-driver-all.cddl`` -> ``at_driver_all_cddl``
+    """
+    return filename[: -len(".cddl")].replace("-", "_") + "_cddl"
+
+
+def sha256_of(commit, filename):
+    url = f"https://raw.githubusercontent.com/{REPO}/{commit}/{CDDL_PATH}/{filename}"
+    r = http.request("GET", url)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to download {url}: HTTP {r.status}")
+    return hashlib.sha256(r.data).hexdigest()
+
+
+def build_entries(commit, filenames):
+    return [(repo_name(name), name, sha256_of(commit, name)) for name in filenames]
+
+
+def existing_repo_names(content):
+    return set(re.findall(r'\(\s*"([a-z0-9_]+)"\s*,\s*"[^"]+\.cddl"', content))
+
+
+def render_cddl_files(entries):
+    lines = ["_CDDL_FILES = ["]
+    for name, filename, sha256 in entries:
+        lines.append(f'    ("{name}", "{filename}", "{sha256}"),')
+    lines.append("]")
+    return "\n".join(lines)
+
+
+def update_pin(commit, entries):
+    content = BZL_FILE.read_text()
+    before = existing_repo_names(content)
+
+    content = re.sub(r'_COMMIT = "[0-9a-f]+"', f'_COMMIT = "{commit}"', content)
+    content = re.sub(r"_CDDL_FILES = \[.*?\n\]", render_cddl_files(entries), content, flags=re.S)
+
+    BZL_FILE.write_text(content)
+
+    after = {name for name, _, _ in entries}
+    return sorted(after - before), sorted(before - after)
+
+
+def update_module(entries):
+    content = MODULE_FILE.read_text()
+    repo_list = "\n".join(f'    "{name}",' for name, _, _ in sorted(entries))
+    new_block = f"use_repo(\n    webref_cddl_extension,\n{repo_list}\n)"
+    content, count = re.subn(
+        r"use_repo\(\n    webref_cddl_extension,\n.*?\n\)",
+        new_block,
+        content,
+        flags=re.S,
+    )
+    if count != 1:
+        raise RuntimeError(f"Expected exactly one webref_cddl_extension use_repo block, found {count}")
+    MODULE_FILE.write_text(content)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--commit", help="pin this exact webref commit instead of the branch tip")
+    parser.add_argument(
+        "--branch",
+        default="main",
+        help="webref branch to resolve when --commit is omitted (default: main)",
+    )
+    args = parser.parse_args()
+
+    commit = args.commit or resolve_commit(args.branch)
+    print(f"Pinning {REPO}@{commit}")
+
+    filenames = list_cddl_files(commit)
+    print(f"Found {len(filenames)} CDDL files in {CDDL_PATH}")
+
+    entries = build_entries(commit, filenames)
+    added, removed = update_pin(commit, entries)
+    update_module(entries)
+
+    for name in added:
+        print(f"  added: {name}")
+    for name in removed:
+        print(f"  removed: {name}")
+    print(f"Updated {BZL_FILE.relative_to(root_dir)} and {MODULE_FILE.relative_to(root_dir)}")
+
+
+if __name__ == "__main__":
+    main()
Evidence
PR Compliance ID 389273 requires tests for new functionality. The PR adds a new py_binary target
(update_cddl) and a new implementation file (scripts/update_cddl.py) that updates pinned inputs,
but no tests were added/updated alongside this new behavior in the change set.

Rule 389273: Require tests for all new functionality and bug fixes
scripts/update_cddl.py[1-156]
scripts/BUILD.bazel[31-37]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
This PR introduces new release/update tooling (`//scripts:update_cddl`) without any automated test coverage, violating the requirement that new functionality includes tests.

## Issue Context
The new script performs non-trivial logic (deriving repo names, rendering `_CDDL_FILES`, and rewriting `common/webref_cddl.bzl` + `MODULE.bazel`). A regression could break release automation or pin incorrect hashes without being caught.

## Fix Focus Areas
- scripts/update_cddl.py[1-156]
- scripts/BUILD.bazel[31-37]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. No auth/timeout on API ✗ Dismissed 🐞 Bug ☼ Reliability
Description
scripts/update_cddl.py performs GitHub API requests without any timeout and without using an
available CI GITHUB_TOKEN, increasing the chance of hangs or transient 403/rate-limit failures
during automated release updates. This makes the new pre-release “CDDL Spec Files” job less reliable
than necessary.
Code

scripts/update_cddl.py[R40-65]

+http = urllib3.PoolManager()
+root_dir = Path(os.path.realpath(__file__)).parent.parent
+
+REPO = "w3c/webref"
+CDDL_PATH = "ed/cddl"
+API_HEADERS = {"Accept": "application/vnd.github+json", "User-Agent": "selenium-update-cddl"}
+
+BZL_FILE = root_dir / "common" / "webref_cddl.bzl"
+MODULE_FILE = root_dir / "MODULE.bazel"
+
+
+def resolve_commit(branch):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/commits/{branch}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to resolve {REPO}@{branch}: HTTP {r.status}")
+    return json.loads(r.data)["sha"]
+
+
+def list_cddl_files(commit):
+    r = http.request("GET", f"https://api.github.com/repos/{REPO}/contents/{CDDL_PATH}?ref={commit}", headers=API_HEADERS)
+    if r.status != 200:
+        raise RuntimeError(f"Failed to list {CDDL_PATH} at {commit}: HTTP {r.status}")
+    entries = json.loads(r.data)
+    # Only the "-all" union of each protocol is consumed; the local/remote splits
+    # feed nothing (BiDi generation merges the union), so they are not pinned.
+    return sorted(e["name"] for e in entries if e["type"] == "file" and e["name"].endswith("-all.cddl"))
Evidence
The updater makes GitHub API calls via a default PoolManager with no explicit timeout and only sets
Accept/User-Agent headers, while the CI runner environment already provides a GitHub token that
could be used to avoid rate limits and improve reliability.

scripts/update_cddl.py[40-65]
.github/workflows/bazel.yml[89-96]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update_cddl.py` uses `urllib3.PoolManager()` with default timeout/retry behavior and makes unauthenticated requests to `api.github.com`. This can hang indefinitely on network stalls and is more likely to fail under GitHub API throttling.

## Issue Context
The GitHub Actions Bazel workflow exports `GITHUB_TOKEN` into the environment, but the updater never uses it.

## Fix Focus Areas
- scripts/update_cddl.py[40-65]
- .github/workflows/bazel.yml[89-96]

## Implementation notes
- Read `GITHUB_TOKEN` (and/or `GH_TOKEN`) from `os.environ` and, when present, add `Authorization: Bearer <token>` to `API_HEADERS`.
- Configure a default timeout and retries, e.g.:
 - `timeout=urllib3.Timeout(connect=5.0, read=30.0, total=60.0)`
 - `retries=urllib3.Retry(total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503, 504])`
- When raising on non-200 responses, include the response body (or JSON `message`) to make failures diagnosable in CI logs.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Unchecked regex rewrites ✓ Resolved 🐞 Bug ≡ Correctness
Description
update_pin() rewrites common/webref_cddl.bzl with re.sub() but never verifies that the _COMMIT and
_CDDL_FILES patterns matched, so the script can report success while leaving stale pins if the file
format changes. This can silently break the intended “refresh pinned CDDL” behavior in release
automation.
Code

scripts/update_cddl.py[R100-110]

+def update_pin(commit, entries):
+    content = BZL_FILE.read_text()
+    before = existing_repo_names(content)
+
+    content = re.sub(r'_COMMIT = "[0-9a-f]+"', f'_COMMIT = "{commit}"', content)
+    content = re.sub(r"_CDDL_FILES = \[.*?\n\]", render_cddl_files(entries), content, flags=re.S)
+
+    BZL_FILE.write_text(content)
+
+    after = {name for name, _, _ in entries}
+    return sorted(after - before), sorted(before - after)
Evidence
The pin update function uses re.sub without checking replacement counts, while the module update
function does count-checking; this inconsistency means pin updates can fail silently whereas module
updates fail loudly.

scripts/update_cddl.py[100-111]
scripts/update_cddl.py[113-125]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`update_pin()` uses `re.sub()` for `_COMMIT` and `_CDDL_FILES` updates but does not check whether replacements occurred. If `common/webref_cddl.bzl` is reformatted (even slightly), the updater can become a no-op without failing.

## Issue Context
`update_module()` already uses `re.subn()` and validates `count == 1`, showing the intended robustness pattern.

## Fix Focus Areas
- scripts/update_cddl.py[100-111]
- scripts/update_cddl.py[113-125]

## Implementation notes
- Replace the two `re.sub(...)` calls with `re.subn(...)` and assert `count == 1` for each substitution.
- If either count is not 1, raise a `RuntimeError` that includes which pattern failed (and ideally the file path) so CI failures are actionable.
- (Optional hardening) Make the regexes whitespace-tolerant (e.g., `\s*`) and/or add explicit sentinel comments in `common/webref_cddl.bzl` to anchor the generated block.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

Comment thread scripts/update_cddl.py
Comment thread scripts/update_cddl.py
Comment thread scripts/update_cddl.py
@qodo-code-review

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit 3c5376b

@qodo-code-review

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit 6ae54e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants