Skip to content

Parity Auto Trigger: run parity.yml per upstream commit once CI finishes#3176

Draft
ethanwee1 wants to merge 11 commits intodevelopfrom
parity-auto-trigger
Draft

Parity Auto Trigger: run parity.yml per upstream commit once CI finishes#3176
ethanwee1 wants to merge 11 commits intodevelopfrom
parity-auto-trigger

Conversation

@ethanwee1
Copy link
Copy Markdown

@ethanwee1 ethanwee1 commented Apr 23, 2026

Summary

Adds .github/workflows/parity-auto.yml so ROCm/pytorch automatically dispatches parity.yml for upstream pytorch/pytorch:main commits once the CI jobs needed for the parity report have finished.

The workflow currently:

  1. Polls recent upstream commits on a cron and via workflow_dispatch.
  2. Reads upstream check-runs for each SHA rather than relying on parent workflow status.
  3. Waits for every in-scope ROCm parity test shard to be status=completed.
  4. Waits for the CUDA jobs consumed by download_testlogs to be status=completed:
    • linux-jammy-cuda13.0-py3.10-gcc11 / test-osdc (default, ...)
    • linux-jammy-cuda13.0-py3.10-gcc11 / test-osdc (distributed, ...)
    • unit-test / inductor-test / test (inductor, ...)
  5. Dispatches parity.yml once for the ready, unprocessed arch subset.
  6. Embeds the upstream SHA in csv_name/run title so the next scan can avoid duplicates.

The cron is set to every 10 minutes to reduce dispatch latency after upstream CI finishes.

Notable details

  • Readiness is based on check-run status=completed, not conclusion=success; failing test shards are still useful because they produce logs/artifacts.
  • ROCm readiness is scoped to the configured arch test-shard regexes, so unrelated ROCm benchmark/periodic jobs do not block parity reports.
  • CUDA default/distributed now uses the upstream OSDC CUDA jobs and test-reports-test-osdc-* artifact prefixes.
  • download_testlogs normalizes extracted CUDA OSDC artifact folders back to test-default-* / test-distributed-* so the existing XML summarizer keeps producing the same test_config values.

Testing on fork

This version has been deployed on ethanwee1/pytorch:main for live testing.

Recent successful scheduled auto-trigger runs on the latest fork head b490444...:

Recent parity reports dispatched by the auto-trigger after the latest fixes:

Earlier failures on the fork were from older revisions before the CSV field-size and CUDA OSDC fixes. The latest completed reports on the current fork head are green.

Follow-up after merge

After this lands on ROCm/pytorch develop, disable the fork cron to avoid duplicate polling/dispatching:

gh workflow disable parity-auto.yml --repo ethanwee1/pytorch

Base automatically changed from parity-summary-improvements to develop April 23, 2026 14:35
Adds .github/workflows/parity-auto.yml, which runs on a 30-minute
cron (and workflow_dispatch for testing) and:

  1. Pulls the most recent commits from pytorch/pytorch:main.
  2. Skips commits that are too new (CI not started), too old
     (back-fill limit), or already have a parity.yml run in this
     repo (detected by matching the full SHA in prior run titles).
  3. For the first remaining commit whose upstream check-runs are
     all "completed", dispatches parity.yml with that SHA so
     download_testlogs pulls the artifacts and logs for that exact
     build and generate_summary.py produces the per-arch report.

csv_name is set to "autoparity-YYYYMMDD-<full SHA>" so the SHA ends
up in the dispatched run's display title, which is what this
workflow queries to avoid re-dispatching.

Inputs expose max_commits, lookback_hours, max_age_hours, arch, and
dry_run for tuning / debugging without code changes.
@ethanwee1 ethanwee1 force-pushed the parity-auto-trigger branch from dbe8b5f to 82e48da Compare April 23, 2026 14:38
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 23, 2026

Jenkins build for b926457d84f7482481aa8f1fdfced65247ad8882 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

…ow completion

Previously the workflow used a blunt "all upstream check-runs
completed" gate and dispatched parity.yml with a fixed arch list
(mi355, mi300, mi200). That meant:

  * We blocked on hundreds of unrelated upstream check-runs
    (labeler bots, etc.).
  * We'd dispatch with arch="mi355, mi300, mi200" for a commit
    where only `trunk` had run, so mi300/mi200 had no data and
    the parity report came out nearly empty.

Per-arch rewrite:

  * Query `repos/pytorch/pytorch/actions/runs?head_sha=<SHA>` to
    see which upstream workflows actually completed on the commit.
  * Map each arch to its default-tier upstream workflow (mi355→
    trunk, mi300→rocm-mi300, mi200→trunk-rocm-sandbox, navi31→
    rocm-navi31, nightly→rocm-nightly), exposed as
    `arch_workflow_map` input.
  * For each SHA newest→oldest, compute ready archs = archs whose
    required workflow is completed, minus archs already dispatched
    for that SHA (parsed from prior parity run titles after " · ").
  * If the remaining set is non-empty, dispatch parity.yml with
    arch=<that subset> and csv_name embedding the full SHA.

Effect: mi355 gets a parity report per upstream commit (trunk
runs per-commit). mi300/mi200 get dispatched separately whenever
their less-frequent periodic workflow finishes on a given SHA.
Each (SHA, arch) pair is dispatched at most once.

Also adds a `target_ref` input so the dispatched parity.yml can
run off a specific branch (useful for testing against a branch
that has the up-to-date parity scripts while the workflow file
itself lives on the default branch).
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 23, 2026

Jenkins build for b926457d84f7482481aa8f1fdfced65247ad8882 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@ethanwee1 ethanwee1 marked this pull request as draft April 29, 2026 16:22
The loop was silently aborting after printing 'no ready archs' for
the first commit, because set -e was catching a non-zero exit in
the next iteration (most likely date -u -d failing on an edge-case
DATE string, or a gh api pagination call hitting a transient error).
Drop -e (we already guard the pipelines that matter with || true),
and make COMMIT_EPOCH fall back to 0 + skip the age check if
date -d parsing fails.
…ult)

GitHub Actions runs our script with 'shell: /usr/bin/bash -e {0}', so
errexit is active from the shebang regardless of what we put in the
script. 'set -uo pipefail' only adds options; it does not remove -e.
Use 'set +e' before 'set -uo pipefail' so a non-zero exit from a pipe
(grep -q with no match, etc.) in the middle of scanning multiple
commits no longer silently kills the loop.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 29, 2026

Jenkins build for 7e331d97cb23b9ba937aa56d586a886740fd4a99 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

The auto-trigger previously waited for every ROCm check-run on an upstream SHA to complete before dispatching parity.yml, but download_testlogs also consumes CUDA default/distributed shards from trunk and CUDA inductor shards from the inductor workflow. If those CUDA jobs were still running, the parity report could be authored with partial CUDA data.

Fetch all check-runs for the SHA, split out ROCm check-runs plus the CUDA test check-runs used by download_testlogs, and require the combined set to be status=completed before dispatching. Conclusions may still be failure; we only need the shards to have finished so their logs/artifacts are available.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 29, 2026

Jenkins build for bb046b388cdf6d2fa2f12fe8d0dc785aba3badd5 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

The CUDA readiness gate should wait for the jobs that parity.yml actually consumes, not every upstream check-run containing "rocm". Some unrelated ROCm benchmark/periodic jobs can still be pending on the same SHA and would otherwise block reports unnecessarily.

Build the ROCm side of the gate from the configured per-arch test shard regexes, then combine that with CUDA default/distributed/inductor checks. This preserves the "wait until the jobs we compare are finished" invariant without waiting on unrelated ROCm jobs.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 29, 2026

Jenkins build for 3f0fa62ba5d8141b952cc3af902fd2331f791792 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Upstream trunk now provides the CUDA default/distributed coverage we need through linux-jammy-cuda13.0-py3.10-gcc11 test-osdc shards rather than the older normal test shards. The old lookup matched test-osdc loosely as '/ test', then failed to find logs/artifacts because it still searched for '/ test (' job names and test-reports-test-default/distributed prefixes.

Switch CUDA default/distributed log matching to test-osdc, use the test-reports-test-osdc-default/distributed artifact prefixes, and normalize extracted test-osdc artifact directories back to test-default/test-distributed so summarize_xml_testreports keeps assigning the existing test_config values. Also update parity-auto's CUDA readiness regex to wait for the same OSDC shards before dispatching.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 29, 2026

Jenkins build for 4d77e114a308071dd31fc1d665d44e3933d6f0bb commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

The auto-trigger is lightweight API polling, and a 30 minute cron leaves too much latency after the last ROCm/CUDA parity shard finishes. Tighten the schedule to every 10 minutes so completed upstream commits are picked up sooner while still avoiding excessive schedule noise.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 29, 2026

Jenkins build for 4d77e114a308071dd31fc1d665d44e3933d6f0bb commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 30, 2026

Jenkins build for be9768a43660294dcbb1187bc1ab07ff95cedefc commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

/var/lib/jenkins/pytorch/aten/src/ATen/hip/CublasHandlePool.cpp:60:11: warning: enumeration value ‘rocblas_status_excluded_from_build’ not handled in switch [-Wswitch]
/var/lib/jenkins/pytorch/aten/src/ATen/hip/CublasHandlePool.cpp:60:11: warning: enumeration value ‘rocblas_status_arch_mismatch’ not handled in switch [-Wswitch]
[7625/8176] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/cudnn/hip/BatchNorm.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
[7626/8176] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASH_NAMESPACE=pytorch_flash -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHIPBLASLT_USE_ROCROLLER -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_VERSION=70202 -DTORCH_CUDA_BUILD_MAIN_LIB -DTORCH_HIP_VERSION=702 -DUNFUSE_FMA -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_LAYERNORM_FAST_RECIPROCAL -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_PROF_API=1 -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -Dtorch_hip_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/aten/src/THH -I/var/lib/jenkins/pytorch/third_party/mslk/include -I/var/lib/jenkins/pytorch/aten/src/ATen/hip -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/library/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/example/ck_tile/01_fmha -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/composable_kernel -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/aiter/csrc/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/torch/include -I/var/lib/jenkins/pytorch/c10/hip/../.. -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/build/third_party/gloo/hip -isystem /opt/rocm-7.2.2/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /opt/rocm/include -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /var/lib/jenkins/pytorch/third_party/concurrentqueue -isystem /opt/rocm-7.2.2/include/hiprand -isystem /opt/rocm-7.2.2/include/rocrand -isystem /opt/rocm/magma/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_MSLK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -fdiagnostics-color=always -DMKL_HAS_SBGEMM -DMKL_HAS_SHGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wunused -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wredundant-move -Wno-interference-size -Wno-maybe-uninitialized -fvisibility=hidden -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=702 -Wno-shift-count-negative -Wno-shift-count-overflow -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIPBLASLT_OUTER_VEC -DUSE_ROCM_CK_GEMM -DHIP_VERSION=7 -Wno-duplicate-decl-specifier -DUSE_MIOPEN -MD -MT caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o -MF caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o.d -o caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o -c /var/lib/jenkins/pytorch/torch/csrc/distributed/c10d/UCCUtils.cpp
sccache: encountered fatal error
sccache: error : corrupt deflate stream
sccache:  cause: corrupt deflate stream
[7627/8176] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/HIPSparseDescriptors.cpp.o

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant