[CI] Show only ROCm failures in parity summary and add cross-arch column#3153
[CI] Show only ROCm failures in parity summary and add cross-arch column#3153jithunnair-amd merged 8 commits intodevelopfrom
Conversation
Only display tests where ROCm status is FAILED (CUDA status shown as context column). Add cross-architecture lookup so each failure shows which other architectures have the same test failing.
|
Jenkins build for fad16aed2f09fcc9366270a28d73d228b5629220 commit finished as NOT_BUILT |
Remove cuda, cuda_dist, cuda_inductor, and baseline entries from LOG_FILE_MAP since only ROCm failures are relevant to the parity report.
|
Jenkins build for fad16aed2f09fcc9366270a28d73d228b5629220 commit finished as FAILURE |
- Restore CUDA and baseline log parsing so their failures can be cross- referenced, but keep the LOG-BASED FAILURES table's Arch column limited to ROCm entries (CUDA rows are hidden from the table itself). - Add "Also Failing In" column to LOG-BASED FAILURES and include "cuda" in the FAILED TESTS "Also Failing In" column when a CUDA log failure exists for the same test tuple. This lets us spot tests failing on both platforms so we can revert upstream changes instead of filing a ROCm DISABLED issue. - Split the single Shard column in FAILED TESTS into Shard (rocm) and Shard (cuda) so each failure can be looked up in either CI job. - Propagate the active test-file shard to CONSISTENT_FAILURE log entries so shard info is no longer blank in the log-based failures table.
|
Jenkins build for 902d7cfa0b3c35044138c88548e5991bf5c43049 commit finished as ABORTED |
- detect_log_failures.py now computes job-level shard totals by counting log files per (platform, test_config) and emits both job_shard (e.g. 3/6, derived from filename + file count) and test_shard (e.g. 10/15, the intra-file pytest "Running ... N/M" value) for each failure, including CONSISTENT_FAILURE entries. - generate_summary.py LOG-BASED FAILURES table now has separate "Job-Level Shard" and "Test-Level Shard" columns so reviewers can jump directly to the CI job and any intra-file shard. - FAILED TESTS table columns renamed from "Shard (rocm/cuda)" to "Job-Level Shard (rocm/cuda)" for consistency with the log-based table (these values are already derived from the XML report dir name, e.g. test-default-3-6).
|
Jenkins build for 81c66e65c6790db005efda1c3918f359684ffd5b commit finished as FAILURE |
When a test failure is already reported in the XML-based FAILED TESTS table, it would also appear in LOG-BASED FAILURES whenever the same shard's log contained a "failed!" or "FAILED CONSISTENTLY" line. That made the summary look like two separate failures when there was only one. The LOG-BASED section is meant for failures *not* captured by XML (timeouts, crashes, process kills), so skip any entry whose (arch, test_file, test_class, test_name) tuple already appears in the FAILED TESTS table. Also normalize test_file before comparing, since XML uses dotted paths (e.g. distributed.test_symmetric_memory) while logs use slash paths (distributed/test_symmetric_memory, sometimes with a trailing .py). On run 24735028060 this drops the LOG-BASED section from 21 rows to 6 truly XML-missing timeouts.
|
Jenkins build for 8d063809e99a31079baa98d915540d6f88df8a1b commit finished as NOT_BUILT |
…shards inventory detect_log_failures.py now emits a sibling log_shards_<arch>.csv alongside log_failures_<arch>.csv, capturing every (platform, test_config, job_shard, test_file) -> observed test-level shards combination seen in the raw CI logs. generate_summary.py consumes the inventory to back-fill a "Test-Level Shard (rocm)" and "Test-Level Shard (cuda)" column in the XML-based FAILED TESTS table (XML artifacts don't contain test-level shard metadata, so we recover it by matching the job-level shard + test file to the log inventory). For intra-file-sharded test files (e.g. test_torchinductor_opinfo_properties split into 14 pytest shards), the value is rendered compactly as "1,6,12/14". The LOG-BASED FAILURES table already displayed test-level shard per entry; no change there beyond the existing column. parity.yml: exclude log_shards_*.csv from the CSV discovery glob in the summarize step so the new inventory file isn't mistaken for a parity CSV.
|
Jenkins build for 8d063809e99a31079baa98d915540d6f88df8a1b commit finished as NOT_BUILT |
… in LOG-BASED FAILURES detect_log_failures.py: - parse_log_file now also returns a flaky_tests list. When CI's "Test succeeded in new process, continuing with the rest of the tests" marker follows an individual-test PASSED line, the corresponding test is recorded as flaky (the preceding normal-process run failed, hence the rerun). - scan_logs emits these as structured records with platform, test_config, test_file, test_class, test_name, job_shard, and test_shard. - A sibling flaky_tests_<arch>.csv is written next to log_failures_<arch>.csv, via the generalized _derive_sibling_path(). generate_summary.py: - load_flaky_tests_as_log_failures() reads the flaky CSV and shapes it like log-failure rows with category='FLAKY'. main() appends these to the log_failures list. - FLAKY entries are exempted from the XML-vs-log dedup filter in the LOG-BASED FAILURES table, since a rerun-passed signal is orthogonal to any hard failure recorded in XML. - Cross-arch "Also Failing In" now naturally links matching flaky tests across architectures. Verified locally on run 24735028060 artifacts: 20 flaky entries for mi200 and 9 for mi355 (exact 1:1 with "Test succeeded in new process" log lines), including tests like test_flex_attention_with_dynamic_max_autotune_graph_partition_cuda and test_template_epilogue_fusion_static_analysis_...use_async_compile_True that the dashboard owner flagged from run 24796654604.
|
Jenkins build for 2051d20f0c6d70d2d7b9ca0644c3a2f1a6f5d9ab commit finished as NOT_BUILT |
The summarize job picks the first matching *.csv in the per-arch artifact dir, filtering out auxiliary files. Now that detect_log_failures.py also emits a sibling flaky_tests_<arch>.csv, it can be mistakenly picked up as the parity CSV (e.g. when ordering puts it first), causing generate_summary.py to crash with KeyError: 'status_set1'. Add it to the exclusion list alongside log_failures_*.csv and log_shards_*.csv.
|
Jenkins build for 2051d20f0c6d70d2d7b9ca0644c3a2f1a6f5d9ab commit finished as FAILURE |
jithunnair-amd
left a comment
There was a problem hiding this comment.
These are some awesome improvements! I did cross-check that the flaky tests was unable to catch some tests such as the following in shard 5 for mi355:
2026-04-22T10:26:36.9749161Z inductor/test_max_autotune.py::TestMaxAutotuneAsyncPipelined::test_triton_error_precompilation_and_autotuning E0422 10:25:44.014000 1154211 site-packages/torch/_inductor/select_algorithm.py:3854] [0/0] Runtime error for autotuning triton choices, defaulting to extern kernels.
2026-04-22T10:26:36.9750048Z W0422 10:25:46.007000 1155334 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
2026-04-22T10:26:36.9750731Z /var/lib/jenkins/pytorch/test/inductor/test_max_autotune.py:123: FutureWarning: torch.cuda._set_allocator_settings is deprecated. Use torch._C._accelerator_setAllocatorSettings instead.
2026-04-22T10:26:36.9751116Z torch.cuda.memory._set_allocator_settings("expandable_segments:False")
2026-04-22T10:26:36.9751479Z E0422 10:26:02.003000 1154211 site-packages/torch/_inductor/select_algorithm.py:3854] [0/0] Runtime error for autotuning triton choices, defaulting to extern kernels.
2026-04-22T10:26:36.9751791Z PASSED [19.3006s] [100%]
2026-04-22T10:26:36.9751858Z
2026-04-22T10:26:36.9752076Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_max_autotune/inductor.test_max_autotune-93dbff3468a90d16.xml -
2026-04-22T10:26:36.9752409Z ====================== 1 passed, 276 deselected in 19.34s ======================
2026-04-22T10:26:36.9753715Z Got exit code 0
2026-04-22T10:26:36.9753859Z Test succeeded in new process, continuing with the rest of the tests
We can refine the flaky tests logic to be more robust.
Summary
Test plan
Latest run https://github.com/ROCm/pytorch/actions/runs/24798004968
Run without this PR on the same commit: https://github.com/ROCm/pytorch/actions/runs/24796654604