[Quake] Fold veq_size through cc.if to activate ForwardEmptyVeqSizePattern#4633
[Quake] Fold veq_size through cc.if to activate ForwardEmptyVeqSizePattern#46331tnguyen wants to merge 2 commits into
veq_size through cc.if to activate ForwardEmptyVeqSizePattern#4633Conversation
Fixes issue NVIDIA#4631, where PR NVIDIA#4610's Python slice lowering could leave quake.veq_size operating on a cc.if result whose else branch yielded cc.undef. Before: %v = cc.if(%cond) -> !quake.veq<?> { ... cc.continue %sub } else { %u = cc.undef !quake.veq<?>; cc.continue %u } %n = quake.veq_size %v : (!quake.veq<?>) -> i64 After: %v:2 = cc.if(%cond) -> (!quake.veq<?>, i64) { %n0 = quake.veq_size %sub; cc.continue %sub, %n0 } else { %u = cc.undef !quake.veq<?>; cc.continue %u, %c0_i64 } The branch-local undef size now canonicalizes to arith.constant 0, while the original veq result is preserved as the prefix result. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
CI Summary (
|
| Job | Result |
|---|---|
binaries |
⏩ skipped |
build_and_test |
✅ success |
config_devdeps |
✅ success |
config_source_build |
⏩ skipped |
config_wheeldeps |
✅ success |
devdeps |
✅ success |
docker_image |
⏩ skipped |
gen_code_coverage |
⏩ skipped |
metadata |
✅ success |
python_metapackages |
⏩ skipped |
python_wheels |
⏩ skipped |
source_build |
⏩ skipped |
wheeldeps |
✅ success |
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
| Job |
|---|
binaries |
config_source_build |
docker_image |
gen_code_coverage |
python_metapackages |
python_wheels |
source_build |
All sub-jobs (42) — every matrix leg, with links
| Job | Status | Link |
|---|---|---|
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| CI Summary | ❔ in_progress | view |
| Configure build (devdeps) | ✅ success | view |
| Configure build (source_build) | ⏩ skipped | view |
| Configure build (wheeldeps) | ✅ success | view |
| Create CUDA Quantum installer | ⏩ skipped | view |
| Create Docker images | ⏩ skipped | view |
| Create Python metapackages | ⏩ skipped | view |
| Create Python wheels | ⏩ skipped | view |
| Gen code coverage | ⏩ skipped | view |
| Load dependencies (amd64, gcc12) / Caching | ✅ success | view |
| Load dependencies (amd64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (amd64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (amd64, llvm) / Caching | ✅ success | view |
| Load dependencies (amd64, llvm) / Finalize | ✅ success | view |
| Load dependencies (amd64, llvm) / Metadata | ✅ success | view |
| Load dependencies (arm64, gcc12) / Caching | ✅ success | view |
| Load dependencies (arm64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (arm64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (arm64, llvm) / Caching | ✅ success | view |
| Load dependencies (arm64, llvm) / Finalize | ✅ success | view |
| Load dependencies (arm64, llvm) / Metadata | ✅ success | view |
| Load source build cache | ⏩ skipped | view |
| Load wheel dependencies (amd64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Metadata | ✅ success | view |
| Prepare cache clean-up | ❔ in_progress | view |
| Retrieve PR info | ✅ success | view |
✅ Required checks (6/6) — declared in .github/required-checks.yml for push
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
| } | ||
| }; | ||
|
|
||
| // %0 = cc.if(%cond) -> !quake.veq<?> { |
There was a problem hiding this comment.
Is the input pattern coming from an explicit place in lowering? Since this is such a special case, I wonder if we can't handle it at the source of the issue.
There was a problem hiding this comment.
Indeed, I believe it came from this if/else op:
cuda-quantum/python/cudaq/kernel/ast_bridge.py
Lines 4631 to 4635 in 2a80ac4
I also pondered the idea of changing it at the bridge side, but I couldn't think of a clean solution to support dynamical slice and be compatible with ForwardEmptyVeqSizePattern. Do you have an idea?
There was a problem hiding this comment.
Yeah, that's sort of gross. Eek!
What if the Python bridge looks at the argument its about to use to quake.veq_size, seeing if it is a cc.if and does your rewrite (as in this PR) immediately at that point?
That's sort of odd too, though. Another idea would be to put your pattern in a new Python bridge cleanup pass. I like that a bit better as we can add other "cool" patterns there to cleanup squirrelly residue from the front end. :)
There was a problem hiding this comment.
BTW, ForwardEmptyVeqSizePattern was supposed to kick in after arg synth and aggressive const prop, which would be when the if condition ought to be a constant.
Were you seeing a case where that wasn't true?
Fixes #4631, where Python slice lowering could leave
quake.veq_sizeon acc.ifresult whose else branch yieldedcc.undef.The
ForwardEmptyVeqSizePatterndid not fire because it saw thecc.ifresult, not the branch-localcc.undef.Add a canonicalization pattern that hoists
quake.veq_sizethrough thecc.ifby appending the computed size as an extracc.ifresult. This exposes the else-branchquake.veq_size(cc.undef)to the existing empty-veq fold, producing0for the empty slice case.Re-enable the previously disabled
cudaq.runslice tests.Note: I reproduced the segfault on an ARM machine. The issue appears target-sensitive: x86 happened to tolerate the
undefvalue, while ARM did not.