Skip to content

[Quake] Fold veq_size through cc.if to activate ForwardEmptyVeqSizePattern#4633

Open
1tnguyen wants to merge 2 commits into
NVIDIA:mainfrom
1tnguyen:tnguyen/4631
Open

[Quake] Fold veq_size through cc.if to activate ForwardEmptyVeqSizePattern#4633
1tnguyen wants to merge 2 commits into
NVIDIA:mainfrom
1tnguyen:tnguyen/4631

Conversation

@1tnguyen
Copy link
Copy Markdown
Collaborator

Fixes #4631, where Python slice lowering could leave quake.veq_size on a cc.if result whose else branch yielded cc.undef.

The ForwardEmptyVeqSizePattern did not fire because it saw the cc.if result, not the branch-local cc.undef.

  • Add a canonicalization pattern that hoists quake.veq_size through the cc.if by appending the computed size as an extra cc.if result. This exposes the else-branch quake.veq_size(cc.undef) to the existing empty-veq fold, producing 0 for the empty slice case.

  • Re-enable the previously disabled cudaq.run slice tests.

Note: I reproduced the segfault on an ARM machine. The issue appears target-sensitive: x86 happened to tolerate the undef value, while ARM did not.

Fixes issue NVIDIA#4631, where PR NVIDIA#4610's Python slice lowering could leave quake.veq_size operating on a cc.if result whose else branch yielded cc.undef.

Before:

%v = cc.if(%cond) -> !quake.veq<?> { ... cc.continue %sub } else { %u = cc.undef !quake.veq<?>; cc.continue %u }

%n = quake.veq_size %v : (!quake.veq<?>) -> i64

After:

%v:2 = cc.if(%cond) -> (!quake.veq<?>, i64) { %n0 = quake.veq_size %sub; cc.continue %sub, %n0 } else { %u = cc.undef !quake.veq<?>; cc.continue %u, %c0_i64 }

The branch-local undef size now canonicalizes to arith.constant 0, while the original veq result is preserved as the prefix result.

Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>
@1tnguyen 1tnguyen requested a review from schweitzpgi May 29, 2026 07:17
@1tnguyen 1tnguyen marked this pull request as ready for review May 29, 2026 07:17
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

CI Summary (push) — ✅ passed

Run #26625742638 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ✅ success
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (42) — every matrix leg, with links
Job Status Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (amd64, llvm) / Caching ✅ success view
Load dependencies (amd64, llvm) / Finalize ✅ success view
Load dependencies (amd64, llvm) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, llvm) / Caching ✅ success view
Load dependencies (arm64, llvm) / Finalize ✅ success view
Load dependencies (arm64, llvm) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ in_progress view
Retrieve PR info ✅ success view
✅ Required checks (6/6) — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view

}
};

// %0 = cc.if(%cond) -> !quake.veq<?> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the input pattern coming from an explicit place in lowering? Since this is such a special case, I wonder if we can't handle it at the source of the issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I believe it came from this if/else op:

elseBlock = Block.create_at_start(ifOp.elseRegion, [])
with InsertionPoint(elseBlock):
subv = cc.UndefOp(self.getVeqType())
cc.ContinueOp([subv.result])
self.pushValue(ifOp.result)

I also pondered the idea of changing it at the bridge side, but I couldn't think of a clean solution to support dynamical slice and be compatible with ForwardEmptyVeqSizePattern. Do you have an idea?

Copy link
Copy Markdown
Collaborator

@schweitzpgi schweitzpgi May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's sort of gross. Eek!

What if the Python bridge looks at the argument its about to use to quake.veq_size, seeing if it is a cc.if and does your rewrite (as in this PR) immediately at that point?

That's sort of odd too, though. Another idea would be to put your pattern in a new Python bridge cleanup pass. I like that a bit better as we can add other "cool" patterns there to cleanup squirrelly residue from the front end. :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, ForwardEmptyVeqSizePattern was supposed to kick in after arg synth and aggressive const prop, which would be when the if condition ought to be a constant.

Were you seeing a case where that wasn't true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mystery crash with cudaq::run in CI

2 participants