[AMDGPU] comgr: read SGPR count from msgpack metadata instead of reserved KD field#2873
Merged
chinmaydd merged 2 commits intoJun 15, 2026
Conversation
jmmartinez
approved these changes
Jun 12, 2026
jmmartinez
left a comment
There was a problem hiding this comment.
The getKernelSgprCount related changes look good to me. Thanks!
10 tasks
|
@suryajasper could you take a look at the merge conflicts here ? Thanks ! This one should be good to go. |
…rved KD field On GFX10+ GRANULATED_WAVEFRONT_SGPR_COUNT in the kernel descriptor is architecturally reserved (must be zero). The scratch SGPR allocator was reading this field to determine the allocation base, which is unreliable. Replace getKernelSgprCount() with a metadata-first approach: parse the NT_AMDGPU_METADATA msgpack note and read .sgpr_count from the amdhsa.kernels array. Fall back to the KD field (with the correct encoding granule of 8) for minimal ELFs that lack a metadata note. Also remove the SGPR write path from updateKernelDescriptor() since bumping the reserved field has no effect on hardware, and remove SgprGranuleSize from RewriteConfig since the metadata provides the raw count directly.
98519e0 to
a985d11
Compare
Member
Author
|
Rebased and addressed merge conflicts. Also fixed |
GFX1250 has Feature1024AddressableVGPRs. With wave32, getAddressableNumVGPRs returns 1024, not 256. The incorrect limit artificially constrained the VgprAllocator's headroom.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On GFX10+,
GRANULATED_WAVEFRONT_SGPR_COUNTin the kernel descriptor is architecturally reserved and must be zero — the hardware ignores it. The scratch SGPR allocator introduced in #2328 was reading this field to determine where to start allocating scratch SGPRs, which is unreliable: a code object that correctly zeroes the field would cause the allocator to start at s8, potentially clobbering live registers.This PR replaces the KD-field approach with a metadata-first strategy: read
.sgpr_countfrom theamdhsa.kernelsmsgpack metadata note (NT_AMDGPU_METADATA), falling back to the KD field for minimal ELFs that lack a metadata note (e.g. lit test objects assembled with-nostdlib).Also fixes
Gfx1250MaxVgprsfrom 256 to 1024 — GFX1250 hasFeature1024AddressableVGPRsand with wave32,getAddressableNumVGPRsreturns 1024.Follow-up to #2328 (specifically @jmmartinez's review comment noting the SGPR field is reserved on GFX10+, and comment noting gfx1250 has 1024 VGPRs).
Implementation Details
comgr-hotswap-elf.cpp—getKernelSgprCount()rewritten:PT_NOTEprogram headers → findNT_AMDGPU_METADATAnote → parse msgpack blob viallvm::msgpack::Document::readFromBlob()→ walkamdhsa.kernelsarray → match.nameto the kernel → return.sgpr_count.GRANULATED_WAVEFRONT_SGPR_COUNTfrom the kernel descriptor with the correct encoding granule (8), matchinggetSGPREncodingGranule()inAMDGPUBaseInfo.cpp.std::nullopton failure; the caller falls back toMaxSgprs(106), which is conservative (no headroom → patch refused rather than silent clobber).comgr-hotswap-elf.cpp—updateKernelDescriptor()simplified:ExtraSgprsandSgprGranuleSizeparameters.comgr-hotswap-internal.h:getKernelSgprCount()signature: removedSgprGranuleSizeparameter (metadata provides the raw count directly).updateKernelDescriptor()signature: removedExtraSgprsandSgprGranuleSize.RewriteConfig: removedSgprGranuleSizefield.comgr-hotswap-patch-f32-to-e5m3.cpp:allocateScratch(): updated call togetKernelSgprCount()(no granule argument).comgr-hotswap-b0a0.cpp:Gfx1250SgprGranuleSizeconstant.Gfx1250MaxVgprsfrom 256 to 1024 (Feature1024AddressableVGPRs+ wave32).updateKernelDescriptor()call site to match new signature.No build system changes —
LLVMBinaryFormat(which includes msgpack) is already linked by COMGR.Test Coverage
s_mov_b32without specifying SGPR numbers, so they are agnostic to which specific SGPRs the allocator selects.-nostdlib, which produces ELFs without metadata notes). The metadata path will be exercised by production code objects compiled withhipcc/clangwhich always includeNT_AMDGPU_METADATA.