[GPU] Fixup gemm scale dims init #3566

kealan-barbieri · 2025-07-09T20:59:40Z

Description

Directly initialize memory descriptors for scales for use as post-ops and for checking internal gemmstone handling
Fix handling of batch offsets for per_tensor scales in case of swap_ab
Fix handling of acc type.

Offset fix example:

 --> ../../dnn/build/tests/benchdnn/benchdnn --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=ab
c --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
[  32][DST][1:0:0] exp_f32:          17 exp:          17 got:        -nan diff:     nan rdiff:     nan
[  33][DST][1:1:0] exp_f32:        2240 exp:        2240 got:        -nan diff:     nan rdiff:     nan
[  34][DST][1:2:0] exp_f32:         362 exp:         362 got:        -nan diff:     nan rdiff:     nan
[  35][DST][1:3:0] exp_f32:        -547 exp:        -547 got:        -nan diff:     nan rdiff:     nan
[  36][DST][1:4:0] exp_f32:        2014 exp:        2014 got:        -nan diff:     nan rdiff:     nan
[  37][DST][1:5:0] exp_f32:        1284 exp:        1284 got:        -nan diff:     nan rdiff:     nan
[  38][DST][1:6:0] exp_f32:         616 exp:         616 got:        -nan diff:     nan rdiff:     nan
[  39][DST][1:7:0] exp_f32:         872 exp:         872 got:        -nan diff:     nan rdiff:     nan
[  40][DST][1:8:0] exp_f32:         634 exp:         634 got:        -nan diff:     nan rdiff:     nan
[  41][DST][1:9:0] exp_f32:        1186 exp:        1186 got:        -nan diff:     nan rdiff:     nan
[COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:       0 all_max_rdiff:       0
0:FAILED (errors:480 total:512) (330 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
0:FAILED (errors:480 total:512) (330 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
============================
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.34s; create_pd: 0.00s (0%); create_prim: 0.07s (21%); fill: 0.04s (11%); execute: 0.00s (0%); compute_ref: 0.02s (6%); compare: 0.02s (7%);

 --> ./tests/benchdnn/benchdnn --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scale
s=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
0:PASSED (612 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.61s; create_pd: 0.00s (0%); create_prim: 0.07s (11%); fill: 0.00s (0%); execute: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.01s (1%);

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

kealan-barbieri · 2025-07-09T23:07:58Z

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_matmul
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

dyoussif · 2025-07-11T18:23:23Z

src/gpu/intel/jit/gemm/jit_gemm_pd.cpp

@@ -249,6 +259,11 @@ void jit_gemm_pd_t::init_attrs() {
    auto ndims = d->c_desc.ndims;
    ao_dims_ = quant_entry_ndims(a_zps, d->b_desc, ndims - 2);
    bo_dims_ = quant_entry_ndims(b_zps, d->a_desc, ndims - 1);
+
+    quant_entry_init(a_scales, d->b_desc, a_scale_md_, ndims - 2);


why is it ndims -2 instead of ndims -1?

kealan-barbieri requested a review from a team as a code owner July 9, 2025 20:59

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jul 9, 2025

kealan-barbieri changed the title ~~Kealanba/fixup dims init~~ [GPU] Fixup gemm scale dims init Jul 9, 2025

kealan-barbieri added 2 commits July 10, 2025 16:16

xe: jit: gemm: add scales md init

e6fe71c

xe: jit: gemm: fixup acc type init

9e7bd3c

dyoussif reviewed Jul 11, 2025

View reviewed changes

dyoussif approved these changes Jul 11, 2025

View reviewed changes

kealan-barbieri force-pushed the kealanba/fixup_dims_init branch from 4cfb6a7 to 9e7bd3c Compare July 11, 2025 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Fixup gemm scale dims init #3566

[GPU] Fixup gemm scale dims init #3566

Uh oh!

kealan-barbieri commented Jul 9, 2025 •

edited

Loading

Uh oh!

kealan-barbieri commented Jul 9, 2025

Uh oh!

dyoussif Jul 11, 2025

Uh oh!

Uh oh!

[GPU] Fixup gemm scale dims init #3566

Are you sure you want to change the base?

[GPU] Fixup gemm scale dims init #3566

Uh oh!

Conversation

kealan-barbieri commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

General

Uh oh!

kealan-barbieri commented Jul 9, 2025

Uh oh!

dyoussif Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kealan-barbieri commented Jul 9, 2025 •

edited

Loading