Skip to content

[GPU] Fixup gemm scale dims init #3566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kealan-barbieri
Copy link
Contributor

@kealan-barbieri kealan-barbieri commented Jul 9, 2025

Description

  • Directly initialize memory descriptors for scales for use as post-ops and for checking internal gemmstone handling
  • Fix handling of batch offsets for per_tensor scales in case of swap_ab
  • Fix handling of acc type.

Offset fix example:

 --> ../../dnn/build/tests/benchdnn/benchdnn --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=ab
c --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
[  32][DST][1:0:0] exp_f32:          17 exp:          17 got:        -nan diff:     nan rdiff:     nan
[  33][DST][1:1:0] exp_f32:        2240 exp:        2240 got:        -nan diff:     nan rdiff:     nan
[  34][DST][1:2:0] exp_f32:         362 exp:         362 got:        -nan diff:     nan rdiff:     nan
[  35][DST][1:3:0] exp_f32:        -547 exp:        -547 got:        -nan diff:     nan rdiff:     nan
[  36][DST][1:4:0] exp_f32:        2014 exp:        2014 got:        -nan diff:     nan rdiff:     nan
[  37][DST][1:5:0] exp_f32:        1284 exp:        1284 got:        -nan diff:     nan rdiff:     nan
[  38][DST][1:6:0] exp_f32:         616 exp:         616 got:        -nan diff:     nan rdiff:     nan
[  39][DST][1:7:0] exp_f32:         872 exp:         872 got:        -nan diff:     nan rdiff:     nan
[  40][DST][1:8:0] exp_f32:         634 exp:         634 got:        -nan diff:     nan rdiff:     nan
[  41][DST][1:9:0] exp_f32:        1186 exp:        1186 got:        -nan diff:     nan rdiff:     nan
[COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:       0 all_max_rdiff:       0
0:FAILED (errors:480 total:512) (330 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
0:FAILED (errors:480 total:512) (330 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
============================
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.34s; create_pd: 0.00s (0%); create_prim: 0.07s (21%); fill: 0.04s (11%); execute: 0.00s (0%); compute_ref: 0.02s (6%); compare: 0.02s (7%);

 --> ./tests/benchdnn/benchdnn --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scale
s=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
0:PASSED (612 ms) __REPRO: --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:s8:f16 --stag=abc --wtag=acb --dtag=abc --attr-scales=src:per_tensor:f16:1x32+wei:per_tensor:f16:32x1 16x32x64:16x64x1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.61s; create_pd: 0.00s (0%); create_prim: 0.07s (11%); fill: 0.00s (0%); execute: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.01s (1%);

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@kealan-barbieri kealan-barbieri requested a review from a team as a code owner July 9, 2025 20:59
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jul 9, 2025
@kealan-barbieri kealan-barbieri changed the title Kealanba/fixup dims init [GPU] Fixup gemm scale dims init Jul 9, 2025
@kealan-barbieri
Copy link
Contributor Author

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_matmul
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

@@ -249,6 +259,11 @@ void jit_gemm_pd_t::init_attrs() {
auto ndims = d->c_desc.ndims;
ao_dims_ = quant_entry_ndims(a_zps, d->b_desc, ndims - 2);
bo_dims_ = quant_entry_ndims(b_zps, d->a_desc, ndims - 1);

quant_entry_init(a_scales, d->b_desc, a_scale_md_, ndims - 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it ndims -2 instead of ndims -1?

@kealan-barbieri kealan-barbieri force-pushed the kealanba/fixup_dims_init branch from 4cfb6a7 to 9e7bd3c Compare July 11, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants