Open
Description
Backgroud
This is a follow up of issue 2674. Share the same background with issue 2674, but for a different test case as follows. The case is skipped in #2168 because of failures.
test_graph_unit_dnnl_large_partition_usm_cpu(test_large_partition_execute.Int8Resnet50Stage2Block)
Summary
When I try to analyze the issue, I found that the test case can also be reproduced with benchdnn without graph API component.
The failed kernel is convolution,gemm_s8u8s32:ref
See the following log:
ONEDNN_VERBOSE=1 ./tests/benchdnn/benchdnn --conv --reset --allow-enum-tags-only=0 --engine=cpu --dir=FWD_I --alg=direct --dt=u8:s8:u8 --bia-dt=f32
--stag=acdb --wtag=any --dtag=acdb --attr-post-ops=eltwise_relu --attr-scales=src0:common:0.5+dst:common:0.5+wei:per_oc --attr-zero-points=src0:common:1+dst:common:1 --attr-scratchpad=user
mb1_ic8oc8_ih12oh12kh1sh1dh0ph0_iw12ow12kw1sw1dw0pw0
onednn_verbose,v1,info,oneDNN v3.8.0 (commit af1410c21a7455af587ae496c719ac7896d8ed95)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:4
onednn_verbose,v1,info,cpu,isa:AArch64 SVE (256 bits)
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:a::f0 dst:f32::blocked:a::f0,,,8,0.0109863
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:s8::blocked:cdba::f8:zpm1,,,8x8x1x1,0.163086
onednn_verbose,v1,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src:f32::blocked:abcd::f0 dst:s8::blocked:abcd::f0,,,8x8x1x1,0.0268555
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:s8::blocked:abcd::f0 dst:s8::blocked:cdba::f8:zpm1,,,8x8x1x1,0.0200195
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abcd::f0 dst:u8::blocked:acdb::f0,,,1x8x12x12,0.104004
onednn_verbose,v1,primitive,exec,cpu,convolution,gemm_s8u8s32:ref,forward_inference,src:u8::blocked:acdb::f0 wei:s8:a:blocked:cdba::f8:zpm1 bia:f32:a:blocked:a::f0 dst:u8::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+dst:0:f32+wei:1:f32 attr-zero-points:src0:0:s32+dst:0:s32 attr-post-ops:eltwise_relu,alg:convolution_direct,mb1_ic8oc8_ih12oh12kh1sh1dh0ph0_iw12ow12kw1sw1dw0pw0,0.177002
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:u8::blocked:acdb::f0 dst:f32::blocked:abcd::f0,,,1x8x12x12,0.0229492
[ 0][DST][0:0:0:0] exp_f32: 14 exp: 14 got: 16 diff: 2 rdiff:0.142857
[ 1][DST][0:0:0:1] exp_f32: 21.5 exp: 22 got: 23 diff: 1 rdiff:0.0454545
[ 2][DST][0:0:0:2] exp_f32: 15.5 exp: 16 got: 17 diff: 1 rdiff: 0.0625
[ 3][DST][0:0:0:3] exp_f32: 14.75 exp: 15 got: 16 diff: 1 rdiff:0.0666667
[ 4][DST][0:0:0:4] exp_f32: 19 exp: 19 got: 20 diff: 1 rdiff:0.0526316
[ 5][DST][0:0:0:5] exp_f32: 11.75 exp: 12 got: 13 diff: 1 rdiff:0.0833333
[ 6][DST][0:0:0:6] exp_f32: 15 exp: 15 got: 16 diff: 1 rdiff:0.0666667
[ 7][DST][0:0:0:7] exp_f32: 15.5 exp: 16 got: 17 diff: 1 rdiff: 0.0625
[ 8][DST][0:0:0:8] exp_f32: 8 exp: 8 got: 10 diff: 2 rdiff: 0.25
[ 9][DST][0:0:0:9] exp_f32: 20.25 exp: 20 got: 22 diff: 2 rdiff: 0.1
[COMPARE_STATS][DST]: trh=0 err_max_diff: 32 err_max_rdiff: 32 all_max_diff: 32 all_max_rdiff: 32
0:FAILED (errors:897 total:1152) __REPRO: --conv --allow-enum-tags-only=false --dir=FWD_I --dt=u8:s8:u8 --bia-dt=f32 --stag=acdb --dtag=acdb --attr-scales=src:common:0.5+dst:common:0.5+wei:per_oc --attr-zero-points=src:common:1+dst:common:1 --attr-post-ops=relu --attr-scratchpad=user mb1ic8ih12oc8oh12kh1ph0
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.01s; fill: 0.00s (52%); compute_ref: 0.00s (5%); compare: 0.00s (11%);
Environment
- system: Linux 22.04.1-Ubuntu SMP aarch64 aarch64 aarch64 GNU/Linux
- gcc: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
- cmake cmake version 3.22.1
Steps to reproduce
- Build library:
1. setup ACL library
git clone --branch v24.11.1 --depth 1 https://github.com/ARM-software/ComputeLibrary.git
git checkout 1f3bf6bbc4a1a57b5915fc0a19b195ae53acc06d
scons -j4 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 os=linux arch=armv8.2-a build=native multi_isa=1 fixed_format_kernels=1 cppthreads=0 openmp=1 examples=0 validation_tests=0
2. export ACL_ROOT_DIR=/path/to/ComputeLibrary
3. build oneDNN
cmake .. -DDNNL_AARCH64_USE_ACL=ON -DONEDNN_BUILD_GRAPH=ON -DDNNL_CPU_RUNTIME=OMP -DONEDNN_WERROR=ON -DDNNL_BUILD_FOR_CI=ON -DONEDNN_TEST_SET=NIGHTLY -DCMAKE_BUILD_TYPE=Debug
make -j 4
- Run test:
ONEDNN_VERBOSE=1 ./tests/benchdnn/benchdnn --conv --reset --allow-enum-tags-only=0 --engine=cpu --dir=FWD_I --alg=direct --dt=u8:s8:u8 --bia-dt=f32
--stag=acdb --wtag=any --dtag=acdb --attr-post-ops=eltwise_relu --attr-scales=src0:common:0.5+dst:common:0.5+wei:per_oc --attr-zero-points=src0:common:1+dst:common:1 --attr-scratchpad=user
mb1_ic8oc8_ih12oh12kh1sh1dh0ph0_iw12ow12kw1sw1dw0pw0