Hello, I'm trying to run cblas_sbgemm on WSL & AMD CPU but find it extremely slow, e.g. 30x slower than cblas_sgemm. Anyone knows how to debug and solve this issue?
Related issue (run on Anaconda Prompt): #4672
System info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 5 7640HS w/ Radeon 760M Graphics
CPU family: 25
Model: 116
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 1
BogoMIPS: 8583.32
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse ss
e2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nons
top_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch o
svw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid
avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl
xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_
clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload avx512vbmi umip avx512_
vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 6 MiB (6 instances)
L3: 16 MiB (1 instance)
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS
Not affected
Srbds: Not affected
Tsx async abort: Not affected
Build command:
cmake -DBUILD_BFLOAT16=ON -DBUILD_WITHOUT_LAPACK=yes -DNOFORTRAN=1 ..
make -j16
make install
Hello, I'm trying to run
cblas_sbgemmon WSL & AMD CPU but find it extremely slow, e.g. 30x slower thancblas_sgemm. Anyone knows how to debug and solve this issue?Related issue (run on Anaconda Prompt): #4672
System info:
Build command: