I had trouble getting this to build, with errors on _mm512_maskz_loadu_ph/_mm512_mask_storeu_ph:
zendnnl/src/common/float16.hpp error: '_mm512_mask_storeu_ph' was not declared in this scope; did you mean '_mm512_mask_storeu_ps'?
I worked around this by updating the code there to use the legacy call instead, and that solves this hurdle in the build for me:
diff --git a/zendnnl/src/common/float16.hpp b/zendnnl/src/common/float16.hpp
index f8be2ef..31c38cd 100644
--- a/zendnnl/src/common/float16.hpp
+++ b/zendnnl/src/common/float16.hpp
@@ -199,20 +199,12 @@ class float16_t {
__attribute__((always_inline, target("avx512f,avx512vl,avx512bw,avx512fp16")))
static inline __m512h f16_maskz_loadu_vec(__mmask32 k, const void *addr) {
-#if (__GNUC__ < 14)
return (__m512h)_mm512_maskz_loadu_epi16(k, addr);
-#else
- return _mm512_maskz_loadu_ph(k, addr);
-#endif
}
__attribute__((always_inline, target("avx512f,avx512vl,avx512bw,avx512fp16")))
static inline void f16_mask_storeu_vec(void *addr, __mmask32 k, __m512h val) {
-#if (__GNUC__ < 14)
_mm512_mask_storeu_epi16(addr, k, (__m512i)val);
-#else
- _mm512_mask_storeu_ph(addr, k, val);
-#endif
}
#endif // __GNUC__ >= 12
~/opensrc/ZenDNN main* devoidfury@ai365
❯ gcc --version
gcc (GCC) 16.1.1 20260430
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
~/opensrc/ZenDNN main* devoidfury@ai365
❯ g++ --version
g++ (GCC) 16.1.1 20260430
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
~/opensrc/ZenDNN main* devoidfury@ai365
❯ cat /proc/cpuinfo | head -n 27
processor : 0
vendor_id : AuthenticAMD
cpu family : 26
model : 112
model name : AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
stepping : 0
microcode : 0xb70001e
cpu MHz : 2000.000
cache size : 1024 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user vmscape
bogomips : 5990.02
TLB size : 192 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
I had trouble getting this to build, with errors on _mm512_maskz_loadu_ph/_mm512_mask_storeu_ph:
I worked around this by updating the code there to use the legacy call instead, and that solves this hurdle in the build for me: