You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
The MLSys '21 paper doesn't seem to mention how many cores (and hence, threads) were used on each machine to gather data, but based on the README file in this repo, it seems that the experiments were performed with different number of cores (and hence, threads) for both the machines.
Besides the data reported in the paper, had you also compared performance (without BF16) on Cascade Lake & Cooper Lake by using equal number of cores for both?
I'm curious if you observed any improvement in AVX512 performance (besides BF16 support) in Cooper Lake over Cascade Lake, as Ice Lake SP (like Cooper Lake, it's also Xeon SP 3rd gen, but with 1 or 2 sockets, and 48 KB L1D cache on each core) reportedly has improvements pertaining to frequency (downclocking) when AVX512 instructions are used. Since GCP/AWS/Microsoft Azure don't have Cooper Lake, so it's not possible for me to gauge its performance.
Thank you!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello @nmeisburger, @uyongw & @iitkgpanshu,
The MLSys '21 paper doesn't seem to mention how many cores (and hence, threads) were used on each machine to gather data, but based on the README file in this repo, it seems that the experiments were performed with different number of cores (and hence, threads) for both the machines.
Besides the data reported in the paper, had you also compared performance (without BF16) on Cascade Lake & Cooper Lake by using equal number of cores for both?
I'm curious if you observed any improvement in AVX512 performance (besides BF16 support) in Cooper Lake over Cascade Lake, as Ice Lake SP (like Cooper Lake, it's also Xeon SP 3rd gen, but with 1 or 2 sockets, and 48 KB L1D cache on each core) reportedly has improvements pertaining to frequency (downclocking) when AVX512 instructions are used. Since GCP/AWS/Microsoft Azure don't have Cooper Lake, so it's not possible for me to gauge its performance.
Thank you!
The text was updated successfully, but these errors were encountered: