Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Query regarding comparison of Cascade Lake & Cooper Lake performance #4

Open
imaginary-person opened this issue Jun 15, 2021 · 0 comments

Comments

@imaginary-person
Copy link

Hello @nmeisburger, @uyongw & @iitkgpanshu,

The MLSys '21 paper doesn't seem to mention how many cores (and hence, threads) were used on each machine to gather data, but based on the README file in this repo, it seems that the experiments were performed with different number of cores (and hence, threads) for both the machines.

Besides the data reported in the paper, had you also compared performance (without BF16) on Cascade Lake & Cooper Lake by using equal number of cores for both?

I'm curious if you observed any improvement in AVX512 performance (besides BF16 support) in Cooper Lake over Cascade Lake, as Ice Lake SP (like Cooper Lake, it's also Xeon SP 3rd gen, but with 1 or 2 sockets, and 48 KB L1D cache on each core) reportedly has improvements pertaining to frequency (downclocking) when AVX512 instructions are used. Since GCP/AWS/Microsoft Azure don't have Cooper Lake, so it's not possible for me to gauge its performance.

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant