-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M128-28 not getting better scores than M96-28 #11
Comments
Using RAM provided by Ampere (thanks!) I've upgraded the system to 384 GB of RAM:
For tinymembench results, see: geerlingguy/sbc-reviews#19 (comment) And running again:
I still think something is bottlenecking the CPU. Maximum SoC Temperature was around 70°C, and max CPU power was around 144W. It should have more headroom, so I'm wondering if there might be something on the system or in the BIOS limiting power? |
Running a
And NUMA configuration is set to Monolithic:
|
Given your stream results and the fact that this is a 6 memory channel platform, you might be hitting the limits of memory bandwidth, Jeff. If that is the case, then the extra cores will not help since both processors are using DDR-4/3200. A quick way to find out what your memory bandwidth usage is while running HPL is using the PMUs. It collects the number of memory requests every second. Multiply by 64 (the cache line size). That will give you bytes/second. perf stat -e hnf_mc_reqs -I 1000 |
@naren-ampere - When I started the HPL run, it quickly went from around 300,000 to 2.6 billion, and seemed to top out around there:
2,718,404,089 * 64 = 173,977,861,696 bytes/sec? (Is that correct?) And as the test continued on, the numbers hovered between 2.0-2.4 billion counts (down from the max of around 2.7 billion). While monitoring with Looking at Anandtech's article, it seems the 128-core CPU can max out around 175 GB/s at lower thread counts, but dips down to the 140s-150s as you hit 120+ threads. 150-160 GB/sec seems to be in that range at least? (Or am I interpreting it incorrectly?) |
@geerlingguy, yes, the math is correct. I'm impressed you're getting 174 GB/sec with your config. |
Those numbers came from the M128-28 part — I haven't tested with perf on the M96-28 part yet, I'm currently a bit space-constrained but I'm going to try to set up my Dev Kit with the M96-28 CPU so I can run both for comparison without having to do a full CPU swap each time :) |
I'm also going to test with a Q64-22 to see how things scale (I'm presuming it will not be memory-bound). See: geerlingguy/top500-benchmark#19 |
That CPU has a similar efficiency, and scores about half the 128-core part, so I'm going to count it as a win, and say the bottleneck is memory in terms of getting the maximum possible scores. Not that the 6-channel board is a slouch, it's just that we would need to go with a server-grade motherboard to get any higher. |
Hi Jeff! Great post and video on YT. Would you mind sharing the model number of the RAM that you used? |
Samsung DDR3200 ECC RAM - specifically I have about 12 of these sticks now, as I've ordered a few sets for testing. |
I've bumped Ps/Qs, and tweaked Ns a bit, but for some reason on my system, I can't get the M128-28 CPU I swapped in to get any higher number than my M96-28 CPU...
For the 96-core CPU, see: #10
I have kept everything else in the system identical (same Samsung 96 GB RAM, no additional PCIe cards, same USB network adapter plugged in, using VGA monitor output), but with the M128-28 CPU, I changed HPL.dat to use:
That resulted in
1118.5 Gflops
, nearly identical to the 96-core result. I installedlm-sensors
and ranwatch sensors
, and the SoC temp never rose above 65-67°C. CPU power hovered around 125-130W, and never went any higher. Checkingcat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
, almost all cores were locked at2800000
, but maybe 10-15 would go up and down.Here's the full HPL result. I also tried
105000
forN
, with almost identical results.Is there something I could be missing? Do I need to change anything else in BIOS or on the COM-HPC carrier from ADLINK to unlock the additional performance / power? I believe the chip should go up to 170W or maybe even a little more at full blast...
The text was updated successfully, but these errors were encountered: