Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Core-CCD pairing is wrong for 3950x #148

Closed
Kodikuu opened this issue Oct 17, 2020 · 18 comments
Closed

[BUG] Core-CCD pairing is wrong for 3950x #148

Kodikuu opened this issue Oct 17, 2020 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@Kodikuu
Copy link

Kodikuu commented Oct 17, 2020

Describe the bug

The 3950x's CCDs are cores 0-7 + 16-23, and 8-15, 24-31.

bpytop treats it like 0-15, and 16-31.

The two CCD's temperatures are flipped across 8-23 as a result

To Reproduce

Have a 3950x

Expected behavior

C1-C8 should use Tccd1
C9-16 should use Tccd2
C17-24 should use Tccd1
C25-32 should use Tccd2

Screenshots

Cores marked yellow and pink need their temperatures flipped
image

CCD grouping; Orange is CCD1, Yellow is CCD2
image

Info (please complete the following information):

  • bpytop version: 1.0.42
  • psutil version: 5.7.0
  • (Linux) Linux distribution and version: Opensuse Tumbleweed
  • Terminal used: Windows Terminal (VM), Tilda
  • Font used: Cascadia Code PL
  • Python version: 3.8.5

Additional context

contents of ~/.config/bpytop/error.log

17/10/20 (16:15:07) | INFO: New instance of bpytop version 1.0.42 started with pid 14860
17/10/20 (16:15:07) | INFO: Loglevel set to DEBUG
17/10/20 (16:15:07) | DEBUG: Using psutil version 5.7.0
17/10/20 (16:15:07) | DEBUG: CMD: /usr/local/bin/bpytop --debug
17/10/20 (16:15:08) | DEBUG: Collect and draw completed in 0.064445 seconds
17/10/20 (16:15:09) | DEBUG: Init completed in 2.309829 seconds
17/10/20 (16:15:12) | INFO: Exiting. Runtime 0:00:06

@Kodikuu Kodikuu added the bug Something isn't working label Oct 17, 2020
@Kodikuu Kodikuu changed the title [BUG] CPU-Temperature pairing is wrong for 3950x [BUG] Core-CCD pairing is wrong for 3950x Oct 17, 2020
@aristocratos
Copy link
Owner

@Kodikuu
The ordering was based on discussion in aristocratos/bashtop#48
I don't have a ryzen cpu myself so can only base it on other peoples experience.
Do you have any documentation or screenshots from other software to verify that your ordering is correct?

@Kodikuu
Copy link
Author

Kodikuu commented Oct 17, 2020

The ordering appears correct for pre-ComboPi AGESA bios, but with ComboPi it matches Intel's layout;
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#XML_examples

The ComboPi AGESA was released in March of 2019, bringing Zen2 support.

The link above does agree with my personal experience with the 3950x. I have a somewhat significant CCD temperature gap, as Windows runs on one CCD, and Linux on the other.

@aristocratos
Copy link
Owner

@Kodikuu
Would need a reliable way for python to distinguish between the two then.
Is "Zen 1/2" mentioned in /proc/cpuinfo?
If that's the case, I could add check and use the newer ordering if Zen architecture is detected and version isn't 1.
Or is Zen 1 ordered this way also if the newer bios is used?

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

I believe all are ordered this way if the newer bios is used. "Zen 1/2" is not mentioned in /proc/cpuinfo.

Perhaps a config toggle?

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

However, /proc/cpuinfo also shows the correct mapping of processor to core id, perhaps you could use that;
image

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

And just to be certain; Stressing the windows side with htop and bpytop up
image

@aristocratos
Copy link
Owner

aristocratos commented Oct 19, 2020

@Kodikuu

However, /proc/cpuinfo also shows the correct mapping of processor to core id, perhaps you could use that;

Don't know if that's too much help, the mapping looks the same for an Intel cpus with hyper threading.
What information is needed to calculate the mapping correctly would be:

  1. Number of CCD's
  2. If hyperthreading (or what AMD calls their version of it) is enabled
  3. Zen version
  4. Bios version if Zen == 1

Would you mind exploring your own system a bit and see if you can find any of that information from either /proc or /sys?

Right now the only temps collected is for CPU die and for CCD's (which could change to be per core in a future kernel/sensors update), so don't know if that would be a reliable source for number of CCD's either.

The alternative as you said would be to have a toggle, but I'm not sure if I like that, since it isn't really user friendly and would likely leave people not knowledgeable about this with an incorrect CCD temperature representation.

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

cpupower agrees with my mapping. And yeah, it looks like Intel because that's what it was changed to.

image
/proc/cpuinfo also shows the total number of cores, and the microcode version

image
sensors can be parsed for the number of CCDs, which you can then split evenly among core IDs rather than processor number

/sys/devices/system/cpu/smt determines "hyperthreading"/SMT
/sys/devices/system/cpu/online returns the range of cores running

A super messy script;
image

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

And a slightly less messy script
image

Prints how each core maps to each CCD

Mind you, this ignores the old AGESA, need to use /sys/devices/system/cpu/cpuX/topology/core_id for that really

@aristocratos
Copy link
Owner

@Kodikuu
Could you run
python3 -c "import psutil; print(psutil.cpu_count(logical=True), psutil.cpu_count(logical=False))"
Realised smt could be determined if logical cores vs non logical cores don't match.
and
python3 -c "import psutil; print(psutil.sensors_temperatures())"
just to see that the labels for the sensors are carried over.
I believe most of the values can be collected trough psutil.
What's missing is zen version and bios version.

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

python3 -c "import psutil; print(psutil.cpu_count(logical=True), psutil.cpu_count(logical=False))"

32, 16

python3 -c "import psutil; print(psutil.sensors_temperatures())"

{'it8792': [shwtemp(label='', current=40.0, high=127.0, critical=127.0), shwtemp(label='', current=-55.0, high=127.0, critical=127.0), shwtemp(label='', current=35.0, high=127.0, critical=127.0)], 'nvme': [shwtemp(label='Composite', current=33.85, high=76.85, critical=79.85), shwtemp(label='Composite', current=35.85, high=76.85, critical=79.85)], 'acpitz': [shwtemp(label='', current=16.8, high=20.8, critical=20.8)], 'k10temp': [shwtemp(label='Tctl', current=43.875, high=None, critical=None), shwtemp(label='Tdie', current=43.875, high=None, critical=None), shwtemp(label='Tccd1', current=49.5, high=None, critical=None), shwtemp(label='Tccd2', current=42.5, high=None, critical=None)]}

@Kodikuu
Copy link
Author

Kodikuu commented Oct 19, 2020

image

This does away with the need to know Zen/BIOS

@aristocratos
Copy link
Owner

@Kodikuu
Will take a look at later this week when I've got some time.
But if you're feeling productive and up for it, feel free to make a PR.
The code for the Ryzen temp mapping starts at line 2762 in bpytop.py
You might wanna split up elif cpu_type == "ryzen" or cpu_type == "other": in to two seperate branches though.

Any needed code to set a variable with mapping information could be included in a IF statement after line 4720, since the CPU_NAME variable should contain "Ryzen" if it is a ryzen cpu.

@88mm
Copy link

88mm commented Oct 20, 2020

I guess this is a "me too" except I am using an Intel i9-7940X. I noticed that when running a cpu-bound single-threaded app that bpytop correctly shows the "core" that is running at 100% but the hot (temperature) is shown on a different core. Using i7z ( https://github.com/ajaiantilal/i7z ) I see the same "core" is running at 100% and the high temperature is also displayed on that core.

@aristocratos
Copy link
Owner

@88mm
This is about a specific problem related to Ryzen cpus, if you are having problems you're gonna have to create your own bug report. Neither the cause or the solution will be the same.

@aristocratos
Copy link
Owner

@Kodikuu
Try out v1.0.45, only tested with virtual ryzen stats so let me know if it's off.

@Kodikuu
Copy link
Author

Kodikuu commented Oct 25, 2020

Gave it a whirl, it's perfect!
Screenshot_2020-10-25_20-33-26

Thanks very much for the fix

@Kodikuu Kodikuu closed this as completed Oct 25, 2020
@aristocratos
Copy link
Owner

@Kodikuu
No problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants