-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incoherent power settings and behaviour of Vega frontier edition #67
Comments
For the fan, it was an issue with the SMI taking values at different times and giving different results. That part should be fine in 2.2 now that I handled a race condition where the value and the percent weren't matching up, giving skewed percentages. As for the incorrect settings, I am unsure as it could just be the quirk of the GPU itself in terms of how it handles the voltages and what "optimal" values are. Maybe @Jgreathouse has an idea. |
IIRC, I've observed the same behavior but I have not had the bandwidth to root cause it. Hunting down power idiosyncrasies takes a tremendous amount of work since it crosses half a dozen layers of software and firmware. |
I suspect that this is issue with firmware because I remember some strange results doing my ill-fated windows attempts one year ago. Given that difference is more than 25% it must be worth for AMD to invest the resources and figure out what is doing crap. It will be a huge performance/power advantage of AMD if its cards start working more efficiently just by fixing some power profile selection algo or something. |
It's been a while since I tested this, but IIRC I was only able to reproduce the problem when setting custom powerplay tables rather than using the tables we ship in the GPU's VBIOS. This significantly reduces the value proposition, since the vast majority of normal users aren't setting custom powerplay tables. |
Do you mean powerplay tables like changing in the driver? I didn't change them. Only run the commands in description. Given different use cases can't be covered well with default power/clock settings, in case user can easily tune them, that will be a huge advantage for AMD. I think for compute use cases this would be used a lot. 25% difference was only between "properly" optimized settings and settings that seem improper but effectively work awesome. If I run on vanilla default settings (no clock setting) card tops at 250W with lower performance. So changing power/clock settings can result in 50% improved characteristics. IMHO that's worth it. On the other hand I trust that AMD has a clue what must be a priority. Regards. |
To set the panel orientation property with quirk, we need the mode size provided by EDID. This info is available after EDID is read by dc_link_detect() and updated by amdgpu_dm_update_connector_after_detect(). The detection happens at driver load in amdgpu_dm_initialize_drm_device() and, therefore, we can get modes and set panel orientation before drm_dev_register() to avoid DRM warns on creating the connector property after device registration: [ 2.563969] ------------[ cut here ]------------ [ 2.563971] WARNING: CPU: 6 PID: 325 at drivers/gpu/drm/drm_mode_object.c:45 drm_mode_object_add+0x72/0x80 [drm] [ 2.563997] Modules linked in: btusb btrtl btbcm btintel btmtk bluetooth rfkill ecdh_generic ecc usbhid crc16 amdgpu(+) drm_ttm_helper ttm agpgart gpu_sched i2c_algo_bit drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw sdhci_pci atkbd libps2 cqhci vivaldi_fmap ccp sdhci i8042 crct10dif_pclmul crc32_pclmul hid_multitouch ghash_clmulni_intel aesni_intel crypto_simd cryptd wdat_wdt mmc_core cec xhci_pci sp5100_tco rng_core xhci_pci_renesas serio 8250_dw i2c_hid_acpi i2c_hid btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod pkcs8_key_parser crypto_user [ 2.564032] CPU: 6 PID: 325 Comm: systemd-udevd Not tainted 5.18.0-amd-staging-drm-next+ #67 [ 2.564034] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0105 03/21/2022 [ 2.564036] RIP: 0010:drm_mode_object_add+0x72/0x80 [drm] [ 2.564053] Code: f0 89 c3 85 c0 78 07 89 45 00 44 89 65 04 4c 89 ef e8 e2 99 04 f1 31 c0 85 db 0f 4e c3 5b 5d 41 5c 41 5d c3 80 7f 50 00 74 ac <0f> 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 4c [ 2.564055] RSP: 0018:ffffb2e880413860 EFLAGS: 00010202 [ 2.564056] RAX: ffffffffc0ba1440 RBX: ffff99508a860010 RCX: 0000000000000001 [ 2.564057] RDX: 00000000b0b0b0b0 RSI: ffff99508c050110 RDI: ffff99508a860010 [ 2.564058] RBP: ffff99508c050110 R08: 0000000000000020 R09: ffff99508c292c20 [ 2.564059] R10: 0000000000000000 R11: ffff99508c0507d8 R12: 00000000b0b0b0b0 [ 2.564060] R13: 0000000000000004 R14: ffffffffc068a4b6 R15: ffffffffc068a47f [ 2.564061] FS: 00007fc69b5f1a40(0000) GS:ffff9953aff80000(0000) knlGS:0000000000000000 [ 2.564063] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.564063] CR2: 00007f9506804000 CR3: 0000000107f92000 CR4: 0000000000350ee0 [ 2.564065] Call Trace: [ 2.564068] <TASK> [ 2.564070] drm_property_create+0xc9/0x170 [drm] [ 2.564088] drm_property_create_enum+0x1f/0x70 [drm] [ 2.564105] drm_connector_set_panel_orientation_with_quirk+0x96/0xc0 [drm] [ 2.564123] get_modes+0x4fb/0x530 [amdgpu] [ 2.564378] drm_helper_probe_single_connector_modes+0x1ad/0x850 [drm_kms_helper] [ 2.564390] drm_client_modeset_probe+0x229/0x1400 [drm] [ 2.564411] ? xas_store+0x52/0x5e0 [ 2.564416] ? kmem_cache_alloc_trace+0x177/0x2c0 [ 2.564420] __drm_fb_helper_initial_config_and_unlock+0x44/0x4e0 [drm_kms_helper] [ 2.564430] drm_fbdev_client_hotplug+0x173/0x210 [drm_kms_helper] [ 2.564438] drm_fbdev_generic_setup+0xa5/0x166 [drm_kms_helper] [ 2.564446] amdgpu_pci_probe+0x35e/0x370 [amdgpu] [ 2.564621] local_pci_probe+0x45/0x80 [ 2.564625] ? pci_match_device+0xd7/0x130 [ 2.564627] pci_device_probe+0xbf/0x220 [ 2.564629] ? sysfs_do_create_link_sd+0x69/0xd0 [ 2.564633] really_probe+0x19c/0x380 [ 2.564637] __driver_probe_device+0xfe/0x180 [ 2.564639] driver_probe_device+0x1e/0x90 [ 2.564641] __driver_attach+0xc0/0x1c0 [ 2.564643] ? __device_attach_driver+0xe0/0xe0 [ 2.564644] ? __device_attach_driver+0xe0/0xe0 [ 2.564646] bus_for_each_dev+0x78/0xc0 [ 2.564648] bus_add_driver+0x149/0x1e0 [ 2.564650] driver_register+0x8f/0xe0 [ 2.564652] ? 0xffffffffc1023000 [ 2.564654] do_one_initcall+0x44/0x200 [ 2.564657] ? kmem_cache_alloc_trace+0x177/0x2c0 [ 2.564659] do_init_module+0x4c/0x250 [ 2.564663] __do_sys_init_module+0x12e/0x1b0 [ 2.564666] do_syscall_64+0x3b/0x90 [ 2.564670] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2.564673] RIP: 0033:0x7fc69bff232e [ 2.564674] Code: 48 8b 0d 45 0b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 0b 0c 00 f7 d8 64 89 01 48 [ 2.564676] RSP: 002b:00007ffe872ba3e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 2.564677] RAX: ffffffffffffffda RBX: 000055873f797820 RCX: 00007fc69bff232e [ 2.564678] RDX: 000055873f7bf390 RSI: 0000000001155e81 RDI: 00007fc699e4d010 [ 2.564679] RBP: 00007fc699e4d010 R08: 000055873f7bfe20 R09: 0000000001155e90 [ 2.564680] R10: 000000055873f7bf R11: 0000000000000246 R12: 000055873f7bf390 [ 2.564681] R13: 000000000000000d R14: 000055873f7c4cb0 R15: 000055873f797820 [ 2.564683] </TASK> [ 2.564683] ---[ end trace 0000000000000000 ]--- [ 2.564696] ------------[ cut here ]------------ [ 2.564696] WARNING: CPU: 6 PID: 325 at drivers/gpu/drm/drm_mode_object.c:242 drm_object_attach_property+0x52/0x80 [drm] [ 2.564717] Modules linked in: btusb btrtl btbcm btintel btmtk bluetooth rfkill ecdh_generic ecc usbhid crc16 amdgpu(+) drm_ttm_helper ttm agpgart gpu_sched i2c_algo_bit drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw sdhci_pci atkbd libps2 cqhci vivaldi_fmap ccp sdhci i8042 crct10dif_pclmul crc32_pclmul hid_multitouch ghash_clmulni_intel aesni_intel crypto_simd cryptd wdat_wdt mmc_core cec xhci_pci sp5100_tco rng_core xhci_pci_renesas serio 8250_dw i2c_hid_acpi i2c_hid btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod pkcs8_key_parser crypto_user [ 2.564738] CPU: 6 PID: 325 Comm: systemd-udevd Tainted: G W 5.18.0-amd-staging-drm-next+ #67 [ 2.564740] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0105 03/21/2022 [ 2.564741] RIP: 0010:drm_object_attach_property+0x52/0x80 [drm] [ 2.564759] Code: 2d 83 f8 18 74 33 48 89 74 c1 08 48 8b 4f 08 48 89 94 c1 c8 00 00 00 48 8b 47 08 83 00 01 c3 4d 85 d2 75 dd 83 7f 58 01 75 d7 <0f> 0b eb d3 41 80 78 50 00 74 cc 0f 0b eb c8 44 89 ce 48 c7 c7 28 [ 2.564760] RSP: 0018:ffffb2e8804138d8 EFLAGS: 00010246 [ 2.564761] RAX: 0000000000000010 RBX: ffff99508c1a2000 RCX: ffff99508c1a2180 [ 2.564762] RDX: 0000000000000003 RSI: ffff99508c050100 RDI: ffff99508c1a2040 [ 2.564763] RBP: 00000000ffffffff R08: ffff99508a860010 R09: 00000000c0c0c0c0 [ 2.564763] R10: 0000000000000000 R11: 0000000000000020 R12: ffff99508a860010 [ 2.564764] R13: ffff995088733008 R14: ffff99508c1a2000 R15: ffffffffc068a47f [ 2.564765] FS: 00007fc69b5f1a40(0000) GS:ffff9953aff80000(0000) knlGS:0000000000000000 [ 2.564766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.564767] CR2: 00007f9506804000 CR3: 0000000107f92000 CR4: 0000000000350ee0 [ 2.564768] Call Trace: [ 2.564769] <TASK> [ 2.564770] drm_connector_set_panel_orientation_with_quirk+0x4a/0xc0 [drm] [ 2.564789] get_modes+0x4fb/0x530 [amdgpu] [ 2.565024] drm_helper_probe_single_connector_modes+0x1ad/0x850 [drm_kms_helper] [ 2.565036] drm_client_modeset_probe+0x229/0x1400 [drm] [ 2.565056] ? xas_store+0x52/0x5e0 [ 2.565060] ? kmem_cache_alloc_trace+0x177/0x2c0 [ 2.565062] __drm_fb_helper_initial_config_and_unlock+0x44/0x4e0 [drm_kms_helper] [ 2.565072] drm_fbdev_client_hotplug+0x173/0x210 [drm_kms_helper] [ 2.565080] drm_fbdev_generic_setup+0xa5/0x166 [drm_kms_helper] [ 2.565088] amdgpu_pci_probe+0x35e/0x370 [amdgpu] [ 2.565261] local_pci_probe+0x45/0x80 [ 2.565263] ? pci_match_device+0xd7/0x130 [ 2.565265] pci_device_probe+0xbf/0x220 [ 2.565267] ? sysfs_do_create_link_sd+0x69/0xd0 [ 2.565268] really_probe+0x19c/0x380 [ 2.565270] __driver_probe_device+0xfe/0x180 [ 2.565272] driver_probe_device+0x1e/0x90 [ 2.565274] __driver_attach+0xc0/0x1c0 [ 2.565276] ? __device_attach_driver+0xe0/0xe0 [ 2.565278] ? __device_attach_driver+0xe0/0xe0 [ 2.565279] bus_for_each_dev+0x78/0xc0 [ 2.565281] bus_add_driver+0x149/0x1e0 [ 2.565283] driver_register+0x8f/0xe0 [ 2.565285] ? 0xffffffffc1023000 [ 2.565286] do_one_initcall+0x44/0x200 [ 2.565288] ? kmem_cache_alloc_trace+0x177/0x2c0 [ 2.565290] do_init_module+0x4c/0x250 [ 2.565291] __do_sys_init_module+0x12e/0x1b0 [ 2.565294] do_syscall_64+0x3b/0x90 [ 2.565296] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2.565297] RIP: 0033:0x7fc69bff232e [ 2.565298] Code: 48 8b 0d 45 0b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 0b 0c 00 f7 d8 64 89 01 48 [ 2.565299] RSP: 002b:00007ffe872ba3e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 2.565301] RAX: ffffffffffffffda RBX: 000055873f797820 RCX: 00007fc69bff232e [ 2.565302] RDX: 000055873f7bf390 RSI: 0000000001155e81 RDI: 00007fc699e4d010 [ 2.565303] RBP: 00007fc699e4d010 R08: 000055873f7bfe20 R09: 0000000001155e90 [ 2.565303] R10: 000000055873f7bf R11: 0000000000000246 R12: 000055873f7bf390 [ 2.565304] R13: 000000000000000d R14: 000055873f7c4cb0 R15: 000055873f797820 [ 2.565306] </TASK> [ 2.565307] ---[ end trace 0000000000000000 ]--- -- v2: - call amdgpu_dm_connector_get_modes() instead of ddc_get_modes() (Harry) Fixes: d77de78 ("amd/display: enable panel orientation quirks") Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
@akostadinov Apologies for the lack of response. Vega Frontier Edition boards are no longer supported on the latest ROCm 6.2. Thanks! |
Hi, I was playing with overdrive settings on my Vega Frontier Edition vs rocm-dkms-2.0.89-1.x86_64
I enabled overdrive settings and used
rocm-smi
to set some settings. By pure luck I'm observing a very strange behaviour. First of all I'm aware of ROCm/ROCm#463 (comment) and ROCm/ROCm#564 (comment) and these are the commands I prepared:With these settings my benchmark (ethminer) worked fine initially hitting 41.5MH/s but quickly card was throttled to 37MH/s. smi showed:
Earlier by mistake I tried different (not according to recommendations) settings which turn out to work much better though:
With the above settings the card works cooler and lower power at 41.5MH/s stable. smi shows:
The only difference is
AvgPwr
which dropped to 141W from 207 and card is cooler as a result thus performance not thermal throttled.I have verified that inside
/sys/class/drm/card0/device/pp_od_clk_voltage
settings are what were expected by myrocm-smi
commands (luck_settings.txt vs bad_settings.txt).So question is why the supposedly incorrect settings work way better than the recommended ones although gpu/mem clocks appear to be the same?
P.S. another interesting thing is why Fan works at 66% instead of 60% as command should set it to?
The text was updated successfully, but these errors were encountered: