Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSGO running smooth for a couple seconds, then HEAVILY dropping, then going back to normal, repeat #335

Closed
duckyondiscord opened this issue Jul 25, 2022 · 30 comments
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate

Comments

@duckyondiscord
Copy link

NVIDIA Open GPU Kernel Modules Version

515.57-9

Does this happen with the proprietary driver (of the same version) as well?

No

Operating System and Version

Arch Linux

Kernel Release

5.19.0-rc7-1-mainline

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3050 Laptop GPU (UUID: GPU-712fbdf4-63a5-5e55-3624-58bcb8b9aac3)

Describe the bug

With the proprietary driver, CSGO runs at around the same framerate as it does with the open-source driver. Except that with the open-source driver, the game runs smooth(200-250FPS) for around 3-5 seconds, then drops to around 16-20FPS for around the same amount of time, and then it goes back to normal and repeats the cycle indefinitely.

To Reproduce

  • Open CS:GO
  • Start a game, preferably with bots on Mirage because that's what I tested it on.
  • Hopefully see the same results as I did.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

@duckyondiscord duckyondiscord added the bug Something isn't working label Jul 25, 2022
@niv
Copy link
Member

niv commented Jul 26, 2022

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

@duckyondiscord
Copy link
Author

duckyondiscord commented Jul 26, 2022

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

By backup you mean a program that constantly backs up files? If so, there is no program manipulating files in the background that I know of. KDE Plasma's baloo file indexer is also disabled.

There's absolutely nothing impacting framerate running in the background that I know of. I usually do checks every week/day to see if there's stuff running that I don't want.

@niv
Copy link
Member

niv commented Jul 26, 2022

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

By backup you mean a program that constantly backs up files? If so, there is no program manipulating files in the background that I know of. KDE Plasma's baloo file indexer is also disabled.

Yep, I was just asking if there's something that eats significant system resources, to explain the hiccups.

@duckyondiscord
Copy link
Author

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

By backup you mean a program that constantly backs up files? If so, there is no program manipulating files in the background that I know of. KDE Plasma's baloo file indexer is also disabled.

Yep, I was just asking if there's something that eats significant system resources, to explain the hiccups.

With the same set of background programs(not a lot, and none of them significantly impacts resource usage), the proprietary driver does not have these hiccups.

@niv
Copy link
Member

niv commented Jul 26, 2022

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

By backup you mean a program that constantly backs up files? If so, there is no program manipulating files in the background that I know of. KDE Plasma's baloo file indexer is also disabled.

Yep, I was just asking if there's something that eats significant system resources, to explain the hiccups.

With the same set of background programs(not a lot, and none of them significantly impacts resource usage), the proprietary driver does not have these hiccups.

Do you perchance now if you saw this happen on a previous OpenRM release (not .57)?

@duckyondiscord
Copy link
Author

Thanks for the report. This is on a clean system with no significant background processes on your end (e.g. backup running)?

By backup you mean a program that constantly backs up files? If so, there is no program manipulating files in the background that I know of. KDE Plasma's baloo file indexer is also disabled.

Yep, I was just asking if there's something that eats significant system resources, to explain the hiccups.

With the same set of background programs(not a lot, and none of them significantly impacts resource usage), the proprietary driver does not have these hiccups.

Do you perchance now if you saw this happen on a previous OpenRM release (not .57)?

This one's the first open-source driver I tried, if that's what you mean.

@niv
Copy link
Member

niv commented Jul 26, 2022

Thanks for the report. Tracking internally in bug 3732803.

This issue will be updated when there's progress.

@niv niv added the NV-Triaged An NVBug has been created for dev to investigate label Jul 26, 2022
@duckyondiscord
Copy link
Author

Thanks for the report. Tracking internally in bug 3732803.

This issue will be updated when there's progress.

Alright, thanks a lot!

@aritger
Copy link
Collaborator

aritger commented Jul 26, 2022

@duckyondiscord : One experiment that would be useful is if you could test with the proprietary driver but with GSP enabled (the open kernel modules unconditionally use GSP firmware, but the proprietary driver defaults to not yet using GSP firmware on GeForce RTX 3050). It would help to know if the performance problem reproduces with the proprietary driver + GSP.

http://us.download.nvidia.com/XFree86/Linux-x86_64/515.57/README/gsp.html

Add options nvidia NVreg_EnableGpuFirmware=1 to a modprobe.d configuration file.

@duckyondiscord
Copy link
Author

duckyondiscord commented Jul 27, 2022

@duckyondiscord : One experiment that would be useful is if you could test with the proprietary driver but with GSP enabled (the open kernel modules unconditionally use GSP firmware, but the proprietary driver defaults to not yet using GSP firmware on GeForce RTX 3050). It would help to know if the performance problem reproduces with the proprietary driver + GSP.

http://us.download.nvidia.com/XFree86/Linux-x86_64/515.57/README/gsp.html

Add options nvidia NVreg_EnableGpuFirmware=1 to a modprobe.d configuration file.

Sure, I can try that

@duckyondiscord
Copy link
Author

@duckyondiscord : One experiment that would be useful is if you could test with the proprietary driver but with GSP enabled (the open kernel modules unconditionally use GSP firmware, but the proprietary driver defaults to not yet using GSP firmware on GeForce RTX 3050). It would help to know if the performance problem reproduces with the proprietary driver + GSP.

http://us.download.nvidia.com/XFree86/Linux-x86_64/515.57/README/gsp.html

Add options nvidia NVreg_EnableGpuFirmware=1 to a modprobe.d configuration file.

I don't really know how modprobe configs work, but I'm speculating that I just create a config file named nvidia.conf in /etc/modprobe.d/ and type those options in, right?

@aritger
Copy link
Collaborator

aritger commented Jul 27, 2022

Correct. See also the modprobe.d(5) man page for more details.

You can follow the same pattern you're using to set the NVreg_OpenRmEnableUnsupportedGpus kernel module parameter to enable the open kernel modules on a notebook GPU... or is it the Arch Linux package that sets that?

@duckyondiscord
Copy link
Author

duckyondiscord commented Jul 27, 2022

@duckyondiscord : One experiment that would be useful is if you could test with the proprietary driver but with GSP enabled (the open kernel modules unconditionally use GSP firmware, but the proprietary driver defaults to not yet using GSP firmware on GeForce RTX 3050). It would help to know if the performance problem reproduces with the proprietary driver + GSP.

http://us.download.nvidia.com/XFree86/Linux-x86_64/515.57/README/gsp.html

Add options nvidia NVreg_EnableGpuFirmware=1 to a modprobe.d configuration file.

Okay, I tested it with the GSP enabled, and I'm actually seeing a performance BENEFIT of about 10-20FPS instead of the lag I see with the open driver.
And, yes, nvidia-smi -q | grep GSP shows that it's in fact enabled.

@duckyondiscord
Copy link
Author

duckyondiscord commented Jul 27, 2022

I'll add this to my original issue:
In 515.57-9, the -9 is added by the Arch package, so, take it as if it was just 515.57

@aritger
Copy link
Collaborator

aritger commented Jul 28, 2022

Well, that's not what I expected, but good to know. To be clear: with GSP enabled, you see stable performance (10-20fps greater than without GSP), rather than 3-5 seconds at one performance and then 3-5 seconds at the faster performance?

@duckyondiscord
Copy link
Author

duckyondiscord commented Jul 28, 2022

Well, that's not what I expected, but good to know. To be clear: with GSP enabled, you see stable performance (10-20fps greater than without GSP), rather than 3-5 seconds at one performance and then 3-5 seconds at the faster performance?

It's sort of hard-to-measure, as CSGO's performance fluctuates a lot depending on the map, and obviously how many people, guns, particles are in the scene, but I'd say, on average I have better performance with the GSP enabled.

What is interesting though, is that I see a new performance issue with the GSP and the proprietary driver, and it's that the game totally 100% freezes around 15-30 minutes into playing. I have no idea if I can report issues with the proprietary driver on this GitHub page though.

@mtijanic
Copy link
Collaborator

mtijanic commented Aug 3, 2022

The unexpected part is that the proprietary module running in GSP-offload is faster than the open source module. Those two should behave mostly identically.

Is it possible to get an nvidia-bug-report.log from the proprietary driver running GSP-offload and running CS:GO?

@duckyondiscord
Copy link
Author

The unexpected part is that the proprietary module running in GSP-offload is faster than the open source module. Those two should behave mostly identically.

Is it possible to get an nvidia-bug-report.log from the proprietary driver running GSP-offload and running CS:GO?

sure, I could manage that

@duckyondiscord
Copy link
Author

duckyondiscord commented Aug 5, 2022

The unexpected part is that the proprietary module running in GSP-offload is faster than the open source module. Those two should behave mostly identically.

Is it possible to get an nvidia-bug-report.log from the proprietary driver running GSP-offload and running CS:GO?

nvidia-bug-report.log.gz

@duckyondiscord
Copy link
Author

Is this still not addressed? How is GSP interface different in open driver? BTW, is GSP faster in other applications (I am talking about propritaery driver here, sinde it is faster in CSGO).

I will try this again shortly and will test some other apps with the GSP

@bno1
Copy link

bno1 commented Sep 11, 2022

I have a Lenovo Legion 7 15IMHg05 (RTX 2060) and in basically every games I get random fps drops every ~2 minutes. I discovered that during those drops the CPU clock drops from 3-4GHz to 800MHz.

I found 2 different solutions to this problem:
a) Disable CPU turbo
b) Switch my laptop from quiet/balanced mode to performance mode

I don't know the source of this problem. I have a few more details on an arch forum thread: https://bbs.archlinux.org/viewtopic.php?id=273136 in case I missed something,

@mtijanic
Copy link
Collaborator

Hi @duckyondiscord, we did use CS:GO in a lot of our internal testing (and some folks that dogfood this driver also play CS:GO), and while we did get our fair share of issues, we were unable to reproduce this particular instance.

Typically when we see such severe frame stutter, it is because there's a different process polling some GPU state (e.g. MangoHUD or similar overlay using NVML to get GPU stats). I assume that is not the case here, since you did mention no background processes, but just checking?
That would not explain the difference between Open-GSP and Proprietary-GSP versions anyway.

Anyway, would it be possible for you to run some additional diagnostics on your machine? We need some way to correlate these stutters with what the driver is doing, and from just the log it's hard to say since we have no idea what the game is doing at the time.
I can think of two ways to get this data:

  1. A bpftrace script that will instrument both csgo and the kernel driver
  2. A small lib to LD_PRELOAD when running csgo and (possibly) a patch to the kernel driver

Please let me know if you're comfortable with either of these options (obviously source available for all of it) and I can prepare it.

Either way, thanks a bunch for the report and your testing so far!

@duckyondiscord
Copy link
Author

duckyondiscord commented Sep 29, 2022

Hi @duckyondiscord, we did use CS:GO in a lot of our internal testing (and some folks that dogfood this driver also play CS:GO), and while we did get our fair share of issues, we were unable to reproduce this particular instance.

Typically when we see such severe frame stutter, it is because there's a different process polling some GPU state (e.g. MangoHUD or similar overlay using NVML to get GPU stats). I assume that is not the case here, since you did mention no background processes, but just checking?
That would not explain the difference between Open-GSP and Proprietary-GSP versions anyway.

Anyway, would it be possible for you to run some additional diagnostics on your machine? We need some way to correlate these stutters with what the driver is doing, and from just the log it's hard to say since we have no idea what the game is doing at the time.
I can think of two ways to get this data:

  1. A bpftrace script that will instrument both csgo and the kernel driver
  2. A small lib to LD_PRELOAD when running csgo and (possibly) a patch to the kernel driver

Please let me know if you're comfortable with either of these options (obviously source available for all of it) and I can prepare it.

Either way, thanks a bunch for the report and your testing so far!

Sorry for the late response, I can do those if you tell me where I can get that bpftrace script, the driver patch and the library I should load with LD_PRELOAD
Edit: I don't have anything polling GPU state/stats running.

@mtijanic
Copy link
Collaborator

Thanks! I'll get back to you once I've prepared the scripts. Installing CS:GO now :)

@duckyondiscord
Copy link
Author

duckyondiscord commented Oct 2, 2022

I've been having another issue with render offloading recently, which may stop me from being able to debug this for a while.
It's a really strange one, since CS:GO seems to be using the NVIDIA GPU, according to nvidia-smi, but my frame rate's stuck around 60-75, and it feels even less.
Also occurs on the proprietary driver, so I don't know where to report this one.

@duckyondiscord
Copy link
Author

Closing as it got fixed, I don't know which update fixed it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests

6 participants
@niv @aritger @bno1 @mtijanic @duckyondiscord and others