-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage while idle #66
Comments
can you please provide more info? say paste here htop screenshot showing the CPU usage? |
Is it possible to adjust this timeout, say by calculating it precisely, or making it adapt with load? Going by powertop, my (already busy) laptop with all the usual heavyweight apps open (Firefox, Thunderbird, Signal) still experiences 40% more wakeups just with the idle DF running. Thanks :) Hardware: Kernel: OS: Ubuntu 21.10 /proc/cmdline: Htop: Powertop with: Powertop without: perf trace -s
|
it's really weird. dragonfly should not get CPU so high in the idle state.
|
I had to zip them due to attachment restrictions on GitHub |
FYI dropping kSpinLimit to 0 on this hardware brings the CPU usage to closer to 30% (as reported by top across all CPUs, previously this was closer to 400/500%) timeout -sINT 10 perf trace -s -p 440125
|
I attached the svg that I succeeded to get from the profiler. I see you built the binary yourself. could you build it in debug mode and without changes? if you run |
Note GitHub strips interactivity from the SVG when you upload it here |
yes, you are absolutely correct. It did not show such extreme behavior on my amd machine , on my intel laptop and on any of the cloud instances I run dragonfly. |
Just curious what generation / chip your Intel laptop is? I have extremely vague memories of the pause instruction changing at some point. This CPU is from Q3 2015, and (due to chipset bug with this particular Dell laptop) is running without any microcode updates Out of curiosity, going to try to reboot with microcode updates enabled. Haven't tried recently, but this likely still causes the machine to hang. Will report if there is any discernible difference |
|
Hmm, so your CPU is Kaby Lake and mine is Skylake. Going to try that reboot-with-microcode now |
I am trying to fix what the right fix will be. I need to poll in cloud env, so probably we need a runtime flag. |
Can you reduce the multiplicative factor in |
Crazy suggestion (and probably surprising behaviour), but since you're Linux-only already, what about |
it's best we will be able to calibrate automatically. On my laptop for example, dragonfly can run without any changes. |
Reducing the factor from 100 to 1 reduces top-measured %CPU from 400-500% to around 80% with the debug build. Powertop-measured wake count is approximately identical |
Just to report, no change in behaviour with microcode updates applied:
|
I think the easiest solution will be to expose run-time arguments directly. you will find what works best for you and report here. I will try to implement a simple calibration procedure that will adjust those arguments according to what you tell me. I think that 80% is still too high. |
Some X/Y problem missing in my original report: I just want to develop against DF locally and don't care about performance at all. A single thread that sleeps politely and offers 10 requests/sec but is extremely kind to battery is much more desirable in this environment :) I know it's difficult to squeeze both use cases in one binary, but really I think maybe that is what this bug was asking for. Ideally maybe something like a |
lets try with spinlock arguments and if these won't work to your satisfaction we will add |
I created a branch named SpinArgs2 for dragonfly. you will also need to update helio submodule with Once you rebuild, pls run with Thanks! |
It's probably something to do with me, but I can't find "took" from that new log line anywhere:
|
oh shit I forgot to update helio submodule. please pull and update again |
Pause measurement:
And on battery:
|
It doesn't seem possible on this laptop to get usage below 30% (reported by top) regardless of the values given in the flags I haven't used io_uring in anger yet, but note |
I think reducing from 800 to 30 is great progress already. The rest can be attributed to expiry policy. Please change line engine_shard_set.cc:58 to use greater values than 1ms. will it fix the problem?. |
Also, are you still in debug mode? |
I ran with:
And bumped the interval to 1000ms, and now DF has disappeared into the background noise even in powertop output. I don't know what this breaks, though. Nope, these are opt builds |
and are you saying that increasing any of the flags will bring it back to front? what |
With the default flags, top reports ~5% CPU and pause timing is:
With the jinxed flags from above, ~0.7% CPU and:
I have to head out for a while, will pick this up tomorrow. Thanks a ton for your help and time |
…ker architectures. Motivation: dragonflydb/dragonfly#66
…ker architectures. Motivation: dragonflydb/dragonfly#66
…ker architectures. Motivation: dragonflydb/dragonfly#66
Cheers, mate!
Can you, please, try with |
Good morning :)
Thanks for all your help |
Good idea. Will do. I will send you the PR shortly. |
Created #70. Feel free to use this branch and if everything is fine I will submit it later today. |
This is perfect, thank you so much! What a fun journey :) timeout -sINT 10 perf trace -p `pidof dragonfly` -s
Final powertop: |
DragonflyDB appears to be continuously polling, with no sleeps or wakeups, on all CPUs even when completely idle.
Beyond energy efficiency, this makes it a liability in combination with any laptop battery, or any environment where Dragonfly must run alongside other servers (say, a classic LAMP style VPS environment)
Is there any way to adjust the design so that CPU usage is proportional to load? Ideally there are no wakeups when there are no connections or background work, although this is something Redis never managed to achieve either.
The text was updated successfully, but these errors were encountered: