Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage while idle #66

Closed
occasionallydavid opened this issue May 31, 2022 · 37 comments
Closed

High CPU usage while idle #66

occasionallydavid opened this issue May 31, 2022 · 37 comments

Comments

@occasionallydavid
Copy link

DragonflyDB appears to be continuously polling, with no sleeps or wakeups, on all CPUs even when completely idle.

Beyond energy efficiency, this makes it a liability in combination with any laptop battery, or any environment where Dragonfly must run alongside other servers (say, a classic LAMP style VPS environment)

Is there any way to adjust the design so that CPU usage is proportional to load? Ideally there are no wakeups when there are no connections or background work, although this is something Redis never managed to achieve either.

@romange
Copy link
Collaborator

romange commented May 31, 2022

can you please provide more info? say paste here htop screenshot showing the CPU usage?
I am asking because on my 8 CPU machine I see only 10% CPU usage (as 0.1 CPU) compared to 800% at full load.
The reason why there is some CPU usage is because of the 1ms resolution timer that handles expiry stuff.

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

Is it possible to adjust this timeout, say by calculating it precisely, or making it adapt with load?

Going by powertop, my (already busy) laptop with all the usual heavyweight apps open (Firefox, Thunderbird, Signal) still experiences 40% more wakeups just with the idle DF running.

Thanks :)

Hardware: model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz

Kernel: Linux x 5.13.0-41-generic #46-Ubuntu SMP Thu Apr 14 20:06:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

OS: Ubuntu 21.10

/proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-5.13.0-41-generic root=/dev/mapper/root ro noibrs noibpb nopti spectre_v2=off nospectre_v2 spectre_v1=off nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off dis_ucode_ldr noibrs noibpb nopti spectre_v2=off nospectre_v2 spectre_v1=off nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off dis_ucode_ldr

Htop:

image

Powertop with:

image

Powertop without:

image

perf trace -s
timeout -sINT 10 perf trace  -s -p 435769  

 Summary of events:

 Proactor0 (435770), 54604 events, 14.1%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27305      0  3819.403     0.000     0.140     2.856      0.93%


 Proactor1 (435771), 56848 events, 14.7%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     28433      0  3784.537     0.000     0.133     1.450      0.93%


 Proactor2 (435772), 55079 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27543      0  3802.154     0.000     0.138     4.179      0.92%


 Proactor3 (435773), 55007 events, 14.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27502      0  3779.044     0.000     0.137    15.439      1.03%


 Proactor4 (435774), 54784 events, 14.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27388      0  3813.848     0.000     0.139    16.186      1.01%


 Proactor5 (435775), 54896 events, 14.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27444      0  3798.395     0.000     0.138     3.085      0.92%


 Proactor6 (435776), 55140 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     27571      0  3797.901     0.000     0.138     3.022      0.92%

@occasionallydavid occasionallydavid changed the title 100% CPU usage while idle High CPU usage while idle May 31, 2022
@romange
Copy link
Collaborator

romange commented May 31, 2022

it's really weird. dragonfly should not get CPU so high in the idle state.
I am trying to think of what causes it. Can you pls do the following?

  1. run ./dragongly --logtostderr if running native or docker run ... - it should show you some dragonfly logs in the terminal.
  2. Go to you localhost:6379 in your browser - you should see an HTTP page.
  3. go to this url http://localhost:6379/profilez?profile=on
  4. 10s later go to this one http://localhost:6379/profilez?profile=off
  5. this should display a graph and create a .perf file in /tmp folder you should see its name in the log. Can you please attach it here?

@occasionallydavid
Copy link
Author

I had to zip them due to attachment restrictions on GitHub

issue66.zip

@occasionallydavid
Copy link
Author

FYI dropping kSpinLimit to 0 on this hardware brings the CPU usage to closer to 30% (as reported by top across all CPUs, previously this was closer to 400/500%)

timeout -sINT 10 perf trace -s -p 440125
 Summary of events:

 Proactor0 (440126), 52655 events, 14.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26332      0  9436.170     0.000     0.358    15.424      0.89%


 Proactor1 (440127), 56237 events, 15.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     28121      0  9401.739     0.000     0.334    14.731      0.89%


 Proactor2 (440128), 52172 events, 14.1%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26084      0  9417.545     0.000     0.361     6.671      0.86%


 Proactor3 (440129), 52638 events, 14.2%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26324      0  9419.898     0.000     0.358     4.138      0.86%


 Proactor4 (440130), 52910 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26457      0  9429.960     0.000     0.356     6.960      0.86%


 Proactor5 (440131), 52175 events, 14.1%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26087      0  9417.025     0.000     0.361    13.400      0.87%


 Proactor6 (440132), 52182 events, 14.1%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter     26094      0  9415.588     0.000     0.361     3.270      0.86%

@romange
Copy link
Collaborator

romange commented May 31, 2022

I attached the svg that I succeeded to get from the profiler.
It does not seem that the problem is with syscalls. I am 99.9% sure it's some artifact of this laptop architecture.

I see you built the binary yourself. could you build it in debug mode and without changes? if you run ./helio/blaze.sh from the project dir it will create build-dbg subfolder where you can build dragonfly. could you repeat the profiling cycle and send me again the perf file please? I think there is a single hotspot that causes this.

pprof001

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

issue66-dbg.zip

Note GitHub strips interactivity from the SVG when you upload it here

dragonfly_31052022_175930 prof

@occasionallydavid
Copy link
Author

Debug build perf top:

image

Annotated pause():

image

Run() has very little usage, seems to be related to that spin

@romange
Copy link
Collaborator

romange commented May 31, 2022

yes, you are absolutely correct. It did not show such extreme behavior on my amd machine , on my intel laptop and on any of the cloud instances I run dragonfly.

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

Just curious what generation / chip your Intel laptop is? I have extremely vague memories of the pause instruction changing at some point. This CPU is from Q3 2015, and (due to chipset bug with this particular Dell laptop) is running without any microcode updates

Out of curiosity, going to try to reboot with microcode updates enabled. Haven't tried recently, but this likely still causes the machine to hang. Will report if there is any discernible difference

@romange
Copy link
Collaborator

romange commented May 31, 2022

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz

@occasionallydavid
Copy link
Author

Hmm, so your CPU is Kaby Lake and mine is Skylake. Going to try that reboot-with-microcode now

@romange
Copy link
Collaborator

romange commented May 31, 2022

I am trying to fix what the right fix will be. I need to poll in cloud env, so probably we need a runtime flag.
In ideal it could be autoconfigured but I am not sure how to determine that pause is so cpu consuming...

@romange
Copy link
Collaborator

romange commented May 31, 2022

Can you reduce the multiplicative factor in Pause() from 100 to 1 ?

@occasionallydavid
Copy link
Author

Crazy suggestion (and probably surprising behaviour), but since you're Linux-only already, what about /proc/sys/vm/laptop_mode? Lots of power frameworks configure it automatically

@romange
Copy link
Collaborator

romange commented May 31, 2022

it's best we will be able to calibrate automatically. On my laptop for example, dragonfly can run without any changes.
We need first to understand what to change. If reducing the factor from 100 to 1 solves the issue, I will be able to write the calibrator, I hope.

@occasionallydavid
Copy link
Author

Reducing the factor from 100 to 1 reduces top-measured %CPU from 400-500% to around 80% with the debug build. Powertop-measured wake count is approximately identical

@occasionallydavid
Copy link
Author

Just to report, no change in behaviour with microcode updates applied:

[    0.000000] microcode: microcode updated early to revision 0xea, date = 2021-01-25

@romange
Copy link
Collaborator

romange commented May 31, 2022

I think the easiest solution will be to expose run-time arguments directly. you will find what works best for you and report here. I will try to implement a simple calibration procedure that will adjust those arguments according to what you tell me. I think that 80% is still too high.

@occasionallydavid
Copy link
Author

Some X/Y problem missing in my original report: I just want to develop against DF locally and don't care about performance at all. A single thread that sleeps politely and offers 10 requests/sec but is extremely kind to battery is much more desirable in this environment :) I know it's difficult to squeeze both use cases in one binary, but really I think maybe that is what this bug was asking for. Ideally maybe something like a --dev flag?

@romange
Copy link
Collaborator

romange commented May 31, 2022

lets try with spinlock arguments and if these won't work to your satisfaction we will add --dev flag! :)

@romange
Copy link
Collaborator

romange commented May 31, 2022

I created a branch named SpinArgs2 for dragonfly. you will also need to update helio submodule with git submodule update after the switch. There I introduced two flags: proactor_pause_count and proactor_spin_limit that should control the spinning.
Please also remove _deps\glog-* before rebuilding dragonfly - I fixed a small bug there as well.

Once you rebuild, pls run with ./dragonfly --vmodule=proactor=1 --logtostderr
it will print you among other lines something like this:
proactor.cc:111] Running 10 Pause(10) took 707us .
pls run dragonfly with smaller values for the flags, whatever makes your cpu usage calm. say, proactor_pause_count=2 and proactor_spin_limit=2. But pls keep both flags non-zero. If you are happy with the cpu usage, pls tell me parameter values and the measurements you got in the log.

Thanks!

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

It's probably something to do with me, but I can't find "took" from that new log line anywhere:

dragonfly] ag -w took |cat
src/server/db_slice.h:129:  // Returns second=true if insertion took place, false otherwise.
doc/dashtable.md:150:As you can see on the graph, Redis uses 50% more memory even before BGSAVE starts. Around second 14, BGSAVE kicks off on both servers. Visually you can not see this event on Dragonfly graph, but it's seen very well on Redis graph. It took just few seconds for Dragonfly to finish its snapshot (again, not possible to see) and around second 20 Dragonfly is already behind BGSAVE. You can see a distinguishable cliff at second 39
helio/util/fiber_sched_algo.cc:104:      VLOG(1) << "Execution of " << fibers_ext::short_id(prev_picked_) << " took too long "
helio/util/fiber_sched_algo.cc:112:      LOG(INFO) << "Took " << delta_micros / 1000 << " ms since it woke and till became active "
helio/util/uring/proactor.cc:423:    VLOG(1) << "io_uring_register_files took " << absl::ToInt64Microseconds(duration) << " usec";

dragonfly] git status
On branch SpinArgs2
Your branch is up-to-date with 'origin/SpinArgs2'.

nothing to commit, working tree clean
dragonfly] git fetch
dragonfly] git submodule summary
dragonfly] git status
On branch SpinArgs2
Your branch is up-to-date with 'origin/SpinArgs2'.

nothing to commit, working tree clean

@romange
Copy link
Collaborator

romange commented May 31, 2022

oh shit I forgot to update helio submodule. please pull and update again

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

Pause measurement:

I20220531 21:37:26.346498 12184 proactor.cc:111] Running 10 Pause(10) took 900us

And on battery:

I20220531 21:37:58.672793 12354 proactor.cc:111] Running 10 Pause(10) took 1164us

@occasionallydavid
Copy link
Author

It doesn't seem possible on this laptop to get usage below 30% (reported by top) regardless of the values given in the flags

I haven't used io_uring in anger yet, but note io_uring_wait_cqes() supports what looks like a select-like timeout, with the proviso that "this function isn’t safe to use for applications that split SQ and CQ handling between two threads". In the case of DF, is there no possibility of just having the kernel handle wakes in a non-polling fashion?

@romange
Copy link
Collaborator

romange commented May 31, 2022

I think reducing from 800 to 30 is great progress already. The rest can be attributed to expiry policy. Please change line engine_shard_set.cc:58 to use greater values than 1ms. will it fix the problem?.

@romange
Copy link
Collaborator

romange commented May 31, 2022

Also, are you still in debug mode?

@occasionallydavid
Copy link
Author

I ran with:

./dragonfly --proactor_threads=1 --proactor_pause_count=1 --proactor_spin_limit=1

And bumped the interval to 1000ms, and now DF has disappeared into the background noise even in powertop output. I don't know what this breaks, though. redis-cli seems to do what I expect though

Nope, these are opt builds

@romange
Copy link
Collaborator

romange commented May 31, 2022

and are you saying that increasing any of the flags will bring it back to front? what Running 10 Pause(10) line prints ?

@occasionallydavid
Copy link
Author

occasionallydavid commented May 31, 2022

With the default flags, top reports ~5% CPU and pause timing is:

I20220531 22:18:54.884318 15522 proactor.cc:111] Running 10 Pause(10) took 277us

With the jinxed flags from above, ~0.7% CPU and:

I20220531 22:21:30.137244 15615 proactor.cc:111] Running 10 Pause(10) took 18us

I have to head out for a while, will pick this up tomorrow. Thanks a ton for your help and time

romange added a commit to romange/helio that referenced this issue Jun 1, 2022
romange added a commit to romange/helio that referenced this issue Jun 1, 2022
romange added a commit to romange/helio that referenced this issue Jun 1, 2022
@romange
Copy link
Collaborator

romange commented Jun 1, 2022

Cheers, mate!
Meanwhile, I've updated SpinArgs2 branch:

  1. Introduced auto-tuning in helio: romange/helio@439b47b
  2. Introduced hz flag which has somewhat similar meaning to redis hz. The default is 1000 and it should stay 1000 for prod config.

Can you, please, try with dragonfly --proactor_threads=1 --hz=1 ?

@occasionallydavid
Copy link
Author

Good morning :)

--hz=1 works well. I also found ServerFamily::Init()'s 10ms stats timer, is there any way to make it relative to --hz too? Bumping it locally to 1000 doesn't seem to break anything, it looks like the memory stat is only used for informational content that ends up in the RDB file

Thanks for all your help

@romange
Copy link
Collaborator

romange commented Jun 1, 2022

Good idea. Will do. I will send you the PR shortly.

romange added a commit that referenced this issue Jun 1, 2022
1. Introduce `hz` flag to control the frequency of periodically running tasks.
2. Update helio dependency.
@romange
Copy link
Collaborator

romange commented Jun 1, 2022

Created #70.

Feel free to use this branch and if everything is fine I will submit it later today.

@occasionallydavid
Copy link
Author

occasionallydavid commented Jun 1, 2022

This is perfect, thank you so much! What a fun journey :)

timeout -sINT 10 perf trace -p `pidof dragonfly` -s
 Summary of events:

 Proactor0 (6215), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8001.716     0.000   296.360  1000.737     30.22%


 Proactor1 (6216), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.784     0.000   296.325  1000.240     30.22%


 Proactor2 (6217), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.759     0.000   296.324  1000.198     30.22%


 Proactor3 (6218), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.889     0.000   296.329  1000.206     30.22%


 Proactor4 (6219), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.545     0.000   296.316  1000.184     30.22%


 Proactor5 (6220), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.760     0.000   296.324  1000.198     30.22%


 Proactor6 (6221), 54 events, 14.3%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   io_uring_enter        27      0  8000.909     0.000   296.330  1000.280     30.22%

Final powertop:

image

romange added a commit that referenced this issue Jun 1, 2022
1. Introduce `hz` flag to control the frequency of periodically running tasks.
2. Update helio dependency.
@romange romange closed this as completed in d5f2c23 Jun 1, 2022
valerauko added a commit to valerauko/kitsune that referenced this issue Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants