high system cpu usage on c6gn instances with high throughput loads #195

romange · 2021-12-03T09:07:40Z

when running memcached loadtest on c6gn.16xlarge I noticed that soft irq takes lots of CPU. As far as I remember it has not been before and looks like degradation in hypervisor, maybe?

I checked it with ubuntu 21.04 and 21.10, with both native ENA driver that comes with the distibution and with 2.6. It's always the same thing.

To reproduce (using 2 c6gn.16xlarge):

/usr/bin/memcached -t 32 -m 640 -p 11211 -u memcache -l 0.0.0.0 -c 10240 on server side.
memtier_benchmark -s <private_ip> -p 11211 --ratio 0:1 -t 32 -c 50 -n 2000000 -P memcache_text on the load test instance.

The text was updated successfully, but these errors were encountered:

romange · 2021-12-03T09:56:24Z

akiyano · 2021-12-05T06:00:26Z

Hi @romange,

Thanks for reporting this.
A few questions:

You show 2 different results in your 2 comments - one with 13.96% in __do_softirq the other with 28.58% - what is the difference between the 2 setups? (instance type? instance size? ami? OS distribution and version? driver version? preinstalled driver vs driver taken from github?)
You say "As far as I remember". Is there a chance that your memory is with a different type of instance (not c6gn - which is a fairly new instance type)?
The preinstalled driver in ubuntu has adaptive interrupt coalescing off by default, and the github 2.6.0g driver has it on. This setting should make at least some difference, as it should change the number of interrupts you are getting (thus change the overhead of handling them). Can you please try turning adaptive interrupt coalescing on and off to see if it makes a difference? Assuming the network device is ens5, the command to see if it is on:
sudo ethtool -c ens5
And the command to turn it on/off is:
sudo ethtool -C ens5 adaptive-rx on/off

Thanks,
Arthur

romange · 2021-12-05T08:20:47Z

Hi Arthur,

thanks for responding so quickly. Replying to each question.

I have done now the exact reproduction with ubuntu 21.04, linux 5.11.0-1022-aws. c6gn.8xlarge - no custom driver, no custom ethtool settings. See htop below on a server side.
Client side - another instance with exactly the same configuration - running
memtier_benchmark -s <private_ip> -p 11211 --ratio 0:1 -t 32 -c 50 -n 2000000 -P memcache_text.

Attaching perf top output from the server:

> ethtool -c ens5 
Coalesce parameters for ens5:
Adaptive RX: off  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 0
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 0

I feel uncomfortable expanding about this in the open channel since it's related to my work in AWS up until September this year ( I was an AWS employee up until recently). If you want to hear more details pls send me an email to romange at gmail ... In any case, I am pretty sure the state of things was much better with c6gn instances in August 2021.
Setting adaptive-rx to on does not change a thing. And it's expected with Memcached benchmark. Memcached benchmark puts lots of stress on the system interrupts since each client sends a ping (short message) and waits for a pong back. The high throughput is created by using 32 threads * 50 connections in each. Each ping creates an interrupt but it's not been followed by any other packet (since the client waits for a pong) there is nothing to coalesce on a server-side. It's different from iperf which sends large chunks of data in one direction.

talawahtech · 2021-12-14T19:12:53Z

@romange it is possible that you are running into this issue: #159. One way to confirm would be to run the same test on a c6gn vs a c6g (or a c5n) and compare the output of dstat --cpu -y -i -I 27,28,29,30 --net-packets to see if interrupt coalescing is active.

With high-throughput benchmarks, interrupt moderation can have a big impact, even for request/response workloads, because the coalescing happens across multiple connections. It prevents all incoming packets from triggering an interrupt for x microseconds and then handles a group of them all at once. It led to a 14% performance improvement in my high-throughput HTTP benchmark: https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-million/#interrupt-moderation.

romange · 2021-12-15T15:46:49Z

I will check. Thanks Mark!

romange · 2021-12-30T10:49:22Z

Following @talawahtech suggestion, I run the test again today in us-east-2.
To my surprise, the CPU usage looked excellent. And perf top looked healthy

@akiyano did you guys fix something or it's a random thing?
@talawahtech attaching dstat output for the protocol but I am not familiar with dstat, unfortunately.

akiyano · 2021-12-30T11:12:20Z

@romange
No deliberate change was made yet to fix this issue.
Can you please explain what you did differently this time compared to the original run, which suddenly gave you good CPU usage?

romange · 2021-12-30T11:36:11Z

nothing. same image, same vm. the only thing I did differently is choosing us-east-2.
I do not remember where I run before, but not in Ohio. could be us-east1 or Oregon. I suggest that you try reproducing results in different regions in US. It takes 10 mins to do a run: you just need 2 c6gn.8xlarge instances in the same zone and run the commands above. Unfortunately, memtier_benchmark can not be installed in ubuntu so you need to build it from source once and then copy each time you start a VM.

talawahtech · 2021-12-30T18:41:19Z

@romange based on that image it looks like you are only doing around 1800 request/packets per second. At that request rate you won't see much softirq activity.

Also, the irq numbers in dstat command that I gave you are wrong for the c6gn. I believe the irq numbers for the individual network queues are in the 40+ range for the c6gn vs the 20+ range for the c5n. You can run cat /proc/interrupts | grep eth0 to confirm and use the numbers from that output in the dstat command to see the per queue interrupt data.

romange · 2021-12-30T19:53:50Z

I consistently sent around 2M qps to that instance. I took the subset of network irqs - they start from 21 and there are 32 irq queues on 8xlarge. My network device on ubuntu is called ens5.

…

On Thu, Dec 30, 2021 at 8:41 PM Marc Richards ***@***.***> wrote: @romange <https://github.com/romange> based on that image it looks like you are only doing around 1800 request/packets per second. At that request rate you won't see much softirq activity. Also, the irq numbers in dstat command that I gave you are wrong for the c6gn. I believe the irq numbers for the individual network queues are in the 40+ range for the c6gn vs the 20+ range for the c5n. You can run cat /proc/interrupts | grep eth0 to confirm and use the numbers from that output in the dstat command to see the per queue interrupt data. — Reply to this email directly, view it on GitHub <#195 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4BFCB6UGJOWIOFQPE2S3LUTSRVVANCNFSM5JJF56AA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Best regards, Roman

romange · 2022-05-26T18:42:45Z

It seems that the problem has been fixed

I-gor-C added the Linux ENA driver label Dec 3, 2021

romange closed this as completed May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

high system cpu usage on c6gn instances with high throughput loads #195

high system cpu usage on c6gn instances with high throughput loads #195

romange commented Dec 3, 2021

romange commented Dec 3, 2021

akiyano commented Dec 5, 2021

romange commented Dec 5, 2021

talawahtech commented Dec 14, 2021 •

edited

romange commented Dec 15, 2021

romange commented Dec 30, 2021

akiyano commented Dec 30, 2021

romange commented Dec 30, 2021

talawahtech commented Dec 30, 2021

romange commented Dec 30, 2021 via email

romange commented May 26, 2022

high system cpu usage on c6gn instances with high throughput loads #195

high system cpu usage on c6gn instances with high throughput loads #195

Comments

romange commented Dec 3, 2021

romange commented Dec 3, 2021

akiyano commented Dec 5, 2021

romange commented Dec 5, 2021

talawahtech commented Dec 14, 2021 • edited

romange commented Dec 15, 2021

romange commented Dec 30, 2021

akiyano commented Dec 30, 2021

romange commented Dec 30, 2021

talawahtech commented Dec 30, 2021

romange commented Dec 30, 2021 via email

romange commented May 26, 2022

talawahtech commented Dec 14, 2021 •

edited