Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Snorch · 2021-07-30T10:40:48Z

When one sets socket buffer sizes with setsockopt(SO_{SND,RCV}BUF*), kernel sets corresponding SOCK_SNDBUF_LOCK or SOCK_RCVBUF_LOCK flags on struct sock. It means that such a socket with explicitly changed buffer size can not be auto-adjusted by kernel (e.g. if there is free memory kernel can auto-increase default socket buffers to improve performance). (see tcp_fixup_rcvbuf() and tcp_sndbuf_expand())

CRIU is always changing buf sizes on restore, that means that all sockets receive lock flags on struct sock and become non-auto-adjusted after migration. In some cases it can decrease performance of network connections quite a lot.

So let's c/r socket buf locks (SO_BUF_LOCKS), so that sockets for which auto-adjustment is available does not lose it.

Origin of the problem:

We have a customer in Virtuozzo who mentioned that nginx server becomes slower after migration. Especially it is easy to mention when you wget some big file via localhost from the same container which was just migrated.

By strace-ing all nginx processes I see that nginx worker process before c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it only succeeds to send by small chunks ~64Kb (also to remote wget it is around ~128Kb). That can explain the decrease in download speed.

Before:
sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180>

After:
sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675>

As a POC (this way one can check it without a kernel patch) I just commented out all buffer setting manipulations. I managed to get big chunks after c/r and download speed became the same (fast) as before migration.

This is yet a draft as a kernel patch does not hit mainstream kernel. See https://lkml.org/lkml/2021/7/30/346

codecov-commenter · 2021-07-30T11:10:45Z

Codecov Report

Merging #1568 (ec6bcda) into criu-dev (4a17318) will decrease coverage by 0.20%.
The diff coverage is 23.48%.

❗ Current head ec6bcda differs from pull request most recent head d33cb4d. Consider uploading reports for the commit d33cb4d to get more accurate results

@@             Coverage Diff              @@
##           criu-dev    #1568      +/-   ##
============================================
- Coverage     69.07%   68.86%   -0.21%     
============================================
  Files           136      137       +1     
  Lines         33248    33391     +143     
============================================
+ Hits          22967    22996      +29     
- Misses        10281    10395     +114

Impacted Files	Coverage Δ
criu/include/sockets.h	`100.00% <ø> (ø)`
criu/plugin.c	`10.52% <0.00%> (-0.25%)`	⬇️
flog/src/flog.c	`0.00% <0.00%> (ø)`
criu/files-reg.c	`76.38% <9.09%> (-0.87%)`	⬇️
criu/proc_parse.c	`70.17% <11.76%> (-0.61%)`	⬇️
criu/tty.c	`75.60% <33.33%> (-0.15%)`	⬇️
criu/sockets.c	`84.54% <42.85%> (-0.73%)`	⬇️
criu/kerndat.c	`57.85% <66.66%> (+0.27%)`	⬆️
criu/cr-check.c	`61.50% <100.00%> (+0.19%)`	⬆️
criu/cr-restore.c	`66.62% <100.00%> (+0.08%)`	⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a17318...d33cb4d. Read the comment docs.

Snorch · 2021-08-09T07:29:56Z

Now when a kernel patch is in net-next https://lore.kernel.org/linux-mips/162807840601.323.12461942403559350811.git-patchwork-notify@kernel.org/ we can start reviewing it. Note that on older kernels this functionality/test would be automatically disabled by corresponding kerndat checks.

github-actions · 2021-09-09T00:03:18Z

A friendly reminder that this PR had no activity for 30 days.

0x7f454c46 · 2021-09-13T11:45:52Z

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

Snorch · 2021-09-13T12:19:59Z

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

Sure, vz_ prefix is a leftover, will fix after I'm back from vocation.

0x7f454c46 · 2021-09-13T12:21:32Z

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

Sure, vz_ prefix is a leftover, will fix after I'm back from vocation.

Enjoy your quest!

avagin · 2021-09-21T16:19:00Z

@Snorch could you rebase the pr and fix what have wanted to fix

Snorch · 2021-09-22T08:23:37Z

Thanks for reminder! Rebased + removed "vz_" prefix.

Snorch · 2021-09-23T09:38:50Z

fix indentation in test

Snorch · 2021-10-21T10:27:46Z

@checkpoint-restore/maintainers Please give some review here, I believe it's important as it fixes network (tcp) perfomance decrease after migration.

Snorch · 2021-10-21T10:58:03Z

Same error to #1628 (comment) on circleci: test-local-gcc

adrianreber · 2021-10-21T13:55:53Z

Same error to #1628 (comment) on circleci: test-local-gcc

Can you try to rerun the failed tests. If you click on 'Details' of the failed circleci test, there should be a button to rerun the tests. Usually I can trigger reruns but it does not work this time. Not sure why.

Snorch · 2021-10-25T07:47:10Z

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

adrianreber · 2021-10-25T07:58:56Z

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

Snorch · 2021-10-25T09:46:12Z

Not only this one

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

And #1628 this one. Both mine, obviousely I'm damn lucky =)

(upd: let me try reruning rebased on master)

We want to also c/r socket buf locks (SO_BUF_LOCKS) which are also implicitly set by setsockopt(SO_{SND,RCV}BUF*), so we need to order these two properly. That's why we need to wait for sk_setbufs to finish. And there is no much point in seting buffer sizes asyncronously anyway. Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

This is a new kernel feature to let criu restore sockets with kernel auto-adjusted buffer sizes. Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

When one sets socket buffer sizes with setsockopt(SO_{SND,RCV}BUF*), kernel sets coresponding SOCK_SNDBUF_LOCK or SOCK_RCVBUF_LOCK flags on struct sock. It means that such a socket with explicitly changed buffer size can not be auto-adjusted by kernel (e.g. if there is free memory kernel can auto-increase default socket buffers to improve perfomance). (see tcp_fixup_rcvbuf() and tcp_sndbuf_expand()) CRIU is always changing buf sizes on restore, that means that all sockets receive lock flags on struct sock and become non-auto-adjusted after migration. In some cases it can decrease perfomance of network connections quite a lot. So let's c/r socket buf locks (SO_BUF_LOCKS), so that sockets for which auto-adjustment is available does not lose it. Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Just set all possible values 0-3 and chack if it persists. Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

adrianreber · 2021-10-25T10:16:44Z

Not only this one

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

And #1628 this one. Both mine, obviousely I'm damn lucky =)

(upd: let me try reruning rebased on master)

For some reason your runs have 'EC2: true' and you are running on much newer CPUs. It works when running on Google Cloud Engine but it fails when running on EC2:

-BOOT_IMAGE=/boot/vmlinuz-5.4.0-1025-aws root=PARTUUID=ea9aeff6-01 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
+BOOT_IMAGE=/boot/vmlinuz-5.4.0-1021-gcp root=PARTUUID=2707a646-6a60-476b-805c-ac3388be9375 ro console=ttyS0 panic=-1

-Vendor ID:                       GenuineIntel
-CPU family:                      6
-Model:                           85
-Model name:                      Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
-Stepping:                        4
-CPU MHz:                         3104.109
-BogoMIPS:                        4999.99
-Hypervisor vendor:               KVM
-Virtualization type:             full
+Vendor ID:                       GenuineIntel
+CPU family:                      6
+Model:                           63
+Model name:                      Intel(R) Xeon(R) CPU @ 2.30GHz
+Stepping:                        0
+CPU MHz:                         2299.998
+BogoMIPS:                        4599.99
+Hypervisor vendor:               KVM
+Virtualization type:             full

The first one is always from the logs of your CI run. Do you have anything specially configured for Circle CI?

adrianreber · 2021-10-25T10:19:20Z

With the newer CPUs the test injection fails for some reason.

Snorch · 2021-10-25T10:40:27Z

Do you have anything specially configured for Circle CI?

I believe I did nothing special in my circle ci setup... Probably they just attach same node for reruns and if PR was lucky to get new cpu on first test run it would continue running on it, just guessing...

e.g. my last PR passed all tests #1631

adrianreber · 2021-10-27T07:31:56Z

@mihalicyn your review has been requested. Can you please take a look?

@avagin You also requested changes. Is the PR ready from your point of view.

I also see a LGTM from @0x7f454c46

I would say almost ready to be merged.

Snorch · 2021-11-10T09:36:29Z

Thanks!

Snorch force-pushed the socket-bufs-lock-support branch from 2811a09 to a0e14e3 Compare July 30, 2021 10:57

Snorch force-pushed the socket-bufs-lock-support branch from a0e14e3 to 715de39 Compare August 9, 2021 07:25

Snorch requested a review from mihalicyn August 9, 2021 07:26

Snorch marked this pull request as ready for review August 9, 2021 07:27

github-actions bot added the stale-pr label Sep 9, 2021

Snorch force-pushed the socket-bufs-lock-support branch from 715de39 to fcc2322 Compare September 22, 2021 08:22

Snorch force-pushed the socket-bufs-lock-support branch from fcc2322 to 6f1dcad Compare September 23, 2021 09:38

avagin force-pushed the criu-dev branch from 5519fb2 to 4a67277 Compare October 14, 2021 16:12

Snorch force-pushed the socket-bufs-lock-support branch from 6f1dcad to 9b33b06 Compare October 19, 2021 11:30

Snorch removed the stale-pr label Oct 21, 2021

Snorch changed the title ~~Socket bufs lock support~~ Socket bufs lock support (tcp perfomance decrease after migration fix) Oct 21, 2021

Snorch force-pushed the socket-bufs-lock-support branch from 9b33b06 to 7f344a7 Compare October 25, 2021 07:46

Snorch added 3 commits October 25, 2021 12:50

Snorch added 2 commits October 25, 2021 12:50

zdtm: add test for socket buffer size locks

7b2e9a7

Just set all possible values 0-3 and chack if it persists. Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

zdtm: make sock_opts02 also check lock change by SO_*BUF*

d33cb4d

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Snorch force-pushed the socket-bufs-lock-support branch from 7f344a7 to d33cb4d Compare October 25, 2021 09:51

adrianreber mentioned this pull request Oct 26, 2021

CI failures on newer CPUs in CI #1635

Closed

avagin merged commit b4cc856 into checkpoint-restore:criu-dev Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Snorch commented Jul 30, 2021 •

edited

Loading

codecov-commenter commented Jul 30, 2021 •

edited

Loading

Snorch commented Aug 9, 2021

github-actions bot commented Sep 9, 2021

0x7f454c46 commented Sep 13, 2021

Snorch commented Sep 13, 2021

0x7f454c46 commented Sep 13, 2021

avagin commented Sep 21, 2021

Snorch commented Sep 22, 2021

Snorch commented Sep 23, 2021

Snorch commented Oct 21, 2021

Snorch commented Oct 21, 2021

adrianreber commented Oct 21, 2021

Snorch commented Oct 25, 2021

adrianreber commented Oct 25, 2021

Snorch commented Oct 25, 2021 •

edited

Loading

adrianreber commented Oct 25, 2021

adrianreber commented Oct 25, 2021

Snorch commented Oct 25, 2021 •

edited

Loading

adrianreber commented Oct 27, 2021

Snorch commented Nov 10, 2021

Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Conversation

Snorch commented Jul 30, 2021 • edited Loading

codecov-commenter commented Jul 30, 2021 • edited Loading

Codecov Report

Snorch commented Aug 9, 2021

github-actions bot commented Sep 9, 2021

0x7f454c46 commented Sep 13, 2021

Snorch commented Sep 13, 2021

0x7f454c46 commented Sep 13, 2021

avagin commented Sep 21, 2021

Snorch commented Sep 22, 2021

Snorch commented Sep 23, 2021

Snorch commented Oct 21, 2021

Snorch commented Oct 21, 2021

adrianreber commented Oct 21, 2021

Snorch commented Oct 25, 2021

adrianreber commented Oct 25, 2021

Snorch commented Oct 25, 2021 • edited Loading

adrianreber commented Oct 25, 2021

adrianreber commented Oct 25, 2021

Snorch commented Oct 25, 2021 • edited Loading

adrianreber commented Oct 27, 2021

Snorch commented Nov 10, 2021

Snorch commented Jul 30, 2021 •

edited

Loading

codecov-commenter commented Jul 30, 2021 •

edited

Loading

Snorch commented Oct 25, 2021 •

edited

Loading

Snorch commented Oct 25, 2021 •

edited

Loading