Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket bufs lock support (tcp perfomance decrease after migration fix) #1568

Merged
merged 5 commits into from
Nov 8, 2021

Conversation

Snorch
Copy link
Member

@Snorch Snorch commented Jul 30, 2021

When one sets socket buffer sizes with setsockopt(SO_{SND,RCV}BUF*), kernel sets corresponding SOCK_SNDBUF_LOCK or SOCK_RCVBUF_LOCK flags on struct sock. It means that such a socket with explicitly changed buffer size can not be auto-adjusted by kernel (e.g. if there is free memory kernel can auto-increase default socket buffers to improve performance). (see tcp_fixup_rcvbuf() and tcp_sndbuf_expand())

CRIU is always changing buf sizes on restore, that means that all sockets receive lock flags on struct sock and become non-auto-adjusted after migration. In some cases it can decrease performance of network connections quite a lot.

So let's c/r socket buf locks (SO_BUF_LOCKS), so that sockets for which auto-adjustment is available does not lose it.

Origin of the problem:

We have a customer in Virtuozzo who mentioned that nginx server becomes slower after migration. Especially it is easy to mention when you wget some big file via localhost from the same container which was just migrated.

By strace-ing all nginx processes I see that nginx worker process before c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it only succeeds to send by small chunks ~64Kb (also to remote wget it is around ~128Kb). That can explain the decrease in download speed.

Before:
sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180>

After:
sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675>

As a POC (this way one can check it without a kernel patch) I just commented out all buffer setting manipulations. I managed to get big chunks after c/r and download speed became the same (fast) as before migration.

This is yet a draft as a kernel patch does not hit mainstream kernel. See https://lkml.org/lkml/2021/7/30/346

@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2021

Codecov Report

Merging #1568 (ec6bcda) into criu-dev (4a17318) will decrease coverage by 0.20%.
The diff coverage is 23.48%.

❗ Current head ec6bcda differs from pull request most recent head d33cb4d. Consider uploading reports for the commit d33cb4d to get more accurate results
Impacted file tree graph

@@             Coverage Diff              @@
##           criu-dev    #1568      +/-   ##
============================================
- Coverage     69.07%   68.86%   -0.21%     
============================================
  Files           136      137       +1     
  Lines         33248    33391     +143     
============================================
+ Hits          22967    22996      +29     
- Misses        10281    10395     +114     
Impacted Files Coverage Δ
criu/include/sockets.h 100.00% <ø> (ø)
criu/plugin.c 10.52% <0.00%> (-0.25%) ⬇️
flog/src/flog.c 0.00% <0.00%> (ø)
criu/files-reg.c 76.38% <9.09%> (-0.87%) ⬇️
criu/proc_parse.c 70.17% <11.76%> (-0.61%) ⬇️
criu/tty.c 75.60% <33.33%> (-0.15%) ⬇️
criu/sockets.c 84.54% <42.85%> (-0.73%) ⬇️
criu/kerndat.c 57.85% <66.66%> (+0.27%) ⬆️
criu/cr-check.c 61.50% <100.00%> (+0.19%) ⬆️
criu/cr-restore.c 66.62% <100.00%> (+0.08%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a17318...d33cb4d. Read the comment docs.

@Snorch Snorch requested a review from mihalicyn August 9, 2021 07:26
@Snorch Snorch marked this pull request as ready for review August 9, 2021 07:27
@Snorch
Copy link
Member Author

Snorch commented Aug 9, 2021

Now when a kernel patch is in net-next https://lore.kernel.org/linux-mips/162807840601.323.12461942403559350811.git-patchwork-notify@kernel.org/ we can start reviewing it. Note that on older kernels this functionality/test would be automatically disabled by corresponding kerndat checks.

@github-actions
Copy link

github-actions bot commented Sep 9, 2021

A friendly reminder that this PR had no activity for 30 days.

@0x7f454c46
Copy link
Member

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

@Snorch
Copy link
Member Author

Snorch commented Sep 13, 2021

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

Sure, vz_ prefix is a leftover, will fix after I'm back from vocation.

@0x7f454c46
Copy link
Member

LGTM, but vz_so_buf_lock is a bit questionable for mainstream criu.

Sure, vz_ prefix is a leftover, will fix after I'm back from vocation.

Enjoy your quest!

@avagin
Copy link
Member

avagin commented Sep 21, 2021

@Snorch could you rebase the pr and fix what have wanted to fix

@Snorch
Copy link
Member Author

Snorch commented Sep 22, 2021

Thanks for reminder! Rebased + removed "vz_" prefix.

@Snorch
Copy link
Member Author

Snorch commented Sep 23, 2021

  • fix indentation in test

@Snorch
Copy link
Member Author

Snorch commented Oct 21, 2021

@checkpoint-restore/maintainers Please give some review here, I believe it's important as it fixes network (tcp) perfomance decrease after migration.

@Snorch Snorch changed the title Socket bufs lock support Socket bufs lock support (tcp perfomance decrease after migration fix) Oct 21, 2021
@Snorch
Copy link
Member Author

Snorch commented Oct 21, 2021

Same error to #1628 (comment) on circleci: test-local-gcc

@adrianreber
Copy link
Member

Same error to #1628 (comment) on circleci: test-local-gcc

Can you try to rerun the failed tests. If you click on 'Details' of the failed circleci test, there should be a button to rerun the tests. Usually I can trigger reruns but it does not work this time. Not sure why.

@Snorch
Copy link
Member Author

Snorch commented Oct 25, 2021

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

@adrianreber
Copy link
Member

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

@Snorch
Copy link
Member Author

Snorch commented Oct 25, 2021

Not only this one

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

And #1628 this one. Both mine, obviousely I'm damn lucky =)

(upd: let me try reruning rebased on master)

We want to also c/r socket buf locks (SO_BUF_LOCKS) which are also
implicitly set by setsockopt(SO_{SND,RCV}BUF*), so we need to order
these two properly. That's why we need to wait for sk_setbufs to finish.
And there is no much point in seting buffer sizes asyncronously anyway.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is a new kernel feature to let criu restore sockets with kernel
auto-adjusted buffer sizes.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
When one sets socket buffer sizes with setsockopt(SO_{SND,RCV}BUF*),
kernel sets coresponding SOCK_SNDBUF_LOCK or SOCK_RCVBUF_LOCK flags on
struct sock. It means that such a socket with explicitly changed buffer
size can not be auto-adjusted by kernel (e.g. if there is free memory
kernel can auto-increase default socket buffers to improve perfomance).
(see tcp_fixup_rcvbuf() and tcp_sndbuf_expand())

CRIU is always changing buf sizes on restore, that means that all
sockets receive lock flags on struct sock and become non-auto-adjusted
after migration. In some cases it can decrease perfomance of network
connections quite a lot.

So let's c/r socket buf locks (SO_BUF_LOCKS), so that sockets for which
auto-adjustment is available does not lose it.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Just set all possible values 0-3 and chack if it persists.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
@adrianreber
Copy link
Member

Not only this one

I've rebased and push, but by just re-runing the circleci test I was seeing an error again and again...

It is really strange. It only happens on this PR but seems unrelated to this PR.

And #1628 this one. Both mine, obviousely I'm damn lucky =)

(upd: let me try reruning rebased on master)

For some reason your runs have 'EC2: true' and you are running on much newer CPUs. It works when running on Google Cloud Engine but it fails when running on EC2:

-BOOT_IMAGE=/boot/vmlinuz-5.4.0-1025-aws root=PARTUUID=ea9aeff6-01 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
+BOOT_IMAGE=/boot/vmlinuz-5.4.0-1021-gcp root=PARTUUID=2707a646-6a60-476b-805c-ac3388be9375 ro console=ttyS0 panic=-1
-Vendor ID:                       GenuineIntel
-CPU family:                      6
-Model:                           85
-Model name:                      Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
-Stepping:                        4
-CPU MHz:                         3104.109
-BogoMIPS:                        4999.99
-Hypervisor vendor:               KVM
-Virtualization type:             full
+Vendor ID:                       GenuineIntel
+CPU family:                      6
+Model:                           63
+Model name:                      Intel(R) Xeon(R) CPU @ 2.30GHz
+Stepping:                        0
+CPU MHz:                         2299.998
+BogoMIPS:                        4599.99
+Hypervisor vendor:               KVM
+Virtualization type:             full

The first one is always from the logs of your CI run. Do you have anything specially configured for Circle CI?

@adrianreber
Copy link
Member

With the newer CPUs the test injection fails for some reason.

@Snorch
Copy link
Member Author

Snorch commented Oct 25, 2021

Do you have anything specially configured for Circle CI?

I believe I did nothing special in my circle ci setup... Probably they just attach same node for reruns and if PR was lucky to get new cpu on first test run it would continue running on it, just guessing...

e.g. my last PR passed all tests #1631

@adrianreber
Copy link
Member

@mihalicyn your review has been requested. Can you please take a look?

@avagin You also requested changes. Is the PR ready from your point of view.

I also see a LGTM from @0x7f454c46

I would say almost ready to be merged.

@avagin avagin merged commit b4cc856 into checkpoint-restore:criu-dev Nov 8, 2021
@Snorch
Copy link
Member Author

Snorch commented Nov 10, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants