New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect reading of somaxconn for tcp backlog on linux #2430

Merged
merged 1 commit into from Jul 12, 2018

Conversation

Projects
None yet
4 participants
@nickbabcock
Copy link
Contributor

nickbabcock commented Jul 12, 2018

Problem:

Dropwizard drops connections during a burst of short lived connections, as the accept queue is set to (most commonly) 1 or 2 due to Files.readAllBytes unable to work with pseudo files like somaxconn

Solution:

Adopt Netty's implementation

Result:

Test passed and dropwizard should be more burst resistant. Closes #2429

Dropwizard users currently afflicted by this bug, can override this default behavior by specifying an acceptQueueSize in their configuration.

@jplock

jplock approved these changes Jul 12, 2018

@jplock jplock added the bug label Jul 12, 2018

@jplock jplock added this to the 2.0.0 milestone Jul 12, 2018

@joschi

joschi approved these changes Jul 12, 2018

@joschi joschi merged commit 49e961c into dropwizard:master Jul 12, 2018

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@OndraZizka

This comment has been minimized.

Copy link

OndraZizka commented Jul 13, 2018

Just a note. You say that affected users can override it in settings. However, the system limit still applies.
Users trying beyond that will not get over that effectively, AFAICT. Maybe there could be a WARNING issued when trying to use a larger value?

To actually change the limit on linux, the user needs to run:

sysctl -w net.core.somaxconn=2048

Typically added to sudo nano /etc/rc.local.

@OndraZizka

This comment has been minimized.

Copy link

OndraZizka commented Jul 13, 2018

I suspect there is still another bug in Jetty in how it handles the sockets with refused connections. Setting higher accept queue will make it close to impossible to hit in real world scenarios though.

When we hit this issue, we see connection timeouts on the client side after 60 seconds. If it was just the accept queue full, the client would get the rejection and fail the connection right away. But it does not - instead, it keeps waiting. I think what is happening is described here: https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/
In short, Jetty doesn't close() the rejected connections properly, blocking the next client port reuse.

@OndraZizka

This comment has been minimized.

Copy link

OndraZizka commented Jul 13, 2018

Or maybe not. Petr Kacer said he checked with ss and there were no CLOSE_WAIT pending.

arteam added a commit that referenced this pull request Oct 2, 2018

Fix incorrect reading of somaxconn for tcp backlog on linux (#2430)
###### Problem:
Dropwizard drops connections during a burst of short lived connections, as the accept queue is set to (most commonly) 1 or 2 due to `Files.readAllBytes` unable to work with pseudo files like somaxconn

###### Solution:
[Adopt Netty's implementation](https://github.com/netty/netty/blob/77ec8397927e3ceb9b9a447a74e718f625ed9976/common/src/main/java/io/netty/util/NetUtil.java#L261-L269)

###### Result:
Test passed and dropwizard should be more burst resistant. Closes #2429

Dropwizard users currently afflicted by this bug, can override this default behavior by specifying an `acceptQueueSize` in their configuration.

rhowe-gds added a commit to alphagov/pay-connector that referenced this pull request Jan 13, 2019

BAU: Upgrade dropwizard from 1.3.5 to 1.3.8
Release notes: https://www.dropwizard.io/1.3.8/docs/about/release-notes.html

1.3.6 fixes a DoS issue in Jackson:
      FasterXML/jackson-databind#2141

1.3.7 fixes incorrect reading of somaxconn on Linux:
      dropwizard/dropwizard#2430

1.3.8 upgrades Guava to fix a DoS (CVE-2018-10237)

rhowe-gds added a commit to alphagov/pay-connector that referenced this pull request Jan 15, 2019

PP-4627: Upgrade dropwizard from 1.3.5 to 1.3.8
Release notes: https://www.dropwizard.io/1.3.8/docs/about/release-notes.html

1.3.6 fixes a DoS issue in Jackson:
      FasterXML/jackson-databind#2141

1.3.7 fixes incorrect reading of somaxconn on Linux:
      dropwizard/dropwizard#2430

1.3.8 upgrades Guava to fix a DoS (CVE-2018-10237)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment