-
Notifications
You must be signed in to change notification settings - Fork 719
propagate socket errors on .reset #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sources/NIO/BaseSocketChannel.swift
Outdated
| self.close0(error: ChannelError.eof, mode: .all, promise: nil) | ||
|
|
||
| if self.lifecycleManager.isRegistered { | ||
| // if the socket is still open, let's try to get the actual socket error from the socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weissi hmm this comment is confusing as we not really check for open but for registered above. Can you clarify ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. We close the socket when unregistering it but I should probably do something like this:
if self.socket.isOpen {
assert(self.lifecycleManager.isRegistered)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ff4e63b to
1e960ff
Compare
Sources/NIO/BaseSocketChannel.swift
Outdated
|
|
||
| if self.lifecycleManager.isRegistered { | ||
| // if the socket is still open, let's try to get the actual socket error from the socket | ||
| let result: Int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be declared outside the do? It isn't used elsewhere.
| } | ||
| } catch { | ||
| self.close0(error: error, mode: .all, promise: nil) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code could be cleaned up, I think, either by stack allocating an existential Error that each branch sets, and then calls close0 with it, or by having the top two branches stack allocate a concrete IOError.
Either way we should aim to have fewer of these calls to close0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reducing to 2 as we chatted about
Sources/NIO/BaseSocketChannel.swift
Outdated
| if self.socket.isOpen { | ||
| assert(self.lifecycleManager.isRegistered) | ||
| // if the socket is still open, let's try to get the actual socket error from the socket | ||
| let result: Int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be declared outside the block, it's not used outside it.
Tests/NIOTests/ChannelTests.swift
Outdated
| }) | ||
| return sock | ||
| } | ||
| let serverSockAddress = serverSockSyncQ.sync { try! serverSock.localAddress() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two synchronous queue operations appear to be unnecessary.
Tests/NIOTests/ChannelTests.swift
Outdated
| let g = DispatchGroup() | ||
| let q = DispatchQueue(label: "q") | ||
| do { | ||
| // spawn 64 connects, likely some will already get ECONNREFUSED as the backlog isn't large enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: think you're looking for ECONNRESET here.
Tests/NIOTests/ChannelTests.swift
Outdated
| var pfd = pollfd(fd: fd, events: Int16(POLLIN | POLLERR | POLLHUP), revents: 0) | ||
| // wait a maximum of 100ms to accept a new client, just to be guaranteed we don't get stuck here. | ||
| if poll(&pfd, 1, 100) == 1 { | ||
| return accept(fd, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just block in accept here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that would hang forever so we need the 100ms timeout here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what circumstance will it hang forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa well, it calls accept on an open socket. That would only ever return if there was someone connecting. But if there are no further connections (remember the 64 client connections each might either fail or (at least partially) succeed)
Tests/NIOTests/ChannelTests.swift
Outdated
| if let accepted = accepted { | ||
| // set SO_LINGER to 0, which will cause most TCP stacks to send an RST packet | ||
| var lin = linger(l_onoff: 1, l_linger: 0) | ||
| setsockopt(accepted, SOL_SOCKET, SO_LINGER, &lin, socklen_t(MemoryLayout.size(ofValue: lin))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SO_LINGER can get a RST packet, why do we need 64 connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa the SO_LINGER works well for Darwin, the 64 connections works well for Linux ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, can you update the comments to make that clear?
1e960ff to
c88dbf4
Compare
|
@Lukasa feedback addressed |
| if result != 0 { | ||
| // we have a socket error, let's forward | ||
| // this path will be executed on Linux (EPOLLERR) & Darwin (ev.fflags != 0) | ||
| error = IOError(errnoCode: result, reason: "connection reset (error set)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than passing reason can you pass function? That'll give us better debug strings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa there's no function that errors here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make one up. ;) The key advantage is that we get the textual description of the underlying errno, rather than your hard-coded guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not how it's intended. The reason is there so it gives you a better message. I acknowledge that we had a bug here where that was worse. In case where you don't want to provide a reason as it would be "connect failed, errno \(errno)", you can use function: #function as that's shorter.
Tests/NIOTests/ChannelTests.swift
Outdated
| break | ||
| } catch { | ||
| XCTFail("unexpected error \(error)") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my question: is there any reason we can't do this with a regular server socket channel? Create it, set the child channel options, turn off auto-read so that we only accept a maximum of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initially I just closed that socket from elsewhere so I needed to tolerate EBADF. I have now changed that so possibly I could go back to a ServerSocketChannel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa I actually will scrap this whole test. It just isn't causing the ECONNRESET reliably enough on all the platforms and there's always room for random failure of the test (unless I don't assert that at least one actually gets ECONNRESET). But tbh I think pretty much everything is covered by the ECONNREFUSED test already...
Tests/NIOTests/ChannelTests.swift
Outdated
| // we deliberately don't set SO_REUSEADDR | ||
| XCTAssertNoThrow(try serverSock.bind(to: SocketAddress(ipAddress: "127.0.0.1", port: 0))) | ||
| XCTAssertNoThrow(try serverSock.withUnsafeFileDescriptor { fd in | ||
| try Posix.listen(descriptor: fd, backlog: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you calling listen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, actually don't need that
adf7d72 to
0b91305
Compare
| let serverSockAddress = try! serverSock.localAddress() | ||
| XCTAssertNoThrow(try serverSock.close()) | ||
|
|
||
| for _ in 0..<64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the loop? We fail in all cases other than the throw, which suggests we always throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa in my tests it made the probability that we also hit the async connect path a lot higher (as in pretty much always). Therefore I thought that's a good idea :)
0b91305 to
640cf11
Compare
Lukasa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Motivation: Since apple#286, we do eagerly detect connection resets which is good. To resolve the situation we first try to read everything off the socket. If something fails there, everything is good and the client gets the correct error. If that however doesn't lead to any errors, we would close the connection with `error: ChannelError.eof` which the client would then see for example on a connect that happened to get `ECONNRESET` or `ECONNREFUSED`. That's not right so these situations are fixed here. Modifications: - on reset, try to get the socket error if any - if no socket error was detected, we send a `ECONNRESET` because we know we have been reset but not why and didn't encounter any errors - write tests for these situations Result: connection resets that don't hit any other errors will now see the correct error
640cf11 to
7a048cc
Compare
Motivation:
Since #286, we do eagerly detect connection resets which is good. To
resolve the situation we first try to read everything off the socket. If
something fails there, everything is good and the client gets the
correct error. If that however doesn't lead to any errors, we would
close the connection with
error: ChannelError.eofwhich the clientwould then see for example on a connect that happened to get
ECONNRESETorECONNREFUSED. That's not right so these situations arefixed here.
Modifications:
ECONNRESETbecause weknow we have been reset but not why and didn't encounter any errors
Result:
connection resets that don't hit any other errors will now see the
correct error