don't crash when Channel goes inactive in read triggered by write error #594

weissi · 2018-08-23T15:29:24Z

Motivation:

Previously we asserted that a Channel cannot go inactive after calling
out for a read that was triggered by (draining the receive buffer after)
a write error.

This assertion was put in place to guard against readComplete events
sent on inactive channels. It did that job just fine but crashes aren't
great so we now conditionally fire the readComplete event if the
Channel stays active.

Modifications:

make the readComplete event firing conditional

Result:

fewer crashes, more happy faces

weissi · 2018-08-23T15:30:04Z

Sources/NIO/ChannelPipeline.swift

    var error: ChannelError {
        switch self {
        case .all:
-            return .alreadyClosed
+            return .ioOnClosedChannel


this was just the wrong error here, .alreadyClosed means close on previously closed channel. For all others, .ioOnClosedChannel is right

vlm · 2018-08-23T16:38:18Z

Sources/NIO/BaseSocketChannel.swift

-                    assert(self.lifecycleManager.isActive)
-                    pipeline.fireChannelReadComplete()
+                    if self.lifecycleManager.isActive {
+                        pipeline.fireChannelReadComplete()


No race here?

No, this is thread-safe.

I think a more important question is why we fire readComplete more than once. I feel like we shouldn't be doing that: we should probably move the fireChannelReadComplete outside the while loop.

@Lukasa I have no idea either why we're doing that but left it because it was that way. But let me just change it

@vlm thread-safety is fine here because we're on the Channel's event loop. So (unless we call out), the value of self.lifecycleManager.isActive can't change. So if it's still active after readFromSocket returns, it'll remain active until we return or call out again.

@Lukasa done

Motivation: Previously we asserted that a Channel cannot go inactive after calling out for a read that was triggered by (draining the receive buffer after) a write error. This assertion was put in place to guard against `readComplete` events sent on inactive channels. It did that job just fine but crashes aren't great so we now conditionally fire the `readComplete` event if the Channel stays active. Modifications: make the readComplete event firing conditional Result: fewer crashes, more happy faces

normanmaurer

LGTM

weissi · 2018-08-24T15:57:42Z

CI seriously unhappy: https://ci.swiftnio.io/job/swift-nio-master-410-prb/519/console

11:46:21 hudson.plugins.git.GitException: Command "git clean -fdx" returned status code 1:
11:46:21 stdout: Skipping repository .build/checkouts/swift-nio-zlib-support.git--8552607645869553644
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-914925017792929693
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git--5201548940310384603
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-7278503767583922314
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-1529492468280462262
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-2261880114035040650
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git--3716276613919170701
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-9184634203830310054
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-5975790777597269952
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-9137596717834041680
[...]
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git-7681731979200588851
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git--6741020296262781494
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git--388884417884296308
11:46:21 Skipping repository .build/checkouts/swift-nio-zlib-support.git--1868812526729017302
11:46:21 Removing .build/debug
11:46:21 Removing .build/junit-sh-tests.xml
11:46:21 Removing .build/debug.yaml
11:46:21 Removing .build/build.db
11:46:21 Removing .build/dependencies-state.json
11:46:21 Removing .build/regenerate-token
11:46:21 Removing Package.resolved
11:46:21 
11:46:21 stderr: warning: failed to remove .build/repositories
11:46:21 warning: failed to remove .build/x86_64-unknown-linux
11:46:21 
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1996)
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1964)
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1960)
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1597)
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1609)
11:46:21 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.clean(CliGitAPIImpl.java:787)
11:46:21 	at hudson.plugins.git.GitAPI.clean(GitAPI.java:311)
11:46:21 	at hudson.plugins.git.extensions.impl.CleanCheckout.onCheckoutCompleted(CleanCheckout.java:30)
11:46:21 	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1238)
11:46:21 	at hudson.scm.SCM.checkout(SCM.java:504)
11:46:21 	at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
11:46:21 	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
11:46:21 	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
11:46:21 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
11:46:21 	at hudson.model.Run.execute(Run.java:1727)
11:46:21 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:46:21 	at hudson.model.ResourceController.execute(ResourceController.java:97)
11:46:21 	at hudson.model.Executor.run(Executor.java:429)
11:46:21 [Set GitHub commit status (universal)] ERROR on repos [] (sha:eee3aa3) with context:swift-nio-master-410-prb

weissi · 2018-08-24T15:57:46Z

@tomerd ^^^

weissi · 2018-08-24T15:57:52Z

@swift-nio-bot test this please

tomerd · 2018-08-24T16:14:46Z

yea looks like ci broke, will look into it

weissi · 2018-08-24T16:46:38Z

thanks @tomerd

tomerd · 2018-08-24T18:30:10Z

ugh, the ebs volume got detached. working to fix

tomerd · 2018-08-24T18:37:25Z

@swift-nio-bot test this please

tomerd · 2018-08-24T18:39:02Z

@weissi should be okay now, monitoring too

weissi · 2018-08-24T18:48:13Z

awesome, thanks so much @tomerd

weissi · 2018-08-27T10:00:45Z

also hit the confused kernel

18:46:36 ++ lsof -a -d 0-1024 -p 5196
18:46:36 COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
18:46:36 NIOHTTP1S 5196 root    0r   CHR    1,3      0t0   88849 /dev/null
18:46:36 NIOHTTP1S 5196 root    1w   REG 202,80    10106 2101813 /tmp/.swift-nio-http1-server-sh-tests_pB8Eyp/test.out_VT1Hgb
18:46:36 NIOHTTP1S 5196 root    2w   REG 202,80    10106 2101813 /tmp/.swift-nio-http1-server-sh-tests_pB8Eyp/test.out_VT1Hgb
18:46:36 NIOHTTP1S 5196 root    3w  FIFO    0,8      0t0   88578 pipe
18:46:36 NIOHTTP1S 5196 root    4w  FIFO    0,8      0t0   88579 pipe
18:46:36 NIOHTTP1S 5196 root    5u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root    6u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root    7u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root    8u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root    9u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root   10u  0000    0,9        0    5986 anon_inode
18:46:36 NIOHTTP1S 5196 root   11u  IPv4 202998      0t0     TCP localhost:36302 (LISTEN)
18:46:36 NIOHTTP1S 5196 root   12u  sock    0,7      0t0  202631 can't identify protocol
18:46:36 ++ do_netstat inet
18:46:36 ++ pf=inet
18:46:36 ++ netstat_options=()
18:46:36 ++ case "$(uname -s)" in
18:46:36 +++ uname -s
18:46:36 ++ netstat_options+=("-A" "$pf" -p)
18:46:36 ++ netstat -an -A inet -p
18:46:36 Active Internet connections (servers and established)
18:46:36 Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
18:46:36 tcp        0      0 127.0.0.1:36302         0.0.0.0:*               LISTEN      5196/NIOHTTP1Server
18:46:36 tcp        0      0 127.0.0.11:46673        0.0.0.0:*               LISTEN      -               
18:46:36 tcp        0      0 127.0.0.1:37096         127.0.0.1:43894         TIME_WAIT   -               
18:46:36 tcp        0      0 127.0.0.1:37092         127.0.0.1:43894         TIME_WAIT   -               
18:46:36 tcp        0      0 127.0.0.1:47122         127.0.0.1:36302         TIME_WAIT   -               
18:46:36 tcp        0      0 127.0.0.1:47124         127.0.0.1:36302         TIME_WAIT   -               
18:46:36 tcp        0      0 127.0.0.1:37094         127.0.0.1:43894         TIME_WAIT   -               
18:46:36 udp        0      0 127.0.0.11:55168        0.0.0.0:*                           -               
18:46:36 ++ local open_fds
18:46:36 +++ server_lsof 5196

which is #563

weissi · 2018-08-27T10:00:52Z

@swift-nio-bot test this please

…or (#594) Motivation: Previously we asserted that a Channel cannot go inactive after calling out for a read that was triggered by (draining the receive buffer after) a write error. This assertion was put in place to guard against `readComplete` events sent on inactive channels. It did that job just fine but crashes aren't great so we now conditionally fire the `readComplete` event if the Channel stays active. Modifications: make the readComplete event firing conditional Result: fewer crashes, more happy faces Motivation: Explain here the context, and why you're making that change. What is the problem you're trying to solve. Modifications: Describe the modifications you've done. Result: After your change, what will change.

weissi requested review from Lukasa and normanmaurer and removed request for Lukasa August 23, 2018 15:29

weissi commented Aug 23, 2018

View reviewed changes

weissi force-pushed the jw-chan-inactive-write-error branch from 1fc982b to 30fce55 Compare August 23, 2018 15:41

weissi mentioned this pull request Aug 23, 2018

we should drain the receive buffer on write error even if auto-read is off (probably) #593

Open

weissi force-pushed the jw-chan-inactive-write-error branch 2 times, most recently from 06c4eae to 271b6fd Compare August 23, 2018 16:32

vlm reviewed Aug 23, 2018

View reviewed changes

weissi force-pushed the jw-chan-inactive-write-error branch from 271b6fd to 57eb1f4 Compare August 23, 2018 17:37

weissi added the patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1) label Aug 23, 2018

normanmaurer approved these changes Aug 24, 2018

View reviewed changes

Merge branch 'master' into jw-chan-inactive-write-error

0d8dd37

weissi added this to the 1.9.3 milestone Aug 24, 2018

Merge branch 'master' into jw-chan-inactive-write-error

e4ba36b

weissi merged commit df764fe into apple:master Aug 27, 2018

weissi deleted the jw-chan-inactive-write-error branch August 27, 2018 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't crash when Channel goes inactive in read triggered by write error #594

don't crash when Channel goes inactive in read triggered by write error #594

weissi commented Aug 23, 2018

weissi Aug 23, 2018

vlm Aug 23, 2018

Lukasa Aug 23, 2018

weissi Aug 23, 2018

weissi Aug 23, 2018

weissi Aug 23, 2018

normanmaurer left a comment

weissi commented Aug 24, 2018

weissi commented Aug 24, 2018

weissi commented Aug 24, 2018

tomerd commented Aug 24, 2018

weissi commented Aug 24, 2018

tomerd commented Aug 24, 2018

tomerd commented Aug 24, 2018

tomerd commented Aug 24, 2018

weissi commented Aug 24, 2018

weissi commented Aug 27, 2018

weissi commented Aug 27, 2018

don't crash when Channel goes inactive in read triggered by write error #594

don't crash when Channel goes inactive in read triggered by write error #594

Conversation

weissi commented Aug 23, 2018

weissi Aug 23, 2018

Choose a reason for hiding this comment

vlm Aug 23, 2018

Choose a reason for hiding this comment

Lukasa Aug 23, 2018

Choose a reason for hiding this comment

weissi Aug 23, 2018

Choose a reason for hiding this comment

weissi Aug 23, 2018

Choose a reason for hiding this comment

weissi Aug 23, 2018

Choose a reason for hiding this comment

normanmaurer left a comment

Choose a reason for hiding this comment

weissi commented Aug 24, 2018

weissi commented Aug 24, 2018

weissi commented Aug 24, 2018

tomerd commented Aug 24, 2018

weissi commented Aug 24, 2018

tomerd commented Aug 24, 2018

tomerd commented Aug 24, 2018

tomerd commented Aug 24, 2018

weissi commented Aug 24, 2018

weissi commented Aug 27, 2018

weissi commented Aug 27, 2018