New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable Akka HTTP bug in server side WebSockets handling #1515

Closed
jroper opened this Issue Nov 6, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@jroper
Contributor

jroper commented Nov 6, 2017

Lagom is seeing a lot of these errors in its integration tests when using WebSockets:

2017-11-06 12:56:54,422 DEBUG akka.actor.RepointableActorRef - Aborting tcp connection to /127.0.0.1:42630 because of upstream failure: requirement failed: Cannot push port (SubSource.out(326501899)) twice
scala.Predef$.require(Predef.scala:224)
akka.stream.stage.GraphStageLogic.push(GraphStage.scala:548)
akka.stream.impl.fusing.SubSource$$anon$4$$anonfun$1.apply(StreamOfStreams.scala:724)
akka.stream.impl.fusing.SubSource$$anon$4$$anonfun$1.apply(StreamOfStreams.scala:721)
akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:443)
akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:453)
akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:546)
akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:728)
akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$shortCircuitBatch(ActorGraphInterpreter.scala:718)
akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:744)
akka.actor.Actor$class.aroundReceive(Actor.scala:517)
akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:653)
akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
akka.actor.ActorCell.invoke(ActorCell.scala:496)
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
akka.dispatch.Mailbox.run(Mailbox.scala:224)
akka.dispatch.Mailbox.exec(Mailbox.scala:234)
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The problem with this stack trace is that we don't know what invoked SubSourceOutlet.push due to the asynchronous boundary between this method and the SubSource handling. However, neither Play nor Lagom use any SubSourceOutlet's (it's supposed to be an internal Akka streams API anyway), so the most likely cause of this is something wrong in Akka HTTP's WebSocket support which does create a SubSourceOutlet. I can't think of any other way this error could end up happening.

Also note that because SubSource just creates a generic name SubSource.out for its outlet, there's no way to map this error back to the original source. I'll raise a bug about that.

More details about the bug are reported in lagom/lagom#1054.

These tests are doing a lot of starting up WebSockets and then terminating them immediately, so it could be a race condition on startup.

@jroper

This comment has been minimized.

Show comment
Hide comment
@jroper

jroper Nov 6, 2017

Contributor

There does seem to be a race condition in HttpServerBluePrint.scala.

fromNet is pulled whenever toHttp receives a pull. There is no reason why it can't have been pulled before switchToWebSocket is invoked, whatever is handling HTTP requests might pull many times before emitting an HTTP response, plus even if only one HTTP request is handled at a time, there's still a possible race condition between when the response is handled and when the fromNet is pulled. So then the sub source is connected, but it may not pull immediately, and even if it does, if fromNet pushes before it's able to execute that pull, then you'll get the above exception, due to this line not checking if sourceOut is available before pushing to it:

override def onPush(): Unit = sourceOut.push(grab(fromNet).bytes)

Essentially, we have a race condition between a client sending its first web socket frame, and the web socket handler pulling for it.

Contributor

jroper commented Nov 6, 2017

There does seem to be a race condition in HttpServerBluePrint.scala.

fromNet is pulled whenever toHttp receives a pull. There is no reason why it can't have been pulled before switchToWebSocket is invoked, whatever is handling HTTP requests might pull many times before emitting an HTTP response, plus even if only one HTTP request is handled at a time, there's still a possible race condition between when the response is handled and when the fromNet is pulled. So then the sub source is connected, but it may not pull immediately, and even if it does, if fromNet pushes before it's able to execute that pull, then you'll get the above exception, due to this line not checking if sourceOut is available before pushing to it:

override def onPush(): Unit = sourceOut.push(grab(fromNet).bytes)

Essentially, we have a race condition between a client sending its first web socket frame, and the web socket handler pulling for it.

@jroper jroper referenced this issue Nov 6, 2017

Merged

Sets Akka-HTTP as default backend #1054

6 of 7 tasks complete

jroper added a commit to jroper/akka-http that referenced this issue Nov 6, 2017

Fix race condition in WebSocket switch
Fixes akka#1515

Ensures that we only push to the WebSocket handling flow if it has first
pulled.
@jroper

This comment has been minimized.

Show comment
Hide comment
@jroper

jroper Nov 6, 2017

Contributor

Fixed in #1516.

Contributor

jroper commented Nov 6, 2017

Fixed in #1516.

@jrudolph

This comment has been minimized.

Show comment
Hide comment
@jrudolph

jrudolph Nov 6, 2017

Member

I created another ticket over there akka/akka#23929 to track the bad stack trace. There doesn't seem to be a reason to fail eagerly with more context.

Member

jrudolph commented Nov 6, 2017

I created another ticket over there akka/akka#23929 to track the bad stack trace. There doesn't seem to be a reason to fail eagerly with more context.

@dispalt

This comment has been minimized.

Show comment
Hide comment
@dispalt

dispalt Dec 6, 2017

@jroper thank you this fixed a big problem for me!

dispalt commented Dec 6, 2017

@jroper thank you this fixed a big problem for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment