Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests with Large POST Body Immediately Goes into Delay Cancellation Right After the Server Received the Request #460

Closed
raphaelNguyen opened this issue Jan 30, 2024 · 9 comments
Labels
question Further information is requested

Comments

@raphaelNguyen
Copy link

raphaelNguyen commented Jan 30, 2024

Environments

  • pekko-http:1.0.0
  • pekko:1.0.1
  • play-framework:3.0.0

Issue also observed with the following dependencies

  • akka-http:10.2.10
  • akka:2.6.21
  • play-framework:2.9.0

Issue description

I use pekko and pekko-http via play-framework and the following issue seems to only be reproducible when I receive the request via my nginx. If I hit my server through localhost:<server-port> with the same large request, the request does not suffers the same issue.

When server receives a request with a large POST body (~18400 bytes or more) from nginx, the request immediately goes into delayed cancellation after the request was received with the below debug log. Some debugging shows that this happens before the request is routed to the corresponding play Action and gets processed. If the server takes longer than the linger-timeout (default of 1 minute) to process the request, the connection is cancelled by the server.

[DEBUG] o.a.p.h.i.u.StreamUtils$DelayCancellationStage$$anon$11: Delaying cancellation for 1 minute

In the past, I've encountered these symptoms when using play-framework:2.8.x (akka:2.6.x and akka-http:10.1.x). At the time, I found that upgrading akka-http to 10.2.1 fixed the issue. However, after the upgrading to playframework:3.0.0 and adopting pekko, the same symptoms can be observed again. I've confirmed that the issue also happen on playframework:2.9.0 and its akka versions (listed above).

I'm not sure if this issue has been reintroduced in akka-http:10.2.10 and pekko has inheritted it and would like to ask for your assistance.

Debug Information

While I do not have a good minimum reproducible case to report, I've obtained some debug information I hope could help in figuring out the issue.
These logs contains the information that would have been printed by the following lines...

https://github.com/apache/incubator-pekko/blob/cfff9c53df859bb0f4407caf4821e7831dabeb19/stream/src/main/scala/org/apache/pekko/stream/impl/fusing/GraphInterpreter.scala#L551-L553

  println(
    s"$Name PUSH ${outOwnerName(connection)} -> ${inOwnerName(
        connection)}, ${connection.slot} (${connection.inHandler}) [${inLogicName(connection)}]")

https://github.com/apache/incubator-pekko/blob/cfff9c53df859bb0f4407caf4821e7831dabeb19/stream/src/main/scala/org/apache/pekko/stream/impl/fusing/GraphInterpreter.scala#L563-L564

  println(
    s"$Name PULL ${inOwnerName(connection)} -> ${outOwnerName(

https://github.com/apache/incubator-pekko/blob/cfff9c53df859bb0f4407caf4821e7831dabeb19/stream/src/main/scala/org/apache/pekko/stream/impl/fusing/GraphInterpreter.scala#L517-L519

    println(
      s"$Name CANCEL ${inOwnerName(connection)} -> ${outOwnerName(
          connection)} (${connection.outHandler}) [${outLogicName(connection)}]")

... and the cause for cancellation at
https://github.com/apache/incubator-pekko/blob/cfff9c53df859bb0f4407caf4821e7831dabeb19/stream/src/main/scala/org/apache/pekko/stream/impl/fusing/GraphInterpreter.scala#L522

The POST body sent is a text/plain body with the content data=0000000000[...] with the number of zero being specified by me to control the size of the body. The log contains all PUSH, PULL and CANCEL up to the time when the request gets to the play Controller.

Thank you very much for your time and if you require more information, please do not hesitate to ask.

@jrudolph jrudolph added the question Further information is requested label Jan 30, 2024
@jrudolph
Copy link
Contributor

Hi @raphaelNguyen, thanks for the report. Can you give a more high-level description of the problem? What are you trying to achieve and with which code and what happens in the error case?

The difference between the cases is that in the error case, the data comes in multiple chunks of data. There is no guarantee on how data is split up in the entity data stream. On local host and without TLS, it is far more likely that data is received in bigger chunks than with an actual connection with higher latency and maybe chunking caused by TLS and other networking components. In any case, as said before, there's no guarantee on how the data is split up. Making a guess, the problem here is that some component expects data to be split up in a certain way (or not at all). Any previous solution might have been accidental based on lucky timing.

Your logs show play.api.mvc.BodyParsers$TakeUpTo, fold and headOption stream components. Are these based on your code?

@jrudolph
Copy link
Contributor

This seems to be an HTTP/1.0 request with Connection: close which is somewhat uncommon, so maybe something else is going on as well.

@jrudolph
Copy link
Contributor

I can somewhat reproduce your issue, assuming that you cancel the request stream after having read all bytes (which is somewhat expected based on the TakeUpTo -> fold -> headOption flow but cancellation is somehow not shown in your logs?). If that's really the case, then the behavior is somewhat expected but probably under-documented. You must not cancel a request entity stream if you want to send out a response (but slurp the rest in, in this case, the rest would actually be empty but you still need to request it instead of cancelling the stream).

Here's the test, to be put into HttpServerSpec:

    "round-trip with HTTP/1.0 POST request with Default entity (entity received in bits) and Connection: close" in assertAllStagesStopped(new TestSetup {
      send(
        """POST / HTTP/1.0
          |Host: example.com
          |Connection: close
          |Content-Length: 12
          |
          |abcdef""")

      val entity =
      inside(expectRequest()) {
        case HttpRequest(POST, _, _, entity, _) =>
          entity.contentLengthOption shouldEqual Some(12)
          entity
      }
      requests.request(1) // emulate that the server with pull in more requests early

      val dataProbe = ByteStringSinkProbe()
      entity.dataBytes.runWith(dataProbe.sink)
      dataProbe.expectUtf8EncodedString("abcdef")

      send("ghijkl")
      dataProbe.expectUtf8EncodedString("ghijkl")
      dataProbe.cancel()
      // proper user behavior instead of cancel makes the test pass:
      // dataProbe.request(1)
      // dataProbe.expectComplete()

      // let the server some time to propagate messages / prepare the response
      Thread.sleep(2000)

      responses.sendNext(HttpResponse(protocol = HttpProtocols.`HTTP/1.0`))
      expectResponseWithWipedDate(
        """HTTP/1.0 200 OK
          |Server: pekko-http/test
          |Date: XXXX
          |Content-Length: 0
          |
          |""")

      shutdownBlueprint()
    })

Somewhat similar previous instance: akka/akka-http#3458

(Above explanation about why localhost version work is still valid, request is received fully -> HttpEntity.Strict, data is now fully detached from the http connection, so whatever you do has no consequence on the actual connection).

I can remember a previous issue of head being used with Play leading to issues but cannot find the reference any more (years ago).

@raphaelNguyen
Copy link
Author

raphaelNguyen commented Jan 31, 2024

Hi @jrudolph. Thank you for spending the time diving into this issue. I'm learning a lot here.

I'll try to answer your questions to the best of my abilities but let me preface all this by stating that I'm using pekko and pekko-http stock configuration out of play-framework:3.0.0. The only configurations that I put on top is configuring some time-outs to be longer. In the application, I never touch pekko/pekko-http code directly. The logs I obtained was me putting down a few breakpoints at the specified lines and logging when the breakpoint is hit.

Can you give a more high-level description of the problem? What are you trying to achieve and with which code and what happens in the error case?

After upgrading to play-framework:3.0.0 from 2.8.20, I noticed that larger requests time out at around 1 minute.

Turning on all debug loggings shows me the debug message from StreamUtils stating that the connection is being put into delayed cancellation right after the request data is received by the server. In these error cases, the connection is killed by the server after linger-timeout even though the server was still working on processing the request normally. This was not what I expected as requests that are a few bytes less in size is being received and processed normally.

I can reproduce the issue with a bare-boned play-framework application (https://github.com/integrated-application-development/play-premature-close-connection), as long as the request is coming through nginx. A direct request does not seem to trigger this issue. Happens both on play-framework 3.0.0 and 2.9.0.

After posting this issue, I was also able to reproduce the problem in play-framework:2.9.0 but locking the akka-http version to 10.2.9 which I was on previously when using play-framework:2.8.20. This is making me think the problem might have been in how play 2.9.0/3.0.0 is using akka/pekko instead of a problem in akka/pekko itself.

The difference between the cases is that in the error case, the data comes in multiple chunks of data.

Monitoring the TCP/TLS comminucation, I can see that when directly hitting localhost, the data is being sent in in 1 chunk.

However, the case where I sent 18356 bytes request via nginx (1 byte less than error case), the TCP/TLS communication shows the same pattern of traffic as the error case, but my breakpoint loggings show that it's also being received in 1 chunk and did not error out, which I find interesting.

Your logs show play.api.mvc.BodyParsers$TakeUpTo, fold and headOption stream components. Are these based on your code?

To the best of my knowledge, these are not based on my code as I do not interact with pekko directly.

This seems to be an HTTP/1.0 request with Connection: close which is somewhat uncommon, so maybe something else is going on as well.

As far as I know, my nginx is performing the same as before I upgraded to play-framework:3.0.0 and using pekko. I will investigate this further. Maybe the way play-framework/pekko handling this has changed in the upgrade.

You must not cancel a request entity stream if you want to send out a response (but slurp the rest in, in this case, the rest would actually be empty but you still need to request it instead of cancelling the stream).

Potentially new play behaviour as I don't touch the stream directly. I will dig into play code a bit and maybe take this problem to them as well.

Thank you again for all your assistance thus far in diagnosing this issue. If I find any more information, I'll continue to update you here.

@raphaelNguyen
Copy link
Author

After some tweaking, I was able to reproduce the issue on my minimal play application without going through nginx. KeepAlive needs to be disabled and the request needs to be sufficiently large.

@mkurz
Copy link
Contributor

mkurz commented Feb 22, 2024

I can reproduce this in Play, it's because we have HTTP pipelining enabled be default, see

@pjfanning
Copy link
Contributor

@mkurz does this work with Akka HTTP, ie have we broken this in Pekko HTTP?

@mkurz
Copy link
Contributor

mkurz commented Feb 22, 2024

@pjfanning Behaviour in Akka HTTP 10.2.x and Pekko HTTP v1 is the same, so you did not break anything.

@raphaelNguyen
Copy link
Author

raphaelNguyen commented Feb 27, 2024

Thank you @jrudolph and @mkurz and others for looking into this issue. This issue has been resolved for me by playframework/playframework#12351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants