Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long lived clients from grpc eventual oom an akka grpc/http2 server #2551

Closed
Marcus-Rosti opened this issue May 31, 2019 · 10 comments

Comments

@Marcus-Rosti
Copy link

commented May 31, 2019

Please refer to akka/akka-grpc#591

In brief, grpc encourages creating a long lived client to complete all http2 requests. However, these long lived clients don't allow GC to happen on the OutStreams from https://github.com/akka/akka-http/blob/master/akka-http2-support/src/main/scala/akka/http/impl/engine/http2/Http2Multiplexer.scala#158

Since the client is never closed, it keeps all of the objects active, and thus we eventually run out of heap space.

@jrudolph

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

Thanks for the report, @Marcus-Rosti. It would be good if you could provide some reproducer, so we could debug the issue. Otherwise, DEBUG log output for the HTTP/2 parts would be helpful as well.

It seems, in the file you linked, streamFor can create (or recreate) OutStreams whenever requested, even if the stream has been closed before. It would be good to find out which frame would have recreated a stream that was closed before.

Some background information:

There's some inherent difficulty in implementing HTTP/2 that is to keep track of the state of streams, especially when they are inactive, i.e. still idle or already closed. In both cases, you cannot just throw away all state because you might want to keep prioritization information even for closed streams (currently, we don't implement much of prioritization, so we should not keep prioritization information for closed streams).

One important thing, we don't seem to implement yet, is to keep a running counter of what the highest stream id is that we have seen yet. Because of https://tools.ietf.org/html/rfc7540#section-5.1.1:

   The first use of a new stream identifier implicitly closes all
   streams in the "idle" state that might have been initiated by that
   peer with a lower-valued stream identifier.  For example, if a client
   sends a HEADERS frame on stream 7 without ever sending a frame on
   stream 5, then stream 5 transitions to the "closed" state when the
   first frame for stream 7 is sent or received.

we can then maintain that any stream that's not in the outStreams map is either idle or closed and distinguish those by comparing to the highest established stream id. We can then make sure not to put those streams back in the map if they have already been closed.

@Marcus-Rosti

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

@jrudolph I have a repro case here and a screenshot of the effect. Marcus-Rosti/akka-grpc-sample-kubernetes-scala#1

@Marcus-Rosti

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

also that's very informative! Thank you.

I think the idea behind this is solid; however, it makes it impossible to release a service that will receive sustained traffic. If it's never able to release those objects the JVM will always oom eventually.

@Marcus-Rosti

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

As per my example it happens also with unary traffic

@jrudolph

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Thanks, I verified your reproducer. The cause seems to be something different in your case than what I suggested above. It seems the incomingStreams map keeps all entries in the state HalfClosedRemote. I.e. we are still waiting for the response to be finished. Maybe we are missing to report back from the outgoing side that the stream is complete?

@jrudolph

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

@Marcus-Rosti I don't see any OutStreams remaining in the heap, so there might be another issue with those? If you observed those as well, it could make sense to add this case to the reproducer so I could have a look at this as well.

@Marcus-Rosti

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

Right my case doesn't show the outstreams. If we change the call to the streaming case we'll see it. I can add that for you.

raboof added a commit that referenced this issue Jul 1, 2019

Close outgoing HTTP/2 streams earlier (#2551)
Before they would be closed when the connection closes, which caused
unnecessary memory use for long-lived HTTP/2 connections with many streams
(as is common with gRPC).
@raboof

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

Looking at the code I indeed see the risk of incoming streams remaining in HalfClosedRemote state. I started on a fix in the closeIncomingHttp2streams branch (#2585) - it seems to work functionally speaking, but I haven't been able to reproduce the problem in VisualVM very convincingly. Could you give this branch a try?

@jrudolph

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

We have fixed the incoming state part.

Regarding

One important thing, we don't seem to implement yet, is to keep a running counter of what the highest stream id is that we have seen yet. Because of tools.ietf.org/html/rfc7540#section-5.1.1:

We have implemented that logic already for the incoming side. However, it's unclear if it could still happen for outStreams. I created #2587 to verify/fix that issue.

Now that we've merged #2585, @Marcus-Rosti can you check if that already fixes your issue or if you still see Outstreams (in which case we must also fix #2587)?

@jrudolph

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

Now that we've merged #2585, @Marcus-Rosti can you check if that already fixes your issue or if you still see Outstreams (in which case we must also fix #2587)?

We publish snapshots for every master commit at https://dl.bintray.com/akka/snapshots. The version for this fix will be 10.1.8+52-50c5faa6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.