Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncIterator causes memory leaks in production #124

Closed
TimSusa opened this issue Aug 22, 2018 · 18 comments
Closed

AsyncIterator causes memory leaks in production #124

TimSusa opened this issue Aug 22, 2018 · 18 comments

Comments

@TimSusa
Copy link

TimSusa commented Aug 22, 2018

Hi,

this is not an issue directly related to your repository, but as contributor of the project: https://github.com/axelspringer/graphql-google-pubsub I can say we use a very similar version of the AsyncIterator like you have. I wonder if you ran into similar issues?

We observe huge problems with memory leaks on production, where about 100 users are on the system in parallel. It takes about one hour until the memory reaches its limit and a restart of the container is triggered. We only saw this, because of our monitoring, otherwise nobody would have taken notice about it.

Heapdumps gave us the assumption, async iterator is the bad guy here:

screen shot 2018-08-22 at 11 33 41

Here my Questions:

  1. Is somebody using this project in production?
  2. Did anybody observe similar problems?
  3. What could be the right way in rewriting that AsyncIterator?

By the way, I wonder, why you are not interested in this pull-request, which seems to adress the problem, described: #114
--> We will try that out tomorrow and come back with results.

@davidyaha
Copy link
Owner

Hey @TimSusa! Thanks for opening this issue. I have been following #114 and it seems that I would rather have this package inline with graphql-subscriptions package. The PR over there was never merged as well apollographql/graphql-subscriptions#147.
Also it seems as though the leak has a lot to do with how you implement you asyncIterator/asyncIterable and I hadn't faced issues with it thus far.

If that change will end up fixing the issues you had, let me know and I'll gladly merge it.
I think that the concept of asyncIterator is not well known to the majority of js developers yet and that's a big part of the issue.. I've only used it in the context of graphql-subscriptions so I am sure no expert with it.

Would love to help trying to reproduce your issue with this package if and if it is reproduces, I will give it my full attention.

Thanks again and best of luck!

@dobesv
Copy link

dobesv commented Sep 18, 2018

I am having the same issue, by the looks of it. I think it does have to do with how you use the asyncIterator.

@dobesv
Copy link

dobesv commented Sep 18, 2018

From reading through:

tc39/proposal-async-iteration#126

I believe this issue is exposed if you are using async generators syntax sugar, e.g.:

const pubsub = new RedisPubSub(...);

const exampleSubscriptionResolveFunction = async * () => {
  const asyncIter = pubsub.asyncIterator('topic');
  for await (const elt of asyncIter) {
    if(elt.whatever !== 'bla') continue;
    yield elt;
  }
}

In this case the return call from graphql into the asyncIterable returned by this function isn't properly propagated to the pubsub-returned asyncIterator, so it never unsubscribes.

@TimSusa
Copy link
Author

TimSusa commented Sep 21, 2018

Hi,

at first I am sorry for that delay. Following points bubbled up in between at our side. We found out for ourselves to improve the situation by sending keep alive messages to the client. This helped to reduced the amazing amount of dangled connections.

Furthermore, an AWS Autobalancer made a lot of riot by canceling each connection at specific time. We get pay attention about that.

at second we gave our node process a flag called: "--optimize_for_size", according to this article: https://medium.com/@snird/do-not-use-node-js-optimization-flags-blindly-3cc8dfdf76fd

this helped to run the garbage collector more frequently, because we have frequent new data.

at third we improved our open source project to use unit-tests at all and converted this to type-script: https://github.com/axelspringer/graphql-google-pubsub/commits/master

At the moment we will have a look about how it will evolve and look forward.

Best,

Tim Susa

@TimSusa TimSusa closed this as completed Sep 21, 2018
@groundmuffin
Copy link

groundmuffin commented Dec 19, 2018

I have the same issue in production: every time the message is received, memory increase.
This is the code of my subscribe function

subscribe: (_, args, context, info) => {
	const { pubsub, REDIS_NOTIFICATIONS_CHANNEL, user_id } = context;
	const userIdAsStr = user_id.toString();
	const filterByUser = ({ channel } = {}) => channel === userIdAsStr;
	const iterator = pubsub.asyncIterator(REDIS_NOTIFICATIONS_CHANNEL);
	const filtered = withFilter(
		() => iterator,
		filterByUser
	);
	return filtered(_, args, context, info);
},

@dobesv
Copy link

dobesv commented Dec 19, 2018

You might have to move your call to pubsub.asyncIterator into the callback passed to withFilter to ensure that the the iterator is not created if it is never used.

@groundmuffin
Copy link

You might have to move your call to pubsub.asyncIterator into the callback passed to withFilter to ensure that the the iterator is not created if it is never used.

Unfortunately, the same result.
I tried to combine node's "--optimize_for_size" cli-option with "NODE_OPTIONS=--max-old-space-size=XXX" environment variable (refers to V8 options) as @TimSusa mentioned, to enforce better garbage collecting. This seems to improve the result, but leaking persist.

I use the similar code in other project, but with graphql-rabbitmq-subscriptions pubsub instead (also based on AsyncIterator), and it's not leaking.

@davidyaha
Copy link
Owner

@groundmuffin @dobesv @TimSusa Please have a look at v2.1.2 and let us know if that fixed your issue. If it does, praise @jedwards1211 for the fix!

@jedwards1211
Copy link
Contributor

It probably doesn't...I discovered the main cause of memory leaks in our application was using async generators for stuff like

async function * subscribe() {
  await checkUserPermissions()
  for await (const event of redisPubSub.asyncIterator('foo')) {
    yield {Foo: event}
  }
}

I'm considering making a babel plugin to fix this use case, but right now the only solution is to not use async generators.

@davidyaha
Copy link
Owner

@jedwards1211 Not sure I am following, should we roll back the fix? Or are you saying that those are not directly related issues?

@dobesv
Copy link

dobesv commented Dec 25, 2019

The fix might be helpful, but it won't fix the issue fully. The async generator and for await loop still won't pass through the fact that the caller is not iterating any more.

@jedwards1211
Copy link
Contributor

No rollback needed, the issue I'm talking about is a separate thing. My PR still lowers the risk of memory leaks

@ursualexandr
Copy link

Having same issues in production - when there are many subscribers service is being restarted.

@jedwards1211
Copy link
Contributor

@ursualexandr are you using any for await loops?

@ursualexandr
Copy link

ursualexandr commented Feb 10, 2020

@ursualexandr are you using any for await loops?

no, I don't think so

Subscription: {
    NewSubscription: {
      subscribe: withFilter(
        () => pubsub.asyncIterator('subscription_topic'),
        (payload, variables, context: Context) => {
          const hasAccess = validation(context.profile!, variables.channelId);
          if (!hasAccess) {
            log.error('User does not have access');
            throw new Error('user does not have access');
          }
          return payload.channelId === variables.channelId;
        }
      )
    }
}

Doe it matter if my pubsub.publish is async?

await pubsub.publish('subscription_topic', {
          data,
          channelId
})

@jedwards1211
Copy link
Contributor

Definitely doesn't matter about publish. It looks like withFilter swallows rejected promises from your filter function, treating them the same as returning false: https://github.com/apollographql/graphql-subscriptions/blob/master/src/with-filter.ts#L19

So this might be causing subs to pile up if you were expecting your error to terminate the subscription.
I'm not aware of any other risk of a memory leak in withFilter though...

@ursualexandr
Copy link

https://github.com/apollographql/graphql-subscriptions/blob/master/src/with-filter.ts#L19

I've tried to return false instead of throwing an error and it still keeps restarting ....

@julianguinard
Copy link

julianguinard commented Apr 7, 2020

Any update on this? We also have the issue with the following code in production on our chat server

NodeJS V13.2.0
graphql-redis-subscriptions: V2.2.1
graphql-subscriptions: V1.1.0

{
  joinedRoomWatcher: {
    subscribe: withFilter(
      (root, args, context) =>
        pubsub.asyncIterator(
          `${config.get('pubsub.prefix')}PARTICIPANT_JOINED_ROOM`
        ),
      async (payload, args, { identity }, info) => {
        // Assert identity matches w/ company rooms
        const { participant_id: participantId } = payload;
        const companiesIds = (await identity.getUser).companiesIds;
        return participantId && companiesIds.indexOf(participantId) !== -1;
      }
    ),
    resolve: (payload) => new Room(payload.room),
  }
}

Scenario:

  • Open a webpage that triggers that subscription once
  • Force GC in chrome devtools & make snapshot 1
  • Refresh the same webpage 10 times, then close it
  • Force GC in chrome devtools & make snapshot 2

=> Leaky result : after calling the same graphQL subscription 10 times by refreshing the same window before closing it, PubSubAsyncIterator has retainers in memory that are never evicted even after GC was forced before creating snapshots 1 & 2

Chrome devtools analysis below

Chrome devtools

Memory consumption in production below

Cloudwatch

UPDATE : apparently this had something to do with websocket payload originating from the client not specifying a fixed ID when opening subscription. Adding this ID as in the image below results in PubSubAsyncIterator not retaining memory after GC anymore

Sélection_899

Note that I was not using a particular GraphQL Client library on the frontend for this leaky case, but the raw Websocket API. Clients such as Apolo auto-add these IDs to subscriptions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants