Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak through plumpy's PROCESS_STACK context variable #4698

Closed
ltalirz opened this issue Feb 1, 2021 · 9 comments · May be fixed by aiidateam/plumpy#201
Closed

memory leak through plumpy's PROCESS_STACK context variable #4698

ltalirz opened this issue Feb 1, 2021 · 9 comments · May be fixed by aiidateam/plumpy#201

Comments

@ltalirz
Copy link
Member

ltalirz commented Feb 1, 2021

Describe the bug

plumpy uses the contextvars module to enable access to the process itself (via Process.current()) at any point in time during a task run by the process.

https://github.com/aiidateam/plumpy/blob/33f4e9f029e0573f168237b8db030de537453878/plumpy/processes.py#L516-L534

When used in combination with asyncio's call_soon, call_later or call_at methods, each individual function execution actually gets their own copy of this context.
This means that as long as a handle to these scheduled executions remains in memory, the copy of the 'process stack' context var (and thus the process itself) remain in memory [1].

One example of this happening is the Transport future opened by aiida-core for file transfers - it is passed around to several classes inside aiida-core and even outside (e.g. paramiko).

def do_open():
""" Actually open the transport """
if transport_request and transport_request.count > 0:
# The user still wants the transport so open it
_LOGGER.debug('Transport request opening transport for %s', authinfo)
try:
transport.open()
except Exception as exception: # pylint: disable=broad-except
_LOGGER.error('exception occurred while trying to open transport:\n %s', exception)
transport_request.future.set_exception(exception)
# Cleanup of the stale TransportRequest with the excepted transport future
self._transport_requests.pop(authinfo.id, None)
else:
transport_request.future.set_result(transport)
# Save the handle so that we can cancel the callback if the user no longer wants it
open_callback_handle = self._loop.call_later(safe_open_interval, do_open)

The problem is: if any of these places forgets to shed the reference to this handle, not only does the transport itself remain in memory but also the Process that was responsible for creating it (and everything it references).

One can work around this issue by explicitly passing an empty context to call_soon, call_later or call_at where the context is not needed, but I would like to point out that this behaviour may be unexpected to users of plumpy and can be very difficult to debug.

Your environment

  • aiida-core Current develop

@muhrin

[1] Not sure whether memory consumption during execution should be a concern as well - if it's a shallow copy, it should be ok.

@greschd
Copy link
Member

greschd commented Feb 1, 2021

Maybe the context (or the process stack) should only hold a weak reference to the process? Or is there a use case where the process needs to be kept "in memory" by the context?

It doesn't completely eliminate the possibility of someone keeping the process alive - the weak reference can be evaluated into a "regular" reference. But probably it's less likely to happen in parts of the code that don't need the process in the first place.

In [1]: import weakref

In [2]: class TestObject:
   ...:     """needed because built-in types do not support weak references"""
   ...:

In [3]: f = TestObject()

In [4]: g = weakref.ref(f)

In [5]: g
Out[5]: <weakref at 0x7f219514d048; to 'TestObject' at 0x7f21928fd048>

In [6]: del f

In [7]: g  # dead after f has been deleted
Out[7]: <weakref at 0x7f219514d048; dead>

In [8]: f = TestObject()

In [9]: g = weakref.ref(f)

In [10]: h = g() # turn back into a normal reference

In [11]: del f

In [12]: g
Out[12]: <weakref at 0x7f219333b7c8; to 'TestObject' at 0x7f21928f0ef0>

In [13]: del h

In [14]: g # dead only after both f and h have been deleted
Out[14]: <weakref at 0x7f219333b7c8; dead>

@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2021

@greschd Thanks, I wasn't aware of the weakref option. That sounds like a great solution here.
I'll prepare a PR to plumpy

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 1, 2021

Err, I don't think using weak references is a good solution, which could non-deterministically dissappear when they are actually required, leading to potentially just as bad side-effects.
I think it is better to first actually clarify where Process.current() and hence these context vars are required, and see if perhaps there is a more "selective" way we can pass them around.

Really the only use for weak references is for caches, that are only kept for performance needs but won't break the code if they are garbage collected

@greschd
Copy link
Member

greschd commented Feb 1, 2021

which could non-deterministically dissappear

Right, if they need to be kept in memory by the context in general I agree it's not a useful solution. I wasn't sure if there might be a different part of the system that has "ownership" of the process; from the description above it seemed like the context shouldn't own the process.

Use cases for weakref are pretty rare, but I wouldn't go so far to say it's only for caches; it can also be useful to avoid circular referencing.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 1, 2021

it can also be useful to avoid circular referencing.

Python already has a cycle-detecting garbage collector though, so you should not need it for that

@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2021

which could non-deterministically dissappear when they are actually required, leading to potentially just as bad side-effects.

Let's put it this way: a broken reference in Process.current() will be way easier to debug than a hidden chunk of memory references dragged around by the process context.

If one can make the passing around of the context more explicit/obvious, that would be fine as well.

The only thing I would like to avoid is to have to rely on the diligence of plumpy users to always be aware of what is going on in the background - in order to avoid surprises like the one we had now in aiida.

@greschd
Copy link
Member

greschd commented Feb 1, 2021

should not need it for that

It can be needed if you want to implement a non-trivial __del__ that needs to be called in the right order.

Anyway, I agree (sort of) with @chrisjsewell, in that we don't know exactly what the consequence of making it a weakref is. We should definitely be careful with that, as it could seriously break things.
Since the automatic memory management hasn't really worked out here, we need to figure out who should own the process, and how best to fix that.

I also didn't understand 100% why the futures stay in memory in the first place, after they have been cancelled.

@chrisjsewell
Copy link
Member

in that we don't know exactly what the consequence of making it a weakref is. We should definitely be careful with that, as it could seriously break things.

yep thats my key point; lets not jump out of the frying pan, and into the fire 😉

@ltalirz
Copy link
Member Author

ltalirz commented Feb 9, 2021

fixed for the time being in #4699
to be fixed more generally as described in #4704 (comment)

@ltalirz ltalirz closed this as completed Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants