Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Input/Scalar AutoCloseable; fix UDF resource leaks #15765

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mfussenegger
Copy link
Member

@mfussenegger mfussenegger commented Mar 26, 2024

Within the PolyglotScalar we create a GraalVM polyglot Context that
can allocate native resources, these native resources are not
automatically released - leading to potential leaks.

We didn't notice this with JS as embedded language, but trying to add
Python surfaced this issue as the thread leak detection in the test
suite triggered.

Relevant part from the Context.close docs:

Closes this context and frees up potentially allocated native resources. A context may not
free all native resources allocated automatically. For this reason it is recommended to close
contexts after use. If the context is currently being executed on another thread, then an
{@link IllegalStateException} is thrown. To close concurrently executing contexts see
{@link #close(boolean)}.

This is a first, but incomplete step towards addressing the problem:

  • Makes Input/Scalar closeable
  • Updates the PolyglotScalar to properly close the context

What's incomplete is to go through all the Input usages and actually
close them after use. (But it would already be an improvement over the status quo, and we could already merge this and iterate on it)

@mfussenegger mfussenegger changed the title Make Input/Scalar AutoCloseable Make Input/Scalar AutoCloseable; fix UDF resource leaks Mar 26, 2024
Within the `PolyglotScalar` we create a GraalVM polyglot Context that
can allocate native resources, these native resources are not
automatically released - leading to potential leaks.

We didn't notice this with JS as embedded language, but trying to add
Python surfaced this issue as the thread leak detection in the test
suite triggered.

Relevant part from the `Context.close` docs:

    Closes this context and frees up potentially allocated native resources. A context may not
    free all native resources allocated automatically. For this reason it is recommended to close
    contexts after use. If the context is currently being executed on another thread, then an
    {@link IllegalStateException} is thrown. To close concurrently executing contexts see
    {@link #close(boolean)}.

This is a first, but incomplete step towards addressing the problem:

- Makes Input/Scalar closeable
- Updates the PolyglotScalar to properly close the context

What's incomplete is to go through all the `Input` usages and actually
close them after use.
@mfussenegger
Copy link
Member Author

@seut @matriv @romseygeek thoughts on this approach?

Alternatives:

  1. Add some kind of per-query Context that needs to be provided to Scalar.compile, and that can be bound to the life-cycle of components like CollectTask. But I'm not sure if this is any simpler
  2. Introduce some kind of ContextsPool that uses thread-locals. They'd exist over the lifetime of a node. But this wouldn't solve the leaks, only restrict how much we leak - and is dangerous if accessed by scaling thread pools. Also wouldn't solve the thread leak triggered from the test framework if we add python udf test cases.
  3. Variant of ^ but with manual acquire & release instead of thread-locals. Would have the advantage that we can properly clean up the contexts on stop and it would fix the thread leak issue in the test layer. But depending on when we close we'd still have the Context resources allocated over the lifetime of a node. If we release more eagerly, we need to add close calls all over the place, same as with the approach this PR takes currently.

Currently I'm tending to either the approach in this PR, or 3)

@romseygeek
Copy link
Contributor

An alternative to option 1) could be to make TransactionContext implement AutoCloseable, and add a trackResource(Closeable c) method that just adds c to an internally held list, which is closed when the context's close method is called. Then PolyglotLanguage.getFunctionValue can take the transaction context as an argument, and register its language Context on it.

We create transaction contexts in a lot of places so I don't know if it would be any less invasive than making Input closeable. But it feels more like a lifecycle object somehow?

@mfussenegger mfussenegger requested review from seut and matriv April 3, 2024 08:11
@matriv
Copy link
Contributor

matriv commented Apr 3, 2024

I kind of prefer @romseygeek's suggestion, or something similar. I'm sceptical regarding the Input solution, when I see this: https://github.com/crate/crate/pull/15765/files#diff-8b855ba352a12dde36505be36a145486feefcc303ab2f45e5c76019b1dd944f6R213
which means apart from going through existing usages, we don't have a "hard" api that enforces to use Input correctly (try-with-resources).

@mfussenegger
Copy link
Member Author

I'm sceptical regarding the Input solution, when I see this: https://github.com/crate/crate/pull/15765/files#diff-8b855ba352a12dde36505be36a145486feefcc303ab2f45e5c76019b1dd944f6R213
which means apart from going through existing usages, we don't have a "hard" api that enforces to use Input correctly (try-with-resources).

There's no way we can force that something gets closed. This problem will exist no matter what.

So the main criteria here are:

  • how difficult it is to reason about and verify if something gets closed. The bigger the gap between acquire and release, the harder.
  • does it fit into the component from a responsibility perspective
  • can we piggyback onto something that already gets closed (like Task, where we also have leak detections in place - but that's afaik not available everywhere we need to evaluate functions)

The life-cycle & responsibility of the transaction context would fit, but looking at how and where we create transaction contexts I don't get the impression that it's much easier to reason about if and where they would get closed.

@matriv
Copy link
Contributor

matriv commented Apr 3, 2024

Apologies, I didn't phrase it correctly, of course you have to somehow use try-with-resources or explicitly call close(), I'm skeptical about the fine grain level, of following this, if it's added on Input. I haven't looked into how easy/difficult is to do something with TxContext, I'm only thinking about it from a high-level view. If doing it at a higher-level is not easy then let's go with the Input approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants