Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak #48

Merged
merged 1 commit into from
Jan 9, 2023
Merged

Conversation

doctor-pi
Copy link
Contributor

We observed a memory leak in long running-connections, traced to this code using memtrace.

To give an indication, before the fix:

image

And a similar run after the fix:

image

Backporting on top of 0.3.0:

There's also a branch fix/async-memory-leak-backport which applies the fix on top of 0.3.0, which can be useful as a starting point for a possible 0.3.1 release.

If you create a branch starting from the 0.3.0 release (12703c9), I can open another PR to merge it there.

@anmonteiro
Copy link
Owner

I was reading the differences between Pipe.downstream_flushed and Writer.flushed, and it's not clear to me why the former causes a memory leak.

I believe, nevertheless, that your fix is correct. Though now I'm curious to understand these differences, and my first hypothesis is that we're not closing something that we should be. I haven't yet dug into the code, so consider these preliminary thoughts. Is my hypothesis possible? (it's the next thing I'm going to check)

@anmonteiro
Copy link
Owner

(Is your memory leak on the client or server?)

@doctor-pi
Copy link
Contributor Author

(Is your memory leak on the client or server?)

Client

@doctor-pi
Copy link
Contributor Author

I was reading the differences between Pipe.downstream_flushed and Writer.flushed, and it's not clear to me why the former causes a memory leak.

I believe, nevertheless, that your fix is correct. Though now I'm curious to understand these differences, and my first hypothesis is that we're not closing something that we should be. I haven't yet dug into the code, so consider these preliminary thoughts. Is my hypothesis possible? (it's the next thing I'm going to check)

I also don't know exactly why this happens, also investigating...

@doctor-pi
Copy link
Contributor Author

I was reading the differences between Pipe.downstream_flushed and Writer.flushed, and it's not clear to me why the former causes a memory leak.

I believe, nevertheless, that your fix is correct. Though now I'm curious to understand these differences, and my first hypothesis is that we're not closing something that we should be. I haven't yet dug into the code, so consider these preliminary thoughts. Is my hypothesis possible? (it's the next thing I'm going to check)

To give a bit more context, this also happens in connections where the client receives streaming data from a server (we use grpc). Nothing should be sent.

These traces are from such a setup.

I'll investigate what exactly is being written, because this is a bit unexpected.

@anmonteiro
Copy link
Owner

The writes are probably coming from OpenSSL / OCaml-TLS, since the TLS protocol includes control packets sent by both peers.

@anmonteiro
Copy link
Owner

I don't think backporting is necessary, since most changes from 0.3 were adding the EIO backend.

I'm merging this as it seems to work the same as the previous implementation, sans the memory leak.

Thanks!

@anmonteiro anmonteiro merged commit 28f6c53 into anmonteiro:master Jan 9, 2023
anmonteiro added a commit that referenced this pull request Jan 9, 2023
@doctor-pi
Copy link
Contributor Author

I don't think backporting is necessary, since most changes from 0.3 were adding the EIO backend.

I'm merging this as it seems to work the same as the previous implementation, sans the memory leak.

Thanks!

Thanks!

Yeah, backporting is not strictly necessary, but we were hoping to be able to use a (fixed) opam package.

@anmonteiro
Copy link
Owner

I can just make a new release. Stand by.

anmonteiro added a commit that referenced this pull request Jan 9, 2023
* add changelog entry for #48
anmonteiro added a commit to anmonteiro/opam-repository that referenced this pull request Jan 21, 2023
…ten-eio and gluten-async (0.4.0)

CHANGES:

- gluten-eio: Add `gluten-eio` package, a gluten backend for
  [eio](https://github.com/ocaml-multicore/eio)
  ([anmonteiro/gluten#35](anmonteiro/gluten#35))
- gluten-async: Allow connecting to a UNIX domain socket
  ([anmonteiro/gluten#40](anmonteiro/gluten#40))
- gluten-async: Fix memory leak in the SSL / TLS implementations
  ([anmonteiro/gluten#48](anmonteiro/gluten#48))
@anuragsoni
Copy link
Contributor

I was reading the differences between Pipe.downstream_flushed and Writer.flushed, and it's not clear to me why the former causes a memory leak.

I think the problem is with the call toWriter.pipe and not Pipe.downstream_flushed. Write.pipe creates a new async pipe each time. Closing a writer does not close the pipe (and vice-versa). My guess is that the leak was caused because of this as you'd end up creating a pipe each time gluten-async's writev function was called, but since the pipe never gets closed you'll end up with a lot of live objects that are never cleaned up.

@doctor-pi
Copy link
Contributor Author

I think the problem is with the call toWriter.pipe and not Pipe.downstream_flushed. Write.pipe creates a new async pipe each time. Closing a writer does not close the pipe (and vice-versa). My guess is that the leak was caused because of this as you'd end up creating a pipe each time gluten-async's writev function was called, but since the pipe never gets closed you'll end up with a lot of live objects that are never cleaned up.

I suspect you might be right, might test this eventually (mostly to satisfy my curiosity, because the current solution seems cleaner anyway).

@anuragsoni
Copy link
Contributor

because the current solution seems cleaner anyway).

The current solution is indeed the right thing to do here. If you have a writer and want to check if all pending writes have finished, Writer.flushed should indeed be preferred over going through creating a pipe and checking if that pipe has finished all pending writes.

@doctor-pi
Copy link
Contributor Author

I don't think backporting is necessary, since most changes from 0.3 were adding the EIO backend.

I recall testing our stuff using the version on master and compilation failed, because some types changed since 0.3.0. So we would need to wait for other libraries that use gluten to be updated.

For example ocaml-grpc would need more work than just using a new gluten 0.3.1.

So I recommend to re-consider a backported fix and 0.3.1 release. (This does not affect us, we already use the backported version: https://github.com/doctor-pi/gluten/tree/fix/async-memory-leak-backport)

anmonteiro added a commit to anmonteiro/opam-repository that referenced this pull request Feb 11, 2023
…ten-eio and gluten-async (0.4.0)

CHANGES:

- gluten-eio: Add `gluten-eio` package, a gluten backend for
  [eio](https://github.com/ocaml-multicore/eio)
  ([anmonteiro/gluten#35](anmonteiro/gluten#35))
- gluten-async: Allow connecting to a UNIX domain socket
  ([anmonteiro/gluten#40](anmonteiro/gluten#40))
- gluten-async: Fix memory leak in the SSL / TLS implementations
  ([anmonteiro/gluten#48](anmonteiro/gluten#48))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants