Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections leak when restarting Huey. #374

Closed
alexandredufour opened this issue Nov 19, 2018 · 6 comments

Comments

@alexandredufour
Copy link

commented Nov 19, 2018

Preamble
Due to a memory leak inside our backend, we've developed a handy reload script to workaround the problem.
By doing so we've come across a small problem when Huey is restarting.

The problem
It seems that the consumer is not closing properly its connections and/or open files when restarting.

Here's how to reproduce

  • Start Huey
    image

  • Get process PID
    image

  • List all Redis open files ( 11 open files in this specific tests )
    image

  • Manually restart Huey by killing PID.
    image

  • List all Redis open files ( 22 in total )
    image

  • Manually restart Huey by killing PID ( again )
    image

  • List all Redis open files ( 33 in total )
    image

Highly probable cause
In consumer.py inside the run method, there's this bit of code.
image

When restarting HUEY by using os.execl, the original process get overridden by the new instance of Huey and opened file handles (including open connections) are never properly closed (Most-likely because the PID is preserved)

Solution
I will link a PR to what I think is a viable solution to this problems,
Looking forward to your answer.

@coleifer

This comment has been minimized.

Copy link
Owner

commented Nov 19, 2018

OK so if you're using the multiprocess worker/scheduler presumably you won't experience this issue since the worker/scheduler processes are stopped before the restart occurs. This would be a "normal" process exit, so their resources, such as open connections, would be closed. I mention this because I don't think the solution described in your PR would work for the multiprocess situation, as the linux implementation relies on enumerating /proc/self/fd/ from the main consumer process and all the open fds in the child processes (workers + scheduler) would need to be handled separately. It doesn't matter, though, because I am guessing that when the workers/scheduler procs terminate the OS handles the cleanup.

So we're just concerned with threads+greenlets, yeah?

@alexandredufour

This comment has been minimized.

Copy link
Author

commented Nov 19, 2018

@coleifer
Forgot to explicitly mention that we're using the thread consumers. But yes that's our concern. Multiprocess worker/scheduler is not an option for us as we are already struggling with memory usage. Do you want me to edit the PR so it only affects consumer running on thread? I haven't tried with greenlets, so I'm not sure it's needed.

@coleifer

This comment has been minimized.

Copy link
Owner

commented Nov 19, 2018

Are you all not able to run huey with a true process manager? Something like systemd, supervisor, upstart? A tool like that would probably be the correct approach, versus closing fds en-masse?

@laurrentt

This comment has been minimized.

Copy link

commented Nov 19, 2018

@coleifer (I work with @alexandredufour )
First of all, thanks for answering so fast 😄.

I'm not against using a process manager, however in my opinion that's not the correct approach. Here's a rundown of why I think it should be handled by Huey:

  • The SIGHUP signal is supported and documented by Huey, so I think that even when using thread it should not leak fds by default.
  • We found other projects using os.execl that relied on the proposed code change for cleaning up opened fds (1,2).
  • We run this in a docker container and it's recommended to run the main process as PID 1. The use of a process manager is not recommended.

In short, my recommendation would be to either implement the proposed changes or stop supporting SIGHUP restarts when using thread or gevent and documenting why and how to workaround the issue.

@coleifer

This comment has been minimized.

Copy link
Owner

commented Nov 20, 2018

Well, I think the answer is: don't use Python 2.7. I was reading up on fnctl and FD_CLOEXEC and ran across PEP 446, which:

This PEP proposes to make all file descriptors created by Python non-inheritable by default to reduce the risk of these issues.

It was accepted and is available from 3.4 onwards. I see from your logs that you're using 2.7, so that explains the issue.

I verified that it behaves as advertised -- running a threaded consumer and restarting it multiple times did not lead to a growth in the number of connections as reported by Redis.

To all the people who thumbs-up'd the issue -- are you all running 2.7?

@coleifer

This comment has been minimized.

Copy link
Owner

commented Nov 20, 2018

I think I'll mention this in the docs as a potential gotcha and close the issue.

A couple thoughts:

  1. If using the consumer with multiprocessing then you won't have issues, as the child processes exit and their resources are closed by the OS.
  2. If you are using Python 3.4 or newer then you won't have issues because Python opens files and sets the FD_CLOEXEC flag. The OS will handle closing the file-descriptors when an exec occurs.
  3. If you are using Python 2.7 + threads + a process manager of some kind then you won't have issues.

If you still want to implement some kind of cleanup routine, I'd suggest using fcntl.fcntl(fd, fcntl.FD_CLOEXEC) rather than os.close(fd). Letting the OS handle it seems like a much safer option.

@coleifer coleifer closed this Nov 20, 2018

coleifer added a commit that referenced this issue Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.