-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventfd not closed after executor finish #448
Comments
Thanks @thirstycrow . I reproduced this, and I will hunt where this is coming from. Leave this to me. I suggest we manually raise the limit of file descriptors to a very high number so you can test your PR, and I'll fix this later. |
Ok, I know why this happens. We keep a clone of the sleep notifier inside task, and there is a problem that we are aware of for a long time now, but has been a minor bother: tasks that are not runnable do not have their destructors run when the executor drops. So that reference count never drops. |
I raised the limit of open files, and the test lasts until round 2723, panicked with |
As a status update, I spent some time trying to fix this, but it is really hard because tasks often get destroyed under our nose. This brought me back to the refcount hell in the task structures. I'll keep looking at it. |
This turned out to be pretty hard to fix, so I didn't get back to it =( I will notice, though, that creating executors is a fairly expensive operation, and this bug manifests with executors being created / destructed very frequently. Outside of test cases - like described in this bug, I'd recommend you work with long-lived executors when possible, which would not manifest the issue. |
so i'm new to rust and maybe there is an easy way around this. my service is basically a cronjob that runs periodically and does the following:
What i'm trying to use glommio for is reading files using Direct I/O (to bypass page cache since i need to read all the files on disk basically and caching wont' help but use more resources). So for each file, i'm basically doing:
I tried to work around this by moving the executor.run call up the chain to contain all the logic so that i run it once when the service starts but then ran into other problems, mainly any tokio based function stopped working such as timers, http requests using (hyper), etc.. I found a glommio version of timers but didn't get to fixin the http request part and wanted to get some feedback to see if there is a better way? maybe i can do
at the start of the service and call
per file? not sure if that would work around the eventfd leak |
My recommendation would be to try to move the timer logic inside the executor as well. For everything that does I/O (including timers), they have to be glommio primitives. The one thing that works from Tokio is channels: so the way to do it is to add a channel for communication, then timers and files would go inside the executor. Hyper indeed it's a bit harder, but if you can, you can read the requests in tokio, and pass just what you need through channels. Forget for a moment this would be Rust executors: the recommended architecture would still be something like a thread pool. Long lived executors are essentially a thread pool, so you just have to pass data to them when you need it. |
got it, thank you! |
Came across this issue trying to write a property test via proptest with a lot of test cases generated (since each test invocation would create a LocalExecutor and leak). Here's the workaround I came up with for my tests.
|
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
Blocking versions can't be implemented because they would require spawning a new thread to avoid jamming up the background thread used for other syscalls (the jamming up can generate a deadlock which is what testing found). The blocking version would require spawning a new thread with a standalone executor but that's horibbly expensive AND runs into DataDog#448.
As the following benchmark runs, there's an increasing number of eventfds listed by
lsof
. With 2 new executors created in each round of test, there's 12 more open eventfds exists after the executors finish.The text was updated successfully, but these errors were encountered: