-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program hangs after successful shutdown #50
Comments
Theoretically, after the timeout of Did you spawn it through a I could theoretically wait for it then through guard objects, but I wouldn't have any possibility of cancelling it. The shutdown manager can only cancel futures it knows, for obvious reasons :) But none of this matters if you didn't use |
Hmm. It seems that |
I was just looking into lookup_host() as well for reproduction, and figured the same thing. I really wonder how this could be resolved nicely. I do use vanilla threads in two other places in my program, but they all react to oneshot receivers being dropped and exit immediately. |
The only thing I found so far would be So spawning another runtime inside of |
This seems to exhibit the behaviour you described: use std::{thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::select;
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let mut hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
let joinhandle_ref = &mut hanging_task;
select! {
e = joinhandle_ref => e.unwrap(),
_ = subsys.on_shutdown_requested() => {
hanging_task.abort()
},
}
Ok(())
}
#[tokio::main]
async fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(2000))
.await
.map_err(Into::into)
} |
Yeah, that seems to be pretty much it. I'm screening the tokio docs and code, but haven't found anything for it yet. |
I could add something along those lines in use std::{thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::select;
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
select! {
e = hanging_task => e.unwrap(),
_ = subsys.on_shutdown_requested() => {
// Spawn thread that kills leftover tasks if necessary
thread::spawn(|| {
thread::sleep(Duration::from_millis(1000));
println!("Shutdown seems to hang. Killing program ...");
std::process::exit(1);
});
},
}
Ok(())
}
#[tokio::main]
async fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(200))
.await
.map_err(Into::into)
} What do you think? The idea is that we spawn a thread. If |
I think that's a sane way to handle it, as long as the kill timeout is configurable. Though technically we already have timeout defined in |
The only problem is that it isn't guaranteed that We could add a Might have to sleep over it, though. I'm not perfectly happy with this yet, it seems too hacky. |
Sure thing, take your time (not too much please, I definitely like this resolved ;)). I agree that it should be an opt-in feature though. |
@DASPRiD Until I/we come up with a solution, you can work around it like this: use std::{thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::select;
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
select! {
e = hanging_task => e.unwrap(),
_ = subsys.on_shutdown_requested() => (),
}
Ok(())
}
#[tokio::main]
async fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(200))
.await?;
thread::spawn(|| {
thread::sleep(Duration::from_millis(1000));
log::error!("Shutdown seems to hang. Killing program ...");
std::process::exit(1);
});
Ok(())
} |
Feel free to post further thoughts or ideas, I'm open for everything on this problem. I'm struggling to think of something graceful right now, I'll also do some further research. |
I agree that this is definitely not the nicest solution. Considering that such a blocking thread should usually just be ignored and an exit should happen immediately, this feels like an issue on the tokio side. Granted though, I was only running into this issue because I had to restart my program because of network issues, and that's exactly when I noticed this problem of course. As long as the network just works, the shutdown is normal. So this is definitely an edge-case. |
The best solution I can come up with right now is to put above chunk of code in a function, like
So that people can use it like this: use std::{thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::select;
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
select! {
e = hanging_task => e.unwrap(),
_ = subsys.on_shutdown_requested() => (),
}
Ok(())
}
#[tokio::main]
async fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(200))
.await?;
kill_on_hang(Duration::from_secs(1));
Ok(())
} But it doesn't feel right. I agree that I think the solution should be on tokio side somewhere. And it is, if you use tokio's runtime directly and can call |
@DASPRiD In case you want to report it to tokio, this is the minimal reproducible example: use std::{thread, time::Duration};
#[tokio::main]
async fn main() {
tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
} In my opinion it should exit immediately, but it doesn't. My reasoning is that this also exits immediately: use std::{thread, time::Duration};
fn main() {
thread::spawn(|| {
thread::sleep(Duration::from_secs(10));
});
} Further, this also exits immediately: use std::time::Duration;
#[tokio::main]
async fn main() {
tokio::spawn(async {
tokio::time::sleep(Duration::from_secs(10)).await;
});
} |
I should do that, thanks a lot! I'll reference this issue over there. |
I've created an issue on the tokio repository. Maybe you can chime in there as well if you have anything to add ❤️ |
@DASPRiD I prototyped a solution in tokio: Cargo.toml:
use std::{thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::select;
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let mut hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(100));
});
let joinhandle_ref = &mut hanging_task;
select! {
e = joinhandle_ref => e.unwrap(),
_ = subsys.on_shutdown_requested() => (),
}
Ok(())
}
#[tokio::main(ignore_hanging_threads = true)]
async fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(2000))
.await
.map_err(Into::into)
} What are your thoughts? |
That looks quite good to me. The parameter name is a little ambiguous though maybe, as this is quite specific to tokio spawned threads. Apart from that, it looks like a really clean solution. |
Yah, it's still open for debate. Now I just need two things:
The second one could take a while, tokio is notoriously understaffed with many open pull requests. They are doing an excellent job, just to clarify, but there's just too many people wanting things done :D So be patient I guess |
Maybe |
I feel you on that, coming up with short but still descriptive names is… hard. Maybe something like |
Sounds good, might use it. Gotta leave for today though, can't promise when I'll get to it. Greetz |
Thanks a lot for your effort! :) |
@DASPRiD I opened a PR for the first part of the fix, fyi: tokio-rs/tokio#5360 |
Thanks for keeping me in the loop. I assume a second PR would add that to the tokio-main macro? |
Exactly :) it's currently completely impossible to fix that reliably, with our without |
@DASPRiD It might be unlikely that we get the parameter for |
I see that there was no movement on the PR yet, I really hope they get around to it eventually :) |
I'm not sure, they seem to be quite busy ... I'll ask again in their discord soon. |
@DASPRiD After a long discussion they rejected the pull request and instead proposed a solution on our side. Their proposal was to wrap the runtime in a newtype that calls Here is the documentation on what With that, this is how such a newtype thing would look like: use std::{future::Future, thread, time::Duration};
use env_logger::{Builder, Env};
use miette::Result;
use tokio::{runtime::Runtime, select};
use tokio_graceful_shutdown::{SubsystemHandle, Toplevel};
struct RuntimeWithInstantShutdown(Option<Runtime>);
impl RuntimeWithInstantShutdown {
pub fn new() -> Self {
Self(Some(
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap(),
))
}
pub fn block_on<F: Future>(&self, future: F) -> F::Output {
self.0.as_ref().unwrap().block_on(future)
}
}
impl Drop for RuntimeWithInstantShutdown {
fn drop(&mut self) {
self.0.take().unwrap().shutdown_background()
}
}
async fn hanging_task(subsys: SubsystemHandle) -> Result<()> {
let hanging_task = tokio::task::spawn_blocking(|| {
thread::sleep(Duration::from_secs(10));
});
select! {
e = hanging_task => e.unwrap(),
_ = subsys.on_shutdown_requested() => (),
}
Ok(())
}
fn main() -> Result<()> {
// Init logging
Builder::from_env(Env::default().default_filter_or("debug")).init();
RuntimeWithInstantShutdown::new().block_on(async {
Toplevel::new()
.catch_signals()
.start("Hanging Task", hanging_task)
.handle_shutdown_requests(Duration::from_millis(200))
.await
.map_err(Into::into)
})
} |
@Finomnis Interesting, thanks a lot for your effort! Do you have any plan to have a macro for that within this library to make it easier for users? |
Not really, to be honest. It's different enough that it shouldn't be part of this crate, in my opinion. It's just a few lines of code, so I don't think it's even worth abstracting it much. Rather the opposite: because runtime configuration is so highly context dependent, I think this is already the best and most generic API. Writing a macro would only make my crate more opinionated, and I'm trying to keep it as generic and unopinionated as possible. |
Hi there,
During some recent testing I ran into an "interesting" issue with SIGINT and similar shutdowns. The exact situation I identified is the following:
Deep inside one of my subsystems, I do an async
lookup_host()
, which has a timeout of apparently 20 seconds, which is await there. When hitting CTRL+C, all subsystems shut down and I gettokio_graceful_shutdown::toplevel] Shutdown successful.
.At this point the program does not terminate though – it only exits after the
lookup_host()
future resolves/rejects. Do you happen to have an idea on how to force tokio to not wait for such things before shutting down itself?The text was updated successfully, but these errors were encountered: