Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

julianskartor · 2022-05-13T08:32:40Z

If a service takes long time to complete (longer than value specified byHttpServer::shutdown_timeout to be precise), it seems the server does not properly clean up after itself. In particular, I have an issue with this when the server holds on to a tokio channel sender and I'm waiting for the channel's receiver to get "channel closed" message in a separate tokio task after stopping the server.

Expected Behavior

After calling ServerHandle::stop and waiting until the end of the shutdown_timeout period, all resources held by the server should be dropped.

Current Behavior

Server seems to still hold on to resources.

Possible Solution

None known...

Steps to Reproduce (for bugs)

Given the following example application:

use actix_web::{Responder, HttpResponse, web::{Data}};
use tokio::sync::{mpsc, broadcast};
use log::*;

#[actix_web::post("/sleep10")]
pub async fn sleep10(tx: Data<mpsc::UnboundedSender<()>>) -> impl Responder {
    info!("sleeping 10 seconds");
    tokio::time::sleep(std::time::Duration::from_secs(10)).await;
    info!("slept 10 seconds");
    tx.send(()).unwrap();
    HttpResponse::Ok()
}

#[actix_web::post("/sleep30")]
pub async fn sleep30(tx: Data<mpsc::UnboundedSender<()>>) -> impl Responder {
    info!("sleeping 30 seconds");
    tokio::time::sleep(std::time::Duration::from_secs(30)).await;
    info!("slept 30 seconds");
    tx.send(()).unwrap();
    HttpResponse::Ok()
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    env_logger::init();

    let (stop_tx, mut stop_rx) = broadcast::channel(1);
    let (tx, mut rx) = mpsc::unbounded_channel();

    // My spawned task that consumes the channel’s receiver.
    let consumer_handle = tokio::spawn(async move {
        loop {
            tokio::select! {
                Some(()) = rx.recv() => info!("task received item"),
                Ok(()) = stop_rx.recv() => {
                    info!("task draining items");
                    while let Some(()) = rx.recv().await { // This will block as sender is never dropped.
                        info!("task drained item");
                    }
                    info!("task finished draining");
                    break;
                }
            }
        }
    });

    let tx = Data::new(tx);
    let server = actix_web::HttpServer::new(move || {
        let tx = tx.clone();
        actix_web::App::new()
            .service(sleep10)
            .service(sleep30)
            .app_data(tx) // This sender is never dropped.
    })
    .bind("localhost:20000")?
    .workers(4)
    .disable_signals()
    .shutdown_timeout(20) // Modified shutdown timeout, less than 30 seconds.
    .run();

    let server_handle = server.handle();
    let signal = tokio::signal::ctrl_c();

    tokio::pin!(server);
    tokio::select! {
        r = signal => {
            info!("received interrupt signal");
            r.unwrap();
            let ((), r) = tokio::join!(server_handle.stop(true), server);
            r.unwrap();
            info!("stopping task");
            stop_tx.send(()).unwrap();
            consumer_handle.await.unwrap();
        }
        r = &mut server => {
            info!("server finished");
            r.unwrap();
            stop_tx.send(()).unwrap();
            consumer_handle.await.unwrap();
        }
    }

    Ok(())
}

When I call curl -X POST localhost:20000/sleep10 and send SIGINT to the application, it cleanly shutsdown (OK).

$ RUST_LOG=debug cargo run
[2022-05-13T08:18:56Z INFO  actix_server::builder] Starting 4 workers
[2022-05-13T08:18:56Z INFO  actix_server::server] Tokio runtime found; starting in existing Tokio runtime
[2022-05-13T08:19:00Z INFO  shutdown] sleeping 10 seconds
^C[2022-05-13T08:19:01Z INFO  shutdown] received interrupt signal
[2022-05-13T08:19:01Z INFO  actix_server::worker] Graceful worker shutdown; finishing 1 connections
[2022-05-13T08:19:01Z DEBUG actix_server::accept] Paused accepting connections on [::1]:20000
[2022-05-13T08:19:01Z DEBUG actix_server::accept] Paused accepting connections on 127.0.0.1:20000
[2022-05-13T08:19:01Z INFO  actix_server::accept] Accept thread stopped
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:10Z INFO  shutdown] slept 10 seconds
[2022-05-13T08:19:10Z INFO  shutdown] task received item
[2022-05-13T08:19:11Z INFO  shutdown] stopping task
[2022-05-13T08:19:11Z INFO  shutdown] task draining items
[2022-05-13T08:19:11Z INFO  shutdown] task finished draining
$

If I call curl -X POST localhost:20000/sleep30 and send SIGINT, the application never terminates (BUG).

$ RUST_LOG=debug cargo run
[2022-05-13T08:21:14Z INFO  actix_server::builder] Starting 4 workers
[2022-05-13T08:21:14Z INFO  actix_server::server] Tokio runtime found; starting in existing Tokio runtime
[2022-05-13T08:21:17Z INFO  shutdown] sleeping 30 seconds
^C[2022-05-13T08:21:19Z INFO  shutdown] received interrupt signal
[2022-05-13T08:21:19Z DEBUG actix_server::accept] Paused accepting connections on [::1]:20000
[2022-05-13T08:21:19Z DEBUG actix_server::accept] Paused accepting connections on 127.0.0.1:20000
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Graceful worker shutdown; finishing 1 connections
[2022-05-13T08:21:19Z INFO  actix_server::accept] Accept thread stopped
[2022-05-13T08:21:39Z INFO  shutdown] stopping task
[2022-05-13T08:21:39Z INFO  shutdown] task draining items
... never terminates ...

Context

The end goal for me is to implement graceful shutdown of my application that also runs some other tokio spawned and spawn_blocking tasks.

Your Environment

Rust Version: rustc 1.60.0 (7737e0b5c 2022-04-04)
Actix Web Version: 4.0.1
OS: macOS (12.3.1), Apple M1

The text was updated successfully, but these errors were encountered:

gxtaillon · 2022-10-10T19:42:11Z

Edit: Applied the workaround from #2739 (comment) and actix is now shutting down gracefully after a few seconds.

@julianskartor did you find a workaround since submitting the issue?

I've hit the same symptom of hanging on shutdown after spawning a pair of mpsc. A single channel shuts down fine, but not 2, each rx in their respective spawned block.

julianskartor · 2022-10-11T01:54:33Z

@gxtaillon no there is no real workaround for this that I have found. Luckily the condition for the bug is rare (only if handler is very long running) so we don't experience this in practice. We also have infrastructure to handle this if it would happen (e.g. stop forwarding new requests to the service).

kevincox · 2024-04-25T01:16:18Z

I'm also seeing something like this. When running my application in kubernetes it never shuts down. However I haven't been able to reproduce locally. My first thought was that this is something to do with connection keepalive but that doesn't seem to be the case. At least the simple case of HTTP/1.1 pipelining doesn't appear to keep the connection open.

% nc localhost 8000 <<<$'GET / HTTP/1.1\r\nContent-Length:2\r\n\r\nhi\r\n'

I didn't manage to find a good test for HTTP/2. However I don't think ingress-nginx should be using HTTP/2 so that is probably not the problem anyways.

My code looks something like this:

	let server = actix_web::HttpServer::new(move || {
			actix_web::App::new()
				.default_service(actix_web::web::to(handle))
		})
		.bind("0.0.0.0:8000")).unwrap()
		.disable_signals()
		.run();
	let server_handle = server.handle();
	let server = tokio::spawn(server);

	let mut sigterm = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate()).unwrap();

	tokio::select! {
		r = server => tracing::warn!(
			error = ?r.as_ref().err(),
			"http server exited {:#?}", r),
		_ = sigterm.recv() => tracing::info!("SIGTERM received, exiting."),
		() = run_forever(&global, crate::some_background_job) => unreachable!(),
		() = run_forever(&global, crate::other_background_job) => unreachable!(),
	};

	server_handle.stop(true).await;

But even after the last line I can make new requests and the resources associated are never released. (Which prevents the server from shutting down for other reasons).

robjtede added the needs-investigation label Jul 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

julianskartor commented May 13, 2022 •

edited

gxtaillon commented Oct 10, 2022 •

edited

julianskartor commented Oct 11, 2022

kevincox commented Apr 25, 2024

Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

Comments

julianskartor commented May 13, 2022 • edited

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

gxtaillon commented Oct 10, 2022 • edited

julianskartor commented Oct 11, 2022

kevincox commented Apr 25, 2024

julianskartor commented May 13, 2022 •

edited

gxtaillon commented Oct 10, 2022 •

edited