Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server does not clean up resources after calling ServerHandle::stop in certain cases #2759

Open
julianskartor opened this issue May 13, 2022 · 3 comments

Comments

@julianskartor
Copy link

julianskartor commented May 13, 2022

If a service takes long time to complete (longer than value specified byHttpServer::shutdown_timeout to be precise), it seems the server does not properly clean up after itself. In particular, I have an issue with this when the server holds on to a tokio channel sender and I'm waiting for the channel's receiver to get "channel closed" message in a separate tokio task after stopping the server.

Expected Behavior

After calling ServerHandle::stop and waiting until the end of the shutdown_timeout period, all resources held by the server should be dropped.

Current Behavior

Server seems to still hold on to resources.

Possible Solution

None known...

Steps to Reproduce (for bugs)

Given the following example application:

use actix_web::{Responder, HttpResponse, web::{Data}};
use tokio::sync::{mpsc, broadcast};
use log::*;

#[actix_web::post("/sleep10")]
pub async fn sleep10(tx: Data<mpsc::UnboundedSender<()>>) -> impl Responder {
    info!("sleeping 10 seconds");
    tokio::time::sleep(std::time::Duration::from_secs(10)).await;
    info!("slept 10 seconds");
    tx.send(()).unwrap();
    HttpResponse::Ok()
}

#[actix_web::post("/sleep30")]
pub async fn sleep30(tx: Data<mpsc::UnboundedSender<()>>) -> impl Responder {
    info!("sleeping 30 seconds");
    tokio::time::sleep(std::time::Duration::from_secs(30)).await;
    info!("slept 30 seconds");
    tx.send(()).unwrap();
    HttpResponse::Ok()
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    env_logger::init();

    let (stop_tx, mut stop_rx) = broadcast::channel(1);
    let (tx, mut rx) = mpsc::unbounded_channel();

    // My spawned task that consumes the channel’s receiver.
    let consumer_handle = tokio::spawn(async move {
        loop {
            tokio::select! {
                Some(()) = rx.recv() => info!("task received item"),
                Ok(()) = stop_rx.recv() => {
                    info!("task draining items");
                    while let Some(()) = rx.recv().await { // This will block as sender is never dropped.
                        info!("task drained item");
                    }
                    info!("task finished draining");
                    break;
                }
            }
        }
    });

    let tx = Data::new(tx);
    let server = actix_web::HttpServer::new(move || {
        let tx = tx.clone();
        actix_web::App::new()
            .service(sleep10)
            .service(sleep30)
            .app_data(tx) // This sender is never dropped.
    })
    .bind("localhost:20000")?
    .workers(4)
    .disable_signals()
    .shutdown_timeout(20) // Modified shutdown timeout, less than 30 seconds.
    .run();

    let server_handle = server.handle();
    let signal = tokio::signal::ctrl_c();

    tokio::pin!(server);
    tokio::select! {
        r = signal => {
            info!("received interrupt signal");
            r.unwrap();
            let ((), r) = tokio::join!(server_handle.stop(true), server);
            r.unwrap();
            info!("stopping task");
            stop_tx.send(()).unwrap();
            consumer_handle.await.unwrap();
        }
        r = &mut server => {
            info!("server finished");
            r.unwrap();
            stop_tx.send(()).unwrap();
            consumer_handle.await.unwrap();
        }
    }

    Ok(())
}

When I call curl -X POST localhost:20000/sleep10 and send SIGINT to the application, it cleanly shutsdown (OK).

$ RUST_LOG=debug cargo run
[2022-05-13T08:18:56Z INFO  actix_server::builder] Starting 4 workers
[2022-05-13T08:18:56Z INFO  actix_server::server] Tokio runtime found; starting in existing Tokio runtime
[2022-05-13T08:19:00Z INFO  shutdown] sleeping 10 seconds
^C[2022-05-13T08:19:01Z INFO  shutdown] received interrupt signal
[2022-05-13T08:19:01Z INFO  actix_server::worker] Graceful worker shutdown; finishing 1 connections
[2022-05-13T08:19:01Z DEBUG actix_server::accept] Paused accepting connections on [::1]:20000
[2022-05-13T08:19:01Z DEBUG actix_server::accept] Paused accepting connections on 127.0.0.1:20000
[2022-05-13T08:19:01Z INFO  actix_server::accept] Accept thread stopped
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:01Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:19:10Z INFO  shutdown] slept 10 seconds
[2022-05-13T08:19:10Z INFO  shutdown] task received item
[2022-05-13T08:19:11Z INFO  shutdown] stopping task
[2022-05-13T08:19:11Z INFO  shutdown] task draining items
[2022-05-13T08:19:11Z INFO  shutdown] task finished draining
$

If I call curl -X POST localhost:20000/sleep30 and send SIGINT, the application never terminates (BUG).

$ RUST_LOG=debug cargo run
[2022-05-13T08:21:14Z INFO  actix_server::builder] Starting 4 workers
[2022-05-13T08:21:14Z INFO  actix_server::server] Tokio runtime found; starting in existing Tokio runtime
[2022-05-13T08:21:17Z INFO  shutdown] sleeping 30 seconds
^C[2022-05-13T08:21:19Z INFO  shutdown] received interrupt signal
[2022-05-13T08:21:19Z DEBUG actix_server::accept] Paused accepting connections on [::1]:20000
[2022-05-13T08:21:19Z DEBUG actix_server::accept] Paused accepting connections on 127.0.0.1:20000
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Shutting down idle worker
[2022-05-13T08:21:19Z INFO  actix_server::worker] Graceful worker shutdown; finishing 1 connections
[2022-05-13T08:21:19Z INFO  actix_server::accept] Accept thread stopped
[2022-05-13T08:21:39Z INFO  shutdown] stopping task
[2022-05-13T08:21:39Z INFO  shutdown] task draining items
... never terminates ...

Context

The end goal for me is to implement graceful shutdown of my application that also runs some other tokio spawned and spawn_blocking tasks.

Your Environment

  • Rust Version: rustc 1.60.0 (7737e0b5c 2022-04-04)
  • Actix Web Version: 4.0.1
  • OS: macOS (12.3.1), Apple M1
@gxtaillon
Copy link

gxtaillon commented Oct 10, 2022

Edit: Applied the workaround from #2739 (comment) and actix is now shutting down gracefully after a few seconds.


@julianskartor did you find a workaround since submitting the issue?

I've hit the same symptom of hanging on shutdown after spawning a pair of mpsc. A single channel shuts down fine, but not 2, each rx in their respective spawned block.

@julianskartor
Copy link
Author

@gxtaillon no there is no real workaround for this that I have found. Luckily the condition for the bug is rare (only if handler is very long running) so we don't experience this in practice. We also have infrastructure to handle this if it would happen (e.g. stop forwarding new requests to the service).

@kevincox
Copy link

I'm also seeing something like this. When running my application in kubernetes it never shuts down. However I haven't been able to reproduce locally. My first thought was that this is something to do with connection keepalive but that doesn't seem to be the case. At least the simple case of HTTP/1.1 pipelining doesn't appear to keep the connection open.

% nc localhost 8000 <<<$'GET / HTTP/1.1\r\nContent-Length:2\r\n\r\nhi\r\n'

I didn't manage to find a good test for HTTP/2. However I don't think ingress-nginx should be using HTTP/2 so that is probably not the problem anyways.

My code looks something like this:

	let server = actix_web::HttpServer::new(move || {
			actix_web::App::new()
				.default_service(actix_web::web::to(handle))
		})
		.bind("0.0.0.0:8000")).unwrap()
		.disable_signals()
		.run();
	let server_handle = server.handle();
	let server = tokio::spawn(server);

	let mut sigterm = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate()).unwrap();

	tokio::select! {
		r = server => tracing::warn!(
			error = ?r.as_ref().err(),
			"http server exited {:#?}", r),
		_ = sigterm.recv() => tracing::info!("SIGTERM received, exiting."),
		() = run_forever(&global, crate::some_background_job) => unreachable!(),
		() = run_forever(&global, crate::other_background_job) => unreachable!(),
	};

	server_handle.stop(true).await;

But even after the last line I can make new requests and the resources associated are never released. (Which prevents the server from shutting down for other reasons).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants