Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep alive timer causing truncated responses when using rustls #1497

Closed
stevemk14ebr opened this issue May 8, 2020 · 24 comments
Closed

keep alive timer causing truncated responses when using rustls #1497

stevemk14ebr opened this issue May 8, 2020 · 24 comments
Assignees
Labels
C-bug Category: bug

Comments

@stevemk14ebr
Copy link
Contributor

stevemk14ebr commented May 8, 2020

Expected Behavior

Response is not truncated

Current Behavior

Json response is truncated when beyond ~1Mb of json text is returned. I get back a variable number of bytes ranging from ~700Kb - 950Kb. This suggests a race condition of some sort.

Possible Solution

Unknown

Steps to Reproduce (for bugs)

async fn view_mystuff(state: ActixAppState, _req: HttpRequest, 
    queryinfo: web::Query<MyQuery>, json: web::Json<MyInternalStruct>) -> HttpResponse {

    let resp = match ...call internal thing to get datat... {
        Err(e) => {
            return build_404_error(&format!("Error fetching sample for features given: {}", e));
        },
        Ok(r) => {
            r
        }
    };
   
    HttpResponse::Ok().json(resp) <--- truncates
}

let srv = HttpServer::new(move || {
            App::new()
                .wrap(NormalizeSlash)
                .wrap(middleware::Logger::default())
                .wrap(Cors::default())
                .app_data(app_state.clone())
                .app_data(web::PayloadConfig::new(1000000 * 250))
                .app_data(web::Json::<MyInternalStruct>::configure(|cfg| {
                    cfg.limit(1000000 * 25).error_handler(json_error_handler) 
                }))
                .service(web::resource("/features/").route(web::put().to(view_mystuff)))
                .default_service(
                    web::route().to(view_upload)
                )
        })
        .bind(address_http).expect("failed to start server on given address:httpport")
        .bind_rustls(address_https, config).expect("failed to start server on given address:httpsport").shutdown_timeout(5).run();

Json serialization works, this prints the entire body, but it is not properly sent to the client

let j = serde_json::to_string(&resp).expect("basdf");
info!("{}", j);
HttpResponse::Ok().body(j) <--- truncates

Context

Your Environment

  • Rust Version (I.e, output of rustc -V): 1.43
  • Actix Web Version:

actix-web = {version = "2", features=["rustls"]}
actix-rt = "1"
actix-files = "^0.2"
actix-cors = "^0.2"

futures = "^0.3"
async-std = "^1.5"

@stevemk14ebr stevemk14ebr changed the title Json response truncated Async response truncated May 8, 2020
@robjtede
Copy link
Member

robjtede commented May 8, 2020

I've been able to echo back a 1MB json payload using the following

use actix_web::{get, middleware, web, App, HttpRequest, HttpResponse, HttpServer, FromRequest};

async fn view_mystuff(_req: HttpRequest, json: web::Json<serde_json::Value>) -> HttpResponse {
    HttpResponse::Ok().json(json.into_inner())
}

#[actix_rt::main]
async fn main() -> std::io::Result<()> {
    std::env::set_var("RUST_LOG", "actix_server=info,actix_web=info");
    env_logger::init();

    HttpServer::new(move || {
        App::new()
            .wrap(middleware::Logger::default())
            .app_data(web::PayloadConfig::new(1000000 * 250))
            .app_data(web::Json::<serde_json::Value>::configure(|cfg| {
                cfg.limit(1000000 * 25)
            }))
            .service(web::resource("/features").route(web::put().to(view_mystuff)))
    })
    .bind("127.0.0.1:8080")
    .expect("failed to start server on given address:httpport")
    .shutdown_timeout(5)
    .run()
    .await
}
curl -i -H "Content-Type: application/json" -T 1mb-payload.json localhost:8080/features

Can you provide a complete minimal reproduction.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 10, 2020

This is very hard for me to share a scenario to reproduce this, the issue is in a proprietary codebase. I seem to get this only with python-requests, and postman, the exact same request succeeds with curl. I'm sorry i can't be of more help but this is very odd, i have no proxies between me and the server or anything like that. I have also tested not doing the json decoding on the client side just incase it is a json decoder error, but the actual raw text is cut off at what seems to me to be random intervals.

I'm also using another app_data that your test does not duplicate.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 10, 2020

AHAH I figured out the issue! Python requests (and postman) make an http1.1 request, while curl follows the upgrade to http/2. If you force CURL to http1.1 the same truncation is observed:

* Expire in 1000 ms for 0 (transfer 0x55daf2c9ef50)
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
< HTTP/1.1 100 Continue
} [5 bytes data]
* We are completely uploaded and fine
{ [5 bytes data]
< HTTP/1.1 200 OK
< content-length: 3205363
< date: Sun, 10 May 2020 19:20:52 GMT
<
{ [16303 bytes data]
* transfer closed with 2517316 bytes remaining to read

curl -k -vs --http1.1 --location --request PUT 'https://blah/features/' --header 'Content-Type: application/json' --data-raw '...the json...'

Versus:

* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55882b6f4f50)
} [5 bytes data]
> PUT /features/ HTTP/2

curl -k -vs --location --request PUT 'https://blah/features/' --header 'Content-Type: application/json' --data-raw '...the json...'

@Lesiuk
Copy link
Contributor

Lesiuk commented May 11, 2020

Cannot reproduce it. I used 3 MB json file and it was echoed correctly with curl -k -vs --http1.1 --location --request PUT 'http://localhost:8080/features' --header 'Content-Type: application/json' --data "@1mb-test_json.json" command.

* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> PUT /features HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 3124138
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
} [65536 bytes data]
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-length: 2826988
< content-type: application/json
< date: Mon, 11 May 2020 16:32:27 GMT
< 
{ [102287 bytes data]
* Connection #0 to host localhost left intact
* Closing connection 0

@robjtede
Copy link
Member

It seems we really need a complete minimal reproduction case from you @stevemk14ebr to help any further, unfortunately.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 11, 2020

Are you using rusttls features and the versions i have specified? The example given by robjtede does NOT replicate the complexity of the code i shared.

@Lesiuk
Copy link
Contributor

Lesiuk commented May 11, 2020

You are right. I will try to reproduce it with rustls and middleware you specified.

@Lesiuk
Copy link
Contributor

Lesiuk commented May 11, 2020

Just checked
curl -k -vs --http1.1 --location --request PUT 'https://localhost:8443/features' --header 'Content-Type: application/json' --data "@1mb-test_json.json"

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [220 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [100 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [1356 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [556 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=CA; L=SF; O=Company; OU=Org; CN=www.example.com
*  start date: Jan 25 17:46:01 2018 GMT
*  expire date: Jan 25 17:46:01 2019 GMT
*  issuer: C=US; ST=CA; L=SF; O=Company; OU=Org; CN=www.example.com
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
> PUT /features HTTP/1.1
> Host: localhost:8443
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 3124138
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
} [65536 bytes data]
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-length: 2826988
< content-type: application/json
< date: Mon, 11 May 2020 16:57:52 GMT
< 
{ [16271 bytes data]
* Connection #0 to host localhost left intact
* Closing connection 0

With this code

use actix_web::{middleware, web, App, FromRequest, HttpRequest, HttpResponse, HttpServer};
use rustls::internal::pemfile::{certs, rsa_private_keys};
use rustls::{NoClientAuth, ServerConfig};
use std::fs::File;
use std::io::BufReader;
use actix_cors::Cors;

async fn view_mystuff(_req: HttpRequest, json: web::Json<serde_json::Value>) -> HttpResponse {
    HttpResponse::Ok().json(json.into_inner())
}

#[actix_rt::main]
async fn main() -> std::io::Result<()> {
    std::env::set_var("RUST_LOG", "actix_server=info,actix_web=info");
    env_logger::init();

    let mut config = ServerConfig::new(NoClientAuth::new());
    let cert_file = &mut BufReader::new(File::open("cert.pem").unwrap());
    let key_file = &mut BufReader::new(File::open("key.pem").unwrap());
    let cert_chain = certs(cert_file).unwrap();
    let mut keys = rsa_private_keys(key_file).unwrap();
    config.set_single_cert(cert_chain, keys.remove(0)).unwrap();

    HttpServer::new(move || {
        App::new()
            .wrap(middleware::Logger::default())
            .wrap(Cors::default())
            .wrap(middleware::Logger::default())
            .app_data(web::PayloadConfig::new(1000000 * 250))
            .app_data(web::Json::<serde_json::Value>::configure(|cfg| {
                cfg.limit(1000000 * 25)
            }))
            .service(web::resource("/features").route(web::put().to(view_mystuff)))
    })
    .bind_rustls("127.0.0.1:8443", config)?
    .run()
    .await
}

You have to give us something we can reproduce.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 11, 2020

I understand you can't fix the bug if it's not reproducable, but this is inheritently a very difficult bug for anyone to reproduce as it appears to dissappear according to multiple factors. The code you shared still doesn't replicate the environment the same, there's a missing app_data bind which i know can cause problems with how configurations are read (app_datas at the route level do NOT work when that kind of app_data is used for example: #1469). Additionally, i have two bind mounts, one on http and one on https.

I will try to narrow down a limited repro-case when i have free time but this is very difficult to replicate. It DOES reliably occur on the application i am testing in production according to http1.1 vs http2 thing.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 13, 2020

So i'm back, i've learned some things, perhaps it's useful, maybe not. I have found another necessary but not a satisfying condition for the bug. I only get this over a slow network connection when i'm on corporate vpn and other developers on fast networks on the same vpn do not observer this (also still http1.1). I cannot get this to repro over localhost, i've tried artificially slowing my internet but that has not worked. I did try to standup a localhost tls server and got a weird issue, the following code always resets the http connection for me before a response can be returned, i believe it may be a possibly related bug. This connection is only reset on the 8443 port, 8080 works fine:

My totally undeducated theory is some weird interactions are occuring with rust_tls, http2, and sockets closing at the wrong time. I will continue to try and hunt but i'm coming up dry here really.

use actix_web::{middleware, web, App, HttpRequest, HttpResponse, HttpServer, FromRequest};
use rustls::internal::pemfile::{certs, pkcs8_private_keys};
use rustls::{NoClientAuth, ServerConfig};
use std::fs::File;
use std::io::BufReader;

use std::iter;
use rand::{Rng, thread_rng};
use rand::distributions::Alphanumeric;

#[macro_use]
extern crate serde_derive;
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
pub struct Test {
    data: Vec<String>,
}

async fn view_mystuff(_req: HttpRequest) -> HttpResponse {
    let mut rng = thread_rng();
    
    let mut d = Test {
        data: Vec::new()
    };

    let s: String = iter::repeat(())
    .map(|()| rng.sample(Alphanumeric))
    .take(512)
    .collect();

    for _i in 0..10000 {
        d.data.push(s.clone());
    }
    HttpResponse::Ok().json(d)
}

#[actix_rt::main]
async fn main() -> std::io::Result<()> {
    simplelog::TermLogger::init(simplelog::LevelFilter::Debug, simplelog::Config::default(), simplelog::TerminalMode::Mixed)
    .or_else(|_| {
        // when run non-interactively, e.g. in docker, termlogger can't open the tty
        // so we fall back to simple formatting (i.e. no colors)
        simplelog::SimpleLogger::init(simplelog::LevelFilter::Debug, simplelog::Config::default())
    })
    .expect("failed to setup logging");

    // load SSL keys
    let mut config = ServerConfig::new(NoClientAuth::new());

    let cert_file = &mut BufReader::new(std::fs::File::open("cert_chain.pem").expect("failed to find cert"));
    let key_file = &mut BufReader::new(std::fs::File::open("private.pem").expect("failed to find private key"));
    let cert_chain = certs(cert_file).expect("failed to parse cert");
    let mut keys = pkcs8_private_keys(key_file).expect("failed to parse private key");

    // if the key file is wrong format, it wont get added to vec
    assert!(keys.len() > 0);
    config.set_single_cert(cert_chain, keys.remove(0)).expect("failed to set single cert");

    HttpServer::new(move || {
        App::new()
            .wrap(middleware::Logger::default())
            .app_data(web::PayloadConfig::new(1000000 * 250))
            .app_data(web::Json::<serde_json::Value>::configure(|cfg| {
                cfg.limit(1000000 * 25)
            }))
            .service(web::resource("/test").route(web::get().to(view_mystuff)))
    })
    .bind("127.0.0.1:8080")?
    .bind_rustls("127.0.0.1:8443", config)?
    .run()
    .await
}
[package]
name = "slow"
version = "0.1.0"
authors = ["Stephen Eckels"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
actix-web = {version = "2", features=["rustls"]}
actix-rt = "1"
serde_derive = "^1"
serde_json = "^1"
serde = "^1"
rand = "0.7.3"
rustls = "0.16"
simplelog = "^0.7"

futures = "^0.3"
async-std = "^1.5"

@fafhrd91
Copy link
Member

Try to enable trace logging on server

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 13, 2020

Here's my trace logs on a connection that exhibits this truncation. I've redacted some of the large TLS payload byte arrays for brevity. I see a keep-alive timeout is hit, this seems unexpected to me, could this cause the truncation?

trace!("Keep-alive timeout, close connection");

12:54:04 [ INFO] 10.35.132.171:56252 "GET /samples/f9686467a99cdb3928ccf40042d3e18451a9db97ef60f098656725a9fc3d9025/?minimum_occur=2&maximum_occur=200 HTTP/1.1" 200 29971 "-" "python-requests/2.23.0" 39.892484
12:54:04 [TRACE] (3) actix_http::h1::dispatcher: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/dispatcher.rs:634] Keep-alive timeout, close connection
12:54:04 [DEBUG] (3) rustls::session: Sending warning alert CloseNotify
12:54:04 [TRACE] (3) mio::poll: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:907] deregistering handle with poller
12:54:05 [TRACE] (4) mio::poll: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:787] registering with poller
12:54:05 [TRACE] (4) rustls::server::hs: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/hs.rs:538] we got a clienthello ClientHelloPayload { client_version: TLSv1_2, random: Random([137, 251, 198, 110, 78, 90, 132, 30, 150, 204, 9, 176, 196, 198, 185, 36, 135, 251, 99, 153, 25, 7, 185, 26, 181, 13, 38, 20, 230, 242, 4, 222]), session_id: SessionID(0, 11, 118, 174, 22, 90, 152, 129, 143, 251, 207, 138, 241, 223, 163, 136, 200, 218, 242, 36, 93, 130, 106, 76, 161, 110, 12, 112, 93, 1, 142, 225), cipher_suites: [TLS13_AES_256_GCM_SHA384, TLS13_CHACHA20_POLY1305_SHA256, TLS13_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256, Unknown(49327), Unknown(49325), Unknown(49326), Unknown(49324), TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CCM_8, TLS_DHE_RSA_WITH_AES_256_CCM, TLS_DHE_RSA_WITH_AES_128_CCM_8, TLS_DHE_RSA_WITH_AES_128_CCM, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CCM_8, TLS_RSA_WITH_AES_256_CCM, TLS_RSA_WITH_AES_128_CCM_8, TLS_RSA_WITH_AES_128_CCM, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV], compression_methods: [Null], extensions: [ServerName([ServerName { typ: HostName, payload: HostName(DNSName("flareup.flare.services")) }]), ECPointFormats([Uncompressed, ANSIX962CompressedPrime, ANSIX962CompressedChar2]), NamedGroups([X25519, secp256r1, X448, secp521r1, secp384r1]), SessionTicketRequest, Unknown(UnknownExtension { typ: Unknown(22), payload: Payload([]) }), ExtendedMasterSecretRequest, SignatureAlgorithms([ECDSA_NISTP256_SHA256, ECDSA_NISTP384_SHA384, ECDSA_NISTP521_SHA512, ED25519, ED448, Unknown(2057), Unknown(2058), Unknown(2059), RSA_PSS_SHA256, RSA_PSS_SHA384, RSA_PSS_SHA512, RSA_PKCS1_SHA256, RSA_PKCS1_SHA384, RSA_PKCS1_SHA512, Unknown(771), Unknown(769), Unknown(770), Unknown(1026), Unknown(1282), Unknown(1538)]), SupportedVersions([TLSv1_3, TLSv1_2]), PresharedKeyModes([PSK_DHE_KE]), KeyShare([KeyShareEntry { group: X25519, payload: PayloadU16([1, 178, 130, 74, 210, 95, 145, 87, 216, 46, 95, 129, 82, 27, 119, 70, 232, 142, 203, 164, 198, 204, 232, 28, 185, 177, 109, 110, 211, 197, 134, 56]) }]), Unknown(UnknownExtension { typ: Padding, payload: Payload([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) })] }
12:54:05 [TRACE] (4) rustls::server::hs: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/hs.rs:595] sni Some(DNSNameRef("flareup.flare.services"))
12:54:05 [TRACE] (4) rustls::server::hs: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/hs.rs:596] sig schemes [ECDSA_NISTP256_SHA256, ECDSA_NISTP384_SHA384, ECDSA_NISTP521_SHA512, ED25519, ED448, Unknown(2057), Unknown(2058), Unknown(2059), RSA_PSS_SHA256, RSA_PSS_SHA384, RSA_PSS_SHA512, RSA_PKCS1_SHA256, RSA_PKCS1_SHA384, RSA_PKCS1_SHA512, Unknown(771), Unknown(769), Unknown(770), Unknown(1026), Unknown(1282), Unknown(1538)]
12:54:05 [DEBUG] (4) rustls::server::hs: decided upon suite SupportedCipherSuite { suite: TLS13_AES_256_GCM_SHA384, kx: BulkOnly, bulk: AES_256_GCM, hash: SHA384, sign: Anonymous, enc_key_len: 32, fixed_iv_len: 12, explicit_nonce_len: 0, hkdf_algorithm: Algorithm(Algorithm(SHA384)) }
12:54:05 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:150] sending server hello Message { typ: Handshake, version: TLSv1_2, payload: Handshake(HandshakeMessagePayload { typ: ServerHello, payload: ServerHello(ServerHelloPayload { legacy_version: TLSv1_2, random: Random([77, 204, 148, 189, 195, 245, 6, 7, 62, 116, 5, 152, 59, 80, 168, 111, 50, 107, 125, 119, 73, 28, 52, 208, 50, 20, 48, 189, 124, 132, 127, 168]), session_id: SessionID(0, 11, 118, 174, 22, 90, 152, 129, 143, 251, 207, 138, 241, 223, 163, 136, 200, 218, 242, 36, 93, 130, 106, 76, 161, 110, 12, 112, 93, 1, 142, 225), cipher_suite: TLS13_AES_256_GCM_SHA384, compression_method: Null, extensions: [KeyShare(KeyShareEntry { group: X25519, payload: PayloadU16([189, 224, 230, 145, 232, 77, 103, 9, 224, 8, 77, 207, 95, 225, 217, 224, 54, 160, 241, 125, 228, 208, 252, 122, 45, 197, 38, 49, 63, 237, 242, 3]) }), SupportedVersions(TLSv1_3)] }) }) }
12:54:05 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:269] sending encrypted extensions Message { typ: Handshake, version: TLSv1_3, payload: Handshake(HandshakeMessagePayload { typ: EncryptedExtensions, payload: EncryptedExtensions([ServerNameAck]) }) }
12:54:05 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:349] sending certificate Message { typ: Handshake, version: TLSv1_3, payload: Handshake(HandshakeMessagePayload { typ: Certificate, payload: CertificateTLS13(CertificatePayloadTLS13 { context: PayloadU8([]), entries: [CertificateEntry { cert:  }
12:54:05 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:382] sending certificate-verify Message { typ: Handshake, version: TLSv1_3, payload: }
12:54:05 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:404] sending finished Message { typ: Handshake, version: TLSv1_3, payload: Handshake(HandshakeMessagePayload { typ: Finished, payload: Finished(Payload()) }) }
12:54:06 [TRACE] (4) rustls::server: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/mod.rs:320] Dropping CCS
12:54:06 [TRACE] (4) rustls::server::tls13: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.16.0/src/server/tls13.rs:806] sending new stateful ticket Message { typ: Handshake, version: TLSv1_3, payload: Handshake(HandshakeMessagePayload { typ: NewSessionTicket, payload: NewSessionTicketTLS13(NewSessionTicketPayloadTLS13 { lifetime: 86400, age_add: 621722163, nonce: PayloadU8([121, 63, 186, 77, 60, 97, 102, 191, 236, 7, 77, 152, 0, 255, 123, 205, 18, 144, 208, 37, 154, 195, 219, 123, 165, 92, 61, 229, 46, 214, 1, 2]), ticket: PayloadU16([225, 229, 158, 36, 8, 132, 195, 135, 61, 116, 191, 100, 122, 237, 191, 107, 62, 65, 101, 214, 25, 112, 100, 237, 58, 175, 206, 100, 75, 204, 32, 67]), exts: [] }) }) }
12:54:06 [TRACE] (4) actix_http::h1::decoder: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/decoder.rs:440] Length read: 16384
12:54:06 [TRACE] (4) actix_http::h1::decoder: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/decoder.rs:440] Length read: 16384
12:54:06 [TRACE] (4) actix_http::h1::decoder: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/decoder.rs:440] Length read: 16384
12:54:06 [TRACE] (4) actix_http::h1::decoder: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/decoder.rs:440] Length read: 17308
12:54:07 [ INFO] 10.35.132.171:56254 "PUT /features/ HTTP/1.1" 200 3205363 "-" "python-requests/2.23.0" 1.146606
12:54:11 [TRACE] (4) actix_http::h1::dispatcher: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-http-1.0.1/src/h1/dispatcher.rs:634] Keep-alive timeout, close connection
12:54:11 [DEBUG] (4) rustls::session: Sending warning alert CloseNotify
12:54:16 [TRACE] (4) mio::poll: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:907] deregistering handle with poller

@stevemk14ebr
Copy link
Contributor Author

Here's a trace for a successful non-truncated http2 request, the logs look completely different:

13:03:25 [DEBUG] (3) h2::codec::framed_write: send; frame=Data { stream_id: StreamId(1) }
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:186] flush
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:192]   -> queued data frame
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:192]   -> queued data frame
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:237] flushing buffer
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:541] try reclaim frame
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:545]   -> reclaimed; frame=Data { stream_id: StreamId(1) }; sz=0
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:827] schedule_pending_open
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:647] pop_frame
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:652] pop_frame; stream=StreamId(1); stream.state=State { inner: Closed(EndStream) }
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:665]  --> stream=StreamId(1); is_pending_reset=false;
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:678]  --> data frame; stream=StreamId(1); sz=0; eos=true; window=0; available=0; requested=0; buffered=0;
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:724]  --> sending data frame; len=0
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:727]  -- updating stream flow --
13:03:25 [TRACE] (3) h2::proto::streams::flow_control: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/flow_control.rs:168] send_data; sz=0; window=1070536461; available=0
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:740]  -- updating connection flow --
13:03:25 [TRACE] (3) h2::proto::streams::flow_control: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/flow_control.rs:168] send_data; sz=0; window=1070536461; available=1070536461
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:802] pop_frame; frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
13:03:25 [TRACE] (3) h2::proto::streams::counts: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/counts.rs:136] transition_after; stream=StreamId(1); state=State { inner: Closed(EndStream) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=1; num_send=0
13:03:25 [TRACE] (3) h2::proto::streams::counts: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/counts.rs:158] dec_num_streams; stream=StreamId(1)
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:495] writing frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
13:03:25 [DEBUG] (3) h2::codec::framed_write: send; frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:541] try reclaim frame
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:545]   -> reclaimed; frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }; sz=0
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:827] schedule_pending_open
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:647] pop_frame
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:186] flush
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:197]   -> not a queued data frame
13:03:25 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:237] flushing buffer
13:03:25 [TRACE] (3) h2::proto::streams::prioritize: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/prioritize.rs:541] try reclaim frame
13:03:26 [TRACE] (3) h2::codec::framed_read: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_read.rs:342] poll
13:03:26 [DEBUG] (3) rustls::session: Sending warning alert CloseNotify
13:03:26 [TRACE] (3) h2::proto::connection: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/connection.rs:371] codec closed
13:03:26 [TRACE] (3) h2::proto::streams::streams: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/streams.rs:821] Streams::recv_eof
13:03:26 [TRACE] (3) h2::proto::connection: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/connection.rs:273] connection closing after flush
13:03:26 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:186] flush
13:03:26 [TRACE] (3) h2::codec::framed_write: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/codec/framed_write.rs:237] flushing buffer
13:03:26 [TRACE] (3) h2::proto::streams::streams: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/h2-0.2.4/src/proto/streams/streams.rs:821] Streams::recv_eof
13:03:26 [TRACE] (3) mio::poll: [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:907] deregistering handle with poller

@robjtede
Copy link
Member

robjtede commented May 13, 2020

So, worth checking with (much) higher values for keep_alive (and client_timeout for good measure).

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 13, 2020

I will obviously try this as a work around (will get back to you tonight on that). But from my understanding that timeout should not take effect until the request is fully sent and the server is idle, I don't see why it would truncate a response in the process of sending

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 13, 2020

There is 100% an issue with how Connection: Keep-Alive works and it is the cause of this truncation. When setting the server's keep-alive value to 300 (5 mins) and clients provide the Connection: keep-alive header the client hangs until server keep alive timeout is hit (same case as when no connection header is specified as keep-alive is implicit in http1.1). When server's keep-alive is set to 300 and client provides Connection: close the correct, non-truncated response is received in full and the client does not hang. When server's keep-alive is set to default 5 seconds and client provides Connection: keep-alive the response is truncated and the socket incorrectly closed before the client receives all data.

There are therefore two valid work arounds.

  1. Always use http2 requests client side as the keep-alive bug is not observed in this case
  2. If using http1.1 requests server keep-alive must be set very high, and clients must provide Connection: close header.

To recap, here are my conditions to replicate:

  1. Must use http1.1
  2. Must use rust_tls and must connect via the tls enabled route (i have two binds)
  • The http route does not exhibit this behavior
  1. server keep_alive must be small
  2. Slow connection makes this occur more often

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented May 17, 2020

Potentially related #514 keep-alive is explicitly mentioned. Though I still don't know exactly what's occuring.

Another theory is that the timeout check for the buffer being empty doesn't guarantee the socket actually sent the data. It could be closed while kernel is trying to send data and there are complications there: https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

if self.state.is_empty() && self.write_buf.is_empty() {

@stevemk14ebr
Copy link
Contributor Author

switching to openssl fixed all my issues without messing with keep-alive at all.

@Lesiuk
Copy link
Contributor

Lesiuk commented May 31, 2020

Now I have issue with keep alive. Connection is closed in the middle of one of the requests in the 50% of my e2e test suite runs. I will investigate it further. I am pretty sure something is wrong with keep alive logic.

image

@Lesiuk
Copy link
Contributor

Lesiuk commented May 31, 2020

After increasing keep_alive to 30s.
image

It seems connection was closed because there was 11s delay between two last reads. So It looks fine. Never-mind, no issue here.

@stevemk14ebr
Copy link
Contributor Author

stevemk14ebr commented Jun 2, 2020

I still maintain there is a serious bug. Simply switching to openssl and this behavior is not observed.

@robjtede robjtede mentioned this issue Jun 6, 2020
15 tasks
@robjtede robjtede added the C-bug Category: bug label Jun 9, 2020
@robjtede robjtede self-assigned this Jun 21, 2020
@robjtede robjtede changed the title Async response truncated keep alive timer causing truncated responses when using rustls Jun 21, 2020
@robjtede
Copy link
Member

robjtede commented Jun 21, 2020

I have a reproduction case using openssl and rustls binding on the same HttpServer and yes, it does appear that the keep alive timer is not being extended as data is sent into the socket.

Requests to the openssl port last as long as they need to for the transfer to happen but rustls port uses exactly the keep alive time every time, causing truncation when the response takes longer to receive.

Looking into a fix now.

Edit: Effects seems different (in a good way?) on master branch but still getting "Keep-alive timeout, close connection" in the logs yet no truncation when going over the keep-alive time.

@mocsy
Copy link

mocsy commented Aug 4, 2020

So if it works on master, with the "Keep-alive timeout, close connection" in the logs, is this issue still a release stopper?

@robjtede
Copy link
Member

robjtede commented Aug 8, 2020

My test case ceased being able to detect this on master. So I think we will mark this as done. If it shows up again in v3, feel free to open a new issue for it @stevemk14ebr.

@robjtede robjtede closed this as completed Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

5 participants