Description
I did this
This started out as doing several thousand requests to some company's API, and getting occasional errors stating "Error in the HTTP2 framing layer". It was found out that this always happened with stream id 7d1 (2001), also known as request 1001. This lead me to http2_max_requests in nginx. A few hours of minimizing the test case later, I've found two ways to reproduce this bug, generating different errors for the same task, when they should both just work fine.
Method one
The sleep server from my previous bug report returns :) nginx is running on the server with http2_max_requests set to 10, for easier debugging. The sleep server just sleeps the given amount of milliseconds, and the issue seems to become reproducible around 5ms, but consistently so around 100ms.
curl -v $(yes https://home.xifon.eu/sleep/100 | head -n 11) -d0 -m5
Running the command above will execute 11 requests (one more than the limit). The last request times out, but there's no reason for it to.
Method two
This uses the multi socket api, and produces the "Error in the HTTP2 framing layer" error message: https://gist.github.com/TvdW/a8bf6eabc99bf812d2fd0be1d32f9313
It's a crude eventloop implementation doing the bare minimum for making the multi socket API work. This is the last version of my code before I managed to reproduce the bug using the curl command line utility. What's notable about it is that it reproduces the issue, but with a different error message...
curl/libcurl version
curl 7.64.1 (x86_64-pc-linux-gnu) libcurl/7.64.1 OpenSSL/1.0.2k zlib/1.2.7 brotli/1.0.1 c-ares/1.15.0 nghttp2/1.36.0
Release-Date: 2019-03-27
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli HTTP2 HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL UnixSockets
operating system
Linux, CentOS 7
thoughts
Some things stand out:
- this is only reproducible for requests with bodies, and only if they are not zero-length
- timing seems to matter. setting the 'time' component of the sleep server to 0ms makes the final request succeed quite often (maybe half the time), but when set to 100ms it seems to fail in 95% of cases or more
- the two methods produce very different error messages