[THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked #1403

tkan145 · 2023-06-01T06:55:57Z

What:

Fix https://issues.redhat.com/browse/THREESCALE-9542

This PR adds support to proxy the request with "Transfer-Encoding: chunked" when using with the proxy server.

Note to reviewers

Please just review the last 2 commits. I will rebase once part 1 merged

Verification steps:

Checkout this branch
Make runtime-image IMAGE_NAME=apicast-test

make runtime-image IMAGE_NAME=apicast-test

Then run the gateway with the built image

diff --git a/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json b/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
index 5227c5aa..24c45338 100644
--- a/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
+++ b/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
@@ -44,6 +44,11 @@
           "host": "backend"
         },
         "policy_chain": [
+          {
+              "name": "request_unbuffered",
+              "version": "builtin",
+              "configuration": {}
+          },
           {
             "name": "apicast.policy.http_proxy",
             "configuration": {

cd dev-environments/https-proxy-upstream-tlsv1.3
make certs
make gateway IMAGE_NAME=apicast-test

Send chunked request with one chunk body

curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d @my-data.json "http://post.example.com:8080/?user_key=123"

The request should return 200 OK. Note that upstream echo API is reporting that the request included Transfer-Encoding: chunked header and the expected body.

 ▲  curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d 'hello, world' "http://post.example.com:8080/?user_key=123"
* Added post.example.com:8080:127.0.0.1 to DNS cache
* Hostname post.example.com was found in DNS cache
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to post.example.com (127.0.0.1) port 8080 (#0)
> POST /?user_key=123 HTTP/1.1
> Host: post.example.com:8080
> User-Agent: curl/7.61.1
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
>
> c
* upload completely sent off: 19 out of 12 bytes
< HTTP/1.1 200 OK
< {
<   "args": {
<     "user_key": "123"
<   },
<   "data": "hello, world",
<   "files": {},
<   "form": {},
<   "headers": {
<     "Accept": "*/*",
<     "Content-Type": "application/json",
<     "Host": "example.com",
<     "Transfer-Encoding": "chunked",
<     "User-Agent": "curl/7.61.1"
<   },
<   "json": null,
<   "origin": "172.25.0.2",
<   "url": "http://example.com/post?user_key=123"
< }
* Connection #0 to host post.example.com left intact

Send chunked request with few chunks in the body delayed in time. Python3 is required.

First get the APICast IPAddress

 ▲ docker inspect https-proxy-upstream-tlsv13-gateway-run-d76ff72726ec | grep IPAddress

❯ cat <<EOF >chunked-request.py
import http.client
import time

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

http.client.HTTPConnection.debuglevel = 1
conn = http.client.HTTPConnection('127.0.0.1', 8080)

headers = {'Content-type': 'application/octet-stream', 'Host': 'post.example.com'}

conn.request('POST', '/?user_key=foo', gen(), headers)

response = conn.getresponse()
print(response.read().decode())
EOF

Replace 127.0.0.1 with the IP of APIcast gateway above

> python3 chunked-request.py
send: b'POST /?user_key=foo HTTP/1.1\r\nAccept-Encoding: identity\r\nTransfer-Encoding: chunked\r\nContent-type: application/octet-stream\r\nHost: post.example.com\r\n\r\n'
send: b'2\r\nhi\r\n'                                                                                                                                                        
send: b'5\r\nthere\r\n'                                                                                                                                                     
send: b'3\r\nbye\r\n'                                                                                                                                                       
send: b'0\r\n\r\n'                                                                                                                                                          
reply: 'HTTP/1.1 200 OK\r\n'                                                                                                                                                
header: Access-Control-Allow-Credentials: true                                                                                                                              
header: Access-Control-Allow-Origin: *                                                                                                                                      
header: Date: Tue, 09 Jan 2024 03:40:18 GMT                                                                                                                                 
header: Content-Type: application/json                                                                                                                                      
header: Server: gunicorn/19.9.0                                                                                                                                             
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Type": "application/octet-stream", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked",
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  }, 
  "json": null, 
  "origin": "172.18.0.4"
  "url": "http://example.com/post?user_key=foo"
}

Note that the upstream service got transfer encoding chunked request and chunked encoding of the request body with the length bytes preceding each chunk.

> 2024/01/09 03:40:14.000960414  length=203 from=0 to=202 
POST /post?user_key=foo HTTP/1.1\r                        
User-Agent: lua-resty-http/0.14 (Lua) ngx_lua/10019\r     
Transfer-Encoding: chunked\r                              
Host: example.com\r                                       
Accept-Encoding: identity\r                               
Content-type: application/octet-stream\r                  
\r                                                        
> 2024/01/09 03:40:14.000960570  length=7 from=203 to=209 
2\r
hi\r
> 2024/01/09 03:40:16.000952052  length=10 from=210 to=219
5\r
there\r
> 2024/01/09 03:40:18.000954073  length=8 from=220 to=227
3\r
bye\r
> 2024/01/09 03:40:18.000954120  length=5 from=228 to=232
0\r
\r
< 2024/01/09 03:40:18.000954722  length=653 from=0 to=652
HTTP/1.1 200 OK\r
Server: gunicorn/19.9.0\r
Date: Tue, 09 Jan 2024 03:40:18 GMT\r
Connection: keep-alive\r
Content-Type: application/json\r
Content-Length: 423\r
Access-Control-Allow-Origin: *\r
Access-Control-Allow-Credentials: true\r
\r
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Type": "application/octet-stream", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  }, 
  "json": null, 
  "origin": "172.18.0.4", 
  "url": "http://example.com/post?user_key=foo"
}

Send chunked request with expect 100-continue header.

❯ cat <<EOF >chunked-request.py
import http.client
import time

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

http.client.HTTPConnection.debuglevel = 1
conn = http.client.HTTPConnection('127.0.0.1', 8080)

headers = {'Content-type': 'application/octet-stream', 'Host': 'post.example.com', 'Expect': '100-continue'}

conn.request('POST', '/?user_key=foo', gen(), headers)

response = conn.getresponse()
print(response.read().decode())
EOF

Note that the upstream service got transfer encoding chunked request and return 100 Continue

▲ python3 ./chunked-request.py

send: b'POST /?user_key=foo HTTP/1.1\r\nAccept-Encoding: identity\r\nTransfer-Encoding: chunked\r\nContent-type: application/octet-stream\r\nHost: post.example.com\r\nExpect: 100-continue\r\n\r\n'
send: b'2\r\nhi\r\n'                          
send: b'5\r\nthere\r\n'                       
send: b'3\r\nbye\r\n'                         
send: b'0\r\n\r\n'                            
reply: 'HTTP/1.1 100 Continue\r\n'            
headers: [b'\r\n']                            
reply: 'HTTP/1.1 200 OK\r\n'                  
header: Access-Control-Allow-Credentials: true
header: Access-Control-Allow-Origin: *        
header: Date: Tue, 09 Jan 2024 03:47:02 GMT   
header: Content-Type: application/json        
header: Server: gunicorn/19.9.0
{                                                          
  "args": {                                                
    "user_key": "foo"                                      
  },                                                       
  "data": "hitherebye",                                    
  "files": {},                                             
  "form": {},                                              
  "headers": {                                             
    "Accept-Encoding": "identity",                         
    "Content-Type": "application/octet-stream",            
    "Expect": "100-continue",                              
    "Host": "example.com",                                 
    "Transfer-Encoding": "chunked",                        
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  },                                                       
  "json": null,                                            
  "origin": "172.18.0.4",                                  
  "url": "http://example.com/post?user_key=foo"            
}

kevprice83

We need to add unit tests and integration tests for all the scenarios that reproduce the reported issue:

http proxy env vars (over TLS)
http_proxy policy (over TLS)
camel_proxy policy (over TLS)

gateway/src/resty/http/chunked.lua

t/http-proxy.t

eguzki · 2023-07-03T16:01:58Z

t/http-proxy.t

+}
+--- backend env
+  server_name test-backend.lvh.me;
+  listen $TEST_NGINX_RANDOM_PORT ssl;


I wonder why 3scale backend is configured with TLS connection

gateway/src/apicast/http_proxy.lua

t/http-proxy.t

gateway/src/apicast/http_proxy.lua

When a request with the HTTP "Transfer-Encoding: chunked" header is sent, APIcast buffers the entire request because by default it does not support sending chunked requests. However, when sending via proxy, APIcast does not remove the header sent in the initial request, which tells the server that the client is sending a chunk request. This then causes an Bad Request error because the upstream will not be able to determine the end of the chunk from the request. This commit removes the "Transfer-Encoding: chunked" header from the request when sending through a proxy.

…uffering disabled

eguzki

Looking good.

However, this is complex and also hard to maintain. I want to try some other approach relying on lua-resty-http to 0.17.1 or some other library. If we cannot find a simpler (from the APIcast base code perspective), we can always use this code.

gateway/src/apicast/http_proxy.lua

gateway/src/resty/http/response_writer.lua

eguzki · 2023-12-14T16:22:29Z

When using the python client, the response body is not shown. I wonder if APIcast is not handling the response correctly

eguzki · 2023-12-14T17:48:07Z

I am going to try https://github.com/ledgetech/lua-resty-http#set_proxy_options looks promising

eguzki · 2023-12-14T18:32:14Z

gateway/src/apicast/http_proxy.lua

+
+    if http_methods_with_body[req_method] then
+      if opts.request_unbuffered and ngx_http_version() == 1.1 then
+        local _, err = handle_expect()


I wonder if Expect needs to be handled for when buffering is enabled

Actually, I do not think we should be doing this. The lua-resty-http lib is doing that for us. WDYT?

lib lua-resty-http is a client library and it handles the Expect returned from the server, while we are acting as a server here and need to process the Expect header from the client.

When I sent a large payload using cURL, the request hung, I later found out it was due to the Expect header.

I will run some more tests to see whether we really need it here

Ok I think I understand now.

I think that when buffered is on, APIcast should protect upstream and should handle the Expect: 100-Continue. That is, it is the apicast who returns HTTP Response 100 Continue and then consumes the body before opening the connection to upstream. I think this is how it works right now in master. The request Expect: 100-Continue and response 100 Continue happens twice. First time between downstream and then between apicast and upstream (done by lua resty http lib because the Expect header is still there). We might consider removing the expect header on "buffered" mode. Unless we want to keep the Expect protocol with upstream to avoid sending the body if upstream does not want to. Which also makes sense to me. It is actually a requirement from rfc2616#section-8.2.3 to be like this. Check Requirements for HTTP/1.1 proxies: section.

When unbuffered is on, APIcast does not read the body with ngx.req.read_body(), thus, it does not send 100 Continue to downstream. I think that is the reason you saw the request hung. Ideally, I think that we should let upstream to decide if it wants to continue or not, and propagate the response to downstream. Downstream would start sending the body only when upstream tells to do that. I think it is quite hard to implement that. Basically because the lua resty http lib consumes the 100 Continue response of the upstream and then tries to send the body. I do not see a way to do this, other than sending manually the 100 Continue response to the downstream and create a body reader that will be consumed by the lua resty http library. But I can see some issues in there as well. What if upstream says 302 redirect or 400 bad request instead of 100 Continue? The downstream client would have already write the body in the downstream socket and that socket would be unusable for following up HTTP sessions. I do not know how to proceed regarding this.

I heave re-written the message above. In case you have read it previosly, please re-read it again 🙏

I'm a bit confused here. I haven't read the openresty code but do you mean ngx.req.read_body() will send 100 Continue downstream? Doesn't that also mean that APIcast returns 100 Continue to the downstream application before establishing the upstream connection?

Regarding the 400, please correct me if I'm wrong, but I think the only case where the upstream server returns this error is if there is data in the request body. In my head the flow will be as follow

client -> Expect: 100-Continue -> upstream -> 100 Continue -> client client -> start sending body -> upstream read body -> return 400

I haven't read the openresty code but do you mean ngx.req.read_body() will send 100 Continue downstream?

Yes!

Doesn't that also mean that APIcast returns 100 Continue to the downstream application before establishing the upstream connection?

Exactly (when buffered mode is on)

the only case where the upstream server returns this error is if there is data in the request body

400 Bad Request is just an example. It could be 5XX error as well. In unbuffered mode, the workflow would be as follows (in my head)

client -> Expect: 100-Continue -> apicast client <- 100 Continue <- apicast client -> write body to socket -> apicast # Apicast did not read the body yet, it just created a body reader from the socket apicast -> create connection via proxy -> TLS upstream apicast (lua resty http) -> Expect: 100-Continue -> TLS upstream apicast (lua resty http) <- 100 Continue <- TLS upstream apicast (lua resty http) -> send body from the body reader -> TLS upstream

So let's say that upstream does not want it to start upload:

client -> Expect: 100-Continue -> apicast client <- 100 Continue <- apicast client -> write body to socket -> apicast # Apicast did not read the body yet, it just created a body reader from the socket apicast -> create connection via proxy -> TLS upstream apicast (lua resty http) -> Expect: 100-Continue -> TLS upstream apicast (lua resty http) <- 5XX Error <- TLS upstream client <- 5XX Error <- apicast

My issue with this is that the client has sent the body and nobody has consumed it. I need to try this scenario to see what we can do.

From this nginx thread https://mailman.nginx.org/pipermail/nginx/2021-May/060643.html. I think nginx does not handle this well either

How about we send back error response, discard the body and close the socket?

How about we send back error response, discard the body and close the socket?

It's aggressive, but can be a way out.

gateway/src/resty/http/request_reader.lua

eguzki · 2023-12-14T19:19:58Z

gateway/src/apicast/http_proxy.lua

+
+              if is_chunked then
+                -- If the body is smaller than "client_boby_buffer_size" the Content-Length header is
+                -- set based on the size of the buffer. However, when the body is rendered to a file,


If the body is smaller than "client_boby_buffer_size" the Content-Length header is set based on the size of the buffer

Who is doing that? In other words, when all the conditions meet:

the request is chunked,

buffering is enabled

the request body is small

Who sets the Content-Length header?

The lua-resty-http will set the Content-Length based on the body that we passed in. But good catch I should have put more details in the comment

Ok, I see, It's because it is a string and the resty-http gets the length out of it. It happens here. I would make it explicit, but good enough.

Yeah I agree because this is something that will come up again in future when troubleshooting but it doesn't need to be done in this PR, can be added at a later date that if headers["Content-Length"]=nil then headers["Content-Length"]=#body (this will at least be a useful reference for now)

tkan145 · 2023-12-14T23:54:15Z

I am going to try https://github.com/ledgetech/lua-resty-http#set_proxy_options looks promising

I tried it in #1434 in /http/proxy.lua

eguzki · 2023-12-20T18:13:27Z

I am ok going with these changes for now. First #1403 (comment) needs to be fixed.

However, in following up PR's, this (spaghetti) code needs some simplification. It is hard to maintain and understand. The forward_https_request method should implement the diagram below

                                               ┌────────────────────────┐
                       YES                     │                        │                   NO
                                               │                        │
      ┌────────────────────────────────────────┤                        ├────────────────────────────────────────────┐
      │                                        │   Request Buffering?   │                                            │
      │                                        │                        │                                            │
      │                                        │                        │                                            │
      │                                        │                        │                                            │
      │                                        └────────────────────────┘                                            │
      │                                                                                                              │
      │                                                                          ┌───────────────────────────────────▼───────────────────────────────┐
      │                                                                          │                                                                   │
      │                                                                          │                                                                   │
      │                                                                          │                                                                   │
      ▼                                                                          │            Set up body reader from downstream socket              │
┌──────────────┐                                                                 │                                                                   │
│              │                                                                 │                                                                   │
│              │                                                                 │                                                                   │
│  Read body   │                                                                 │            Handle Expect: 100-Continue                            │
│              │                                                                 │                                                                   │
└────┬─────────┘                                                                 │            For Transfer-Encoding: chunked,                        │
     │                                                                           │                                                                   │
     │                                                                           │              set wrapper to encode (back) the request body        │
     │                                                                           │                                                                   │
     │                                                                           └──────────────────────────────────┬────────────────────────────────┘
┌────▼────────────────┐                                                                                             │
│                     │                                                                                             │
│ Set Content-Length  │                                                                                             │
│                     ├──────────────────────────────────┐                                                          │
│ Remove TE header    │                                  │                                                          │
│                     │                                  │                                                          │
└─────────────────────┘                                  │                                                          │
                                                         │                                                          │
                                                         ◄──────────────────────────────────────────────────────────┘
                                                         │
                                                         ▼
                                           ┌─────────────▼──────────────┐
                                           │                            │
                                           │   Connect  TLS Upstream    │
                                           │                            │
                                           │   via proxy                │
                                           │                            │
                                           │                            │
                                           └──────────────┬─────────────┘
                                                          │
                                                          │
                                                          │
                                           ┌──────────────▼──────────────┐
                                           │                             │
                                           │   Send Request              │
                                           │                             │
                                           │                             │
                                           └────────────────┬────────────┘
                                                            │
                                                            │
                                                            │
                                                            ▼
                                           ┌──────────────────────────────┐
                                           │                              │
                                           │     Write Downstream response│
                                           │                              │
                                           └──────────────────────────────┘

With regular socket, openresty will process the Expect header on socket read, so we only need to send back "100 Continue" when using raw socket.

tkan145 · 2024-01-09T03:55:12Z

Fixed #1403 (comment)

Also added 2 commits on top to handle Expect: 100-continue and Content-Type: application/x-www-form-urlencoded

eguzki

This was a hard one, wasn't it!

Great job 🎖️

kevprice83

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

If the tests are indeed correct then only an update to the README is needed.

One note though; why are there no unit tests added/modified?

Final comment: I think we are introducing I/O blocking operations via the file_size() function and this is going to be executed on every request that meets the conditions. Executing things like os.execute, io.open etc would generally be safe in the init and init_by_worker phases because it's a one-time execution but in this scenario we are doing it for every request. Have we considered this already? How much is it harming performance as a result?

kevprice83 · 2024-01-16T10:44:20Z

t/apicast-policy-camel.t

+
+
+=== TEST 15: https_proxy with request_unbuffered policy, only upstream and proxy_pass will buffer
+the request


What do we mean "only upstream & proxy_pass will buffer the request"? Are we saying that the initial request to the camel proxy will be unbuffered? Why would that be given that this is over TLS so the tunnel should be established directly between APIcast & upstream. Is there another part of the code you are referring to that would be unbuffered?

it works like this

Without request_buffering policy (default behavior)

HTTPS request ---> APIcast ---> [reach proxy code] ---> [call ngx.req.get_body_data()] ---> [ request is buffered to a file] ---> [construct a new request and a body reader from buffered file] ---> [ use lua-resty-http to perform handshake and send new request to camel server] ---> Camel

With request_buffering policy

HTTPS request ---> APIcast ---> [reach proxy code] ---> [construct a new request and set up body reader from downstream socket (without reading the body first)] ---> [ use lua-resty-http to perform handshake and send new request to camel server] ---> Camel

Upstream here is the upstream block in the test. It was a bit tricky to setup a test for this so I relied on a client request body is buffered to a temporary file in the log.

proxy_pass will buffer the request body by default

echo_read_request_body will also buffer the request. echo_request_body used to echo the request body so we can check if the request was sent to the upstream server

But I will update README file with more details

kevprice83 · 2024-01-16T11:06:17Z

gateway/src/apicast/http_proxy.lua

+                -- set by openresty based on the size of the buffer. However, when the body is rendered
+                -- to a file, we will need to calculate and manually set the Content-Length header based
+                -- on the file size
+                local contentLength, err = file_size(temp_file_path)


Is this safe to do for ALL requests which meet these conditions? I see that the calls in the file_size function are I/O blocking calls so I am wondering how harmful to performance those could be given they are not executed within a coroutine. If a coroutine cannot be used then we should consider using the lua-io-nginx module for example.

Is it enough to wrap that functionality with a coroutine? I don't know how useful that would be since it would yield on the first call anyway. Also the file_reader also call io.open all every request that has body buffered to file, so I guess we pay the price of calling io.open one more time?

But I totally agree with you that it is a I/O blocking function and should be avoided.

Checking the module lua-io-nginx I can see that this module is currently considered experimental. And it seems like it runs the task on another thread. However, I'm not so sure about this because we have to pay for context switching, threads, locking, etc.

It's worth to mention that the cost time of a single I/O operation won't be reduced, it was just transferred from the main thread (the one executes the event loop) to another exclusive thread. Indeed, the overhead might be a little higher, because of the extra tasks transferring, lock waiting, Lua coroutine resumption (and can only be resumed in the next event loop) and so forth. Nevertheless, after the offloading, the main thread doesn't block due to the I/O operation, and this is the fundamental advantage compared with the native Lua I/O library.

No sure how expensive is

function fsize (filename) local handle, err = open(filename) local current = handle:seek() -- get current position local size = handle:seek("end") -- get file size handle:seek("set", current) -- restore position return size end

Theoretically any IO operation could block the thread. We could try coroutines or any other mean to make it non blocking. Reading lua-nginx-module introduction it says:

Disk operations with relatively small amount of data can be done using the standard Lua io library but huge file reading and writing should be avoided wherever possible as they may block the Nginx process significantly. Delegating all network and disk I/O operations to Nginx's subrequests (via the [ngx.location.capture](https://github.com/openresty/lua-nginx-module#ngxlocationcapture) method and similar) is strongly recommended for maximum performance.

Not sure if we can follow that recommendation, though. @tkan145 we can try to discuss about this in a short call..

Anyway, AFAIK, we have never measured the capacity of APICast to handle traffic with request body big enough to be persisted in disk. All the tests performed where "simple" GET requests.

Fixed and also updated README file. @kevprice83 can you help review the README file and let me know if I need to add anything else?

eguzki · 2024-01-17T09:56:07Z

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

No need to say a single word more about this. If you do not understand, nobody else will. We definitely need to update request_unbuffered README with some doc highlighting different scenarios.

The big value IMO of the new request_unbuffered policy is that it provides a consistent behavior across multiple scenarios like

- APIcast <> upstream HTTP 1.1 plain
- APIcast <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream TLS
- APIcast <> HTTP Proxy (policy) <> upstream TLS
- APIcast <> HTTP Proxy (camel proxy) <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (policy) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (camel proxy) <> upstream HTTP 1.1 plain

The behavior being described by nginx doc when proxy_request_buffering is off.

When buffering is disabled, the request body is sent to the proxied server immediately as it is received.

proxied server here being either configured proxy or upstream (which can in turn also be a proxy).

Whereas when this policy is not in the chain, the behavior (for all the scenarios above) is the one described for proxy_request_buffering is on. I quote here

When buffering is enabled, the entire request body is [read](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) from the client before sending the request to a proxied server.

There are caveats and corner cases like when the request is application/x-www-form-urlencoded in which case, APIcast will always read the entire body, thus, buffer the request regardless of the request buffering policy

Furthermore, the behavior is also consistent regardless of the transfer encoding used. Either requests with known request body length with the "content-length" header or chunked requests, request buffering (or not) semantics will apply.

eguzki · 2024-01-19T09:29:22Z

gateway/src/apicast/policy/request_unbuffered/README.md

  `request_unbuffered` policy is enabled or not.
+- For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body.


This is nice to note, but it is not a caveat (a limitation), it is expected and correct. All about unbuffered request is the fact, as you correctly pointed out, that the request body will be sent to the proxied server immediately as it received. The transfer encoding is HOP-BY-HOP encoding and nothing prevents from changing it from one hop to the next one.

eguzki · 2024-01-19T11:54:32Z

gateway/src/apicast/policy/request_unbuffered/README.md

+For example, when the client sends 10GB, NGINX will buffer the entire 10GB to disk before sending anything to
+the upstream server.
+
+When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)


proxy_request_buffering is not the name of the policy, is it?

eguzki · 2024-01-19T15:28:11Z

I kindly requested review to @3scale/documentation team

dfennessy · 2024-01-19T15:54:45Z

gateway/src/apicast/policy/request_unbuffered/README.md

+
+## Technical details
+
+By default, NGINX reads the entire request body into memory (or buffers large requests into disk) before proxying it to the upstream server. However, reading bodies can become expensive, especially when requests with large payloads are sent.


Suggested change

By default, NGINX reads the entire request body into memory (or buffers large requests into disk) before proxying it to the upstream server. However, reading bodies can become expensive, especially when requests with large payloads are sent.

By default, NGINX reads the entire request body into memory or buffers large requests to disk before forwarding them to the upstream server. Reading bodies can become expensive, especially when sending requests containing large payloads.

dfennessy · 2024-01-19T15:59:41Z

gateway/src/apicast/policy/request_unbuffered/README.md

+For example, when the client sends 10GB, NGINX will buffer the entire 10GB to disk before sending anything to
+the upstream server.
+
+When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)


Suggested change

When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)

When the `proxy_request_buffering` is in the chain, request buffering is disabled, sending the request body to the proxied server immediately upon receiving it. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)

dfennessy · 2024-01-19T16:08:26Z

gateway/src/apicast/policy/request_unbuffered/README.md

+The response buffering is enabled by default in NGINX (the [`proxy_buffering: on`]() directive). It does
+this to shield the backend against slow clients ([slowloris attack](https://en.wikipedia.org/wiki/Slowloris_(computer_security))).
+
+If the `proxy_buffering` is disabled, the upstream server will be forced to keep the connection open until all data has been received by the client. Thereforce, NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.


Suggested change

If the `proxy_buffering` is disabled, the upstream server will be forced to keep the connection open until all data has been received by the client. Thereforce, NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.

If the `proxy_buffering` is disabled, the upstream server keeps the connection open until all data is received by the client. NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.

dfennessy · 2024-01-19T16:16:04Z

gateway/src/apicast/policy/request_unbuffered/README.md

+
+## Caveats
+
+- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`


Suggested change

- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`

- APIcast allows defining of mapping rules based on request content. For example, `POST /some_path?a_param={a_value}`

dfennessy

I've added a few suggestions.

eguzki · 2024-01-22T09:32:37Z

After running again the verification steps, The first request with small body (chunked TE), reaches upstream still with the TE chunked. And I think this is expected. As for this use case (TE chunked and TLS upstream through proxy) APIcast generates a socket reader that decodes the chunk encoding and re-encodes it back to be forwarded. So the request reaching upstream has to be TE chunked.

❯ curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d @my-data.json "http://post.example.com:8080/?user_key=123"
Warning: Couldn't read data from file "my-data.json", this makes an empty 
Warning: POST.
* Added post.example.com:8080:127.0.0.1 to DNS cache
* Hostname post.example.com was found in DNS cache
*   Trying 127.0.0.1:8080...
* Connected to post.example.com (127.0.0.1) port 8080 (#0)
> POST /?user_key=123 HTTP/1.1
> Host: post.example.com:8080
> User-Agent: curl/7.81.0
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
> 
* upload completely sent off: 5 out of 0 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Mon, 22 Jan 2024 09:14:35 GMT
< Content-Type: application/json
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
* no chunk, no close, no size. Assume close to signal end
< 
{
  "args": {
    "user_key": "123"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Type": "application/json", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "curl/7.81.0"
  }, 
  "json": null, 
  "origin": "172.18.0.4", 
  "url": "http://example.com/post?user_key=123"
}
* Closing connection 0

You might want to update verification steps.

eguzki · 2024-01-22T09:44:37Z

gateway/src/apicast/policy/request_unbuffered/README.md

+
+## Why does upstream receive a "Content-Length" header when the original request is sent with "Transfer-Encoding: chunked"
+
+For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body. Then it will send the request to upstream with the "Content-Length" header.


This is not true for the use case this PR is taking care of. Only when nginx handles the proxying task.

Actually, for a future enhancement, we could do the same. Read one buffer from the socket with size (S configurable). If all the full body has been read, send upstream with content-length. If the body full body is not read, proxy the request with TE: chunked.

Updated verification steps. I will address this is the future PR.

eguzki

For me this is mergeable.

I left a couple of comments that are just nitpicks.

kevprice83 · 2024-01-22T10:13:21Z

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

No need to say a single word more about this. If you do not understand, nobody else will. We definitely need to update request_unbuffered README with some doc highlighting different scenarios.

The big value IMO of the new request_unbuffered policy is that it provides a consistent behavior across multiple scenarios like
- APIcast <> upstream HTTP 1.1 plain
- APIcast <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream TLS
- APIcast <> HTTP Proxy (policy) <> upstream TLS
- APIcast <> HTTP Proxy (camel proxy) <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (policy) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (camel proxy) <> upstream HTTP 1.1 plain
The behavior being described by nginx doc when proxy_request_buffering is off.
When buffering is disabled, the request body is sent to the proxied server immediately as it is received. 
proxied server here being either configured proxy or upstream (which can in turn also be a proxy).

Whereas when this policy is not in the chain, the behavior (for all the scenarios above) is the one described for proxy_request_buffering is on. I quote here
When buffering is enabled, the entire request body is [read](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) from the client before sending the request to a proxied server.
There are caveats and corner cases like when the request is application/x-www-form-urlencoded in which case, APIcast will always read the entire body, thus, buffer the request regardless of the request buffering policy

Furthermore, the behavior is also consistent regardless of the transfer encoding used. Either requests with known request body length with the "content-length" header or chunked requests, request buffering (or not) semantics will apply.

This is exactly what we need. Unfortunately the way the integration tests are written makes that hard to understand, not sure if they can be worded or written in another way but honestly the explanation above should be enough for most situations I think. It's perfect.

kevprice83

This looks really good to me now especially with the new README. I'd still like to see the unit tests in there but it's not urgent.

Another question I had to ensure my understanding:

So when an HTTPS proxy is used and TE: chunked and body size is greater than the body_buffer_size then APIcast will buffer, decode then re-encode the request so that the upstream will still receive the originally TE: chunked request format right?

eguzki · 2024-01-22T10:53:38Z

So when an HTTPS proxy is used and TE: chunked and body size is greater than the body_buffer_size then APIcast will buffer, decode then re-encode the request so that the upstream will still receive the originally TE: chunked request format right?

No need to twist it too much. It is much simpler. When an HTTPS proxy is used and TE;

with unbuffered_request policy in place (i.e. proxy_request_buffering off): connection with upstream will be created upon request headers are received. Request headers and transfer encoding will be propagated as it is regardless of the body size. We might change this propagation in the future as there is no obligation to keep the encoding. Same as nginx does, we might implement that for small request bodies TE chunked is replaced by Content-Length and body is decoded. For big bodies, or slow connections (request bodies being slow to reach APIcast), APIcast will keep the encoding as it is obliged to open connection and propagate headers as soon as downstream request is received (as proxy_request_buffering off dictates)

NOTE: Internally, but the customers do not need to know this, for TE chunked, APIcast willl decode the encoding and re-encode again to keep the same encoding. And this is done in streaming mode, chunk by chunk as they are read from the socket. This is implementation detail and could be improved in the future. It happens that we implemented this decode <-> re-encode because the socket reading library decodes it and we cannot disable that.

with missing unbuffered_request policy (i.e. proxy_request_buffering on): The request body will be read (buffered) by APIcast. TE header will be replaced by the "Content-Length" header for the upstream connection. Only when the entire body has been read by APIcast, APIcast will initiate the connection with upstream sending the request with Content-Length header and removing the chunked encoding.

NOTE: Internally, when the body is bigger than the client_boby_buffer_size, APIcast willl write the body in a file after removing the chunked encoding and the upstream http client will receive a body reader which is an iterator reading from the file.

eguzki · 2024-01-22T11:02:42Z

@dfennessy your approval is required as you requested changes.

dfennessy

LGTM!

kevprice83 requested changes Jun 1, 2023

View reviewed changes

gateway/src/resty/http/chunked.lua Outdated Show resolved Hide resolved

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from e3f8888 to c1cae29 Compare June 26, 2023 09:01

tkan145 commented Jun 26, 2023

View reviewed changes

t/http-proxy.t Outdated Show resolved Hide resolved

eguzki reviewed Jul 3, 2023

View reviewed changes

gateway/src/apicast/http_proxy.lua Outdated Show resolved Hide resolved

eguzki reviewed Jul 3, 2023

View reviewed changes

t/http-proxy.t Outdated Show resolved Hide resolved

eguzki reviewed Jul 4, 2023

View reviewed changes

gateway/src/apicast/http_proxy.lua Outdated Show resolved Hide resolved

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 04a9b0c to addb342 Compare October 10, 2023 06:18

tkan145 changed the title ~~[THREESCALE-9542] Add support to proxy request with Transfer-Encoding: chunked~~ [THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked Oct 10, 2023

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from addb342 to 94a825f Compare November 21, 2023 06:40

tkan145 marked this pull request as ready for review November 22, 2023 08:22

tkan145 requested a review from a team as a code owner November 22, 2023 08:22

tkan145 requested a review from eguzki November 22, 2023 08:24

tkan145 added 2 commits November 27, 2023 21:23

Support chunked requests when talking to proxy servers with request b…

b02a888

…uffering disabled

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 94a825f to b02a888 Compare November 27, 2023 11:33

eguzki reviewed Dec 14, 2023

View reviewed changes

gateway/src/apicast/http_proxy.lua Outdated Show resolved Hide resolved

gateway/src/resty/http/response_writer.lua Show resolved Hide resolved

eguzki reviewed Dec 14, 2023

View reviewed changes

gateway/src/resty/http/request_reader.lua Show resolved Hide resolved

eguzki reviewed Dec 14, 2023

View reviewed changes

tkan145 added 3 commits January 8, 2024 17:53

Fixed response body not being sent

e3f6db5

Handle "Expect:100-continue" header only when raw socket is used

f5af618

With regular socket, openresty will process the Expect header on socket read, so we only need to send back "100 Continue" when using raw socket.

Update CHANGELOG.md file

5e05612

tkan145 requested a review from eguzki January 10, 2024 00:33

eguzki approved these changes Jan 15, 2024

View reviewed changes

thomasmaas requested a review from kevprice83 January 16, 2024 10:57

kevprice83 requested changes Jan 16, 2024

View reviewed changes

Wrapping block I/O call in a coroutine

2bfc227

eguzki reviewed Jan 19, 2024

View reviewed changes

Update request_unbuffered policy README.md file

f19b9e5

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 6fd6470 to f19b9e5 Compare January 19, 2024 10:57

tkan145 requested review from kevprice83 and eguzki January 19, 2024 11:50

eguzki reviewed Jan 19, 2024

View reviewed changes

eguzki requested a review from a team January 19, 2024 15:27

dfennessy reviewed Jan 19, 2024

View reviewed changes

dfennessy requested changes Jan 19, 2024

View reviewed changes

Address PR comments

b494ca8

tkan145 force-pushed the THREESCALE-9542-chunked-request branch from c982788 to b494ca8 Compare January 22, 2024 07:23

tkan145 requested review from eguzki and dfennessy January 22, 2024 08:28

eguzki reviewed Jan 22, 2024

View reviewed changes

eguzki approved these changes Jan 22, 2024

View reviewed changes

kevprice83 approved these changes Jan 22, 2024

View reviewed changes

dfennessy approved these changes Jan 22, 2024

View reviewed changes

tkan145 merged commit c38418c into 3scale:master Jan 22, 2024
12 checks passed



		=== TEST 15: https_proxy with request_unbuffered policy, only upstream and proxy_pass will buffer
		the request

		`request_unbuffered` policy is enabled or not.
		- For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body.


		## Technical details

		By default, NGINX reads the entire request body into memory (or buffers large requests into disk) before proxying it to the upstream server. However, reading bodies can become expensive, especially when requests with large payloads are sent.

	When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [Caveats](#caveats)
	When the `proxy_request_buffering` is in the chain, request buffering is disabled, sending the request body to the proxied server immediately upon receiving it. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [Caveats](#caveats)

	If the `proxy_buffering` is disabled, the upstream server will be forced to keep the connection open until all data has been received by the client. Thereforce, NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.
	If the `proxy_buffering` is disabled, the upstream server keeps the connection open until all data is received by the client. NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.


		## Caveats

		- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`

	- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`
	- APIcast allows defining of mapping rules based on request content. For example, `POST /some_path?a_param={a_value}`


		## Why does upstream receive a "Content-Length" header when the original request is sent with "Transfer-Encoding: chunked"

		For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body. Then it will send the request to upstream with the "Content-Length" header.

[THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked #1403

[THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked #1403

Conversation

tkan145 commented Jun 1, 2023 • edited

What:

Note to reviewers

Verification steps:

kevprice83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki left a comment

Choose a reason for hiding this comment

eguzki commented Dec 14, 2023

eguzki commented Dec 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkan145 Dec 14, 2023 • edited

Choose a reason for hiding this comment

eguzki Dec 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki Dec 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkan145 commented Dec 14, 2023

eguzki commented Dec 20, 2023

tkan145 commented Jan 9, 2024

eguzki left a comment

Choose a reason for hiding this comment

kevprice83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Without request_buffering policy (default behavior)

With request_buffering policy

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki commented Jan 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki commented Jan 19, 2024

dfennessy Jan 19, 2024 • edited

Choose a reason for hiding this comment

dfennessy Jan 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfennessy left a comment

Choose a reason for hiding this comment

eguzki commented Jan 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki left a comment

Choose a reason for hiding this comment

kevprice83 commented Jan 22, 2024

kevprice83 left a comment

Choose a reason for hiding this comment

eguzki commented Jan 22, 2024 • edited

eguzki commented Jan 22, 2024

dfennessy left a comment

Choose a reason for hiding this comment

tkan145 commented Jun 1, 2023 •

edited

tkan145 Dec 14, 2023 •

edited

eguzki Dec 18, 2023 •

edited

eguzki Dec 19, 2023 •

edited

dfennessy Jan 19, 2024 •

edited

dfennessy Jan 19, 2024 •

edited

eguzki commented Jan 22, 2024 •

edited