Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked #1403

Merged
merged 8 commits into from Jan 22, 2024

Conversation

tkan145
Copy link
Contributor

@tkan145 tkan145 commented Jun 1, 2023

What:

Fix https://issues.redhat.com/browse/THREESCALE-9542

This PR adds support to proxy the request with "Transfer-Encoding: chunked" when using with the proxy server.

Note to reviewers

Please just review the last 2 commits. I will rebase once part 1 merged

Verification steps:

  • Checkout this branch

  • Make runtime-image IMAGE_NAME=apicast-test

make runtime-image IMAGE_NAME=apicast-test
  • Then run the gateway with the built image
diff --git a/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json b/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
index 5227c5aa..24c45338 100644
--- a/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
+++ b/dev-environments/https-proxy-upstream-tlsv1.3/apicast-config.json
@@ -44,6 +44,11 @@
           "host": "backend"
         },
         "policy_chain": [
+          {
+              "name": "request_unbuffered",
+              "version": "builtin",
+              "configuration": {}
+          },
           {
             "name": "apicast.policy.http_proxy",
             "configuration": {
cd dev-environments/https-proxy-upstream-tlsv1.3
make certs
make gateway IMAGE_NAME=apicast-test
  • Send chunked request with one chunk body
curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d @my-data.json "http://post.example.com:8080/?user_key=123"

The request should return 200 OK. Note that upstream echo API is reporting that the request included Transfer-Encoding: chunked header and the expected body.

 ▲  curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d 'hello, world' "http://post.example.com:8080/?user_key=123"
* Added post.example.com:8080:127.0.0.1 to DNS cache
* Hostname post.example.com was found in DNS cache
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to post.example.com (127.0.0.1) port 8080 (#0)
> POST /?user_key=123 HTTP/1.1
> Host: post.example.com:8080
> User-Agent: curl/7.61.1
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
>
> c
* upload completely sent off: 19 out of 12 bytes
< HTTP/1.1 200 OK
< {
<   "args": {
<     "user_key": "123"
<   },
<   "data": "hello, world",
<   "files": {},
<   "form": {},
<   "headers": {
<     "Accept": "*/*",
<     "Content-Type": "application/json",
<     "Host": "example.com",
<     "Transfer-Encoding": "chunked",
<     "User-Agent": "curl/7.61.1"
<   },
<   "json": null,
<   "origin": "172.25.0.2",
<   "url": "http://example.com/post?user_key=123"
< }
* Connection #0 to host post.example.com left intact
  • Send chunked request with few chunks in the body delayed in time. Python3 is required.

First get the APICast IPAddress

 ▲ docker inspect https-proxy-upstream-tlsv13-gateway-run-d76ff72726ec | grep IPAddress
cat <<EOF >chunked-request.py
import http.client
import time

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

http.client.HTTPConnection.debuglevel = 1
conn = http.client.HTTPConnection('127.0.0.1', 8080)

headers = {'Content-type': 'application/octet-stream', 'Host': 'post.example.com'}

conn.request('POST', '/?user_key=foo', gen(), headers)

response = conn.getresponse()
print(response.read().decode())
EOF

Replace 127.0.0.1 with the IP of APIcast gateway above

> python3 chunked-request.py
send: b'POST /?user_key=foo HTTP/1.1\r\nAccept-Encoding: identity\r\nTransfer-Encoding: chunked\r\nContent-type: application/octet-stream\r\nHost: post.example.com\r\n\r\n'
send: b'2\r\nhi\r\n'                                                                                                                                                        
send: b'5\r\nthere\r\n'                                                                                                                                                     
send: b'3\r\nbye\r\n'                                                                                                                                                       
send: b'0\r\n\r\n'                                                                                                                                                          
reply: 'HTTP/1.1 200 OK\r\n'                                                                                                                                                
header: Access-Control-Allow-Credentials: true                                                                                                                              
header: Access-Control-Allow-Origin: *                                                                                                                                      
header: Date: Tue, 09 Jan 2024 03:40:18 GMT                                                                                                                                 
header: Content-Type: application/json                                                                                                                                      
header: Server: gunicorn/19.9.0                                                                                                                                             
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Type": "application/octet-stream", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked",
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  }, 
  "json": null, 
  "origin": "172.18.0.4"
  "url": "http://example.com/post?user_key=foo"
}
  • Note that the upstream service got transfer encoding chunked request and chunked encoding of the request body with the length bytes preceding each chunk.
> 2024/01/09 03:40:14.000960414  length=203 from=0 to=202 
POST /post?user_key=foo HTTP/1.1\r                        
User-Agent: lua-resty-http/0.14 (Lua) ngx_lua/10019\r     
Transfer-Encoding: chunked\r                              
Host: example.com\r                                       
Accept-Encoding: identity\r                               
Content-type: application/octet-stream\r                  
\r                                                        
> 2024/01/09 03:40:14.000960570  length=7 from=203 to=209 
2\r
hi\r
> 2024/01/09 03:40:16.000952052  length=10 from=210 to=219
5\r
there\r
> 2024/01/09 03:40:18.000954073  length=8 from=220 to=227
3\r
bye\r
> 2024/01/09 03:40:18.000954120  length=5 from=228 to=232
0\r
\r
< 2024/01/09 03:40:18.000954722  length=653 from=0 to=652
HTTP/1.1 200 OK\r
Server: gunicorn/19.9.0\r
Date: Tue, 09 Jan 2024 03:40:18 GMT\r
Connection: keep-alive\r
Content-Type: application/json\r
Content-Length: 423\r
Access-Control-Allow-Origin: *\r
Access-Control-Allow-Credentials: true\r
\r
{
  "args": {
    "user_key": "foo"
  }, 
  "data": "hitherebye", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Type": "application/octet-stream", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  }, 
  "json": null, 
  "origin": "172.18.0.4", 
  "url": "http://example.com/post?user_key=foo"
}
  • Send chunked request with expect 100-continue header.
cat <<EOF >chunked-request.py
import http.client
import time

def gen():
    yield bytes('hi', "utf-8")
    time.sleep(2)
    yield bytes('there', "utf-8")
    time.sleep(2)
    yield bytes('bye', "utf-8")

http.client.HTTPConnection.debuglevel = 1
conn = http.client.HTTPConnection('127.0.0.1', 8080)

headers = {'Content-type': 'application/octet-stream', 'Host': 'post.example.com', 'Expect': '100-continue'}

conn.request('POST', '/?user_key=foo', gen(), headers)

response = conn.getresponse()
print(response.read().decode())
EOF
  • Note that the upstream service got transfer encoding chunked request and return 100 Continue
▲ python3 ./chunked-request.py

send: b'POST /?user_key=foo HTTP/1.1\r\nAccept-Encoding: identity\r\nTransfer-Encoding: chunked\r\nContent-type: application/octet-stream\r\nHost: post.example.com\r\nExpect: 100-continue\r\n\r\n'
send: b'2\r\nhi\r\n'                          
send: b'5\r\nthere\r\n'                       
send: b'3\r\nbye\r\n'                         
send: b'0\r\n\r\n'                            
reply: 'HTTP/1.1 100 Continue\r\n'            
headers: [b'\r\n']                            
reply: 'HTTP/1.1 200 OK\r\n'                  
header: Access-Control-Allow-Credentials: true
header: Access-Control-Allow-Origin: *        
header: Date: Tue, 09 Jan 2024 03:47:02 GMT   
header: Content-Type: application/json        
header: Server: gunicorn/19.9.0
{                                                          
  "args": {                                                
    "user_key": "foo"                                      
  },                                                       
  "data": "hitherebye",                                    
  "files": {},                                             
  "form": {},                                              
  "headers": {                                             
    "Accept-Encoding": "identity",                         
    "Content-Type": "application/octet-stream",            
    "Expect": "100-continue",                              
    "Host": "example.com",                                 
    "Transfer-Encoding": "chunked",                        
    "User-Agent": "lua-resty-http/0.14 (Lua) ngx_lua/10019"
  },                                                       
  "json": null,                                            
  "origin": "172.18.0.4",                                  
  "url": "http://example.com/post?user_key=foo"            
}

Copy link
Member

@kevprice83 kevprice83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add unit tests and integration tests for all the scenarios that reproduce the reported issue:

  • http proxy env vars (over TLS)
  • http_proxy policy (over TLS)
  • camel_proxy policy (over TLS)

gateway/src/resty/http/chunked.lua Outdated Show resolved Hide resolved
@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from e3f8888 to c1cae29 Compare June 26, 2023 09:01
t/http-proxy.t Outdated Show resolved Hide resolved
}
--- backend env
server_name test-backend.lvh.me;
listen $TEST_NGINX_RANDOM_PORT ssl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why 3scale backend is configured with TLS connection

t/http-proxy.t Outdated Show resolved Hide resolved
@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 04a9b0c to addb342 Compare October 10, 2023 06:18
@tkan145 tkan145 changed the title [THREESCALE-9542] Add support to proxy request with Transfer-Encoding: chunked [THREESCALE-9542] Part 2: Add support to proxy request with Transfer-Encoding: chunked Oct 10, 2023
@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from addb342 to 94a825f Compare November 21, 2023 06:40
@tkan145 tkan145 marked this pull request as ready for review November 22, 2023 08:22
@tkan145 tkan145 requested a review from a team as a code owner November 22, 2023 08:22
@tkan145 tkan145 requested a review from eguzki November 22, 2023 08:24
When a request with the HTTP "Transfer-Encoding: chunked" header is sent, APIcast
buffers the entire request because by default it does not support sending chunked
requests. However, when sending via proxy, APIcast does not remove the header sent
in the initial request, which tells the server that the client is sending a chunk
request. This then causes an Bad Request error because the upstream will not be able
to determine the end of the chunk from the request.

This commit removes the "Transfer-Encoding: chunked" header from the request when
sending through a proxy.
@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 94a825f to b02a888 Compare November 27, 2023 11:33
Copy link
Member

@eguzki eguzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

However, this is complex and also hard to maintain. I want to try some other approach relying on lua-resty-http to 0.17.1 or some other library. If we cannot find a simpler (from the APIcast base code perspective), we can always use this code.

gateway/src/apicast/http_proxy.lua Outdated Show resolved Hide resolved
gateway/src/resty/http/response_writer.lua Show resolved Hide resolved
@eguzki
Copy link
Member

eguzki commented Dec 14, 2023

When using the python client, the response body is not shown. I wonder if APIcast is not handling the response correctly

@eguzki
Copy link
Member

eguzki commented Dec 14, 2023

I am going to try https://github.com/ledgetech/lua-resty-http#set_proxy_options looks promising


if http_methods_with_body[req_method] then
if opts.request_unbuffered and ngx_http_version() == 1.1 then
local _, err = handle_expect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Expect needs to be handled for when buffering is enabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I do not think we should be doing this. The lua-resty-http lib is doing that for us. WDYT?

Copy link
Contributor Author

@tkan145 tkan145 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib lua-resty-http is a client library and it handles the Expect returned from the server, while we are acting as a server here and need to process the Expect header from the client.

When I sent a large payload using cURL, the request hung, I later found out it was due to the Expect header.

I will run some more tests to see whether we really need it here

Copy link
Member

@eguzki eguzki Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I understand now.

I think that when buffered is on, APIcast should protect upstream and should handle the Expect: 100-Continue. That is, it is the apicast who returns HTTP Response 100 Continue and then consumes the body before opening the connection to upstream. I think this is how it works right now in master. The request Expect: 100-Continue and response 100 Continue happens twice. First time between downstream and then between apicast and upstream (done by lua resty http lib because the Expect header is still there). We might consider removing the expect header on "buffered" mode. Unless we want to keep the Expect protocol with upstream to avoid sending the body if upstream does not want to. Which also makes sense to me. It is actually a requirement from rfc2616#section-8.2.3 to be like this. Check Requirements for HTTP/1.1 proxies: section.

When unbuffered is on, APIcast does not read the body with ngx.req.read_body(), thus, it does not send 100 Continue to downstream. I think that is the reason you saw the request hung. Ideally, I think that we should let upstream to decide if it wants to continue or not, and propagate the response to downstream. Downstream would start sending the body only when upstream tells to do that. I think it is quite hard to implement that. Basically because the lua resty http lib consumes the 100 Continue response of the upstream and then tries to send the body. I do not see a way to do this, other than sending manually the 100 Continue response to the downstream and create a body reader that will be consumed by the lua resty http library. But I can see some issues in there as well. What if upstream says 302 redirect or 400 bad request instead of 100 Continue? The downstream client would have already write the body in the downstream socket and that socket would be unusable for following up HTTP sessions. I do not know how to proceed regarding this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heave re-written the message above. In case you have read it previosly, please re-read it again 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. I haven't read the openresty code but do you mean ngx.req.read_body() will send 100 Continue downstream? Doesn't that also mean that APIcast returns 100 Continue to the downstream application before establishing the upstream connection?

Regarding the 400, please correct me if I'm wrong, but I think the only case where the upstream server returns this error is if there is data in the request body. In my head the flow will be as follow

client -> Expect: 100-Continue -> upstream -> 100 Continue -> client
client -> start sending body -> upstream read body -> return 400

Copy link
Member

@eguzki eguzki Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the openresty code but do you mean ngx.req.read_body() will send 100 Continue downstream?

Yes!

Doesn't that also mean that APIcast returns 100 Continue to the downstream application before establishing the upstream connection?

Exactly (when buffered mode is on)

the only case where the upstream server returns this error is if there is data in the request body

400 Bad Request is just an example. It could be 5XX error as well. In unbuffered mode, the workflow would be as follows (in my head)

client -> Expect: 100-Continue -> apicast
client <- 100 Continue <- apicast
client -> write body to socket -> apicast 
# Apicast did not read the body yet, it just created a body reader from the socket
apicast -> create connection via proxy -> TLS upstream
apicast (lua resty http) -> Expect: 100-Continue -> TLS upstream
apicast (lua resty http) <- 100 Continue <- TLS upstream
apicast (lua resty http) -> send body from the body reader -> TLS upstream

So let's say that upstream does not want it to start upload:

client -> Expect: 100-Continue -> apicast
client <- 100 Continue <- apicast
client -> write body to socket -> apicast 
# Apicast did not read the body yet, it just created a body reader from the socket
apicast -> create connection via proxy -> TLS upstream
apicast (lua resty http) -> Expect: 100-Continue -> TLS upstream
apicast (lua resty http) <- 5XX Error <- TLS upstream
client <-  5XX Error <- apicast

My issue with this is that the client has sent the body and nobody has consumed it. I need to try this scenario to see what we can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this nginx thread https://mailman.nginx.org/pipermail/nginx/2021-May/060643.html. I think nginx does not handle this well either

How about we send back error response, discard the body and close the socket?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we send back error response, discard the body and close the socket?

It's aggressive, but can be a way out.


if is_chunked then
-- If the body is smaller than "client_boby_buffer_size" the Content-Length header is
-- set based on the size of the buffer. However, when the body is rendered to a file,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the body is smaller than "client_boby_buffer_size" the Content-Length header is set based on the size of the buffer

Who is doing that? In other words, when all the conditions meet:

  • the request is chunked,
  • buffering is enabled
  • the request body is small

Who sets the Content-Length header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lua-resty-http will set the Content-Length based on the body that we passed in. But good catch I should have put more details in the comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, It's because it is a string and the resty-http gets the length out of it. It happens here. I would make it explicit, but good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree because this is something that will come up again in future when troubleshooting but it doesn't need to be done in this PR, can be added at a later date that if headers["Content-Length"]=nil then headers["Content-Length"]=#body (this will at least be a useful reference for now)

@tkan145
Copy link
Contributor Author

tkan145 commented Dec 14, 2023

I am going to try https://github.com/ledgetech/lua-resty-http#set_proxy_options looks promising

I tried it in #1434 in /http/proxy.lua

@eguzki
Copy link
Member

eguzki commented Dec 20, 2023

I am ok going with these changes for now. First #1403 (comment) needs to be fixed.

However, in following up PR's, this (spaghetti) code needs some simplification. It is hard to maintain and understand. The forward_https_request method should implement the diagram below

                                               ┌────────────────────────┐
                       YES                     │                        │                   NO
                                               │                        │
      ┌────────────────────────────────────────┤                        ├────────────────────────────────────────────┐
      │                                        │   Request Buffering?   │                                            │
      │                                        │                        │                                            │
      │                                        │                        │                                            │
      │                                        │                        │                                            │
      │                                        └────────────────────────┘                                            │
      │                                                                                                              │
      │                                                                          ┌───────────────────────────────────▼───────────────────────────────┐
      │                                                                          │                                                                   │
      │                                                                          │                                                                   │
      │                                                                          │                                                                   │
      ▼                                                                          │            Set up body reader from downstream socket              │
┌──────────────┐                                                                 │                                                                   │
│              │                                                                 │                                                                   │
│              │                                                                 │                                                                   │
│  Read body   │                                                                 │            Handle Expect: 100-Continue                            │
│              │                                                                 │                                                                   │
└────┬─────────┘                                                                 │            For Transfer-Encoding: chunked,                        │
     │                                                                           │                                                                   │
     │                                                                           │              set wrapper to encode (back) the request body        │
     │                                                                           │                                                                   │
     │                                                                           └──────────────────────────────────┬────────────────────────────────┘
┌────▼────────────────┐                                                                                             │
│                     │                                                                                             │
│ Set Content-Length  │                                                                                             │
│                     ├──────────────────────────────────┐                                                          │
│ Remove TE header    │                                  │                                                          │
│                     │                                  │                                                          │
└─────────────────────┘                                  │                                                          │
                                                         │                                                          │
                                                         ◄──────────────────────────────────────────────────────────┘
                                                         │
                                                         ▼
                                           ┌─────────────▼──────────────┐
                                           │                            │
                                           │   Connect  TLS Upstream    │
                                           │                            │
                                           │   via proxy                │
                                           │                            │
                                           │                            │
                                           └──────────────┬─────────────┘
                                                          │
                                                          │
                                                          │
                                           ┌──────────────▼──────────────┐
                                           │                             │
                                           │   Send Request              │
                                           │                             │
                                           │                             │
                                           └────────────────┬────────────┘
                                                            │
                                                            │
                                                            │
                                                            ▼
                                           ┌──────────────────────────────┐
                                           │                              │
                                           │     Write Downstream response│
                                           │                              │
                                           └──────────────────────────────┘

With regular socket, openresty will process the Expect header on socket
read, so we only need to send back "100 Continue" when using raw socket.
@tkan145
Copy link
Contributor Author

tkan145 commented Jan 9, 2024

Fixed #1403 (comment)

Also added 2 commits on top to handle Expect: 100-continue and Content-Type: application/x-www-form-urlencoded

@tkan145 tkan145 requested a review from eguzki January 10, 2024 00:33
Copy link
Member

@eguzki eguzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a hard one, wasn't it!

Great job 🎖️

Copy link
Member

@kevprice83 kevprice83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

If the tests are indeed correct then only an update to the README is needed.

One note though; why are there no unit tests added/modified?

Final comment: I think we are introducing I/O blocking operations via the file_size() function and this is going to be executed on every request that meets the conditions. Executing things like os.execute, io.open etc would generally be safe in the init and init_by_worker phases because it's a one-time execution but in this scenario we are doing it for every request. Have we considered this already? How much is it harming performance as a result?



=== TEST 15: https_proxy with request_unbuffered policy, only upstream and proxy_pass will buffer
the request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we mean "only upstream & proxy_pass will buffer the request"? Are we saying that the initial request to the camel proxy will be unbuffered? Why would that be given that this is over TLS so the tunnel should be established directly between APIcast & upstream. Is there another part of the code you are referring to that would be unbuffered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works like this

Without request_buffering policy (default behavior)

HTTPS request ---> APIcast ---> [reach proxy code] ---> [call ngx.req.get_body_data()] ---> [ request is buffered to a file] ---> [construct a new request and a body reader from buffered file] ---> [ use lua-resty-http to perform handshake and send new request to camel server]  ---> Camel

With request_buffering policy

HTTPS request ---> APIcast  ---> [reach proxy code] ---> [construct a new request and set up body reader from downstream socket (without reading the body first)] ---> [ use lua-resty-http to perform handshake and send new request to camel server]  ---> Camel

Upstream here is the upstream block in the test. It was a bit tricky to setup a test for this so I relied on a client request body is buffered to a temporary file in the log.

  • proxy_pass will buffer the request body by default
  • echo_read_request_body will also buffer the request. echo_request_body used to echo the request body so we can check if the request was sent to the upstream server

But I will update README file with more details

-- set by openresty based on the size of the buffer. However, when the body is rendered
-- to a file, we will need to calculate and manually set the Content-Length header based
-- on the file size
local contentLength, err = file_size(temp_file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do for ALL requests which meet these conditions? I see that the calls in the file_size function are I/O blocking calls so I am wondering how harmful to performance those could be given they are not executed within a coroutine. If a coroutine cannot be used then we should consider using the lua-io-nginx module for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to wrap that functionality with a coroutine? I don't know how useful that would be since it would yield on the first call anyway. Also the file_reader also call io.open all every request that has body buffered to file, so I guess we pay the price of calling io.open one more time?

But I totally agree with you that it is a I/O blocking function and should be avoided.

Checking the module lua-io-nginx I can see that this module is currently considered experimental. And it seems like it runs the task on another thread. However, I'm not so sure about this because we have to pay for context switching, threads, locking, etc.

It's worth to mention that the cost time of a single I/O operation won't be reduced, it was just
transferred from the main thread (the one executes the event loop) to another exclusive thread.
Indeed, the overhead might be a little higher, because of the extra tasks transferring, lock waiting,
Lua coroutine resumption (and can only be resumed in the next event loop) and so forth. Nevertheless,
after the offloading, the main thread doesn't block due to the I/O operation, and this is the fundamental
advantage compared with the native Lua I/O library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure how expensive is

function fsize (filename)
      local handle, err = open(filename)
      local current = handle:seek()      -- get current position
      local size = handle:seek("end")    -- get file size
      handle:seek("set", current)        -- restore position
      return size
    end

Theoretically any IO operation could block the thread. We could try coroutines or any other mean to make it non blocking. Reading lua-nginx-module introduction it says:

Disk operations with relatively small amount of data can be done using the standard Lua io library but huge file reading and writing should be avoided wherever possible as they may block the Nginx process significantly. Delegating all network and disk I/O operations to Nginx's subrequests (via the [ngx.location.capture](https://github.com/openresty/lua-nginx-module#ngxlocationcapture) method and similar) is strongly recommended for maximum performance.

Not sure if we can follow that recommendation, though. @tkan145 we can try to discuss about this in a short call..

Anyway, AFAIK, we have never measured the capacity of APICast to handle traffic with request body big enough to be persisted in disk. All the tests performed where "simple" GET requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and also updated README file. @kevprice83 can you help review the README file and let me know if I need to add anything else?

@eguzki
Copy link
Member

eguzki commented Jan 17, 2024

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

No need to say a single word more about this. If you do not understand, nobody else will. We definitely need to update request_unbuffered README with some doc highlighting different scenarios.

The big value IMO of the new request_unbuffered policy is that it provides a consistent behavior across multiple scenarios like

- APIcast <> upstream HTTP 1.1 plain
- APIcast <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream TLS
- APIcast <> HTTP Proxy (policy) <> upstream TLS
- APIcast <> HTTP Proxy (camel proxy) <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (policy) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (camel proxy) <> upstream HTTP 1.1 plain

The behavior being described by nginx doc when proxy_request_buffering is off.

When buffering is disabled, the request body is sent to the proxied server immediately as it is received. 

proxied server here being either configured proxy or upstream (which can in turn also be a proxy).

Whereas when this policy is not in the chain, the behavior (for all the scenarios above) is the one described for proxy_request_buffering is on. I quote here

When buffering is enabled, the entire request body is [read](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) from the client before sending the request to a proxied server.

There are caveats and corner cases like when the request is application/x-www-form-urlencoded in which case, APIcast will always read the entire body, thus, buffer the request regardless of the request buffering policy

Furthermore, the behavior is also consistent regardless of the transfer encoding used. Either requests with known request body length with the "content-length" header or chunked requests, request buffering (or not) semantics will apply.

`request_unbuffered` policy is enabled or not.
- For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice to note, but it is not a caveat (a limitation), it is expected and correct. All about unbuffered request is the fact, as you correctly pointed out, that the request body will be sent to the proxied server immediately as it received. The transfer encoding is HOP-BY-HOP encoding and nothing prevents from changing it from one hop to the next one.

@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from 6fd6470 to f19b9e5 Compare January 19, 2024 10:57
For example, when the client sends 10GB, NGINX will buffer the entire 10GB to disk before sending anything to
the upstream server.

When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy_request_buffering is not the name of the policy, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

@eguzki eguzki requested a review from a team January 19, 2024 15:27
@eguzki
Copy link
Member

eguzki commented Jan 19, 2024

I kindly requested review to @3scale/documentation team


## Technical details

By default, NGINX reads the entire request body into memory (or buffers large requests into disk) before proxying it to the upstream server. However, reading bodies can become expensive, especially when requests with large payloads are sent.
Copy link
Contributor

@dfennessy dfennessy Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, NGINX reads the entire request body into memory (or buffers large requests into disk) before proxying it to the upstream server. However, reading bodies can become expensive, especially when requests with large payloads are sent.
By default, NGINX reads the entire request body into memory or buffers large requests to disk before forwarding them to the upstream server. Reading bodies can become expensive, especially when sending requests containing large payloads.

For example, when the client sends 10GB, NGINX will buffer the entire 10GB to disk before sending anything to
the upstream server.

When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)
Copy link
Contributor

@dfennessy dfennessy Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When `proxy_request_buffering` is in the chain, request buffering will be disabled and the request body will be sent to the proxied server immediately as it received. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)
When the `proxy_request_buffering` is in the chain, request buffering is disabled, sending the request body to the proxied server immediately upon receiving it. This can help minimize time spent sending data to a service and disk I/O for requests with big body. However, there are caveats and corner cases applied, [**Caveats**](#caveats)

The response buffering is enabled by default in NGINX (the [`proxy_buffering: on`]() directive). It does
this to shield the backend against slow clients ([slowloris attack](https://en.wikipedia.org/wiki/Slowloris_(computer_security))).

If the `proxy_buffering` is disabled, the upstream server will be forced to keep the connection open until all data has been received by the client. Thereforce, NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the `proxy_buffering` is disabled, the upstream server will be forced to keep the connection open until all data has been received by the client. Thereforce, NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.
If the `proxy_buffering` is disabled, the upstream server keeps the connection open until all data is received by the client. NGINX [advises](https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#proxy_buffering-off) against disabling `proxy_buffering` as it will potentially waste upstream server resources.


## Caveats

- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Because APIcast allows defining mapping rules based on request content, ie `POST /some_path?a_param={a_value}`
- APIcast allows defining of mapping rules based on request content. For example, `POST /some_path?a_param={a_value}`

Copy link
Contributor

@dfennessy dfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few suggestions.

@tkan145 tkan145 force-pushed the THREESCALE-9542-chunked-request branch from c982788 to b494ca8 Compare January 22, 2024 07:23
@eguzki
Copy link
Member

eguzki commented Jan 22, 2024

After running again the verification steps, The first request with small body (chunked TE), reaches upstream still with the TE chunked. And I think this is expected. As for this use case (TE chunked and TLS upstream through proxy) APIcast generates a socket reader that decodes the chunk encoding and re-encodes it back to be forwarded. So the request reaching upstream has to be TE chunked.

❯ curl --resolve post.example.com:8080:127.0.0.1 -v -H "Transfer-Encoding: chunked"   -H "Content-Type: application/json"  -d @my-data.json "http://post.example.com:8080/?user_key=123"
Warning: Couldn't read data from file "my-data.json", this makes an empty 
Warning: POST.
* Added post.example.com:8080:127.0.0.1 to DNS cache
* Hostname post.example.com was found in DNS cache
*   Trying 127.0.0.1:8080...
* Connected to post.example.com (127.0.0.1) port 8080 (#0)
> POST /?user_key=123 HTTP/1.1
> Host: post.example.com:8080
> User-Agent: curl/7.81.0
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/json
> 
* upload completely sent off: 5 out of 0 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Mon, 22 Jan 2024 09:14:35 GMT
< Content-Type: application/json
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
* no chunk, no close, no size. Assume close to signal end
< 
{
  "args": {
    "user_key": "123"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Type": "application/json", 
    "Host": "example.com", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "curl/7.81.0"
  }, 
  "json": null, 
  "origin": "172.18.0.4", 
  "url": "http://example.com/post?user_key=123"
}
* Closing connection 0

You might want to update verification steps.


## Why does upstream receive a "Content-Length" header when the original request is sent with "Transfer-Encoding: chunked"

For a request with "small" body that fits into [`client_body_buffer_size`](https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) and with header "Transfer-Encoding: chunked", NGINX will always read and know the length of the body. Then it will send the request to upstream with the "Content-Length" header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true for the use case this PR is taking care of. Only when nginx handles the proxying task.

Actually, for a future enhancement, we could do the same. Read one buffer from the socket with size (S configurable). If all the full body has been read, send upstream with content-length. If the body full body is not read, proxy the request with TE: chunked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated verification steps. I will address this is the future PR.

Copy link
Member

@eguzki eguzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is mergeable.

I left a couple of comments that are just nitpicks.

@kevprice83
Copy link
Member

I think I might need a summary on what is the expected behaviour with the different combinations here to understand how this works. I am not sure I get this to be honest which is probably expected but then I would expect the README to describe in more detail so customers and Support can understand better.

No need to say a single word more about this. If you do not understand, nobody else will. We definitely need to update request_unbuffered README with some doc highlighting different scenarios.

The big value IMO of the new request_unbuffered policy is that it provides a consistent behavior across multiple scenarios like

- APIcast <> upstream HTTP 1.1 plain
- APIcast <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream TLS
- APIcast <> HTTP Proxy (policy) <> upstream TLS
- APIcast <> HTTP Proxy (camel proxy) <> upstream TLS
- APIcast <> HTTP Proxy (env var) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (policy) <> upstream HTTP 1.1 plain
- APIcast <> HTTP Proxy (camel proxy) <> upstream HTTP 1.1 plain

The behavior being described by nginx doc when proxy_request_buffering is off.

When buffering is disabled, the request body is sent to the proxied server immediately as it is received. 

proxied server here being either configured proxy or upstream (which can in turn also be a proxy).

Whereas when this policy is not in the chain, the behavior (for all the scenarios above) is the one described for proxy_request_buffering is on. I quote here

When buffering is enabled, the entire request body is [read](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size) from the client before sending the request to a proxied server.

There are caveats and corner cases like when the request is application/x-www-form-urlencoded in which case, APIcast will always read the entire body, thus, buffer the request regardless of the request buffering policy

Furthermore, the behavior is also consistent regardless of the transfer encoding used. Either requests with known request body length with the "content-length" header or chunked requests, request buffering (or not) semantics will apply.

This is exactly what we need. Unfortunately the way the integration tests are written makes that hard to understand, not sure if they can be worded or written in another way but honestly the explanation above should be enough for most situations I think. It's perfect.

Copy link
Member

@kevprice83 kevprice83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me now especially with the new README. I'd still like to see the unit tests in there but it's not urgent.

Another question I had to ensure my understanding:

So when an HTTPS proxy is used and TE: chunked and body size is greater than the body_buffer_size then APIcast will buffer, decode then re-encode the request so that the upstream will still receive the originally TE: chunked request format right?

@eguzki
Copy link
Member

eguzki commented Jan 22, 2024

So when an HTTPS proxy is used and TE: chunked and body size is greater than the body_buffer_size then APIcast will buffer, decode then re-encode the request so that the upstream will still receive the originally TE: chunked request format right?

No need to twist it too much. It is much simpler. When an HTTPS proxy is used and TE;

  • with unbuffered_request policy in place (i.e. proxy_request_buffering off): connection with upstream will be created upon request headers are received. Request headers and transfer encoding will be propagated as it is regardless of the body size. We might change this propagation in the future as there is no obligation to keep the encoding. Same as nginx does, we might implement that for small request bodies TE chunked is replaced by Content-Length and body is decoded. For big bodies, or slow connections (request bodies being slow to reach APIcast), APIcast will keep the encoding as it is obliged to open connection and propagate headers as soon as downstream request is received (as proxy_request_buffering off dictates)

NOTE: Internally, but the customers do not need to know this, for TE chunked, APIcast willl decode the encoding and re-encode again to keep the same encoding. And this is done in streaming mode, chunk by chunk as they are read from the socket. This is implementation detail and could be improved in the future. It happens that we implemented this decode <-> re-encode because the socket reading library decodes it and we cannot disable that.

  • with missing unbuffered_request policy (i.e. proxy_request_buffering on): The request body will be read (buffered) by APIcast. TE header will be replaced by the "Content-Length" header for the upstream connection. Only when the entire body has been read by APIcast, APIcast will initiate the connection with upstream sending the request with Content-Length header and removing the chunked encoding.

NOTE: Internally, when the body is bigger than the client_boby_buffer_size, APIcast willl write the body in a file after removing the chunked encoding and the upstream http client will receive a body reader which is an iterator reading from the file.

@eguzki
Copy link
Member

eguzki commented Jan 22, 2024

@dfennessy your approval is required as you requested changes.

Copy link
Contributor

@dfennessy dfennessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tkan145 tkan145 merged commit c38418c into 3scale:master Jan 22, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants