Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepalive issue when mixing http2 (grpc) and http1.1 which are routing to the same upstream ip:port #10757

Closed
1 task done
LeszekBlazewski opened this issue Apr 26, 2023 · 3 comments
Labels
bug core/proxy stale/pending revisit Too old ticket. Closed, but we may revisit later.

Comments

@LeszekBlazewski
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.1.1

Current Behavior

When kong proxies http2(grpc)/http1.1 traffic to a service that provides both grcp and http1.1 functionalities on single port, the underlying keepalive connections in the pool are being reused without taking into account the schema of the traffic that was used previously when opening such connection.

This results in intermittent errors whenever the schemas get mixed.

In kong proxy logs I am seeing the following errors:

2023/04/26 13:49:11 [error] 1131#0: *26615 no connection data found for keepalive http2 connection while sending request to upstream, client: X.X.X.X, server: kong, request: "POST /cluster.SettingsService/Get HTTP/2.0", upstream: "grpc://10.3.100.118:8080", host: "my.private.host.com:443"

and the client which issued the grpc call sees:

FATA[0003] rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); malformed header: missing HTTP content-type

I have checked and I redeployed kong proxy with the following setting: upstream_keepalive_pool_size=0 which disables keepalive on all connections coming from kong proxy to upstreams (other k8s pods). I can confirm that this fully fixes the issue since each connection is opened when new requests come in and there are neither any error logs in kong nor client receiving 502 issues. Of course this solution is not acceptable to me as of now because I would like to keep using keepalive.

The issue has been described in detail here:

and a sample fix for that has been implemented here: apache/apisix#8364.

As in the above issues, nginx itself also does not handle such case with single port handling two traffic schemes properly but since kong overrides the loadbalancing algorithm (if I am not mistaken some parts are included here: https://github.com/Kong/kong/blob/master/kong/init.lua#L1144) I was wondering wether it could be patched in Kong?

I am not familiar with Kong codebase or aware of any implications which would have been caused if the pool which stores those connections would take into account the scheme of the traffic and that would make the issue go away since we wouldn't be mixing protocols anymore.

Let me know if you need more details.

Expected Behavior

When proxing http2(grpc) and http1 traffic to upstream which handles both schemas on single port, the underlying keepalive connections are only reused when the schema that opened the keepalive connection matches the incoming request. ( I am not fully sure if the sentence is accurate but I hope it explains what is expected)

Steps To Reproduce

  1. Deploy a service that handles both grpc and http1.1 traffic on single port.
  2. Deploy kong
  3. Configure in kong 2 upstreams (can be different hosts) with proper configuration where 1 upstreams handles the http traffic and another one handles grpc.
  4. Start issuing requests with both http and grpc to kong which would hit the backend service and start observing kong logs.

Anything else?

In my case I would like to have this fixed for KIC (kong ingress controller) since it's the way I deploy kong currently.

Removing keepalive for the whole traffic between kong and other pods (it's just this one use case I have that does not work) would be a big performance hit, therefore I am not pursuing such solution.

The fix also does not apply to Kong ingress controller because it does not allows injecting such setting for specific upstream.

Right now I am avoiding this issue by not proxying the grpc calls via kong and only sending http1.1 there. The grcp schema is handled by fully separate loadbalancer which is a bit of shame since kong is capable of doing grcp just fine! (and it works great)

@bungle
Copy link
Member

bungle commented Apr 27, 2023

I think the fix should be implemented here like you, @LeszekBlazewski, said:
https://github.com/Kong/kong/blob/master/kong/init.lua#L1135-L1156

@LeszekBlazewski
Copy link
Author

LeszekBlazewski commented Apr 27, 2023

Would a fix for this issue be considered then? If yes, is there anyone who could do that or the issue opener (me) is expected to patch it?

I am asking because, I am not that familiar with lua or the whole kong ecosystem therefore I am not sure if I can provide the proper way of fixing this along with necessary tests to cover the current faulty behaviour. I could try but I am not fully sure I will be able to handle this task.

@StarlightIbuki
Copy link
Contributor

Dear contributor,
We're closing this issue as there hasn't been any update to it for a long time. If the issue is still relevant in the latest version, please feel free to reopen it. We're more than happy to revisit it again. Your contribution is greatly appreciated!
Please have a look at our pledge to the community for more information.
Sincerely,
Kong Gateway Team

@StarlightIbuki StarlightIbuki added the stale/pending revisit Too old ticket. Closed, but we may revisit later. label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug core/proxy stale/pending revisit Too old ticket. Closed, but we may revisit later.
Projects
None yet
Development

No branches or pull requests

4 participants