Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka L7 visibility policy causes connection errors due to parser errors #21813

Open
2 tasks done
chancez opened this issue Oct 19, 2022 · 6 comments
Open
2 tasks done
Labels
help-wanted Please volunteer for this by adding yourself as an assignee! kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@chancez
Copy link
Contributor

chancez commented Oct 19, 2022

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When using Kafka L7 visibility policies, I'm seeing my Kafka consumers get disconnected roughly every 30-60 seconds. I enabled debug logging and verbose envoy logs and was able to find the error related to the disconnect, it seems the Kafka parser is encountering an error and closing the connection as a result.

Cilium Version

I'm using a recent master build, but this is present in all versions I've tried.

Kernel Version

Linux lima-docker 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 12:56:31 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-09-01T23:50:12Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"linux/arm64"}

Sysdump

cilium-sysdump-20221019-110146.zip

Relevant log output

level=debug msg="time=\"2022-10-19T17:59:56Z\" level=debug msg=\"Ignoring Kafka message apiKey=8,apiVersion=5,len=156: null due to parse error\" error=\"invalid array length\"" subsys=envoy-filter threadID=638
level=debug msg="time=\"2022-10-19T17:59:56Z\" level=warning msg=\"Unable to parse Kafka request; closing Kafka connection\" error=\"invalid array length\"" subsys=envoy-filter threadID=638

Anything else?

I have a demo app to reproduce but it's private. PM me if you need a way to reproduce the error.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@chancez chancez added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Oct 19, 2022
@aanm aanm closed this as completed Oct 19, 2022
@aanm aanm reopened this Oct 19, 2022
@aanm aanm added help-wanted Please volunteer for this by adding yourself as an assignee! sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/agent Cilium agent related. labels Nov 8, 2022
@github-actions
Copy link

github-actions bot commented Jan 8, 2023

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 8, 2023
@chancez chancez added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jan 9, 2023
@daviddyball
Copy link

Can confirm this behaviour on my own Cilium 1.12.7 deployment.

@caiobegotti
Copy link

caiobegotti commented May 19, 2023

Got here through #25489 as I'm seeing a similar issue with Kafka L7 policies with Cilium 1.13.1.

@daviddyball
Copy link

Is this an issue with envoy or with Cilium itself? Do we have to wait for upstream envoy to resolve an issue there, or is it Cilium's use of envoy?

@chancez
Copy link
Contributor Author

chancez commented Jun 26, 2023

@daviddyball The Kafka protocol handling is custom implemented envoy filter, so it's something we need to resolve. I believe this is the current implementation of the filter: https://github.com/cilium/proxy/tree/main/proxylib/kafka

@alexpirtea-flowx
Copy link

Got the same issue on version 1.14.4
Turns out the culprit is the kafka version
Example:
docker.io/wurstmeister/kafka:1.1.0 - works
docker.io/bitnami/kafka:3.4.1-debian-11-r0 - reproduces issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help-wanted Please volunteer for this by adding yourself as an assignee! kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

No branches or pull requests

5 participants