Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP: HTTP request and response size in HTTP L7 flows/metrics #21253

Open
chancez opened this issue Sep 8, 2022 · 8 comments
Open

CFP: HTTP request and response size in HTTP L7 flows/metrics #21253

chancez opened this issue Sep 8, 2022 · 8 comments
Labels
kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/hubble Impacts hubble server or relay

Comments

@chancez
Copy link
Contributor

chancez commented Sep 8, 2022

Cilium Feature Proposal

Is your feature request related to a problem?

No

Describe the feature you'd like

Same idea as #21252 but for HTTP.

Knowing how big a request or response is can be very useful when debugging issues with web services. A large request could cause a service to timeout, and a large response could cause a client to timeout, or simply increase the tail latency of the service when looking at request durations.

(Optional) Describe your proposed solution

Add the request/response size fields to the L7 HTTP protobufs and configure cilium proxy (envoy) to send this information back to cilium/hubble so it can be associated with the Flow.

Once this information is in the flow, I can easily make a PR to add metrics for this. The metrics would be a histogram or summary so we can track the distribution of the request/response sizes.

@chancez chancez added the kind/feature This introduces new functionality. label Sep 8, 2022
@jrajahalme
Copy link
Member

Would value of the content-length header be enough? If so we might already have it in the logged headers. This is bit harder for chunked encoding and if you'd want the size of the headers to be included. Currently we emit access logs when headers have been processed, streaming the body right after.

@chancez
Copy link
Contributor Author

chancez commented Sep 14, 2022

@jrajahalme I thought of that, but it can be incorrect. A client can send a large body and set the Content-Length of the request to a small value, similarly a server can do the same but for a response. Most well behaved clients and servers shouldn't do this, but it's possible, and can lead to misleading data if dealing with less-trusted clients, or poorly implemented clients/servers.

@jrajahalme
Copy link
Member

OK, so the request here is to actually count the bytes of the payload and delay access logging until the payload is done and report the actual payload length with the log record? What if the payload is streamed and it never ends?

@jrajahalme
Copy link
Member

The stated main benefit is debugging, and for that IMO the value of content length header is likely enough. I would start with that and do more only if/when evidence for the need to dig deeper surfaces.

@chancez
Copy link
Contributor Author

chancez commented Sep 14, 2022

@jrajahalme I think the problem is that the request/response flow model is mis-aligned with how requests actually work. In Istio/Envoy they have metrics for TCP natively, and I believe it's just incrementing a counter each time a packet flows through envoy. In cilium + envoy, we're sending request/response access logs and calculating metrics based on that, but that doesn't work as well for streaming use-cases. Perhaps we should have a new access log type for in-progress (streaming) requests.

@aanm aanm added the sig/agent Cilium agent related. label Nov 9, 2022
@github-actions
Copy link

github-actions bot commented Jan 9, 2023

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 9, 2023
@rolinh rolinh added the sig/hubble Impacts hubble server or relay label Jan 9, 2023
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 10, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 12, 2023
@chancez
Copy link
Contributor Author

chancez commented Mar 13, 2023

Not stale. As an aside, I think this would be important to have to be comparable to other service meshes which offer this as a metric already.

@chancez chancez removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 13, 2023
@rolinh rolinh added the pinned These issues are not marked stale by our issue bot. label Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

No branches or pull requests

4 participants