Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubble: add option to sanitize sensitive L7 data from flows #23887

Closed
4 of 5 tasks
rolinh opened this issue Feb 20, 2023 · 18 comments · Fixed by #28848
Closed
4 of 5 tasks

Hubble: add option to sanitize sensitive L7 data from flows #23887

rolinh opened this issue Feb 20, 2023 · 18 comments · Fixed by #28848
Assignees
Labels
kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/hubble Impacts hubble server or relay

Comments

@rolinh
Copy link
Member

rolinh commented Feb 20, 2023

Hubble has the capability of providing visibility on L7 protocols such as HTTP or Kafka. This layer 7 protocol visibility feature is opt-in and requires users to either create a L7 policy or to add explicit pod annotations to be enabled. Layer 7 Hubble flows, however, may contain sensitive information, for instance as part of some HTTP headers or in a URL itself.

Hubble should provide an option for users to decide which potentially sensitive L7 data to keep in Hubble flows and it should be finely configurable.

@rolinh rolinh added kind/feature This introduces new functionality. sig/hubble Impacts hubble server or relay labels Feb 20, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 22, 2023
@rolinh rolinh added pinned These issues are not marked stale by our issue bot. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Apr 24, 2023
@aanm aanm added the sig/agent Cilium agent related. label Apr 25, 2023
@ioandr
Copy link
Contributor

ioandr commented May 27, 2023

Hi @rolinh,

we would like to take a stab on this with @ChrsMark, maybe we could split the items you listed.

Looking at the codebase we understand that we need to extend Hubble's Layer 7 parser and implement the described logic there. This is similar to how Hubble's Layer 7 parser already redacts the user's password to not include it in the network flow. Does this hold?

Thanks!

@rolinh
Copy link
Member Author

rolinh commented Jun 5, 2023

Hi @ioandr,

we would like to take a stab on this with @ChrsMark, maybe we could split the items you listed.

Nice! I'm happy to create sub-issues if that helps but we can otherwise simply add your github handle next to each of the task.

Looking at the codebase we understand that we need to extend Hubble's Layer 7 parser and implement the described logic there. This is similar to how Hubble's Layer 7 parser already redacts the user's password to not include it in the network flow. Does this hold?

Yes, that's the way to do it indeed.

Thanks for your contributions!

@ioandr
Copy link
Contributor

ioandr commented Jun 21, 2023

Hi @rolinh

we would like to continue working on the rest of the items listed in #23887 (comment) so that we can deliver the whole L7 redaction/sanitization feature in Hubble.

At your convenience please advise on the following items to make sure we are on the same page and we start with the implementation:

  1. Redact HTTP password: in Hubble: improve security by adding an option to redact API key in Kafka requests (L7) #25844 (comment) @kaworu suggested that redaction of HTTP passwords should be configurable. This makes sense and seems to be aligned with what we did for Kafka API key and URL query parameters. As @ChrsMark suggested in Hubble: improve security by adding an option to redact API key in Kafka requests (L7) #25844 (comment) we can configure Hubble to redact HTTP passwords by default - if you agree we add an extra item in the description of the current issue for this as it came up as a follow-up.
  2. Sanitize user info part in URL: we interpret sanitize here as:
    • username: ensure lowercase, check for disallowed/suspicious special characters (https://xkcd.com/327/ 😬)
    • password: ensure minimum length, check for standard values that one should avoid (e.g., "root", "admin", "12345", etc.)
  3. Sanitize HTTP headers: we understand that Hubble should maintain a list of "trusted" HTTP headers that it does not need to sanitize. Then. Hubble should sanitize the value of any other header it captures in L7 flows, leaving its key unchecked/untouched. Similar to 2, we interpret sanitize here as:
    • HTTP header value: check for max length, check for disallowed/suspicious special characters

Maybe we could keep sanitization logic in a single place and reuse it from multiple places. Other than the above, it seems that Envoy itself provides a list of HTTP headers that it potentially sanitizes:

https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/header_sanitizing

Maybe we could get some inspiration from this, given that Cilium itself ships with Envoy.

We are looking forward to your thoughts on the above - once again, we can split tasks with @ChrsMark

@kaworu
Copy link
Member

kaworu commented Jun 30, 2023

Thanks @ioandr for the detailed comment!

  1. Redact HTTP password: in Hubble: improve security by adding an option to redact API key in Kafka requests (L7) #25844 (comment) @kaworu suggested that redaction of HTTP passwords should be configurable. This makes sense and seems to be aligned with what we did for Kafka API key and URL query parameters. As @ChrsMark suggested in Hubble: improve security by adding an option to redact API key in Kafka requests (L7) #25844 (comment) we can configure Hubble to redact HTTP passwords by default - if you agree we add an extra item in the description of the current issue for this as it came up as a follow-up.

Sounds good to me but again, I'd like some consistency around defaults. In my view, both Kafka API keys and HTTP passwords are secrets (in every context) and we should apply the same default redact policy. Redacting Kafka API keys by default would be the less intrusive option as changing the default for HTTP password would be a kind of "breaking change" and has security implications.

Agree that the issue description is lacking in terms of what "sanitize" means concretely. In my opinion: because at the Hubble level we don't know in which context theses value are going to be used we shouldn't modify them. It's up to consumers inserting into a SQL database to take the usual precautions about theses tainted values, so should do consumers generating HTML, printing to a Terminal, etc. cc @rolinh to clarify the intent.

[…] t seems that Envoy itself provides a list of HTTP headers that it potentially sanitizes:

https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/header_sanitizing

While the list is interesting, I think it covers a different use-case. In a reverse-proxy setup (edge node / front-proxy in Envoy documentation) it is critical to "sanitize" some headers (e.g. x-forwarded-for) while passing other as-is (e.g. authentication). In the Hubble case we should do nothing about the former but care about the latter (since it can include sensitive information, e.g. JWT or user/password credentials).

@ioandr
Copy link
Contributor

ioandr commented Jun 30, 2023

Sounds good to me but again, I'd like some consistency around defaults. In my view, both Kafka API keys and HTTP passwords are secrets (in every context) and we should apply the same default redact policy. Redacting Kafka API keys by default would be the less intrusive option as changing the default for HTTP password would be a kind of "breaking change" and has security implications.

With @ChrsMark we agree on this, so we can do the following:

  1. Expose a new option to redact HTTP password and enable it by default
  2. Modify the existing option to redact Kafka API key to enable it by default
  3. Leave the existing option to redact URL query params as is (disabled by default)

I can send a small, dedicated PR for 1 and 2 listed above.

Agree that the issue description is lacking in terms of what "sanitize" means concretely.

While the list is interesting, I think it covers a different use-case

Regarding sanitization let's wait for some feedback by @rolinh who originally opened the issue to decide how we will move forward.

@rolinh
Copy link
Member Author

rolinh commented Jul 6, 2023

Sorry @ioandr, @ChrsMark and @kaworu, I missed the updates on this issue. Don't hesitate to ping me on Slack if I'm unresponsive as it's sometimes hard to keep up with everything going on 🙂

at the Hubble level we don't know in which context theses value are going to be used we shouldn't modify them.

Fully agree with this. Without context, it's impossible to do the correct job. Sanitization in this context means, for example, given a list of allowed HTTP headers, leave these untouched and remove all others.

Sanitize user info part in URL: we interpret sanitize here as:

username: ensure lowercase, check for disallowed/suspicious special characters (https://xkcd.com/327/ grimacing)
password: ensure minimum length, check for standard values that one should avoid (e.g., "root", "admin", "12345", etc.)

I should have used the term "redact" rather than sanitize, sorry about this. I don't think it's on Hubble to take the call and decide what is a strong password, etc. What Hubble should do is simply to replace usernames and passwords with HUBBLE_REDACTED or similar when the option is enabled.

Sanitize HTTP headers: we understand that Hubble should maintain a list of "trusted" HTTP headers that it does not need to sanitize. Then. Hubble should sanitize the value of any other header it captures in L7 flows, leaving its key unchecked/untouched. Similar to 2, we interpret sanitize here as:

HTTP header value: check for max length, check for disallowed/suspicious special characters

Same answer here. We can provide a list of default allowed headers (e.g. traceparent, tracestate, Cache-Control, Content-Encoding, Accept, ...) and when the option is enabled, every header not in the list has its value HUBBLE_REDACTED.

With @ChrsMark we agree on this, so we can do the following:

  1. Expose a new option to redact HTTP password and enable it by default
  2. Modify the existing option to redact Kafka API key to enable it by default
  3. Leave the existing option to redact URL query params as is (disabled by default)

Sounds like a good plan to me and doing 1. will make the behavior consistent.

I can send a small, dedicated PR for 1 and 2 listed above.

🚀

@ChrsMark
Copy link
Contributor

ChrsMark commented Jul 6, 2023

Thanks @rolinh! Makes sense to me.

Regarding the http-headers' redaction: how about instead of defining the allowed headers, to define the list of "sensitive" headers to be redacted (optionally)?

The following headers could be those we can start with and ofc we can extend:

  • Authorization
  • Proxy-Authorization
  • Cookie
  • Set-Cookie
  • X-Vault-Token
  • X-Auth-Token
  • X-Forwarded-Client-Cert

@rolinh
Copy link
Member Author

rolinh commented Jul 6, 2023

Regarding the http-headers' redaction: how about instead of defining the allowed headers, to define the list of "sensitive" headers to be redacted (optionally)?

I'd strongly prefer to have users to provide a list of allowed headers rather than a list of headers to redact. The reason being that there could be a lot of custom headers and what not that aren't covered and it would give a false sense of security ("secrets are sanitized, it's safe to share the L7 flows"). I think it's simply not possible to be exhaustive. On the other hand, if a header not on the allow-list is redacted, it's easy to add it to the list. This implies that we need to keep all header keys and only redact header values which seems totally fine to me.

@chancez
Copy link
Contributor

chancez commented Jul 10, 2023

@rolinh @ChrsMark it seems to me that given the variance in headers, we shouldn't be hardcoding headers to keep/redact, instead it should be user supplied. So rather than a boolean option, as it is in #26724 , it would be a list of headers to either redact/keep.

@rolinh
Copy link
Member Author

rolinh commented Jul 12, 2023

we shouldn't be hardcoding headers to keep/redact, instead it should be user supplied

Definitely.

In the values file, I picture something like the following:

hubble:
  redact:
    enabled: false
    http:
      userInfo: true
      urlQuery: true
      headers:
        allow:
          - traceparent
          - tracestate
          - Cache-Control
          - ...
    kafka:
      apiKey: true

So setting hubble.redact.enabled=true would apply the config that follows. We could even think of adding support for hubble.redact.http.headers.deny with a deny approach rather than an allow approach. I think this approach would provide the best flexibility for the user yet with reasonable default, all the user has to do is to set hubble.redact.enabled=true.

@ChrsMark
Copy link
Contributor

Thanks for the feedback @chancez and @rolinh!

Just to verify what was suggested at #23887 (comment), would that mean that we need to change the format of the hubble-redact flags/settings?

So far we have the following:

--hubble-redact=kafa-api-key
--hubble-redact=http-basic-auth-password
--hubble-redact=http-url-query

or in terms of values as

hubble:
 redact:
  - http-url-query
  - kafka-api-key

So would your proposal @rolinh mean that we need to change the schema to sth like the following:

--hubble-redact=true
--hubble-redact-http-user-info=true
--hubble-redact-http-url-query=true
--hubble-redact-http-headers-allow=foo,bar,baz
--hubble-redact-kafka-api-key=true

@chancez
Copy link
Contributor

chancez commented Jul 12, 2023

@ChrsMark Correct, that looks reasonable.

@rolinh
Copy link
Member Author

rolinh commented Jul 21, 2023

@ChrsMark Given that #25746 made it to the v1.14 branch but #25844 did not and that we have plan to rework the values file, I suggest to pull #25746 out of the v1.14 branch. I believe this would provide a better experience than shipping only a portion of this meta issue. If we ship in v1.14 and then change the values options, we'll also have to deprecate in the next version which mean the whole improvement would land in v1.16 only.

@ChrsMark
Copy link
Contributor

Yes @rolinh let's do it like this and revert #25844 from v1.14 branch.

@ChrsMark
Copy link
Contributor

Hey folks. I have filed #26989 to implement the discussed changes.

For now, only the http-query and kafka-api-key options will be supported.
I suggest that once we finalize and merge #26989 we will update #26724 to implement the http-headers part and @ioandr can add the http-userInfo part in a different PR. The default values' logic will be discussed separately in these follow-up PRs.

@rolinh @chancez let me know what you think :).

@ChrsMark
Copy link
Contributor

A heads-up on this one.

Now that #27553 has been merged, we can resume the work on the following pending tasks:

  1. We'll be adding an option to configure the redaction of HTTP passwords, as discussed in more detail here: Hubble: add option to sanitize sensitive L7 data from flows #23887 (comment). -> @ioandr.

  2. The default values of all options related to redaction will be revisited. The aim is to simplify the process by making redact.enabled=true sufficient for users who don't require specific fine-tuning. This discussion took place in the comments here: Hubble: add option to sanitize sensitive L7 data from flows #23887 (comment) and Refactor hubble redact settings schema [v2] #27553 (comment). -> @ioandr

  3. Resume the work on the http headers redact Add option to redact http headers #26724: -> @ChrsMark

@ioandr and everyone feel free to add anything that I might miss here.

@ioandr
Copy link
Contributor

ioandr commented Oct 25, 2023

Hey there, it seems that the only pending item to close this is to add business logic to control whether we redact the user info part from URLs, if present.

I have started working on this however I bumped into some unexpected behavior that I describe in #28798. As soon as we build context around it and have a way forward I should be able to proceed and file a PR to:

Expose a new option to redact HTTP password and enable it by default

CC @rolinh @ChrsMark

ioandr added a commit to ioandr/cilium that referenced this issue Oct 27, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting the default value to true.
* Add unit tests for redacting user info in L7 parser
* Update the `visibility.rst` document
* Update Helm chart templates, values and docs

Finally, ensure that values are redacted both in L7 HTTP flows and
corresponding (HTTP) summaries.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
ioandr added a commit to ioandr/cilium that referenced this issue Oct 27, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting the default value to true.
* Add unit tests for redacting user info in L7 parser
* Update the `visibility.rst` document
* Update Helm chart templates, values and docs

Finally, ensure that values are redacted both in L7 HTTP flows and
corresponding (HTTP) summaries.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
ioandr added a commit to ioandr/cilium that referenced this issue Oct 27, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting it to true by default.
* Add unit tests to verify that password in observed URL is redacted.
* Fix issue in L7 HTTP parser where sensitive values were redacted in
  (L7) HTTP flows, but not in (L7) HTTP summaries.
* Update documentation as needed.
* Update Helm chart templates, values and docs as needed.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
ioandr added a commit to ioandr/cilium that referenced this issue Oct 27, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting it to true by default.
* Add unit tests to verify that password in observed URL is redacted.
* Fix issue in L7 HTTP parser where sensitive values were redacted in
  (L7) HTTP flows, but not in (L7) HTTP summaries.
* Update documentation as needed.
* Update Helm chart templates, values and docs as needed.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
ioandr added a commit to ioandr/cilium that referenced this issue Oct 31, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting it to true by default.
* Add unit tests to verify that password in observed URL is redacted.
* Fix issue in L7 HTTP parser where sensitive values were redacted in
  (L7) HTTP flows, but not in (L7) HTTP summaries.
* Update documentation as needed.
* Update Helm chart templates, values and docs as needed.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
aanm pushed a commit that referenced this issue Nov 3, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting it to true by default.
* Add unit tests to verify that password in observed URL is redacted.
* Fix issue in L7 HTTP parser where sensitive values were redacted in
  (L7) HTTP flows, but not in (L7) HTTP summaries.
* Update documentation as needed.
* Update Helm chart templates, values and docs as needed.

Closes #23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
pjablonski123 pushed a commit to pjablonski123/cilium that referenced this issue Dec 15, 2023
Add business logic to L7 HTTP parser to conditionally redact sensitive
user info (e.g., password used in basic authentication) when present
in observed URLs.

* Add the '--hubble-redact-http-userinfo' option to the Cilium CLI.
  Preserve existing functionality by setting it to true by default.
* Add unit tests to verify that password in observed URL is redacted.
* Fix issue in L7 HTTP parser where sensitive values were redacted in
  (L7) HTTP flows, but not in (L7) HTTP summaries.
* Update documentation as needed.
* Update Helm chart templates, values and docs as needed.

Closes cilium#23887

Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants