Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Identity Management Overhaul #5756

Closed
JoshVanL opened this issue Jan 13, 2023 · 18 comments
Closed

[PROPOSAL] Identity Management Overhaul #5756

JoshVanL opened this issue Jan 13, 2023 · 18 comments

Comments

@JoshVanL
Copy link
Contributor

JoshVanL commented Jan 13, 2023

PRs

/area runtime
/area operator
/area placement

This proposal describes a strategy for overhauling the identity story in Dapr. It involves several distinct points of interest, but have been put into the same proposal as they all serve the same goal and together will have a greater sum than their parts.

Motivation

In Kubernetes mode, Dapr is made up from a set of privileged control plane components (sentry, operator, injector, dashboard, placement (also self hosted)), and lower privileged workload daprd sidecars. In order for Dapr to maintain a strong security posture to ensure malicious, or otherwise, actors cannot read/write/get/post data it should not have access to, Dapr must have a strong authentication and authorization story. Today, Dapr uses the sentry control plane component as an identity provider and certificate authority to the rest of the Dapr components and workloads.

I've identified a number of things Dapr can improve which should give us:

  • Ensure mTLS cannot be compromised,
  • Locked down and Principle Of Least Privilege (POLP) prevents bad actors accessing data they shouldn’t,
  • Stronger identity improves reliability of metrics/tracing,
  • Stronger identity allows for developing richer authorization features in future,
  • The current identity management is inherently weak so no reliance on workload identity can be made.

Goals

  • Overhaul identity management in Dapr so workload identities are secure and mTLS between dapr components can be trusted.
  • Upgrades to this new identity system should be seamless for users, with no manual intervention required.

Non-Goals

  • Implement a mechanism for an external CA system (though don't write the new system in a way which wouldn't allow this to be added in future).

Current Shortfalls

Dapr currently has a number of shortcomings which prevent it from having a strong identity story, and undermines the mTLS of Dapr components.

1. Issuer Key Sharing

The dapr sidecar injector currently adds the Sentry issuing certificate and private key to Dapr workloads as environment variables. This means that all Dapr sidecars have access to the issuer private key and so can mint any certificate identity they want, allowing workloads to impersonate anyone (including the control plane). This is perhaps the most critical problem to fix.

2. Broken CA Auth

Sentry is the component that acts as the Certificate Authority, and is responsible for authenticating dapr workloads and minting them appropriate X.509 identities. This flow follows a common pattern whereby workloads send an audience scoped Kubernetes ServiceAccount token which Sentry verifies against the Kubernetes API server.

Reviewing the sentry flow:

  1. Sentry starts and generates the root CA anchor private key and self signed cert.
  2. Sentry signs an issuer cert with new issuer key. Root CA anchor key is thrown away.
  3. Root CA cert, issuer cert and pk stored in Secret
  4. Pod injector injects root CA cert into dapr sidecar env vars
  5. Daprds request identity certs from sentry using the root CA trust anchor for TLS, and a Kubernetes token which comes from a projected volume.
  6. Sentry validates incoming CSRs by doing a token review against the Kube API.

The current implementation does not currently validate that the ServiceAccount associated with the reviewed token matches the identity requested by the client, and will instead validate the App ID which the client can set any value to.

err = s.validator.Validate(req.GetId(), req.GetToken(), req.GetNamespace())

func (c *defaultCA) ValidateCSR(csr *x509.CertificateRequest) error {

func (v *validator) Validate(id, token, namespace string) error {

identity := identity.NewBundle(csr.Subject.CommonName, req.GetNamespace(), req.GetTrustDomain())

This means that clients have unbounded impersonation (i.e. Alice and request a cert containing Bob).

3. Control Plane components uses same ServiceAccount

Currently (bar the dashboard), all control plane components use the same ServiceAccount. This means that not only does each component share the same Kubernetes identity, they all share the same RBAC, violating POLP.

4. Injector uses separate trust anchor for MutatingWebhook

A lesser problem in Dapr is that the sidecar injector currently uses a separate root of trust for serving the MutatingWebhook server, with the certificate being signed and managed through Helm. This is somewhat disjointed as all other certificates are managed and observed through sentry. We basically have 2 systems of identity management.

Solutions

a) Removing Issuer certificate key pair from workloads

Change the injector to no longer set the DAPR_CERT_CHAIN and DAPR_CERT_KEY environment variables.

Deprecate the flags --issuer-key-secret-key and --issuer-certificate-secret-key on daprd and all control plane components except the sentry component. These flags can still be set for backwards compatibility however functionally do nothing. If set in future versions, a warning log output should describe that these flags no longer do anything.

Add the --sentry-trust-domain flag which will be used by sentry clients to validate sentry connections. This flag gets set by the injector for daprd workloads.

Refactor the runtime so that all components request their identity from sentry on boot.

b) App ID

App ID is a string identifier that is used for service invocation/communicating with other dapr workloads, and scoping access.

In Kubernetes, workloads can set their own App ID (via the dapr.io/app-id annotation), or will it will be defaulted to the Pods name. This App ID has no relation to the Pod's ServiceAccount which results in sentry having no way of validating the authenticity of incoming App identity requests. Some possible solutions to this:

  • Give sentry the ability to get Pods. Sentry will validate the incoming App ID request with the annotation which is set on the Pod (and defaulting to the Pods name if it does not exist). Sentry would then issue identities in some variation of spiffe://<trust domain>/ns/<namespace>/appid/<app ID name>. We should consider sentry GETting Pods as an expensive operation and would slow down init time of dapr workloads.
  • Drop the ability for workloads to self select their APP ID name, in favor of enforcing they be identified by their ServiceAccount. This would eliminate the overhead of GETting pods, however would result in a breaking change to the Dapr API and highly likely disruptive to existing code paths/tooling. Going forward with the first option.

Both solutions here are not great.

c) Separate Service Acounts for control plane components.

Each dapr component should be given their own distinct Service Account. All but the Sentry component should request their identity certificate from Sentry. This is also a good opportunity to review the existing RBAC and ensure POLP.

d) Injector uses identity certificate for Mutating Webhook

The injector identity will be special cased so Sentry will add the injector DNS server name to its identity document. This will allow the injector to serve its Dapr identity document to the Kubernetes API server. Change injector (or Sentry) to update its own Mutating Webhook with the CA trust bundle of the Dapr trust domain. Remove webhook cert management from helm.


misc:

@yaron2
Copy link
Member

yaron2 commented Jan 13, 2023

Thanks, this all looks good to me and most solutions (a, c, d) are straightforward to implement.

Re: b, as you correctly mentioned, dropping the ability to select an app-id is disruptive and thus wouldn't be possible as it's very core to Dapr and we wouldn't want to drop custom application identities only to mitigate a Kubernetes API operation (referencing the alternative of the injector getting pods).

Another important thing to remember is that workload identities should work outside of Kubernetes, as Dapr runs in self-hosted mode, so any change needs to be compatible with this requirement. For this (main) reason I think relying on Service Accounts for identities is not the best way to go.

@JoshVanL
Copy link
Contributor Author

JoshVanL commented Jan 13, 2023

Thanks @yaron2, makes sense. I agree that we should stay clear of tying Dapr APP IDs with Service Accounts in favor of getting the ID from the kube API. A new spiffe ID could look something like: spiffe://<trust domain>/ns/<namespace>/appid/<app id>.

I'll update the proposal with these changes.

@yaron2
Copy link
Member

yaron2 commented Jan 13, 2023

Thanks, as discussed - the current SPIFFE ID already contains the app id.

@ItalyPaleAle
Copy link
Contributor

Give sentry the ability to get Pods. Sentry will validate the incoming App ID request with the annotation which is set on the Pod (and defaulting to the Pods name if it does not exist)

Lately we've seen that services that request data about pods have a much increased memory requirement. Users have reported that the Dapr operator can use up to 800MB of memory even without a single Dapr-enabled pod. Since the K8s SDK keep a cache of data in memory, this can mean that sentry would have to cache a lot of data too.

Can I propose a slightly different approach to this?

  1. The Injector, which is responsible for reading the app-id annotation in the admission webhook and thus populate the -app-id CLI flag, would also create a signed token (could be a JWT, but the actual format doesn't matter) that "attests" the app-id value.
  2. The token is signed with a key (perhaps asymmetric would work best here?) that is known to the sentry service.
  3. The token is added to the pod either as a new volume (ConfigMap maybe) or as an env var
  4. When requesting a certificate from sentry, the app presents this token to authenticate the app ID

@JoshVanL
Copy link
Contributor Author

@ItalyPaleAle I don't think the memory consumption is a problem here as we will not be using an informer in sentry, instead we will be using a single GET request to the API server on incoming requests. If using an informer is a requirement, then we could look at using a metadata only watch since we only need the pod metadata- my fear with that though is that sentry will often have an out of date state of the pod or not observe a new pod before the dapr container requests it's identity, forcing a retries and increasing error rates.

I'm also quite against having secrets be readable through the API (env var / ConfigMap).

@ItalyPaleAle
Copy link
Contributor

Ok, if this doesn't cache data locally in the app/SDK, it would probably be OK.

A self-contained token would still be faster however at verification time. It wouldn't be a "secret" but just a bearer token. Issued by the injector (which has the private key), it attests that the app has the app-id it claims to have. The app presents that to sentry and sentry validates it. The token isn't secret because it contains only public claims (it shouldn't be a concern if the app knows its app-id) and using signatures we can guarantee it won't be tampered with. (In my last message I wrote this could be a JWT; I was wrong there on second thought and it could be a JWS but not JWT, which would require certain claims to be present)

@JoshVanL
Copy link
Contributor Author

I'd definitely consider the app ID token to be a secret since it is basically a password- even though it def contains public information, having access to the is token allows actors to request for dapr identities for an App ID which is not itself (impersonation).

@ItalyPaleAle
Copy link
Contributor

That makes sense

@ItalyPaleAle
Copy link
Contributor

@JoshVanL I've been thinking about this more and while you're right, I actually don't think it matters in the sense that it doesn't make a difference.

Your proposal is to have certificates issued if:

  • Sentry is presented with a valid token signed by K8s (in Dapr 1.10 this will be a bound token, but still issue by K8s)
  • Sentry validates the pod the request comes in by querying the K8s APIs

In this case, the relevant thing is still the bound token which is stored in /var/run/secrets/dapr.io/sentrytoken (legacy path is /var/run/secrets/kubernetes.io/serviceaccount/token"). Using bindings.localstorage an application could read that (related issue) and then using bindings.http could invoke the K8s APIs and request a certificate.

If we use a separate JWS issued by the Dapr Injector, for Sentry to issue a token you will need:

  • The K8s bound token (just like in the previous case)
  • The JWS issued by the Injector

We could store the JWS in a path like /var/run/secrets/dapr.io/pod-identity and with dapr/components-contrib#2444 implemented, it should have the same security profile as the other proposal.

The benefits of using a separate, Dapr-issued JWS are:

  • It's faster as the verification happens locally and doesn't increase the load on the K8s API server
  • it's a bit more decoupled from K8s and will be easier in the future to support self-hosted scenarios (maybe - it still depends on the injector)

@JoshVanL
Copy link
Contributor Author

We could store the JWS in a path like /var/run/secrets/dapr.io/pod-identity and with dapr/components-contrib#2444 implemented, it should have the same security profile as the other proposal.

The problem with this is that sentry or injector needs to create that volume. In vanilla Kubernetes, the only real way to do this is by creating ConfigMaps or Secrets. This is problematic because they are accessible through the API and would also mean granting sentry/injector the permissions to access those resources in all namespaces. Not to mention the overhead of ensuring the contents of those volumes is kept up to date and correct- forcing the writer to watch pods.. The creation of these resources is comparable to the GET Pods operation.

The other option is to create a dapr CSI driver which is OTT, even more resource intensive, and comes with its own security model to manage/maintain.


As an aside, it is quite problematic that we can set the App ID as a Pod annotation since annotations are mutable on Pods, meaning that a dapr sidecar can be using an identity document which does not match the k8s API assumption.

@JoshVanL
Copy link
Contributor Author

Another thing of note implementing the above- sentry is also going to have to GET Configurations in order to determine the trust domain of the app.

@ItalyPaleAle
Copy link
Contributor

it is quite problematic that we can set the App ID as a Pod annotation since annotations are mutable on Pods,

This seems another point in the case of using pre-signed token :)

The problem with this is that sentry or injector needs to create that volume.

True, then maybe setting an env var may be better. The env var secret store could be changed to disable access to that variable.

sentry is also going to have to GET Configurations in order to determine the trust domain of the app.

Is the trust domain of the app a new value that's being added to the Dapr Configuration CRD (specifically daprsystem)? If so, does it need to be in the Configuration CRD because you expect that to change frequently?

If you think this should not change frequently, I'd recommend passing it to Sentry as a CLI flag that is set via variables in the Helm chart.

@JoshVanL
Copy link
Contributor Author

Is the trust domain of the app a new value that's being added to the Dapr Configuration CRD (specifically daprsystem)? If so, does it need to be in the Configuration CRD because you expect that to change frequently?

The trust domain already exists in the configuration API.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Jul 25, 2023
@JoshVanL
Copy link
Contributor Author

JoshVanL commented Sep 7, 2023

In v1.11 runtime requires that the certificate, private key, and trust anchor environment variable be available when running in Kubernetes mode and mTLS is enabled. Because of this, it is a requirement that the injector continue to patch a valid daprd certificate chain as environment variables on the Pod spec in order to maintain the n-1 version coexistence guarantee.

This unfortunately means that in v1.12, the injector will continue to inject a valid certificate key pair (including the private key) on the Pod spec. To improve security, rather than the injector patching in the intermediate certificate, it will instead request and inject the identity certificate of the Pod it is patching. In v1.13 we will be able to remove adding certificate environment variables completely, as planned.

To summarize the security posture:

v1.11

  • Any daprd, dapr enabled app, and any user/entity which is able to get pods is able to impersonate/sign for any dapr identity they want.
  • All control plane services can impersonate/sign for any dapr identity they want.

v1.12

  • daprd, dapr enabled apps and control plane services can only communicate with their given dapr identities
  • The injector service is able to request for any app identity
  • Any user/entity is able to impersonate any app which it has get pods permissions for

v1.13

  • Final state of this issue- no private key material is discoverable & no entity can be impersonated.

@dapr-bot
Copy link
Collaborator

dapr-bot commented Nov 6, 2023

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@mukundansundar
Copy link
Contributor

moving tracking issue to 1.14 for there are still open issues/PRs linked to it.

@mukundansundar mukundansundar modified the milestones: v1.13, v1.14 Feb 8, 2024
@JoshVanL
Copy link
Contributor Author

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants