[PROPOSAL] Identity Management Overhaul #5756

JoshVanL · 2023-01-13T12:21:13Z

PRs

/area runtime
/area operator
/area placement

This proposal describes a strategy for overhauling the identity story in Dapr. It involves several distinct points of interest, but have been put into the same proposal as they all serve the same goal and together will have a greater sum than their parts.

Motivation

In Kubernetes mode, Dapr is made up from a set of privileged control plane components (sentry, operator, injector, dashboard, placement (also self hosted)), and lower privileged workload daprd sidecars. In order for Dapr to maintain a strong security posture to ensure malicious, or otherwise, actors cannot read/write/get/post data it should not have access to, Dapr must have a strong authentication and authorization story. Today, Dapr uses the sentry control plane component as an identity provider and certificate authority to the rest of the Dapr components and workloads.

I've identified a number of things Dapr can improve which should give us:

Ensure mTLS cannot be compromised,
Locked down and Principle Of Least Privilege (POLP) prevents bad actors accessing data they shouldn’t,
Stronger identity improves reliability of metrics/tracing,
Stronger identity allows for developing richer authorization features in future,
The current identity management is inherently weak so no reliance on workload identity can be made.

Goals

Overhaul identity management in Dapr so workload identities are secure and mTLS between dapr components can be trusted.
Upgrades to this new identity system should be seamless for users, with no manual intervention required.

Non-Goals

Implement a mechanism for an external CA system (though don't write the new system in a way which wouldn't allow this to be added in future).

Current Shortfalls

Dapr currently has a number of shortcomings which prevent it from having a strong identity story, and undermines the mTLS of Dapr components.

1. Issuer Key Sharing

The dapr sidecar injector currently adds the Sentry issuing certificate and private key to Dapr workloads as environment variables. This means that all Dapr sidecars have access to the issuer private key and so can mint any certificate identity they want, allowing workloads to impersonate anyone (including the control plane). This is perhaps the most critical problem to fix.

2. Broken CA Auth

Sentry is the component that acts as the Certificate Authority, and is responsible for authenticating dapr workloads and minting them appropriate X.509 identities. This flow follows a common pattern whereby workloads send an audience scoped Kubernetes ServiceAccount token which Sentry verifies against the Kubernetes API server.

Reviewing the sentry flow:

Sentry starts and generates the root CA anchor private key and self signed cert.
Sentry signs an issuer cert with new issuer key. Root CA anchor key is thrown away.
Root CA cert, issuer cert and pk stored in Secret
Pod injector injects root CA cert into dapr sidecar env vars
Daprds request identity certs from sentry using the root CA trust anchor for TLS, and a Kubernetes token which comes from a projected volume.
Sentry validates incoming CSRs by doing a token review against the Kube API.

The current implementation does not currently validate that the ServiceAccount associated with the reviewed token matches the identity requested by the client, and will instead validate the App ID which the client can set any value to.

dapr/pkg/sentry/server/server.go

Line 150 in 9f143d8

err = s.validator.Validate(req.GetId(), req.GetToken(), req.GetNamespace())

dapr/pkg/sentry/ca/certificate_authority.go

Line 125 in 9f143d8

func (c *defaultCA) ValidateCSR(csr *x509.CertificateRequest) error {

dapr/pkg/sentry/identity/kubernetes/validator.go

Line 40 in 9f143d8

func (v *validator) Validate(id, token, namespace string) error {

dapr/pkg/sentry/server/server.go

Line 158 in 1c95ad1

    
           identity := identity.NewBundle(csr.Subject.CommonName, req.GetNamespace(), req.GetTrustDomain())

This means that clients have unbounded impersonation (i.e. Alice and request a cert containing Bob).

3. Control Plane components uses same ServiceAccount

Currently (bar the dashboard), all control plane components use the same ServiceAccount. This means that not only does each component share the same Kubernetes identity, they all share the same RBAC, violating POLP.

4. Injector uses separate trust anchor for MutatingWebhook

A lesser problem in Dapr is that the sidecar injector currently uses a separate root of trust for serving the MutatingWebhook server, with the certificate being signed and managed through Helm. This is somewhat disjointed as all other certificates are managed and observed through sentry. We basically have 2 systems of identity management.

Solutions

a) Removing Issuer certificate key pair from workloads

Change the injector to no longer set the DAPR_CERT_CHAIN and DAPR_CERT_KEY environment variables.

Deprecate the flags --issuer-key-secret-key and --issuer-certificate-secret-key on daprd and all control plane components except the sentry component. These flags can still be set for backwards compatibility however functionally do nothing. If set in future versions, a warning log output should describe that these flags no longer do anything.

Add the --sentry-trust-domain flag which will be used by sentry clients to validate sentry connections. This flag gets set by the injector for daprd workloads.

Refactor the runtime so that all components request their identity from sentry on boot.

b) App ID

App ID is a string identifier that is used for service invocation/communicating with other dapr workloads, and scoping access.

In Kubernetes, workloads can set their own App ID (via the dapr.io/app-id annotation), or will it will be defaulted to the Pods name. This App ID has no relation to the Pod's ServiceAccount which results in sentry having no way of validating the authenticity of incoming App identity requests. Some possible solutions to this:

Give sentry the ability to get Pods. Sentry will validate the incoming App ID request with the annotation which is set on the Pod (and defaulting to the Pods name if it does not exist). Sentry would then issue identities in some variation of spiffe://<trust domain>/ns/<namespace>/appid/<app ID name>. We should consider sentry GETting Pods as an expensive operation and would slow down init time of dapr workloads.
Drop the ability for workloads to self select their APP ID name, in favor of enforcing they be identified by their ServiceAccount. This would eliminate the overhead of GETting pods, however would result in a breaking change to the Dapr API and highly likely disruptive to existing code paths/tooling. Going forward with the first option.

Both solutions here are not great.

c) Separate Service Acounts for control plane components.

Each dapr component should be given their own distinct Service Account. All but the Sentry component should request their identity certificate from Sentry. This is also a good opportunity to review the existing RBAC and ensure POLP.

d) Injector uses identity certificate for Mutating Webhook

The injector identity will be special cased so Sentry will add the injector DNS server name to its identity document. This will allow the injector to serve its Dapr identity document to the Kubernetes API server. Change injector (or Sentry) to update its own Mutating Webhook with the CA trust bundle of the Dapr trust domain. Remove webhook cert management from helm.

misc:

Make use of the upstream go-spiffe Go SDK
PoC
Create a centralized security package which handles TLS middleware/auth for all server/clients in Dapr

The text was updated successfully, but these errors were encountered:

yaron2 · 2023-01-13T15:44:28Z

Thanks, this all looks good to me and most solutions (a, c, d) are straightforward to implement.

Re: b, as you correctly mentioned, dropping the ability to select an app-id is disruptive and thus wouldn't be possible as it's very core to Dapr and we wouldn't want to drop custom application identities only to mitigate a Kubernetes API operation (referencing the alternative of the injector getting pods).

Another important thing to remember is that workload identities should work outside of Kubernetes, as Dapr runs in self-hosted mode, so any change needs to be compatible with this requirement. For this (main) reason I think relying on Service Accounts for identities is not the best way to go.

JoshVanL · 2023-01-13T16:00:26Z

Thanks @yaron2, makes sense. I agree that we should stay clear of tying Dapr APP IDs with Service Accounts in favor of getting the ID from the kube API. A new spiffe ID could look something like: spiffe://<trust domain>/ns/<namespace>/appid/<app id>.

I'll update the proposal with these changes.

yaron2 · 2023-01-13T16:22:13Z

Thanks, as discussed - the current SPIFFE ID already contains the app id.

ItalyPaleAle · 2023-01-16T17:34:33Z

Give sentry the ability to get Pods. Sentry will validate the incoming App ID request with the annotation which is set on the Pod (and defaulting to the Pods name if it does not exist)

Lately we've seen that services that request data about pods have a much increased memory requirement. Users have reported that the Dapr operator can use up to 800MB of memory even without a single Dapr-enabled pod. Since the K8s SDK keep a cache of data in memory, this can mean that sentry would have to cache a lot of data too.

Can I propose a slightly different approach to this?

The Injector, which is responsible for reading the app-id annotation in the admission webhook and thus populate the -app-id CLI flag, would also create a signed token (could be a JWT, but the actual format doesn't matter) that "attests" the app-id value.
The token is signed with a key (perhaps asymmetric would work best here?) that is known to the sentry service.
The token is added to the pod either as a new volume (ConfigMap maybe) or as an env var
When requesting a certificate from sentry, the app presents this token to authenticate the app ID

JoshVanL · 2023-01-17T13:17:52Z

@ItalyPaleAle I don't think the memory consumption is a problem here as we will not be using an informer in sentry, instead we will be using a single GET request to the API server on incoming requests. If using an informer is a requirement, then we could look at using a metadata only watch since we only need the pod metadata- my fear with that though is that sentry will often have an out of date state of the pod or not observe a new pod before the dapr container requests it's identity, forcing a retries and increasing error rates.

I'm also quite against having secrets be readable through the API (env var / ConfigMap).

ItalyPaleAle · 2023-01-17T15:50:39Z

Ok, if this doesn't cache data locally in the app/SDK, it would probably be OK.

A self-contained token would still be faster however at verification time. It wouldn't be a "secret" but just a bearer token. Issued by the injector (which has the private key), it attests that the app has the app-id it claims to have. The app presents that to sentry and sentry validates it. The token isn't secret because it contains only public claims (it shouldn't be a concern if the app knows its app-id) and using signatures we can guarantee it won't be tampered with. (In my last message I wrote this could be a JWT; I was wrong there on second thought and it could be a JWS but not JWT, which would require certain claims to be present)

JoshVanL · 2023-01-17T16:56:26Z

I'd definitely consider the app ID token to be a secret since it is basically a password- even though it def contains public information, having access to the is token allows actors to request for dapr identities for an App ID which is not itself (impersonation).

ItalyPaleAle · 2023-01-17T17:04:10Z

That makes sense

ItalyPaleAle · 2023-01-17T19:29:12Z

@JoshVanL I've been thinking about this more and while you're right, I actually don't think it matters in the sense that it doesn't make a difference.

Your proposal is to have certificates issued if:

Sentry is presented with a valid token signed by K8s (in Dapr 1.10 this will be a bound token, but still issue by K8s)
Sentry validates the pod the request comes in by querying the K8s APIs

In this case, the relevant thing is still the bound token which is stored in /var/run/secrets/dapr.io/sentrytoken (legacy path is /var/run/secrets/kubernetes.io/serviceaccount/token"). Using bindings.localstorage an application could read that (related issue) and then using bindings.http could invoke the K8s APIs and request a certificate.

If we use a separate JWS issued by the Dapr Injector, for Sentry to issue a token you will need:

The K8s bound token (just like in the previous case)
The JWS issued by the Injector

We could store the JWS in a path like /var/run/secrets/dapr.io/pod-identity and with dapr/components-contrib#2444 implemented, it should have the same security profile as the other proposal.

The benefits of using a separate, Dapr-issued JWS are:

It's faster as the verification happens locally and doesn't increase the load on the K8s API server
it's a bit more decoupled from K8s and will be easier in the future to support self-hosted scenarios (maybe - it still depends on the injector)

JoshVanL · 2023-01-18T15:34:08Z

We could store the JWS in a path like /var/run/secrets/dapr.io/pod-identity and with dapr/components-contrib#2444 implemented, it should have the same security profile as the other proposal.

The problem with this is that sentry or injector needs to create that volume. In vanilla Kubernetes, the only real way to do this is by creating ConfigMaps or Secrets. This is problematic because they are accessible through the API and would also mean granting sentry/injector the permissions to access those resources in all namespaces. Not to mention the overhead of ensuring the contents of those volumes is kept up to date and correct- forcing the writer to watch pods.. The creation of these resources is comparable to the GET Pods operation.

The other option is to create a dapr CSI driver which is OTT, even more resource intensive, and comes with its own security model to manage/maintain.

As an aside, it is quite problematic that we can set the App ID as a Pod annotation since annotations are mutable on Pods, meaning that a dapr sidecar can be using an identity document which does not match the k8s API assumption.

JoshVanL · 2023-01-18T18:05:56Z

Another thing of note implementing the above- sentry is also going to have to GET Configurations in order to determine the trust domain of the app.

ItalyPaleAle · 2023-01-19T20:08:39Z

it is quite problematic that we can set the App ID as a Pod annotation since annotations are mutable on Pods,

This seems another point in the case of using pre-signed token :)

The problem with this is that sentry or injector needs to create that volume.

True, then maybe setting an env var may be better. The env var secret store could be changed to disable access to that variable.

sentry is also going to have to GET Configurations in order to determine the trust domain of the app.

Is the trust domain of the app a new value that's being added to the Dapr Configuration CRD (specifically daprsystem)? If so, does it need to be in the Configuration CRD because you expect that to change frequently?

If you think this should not change frequently, I'd recommend passing it to Sentry as a CLI flag that is set via variables in the Helm chart.

JoshVanL · 2023-01-20T13:57:46Z

Is the trust domain of the app a new value that's being added to the Dapr Configuration CRD (specifically daprsystem)? If so, does it need to be in the Configuration CRD because you expect that to change frequently?

The trust domain already exists in the configuration API.

dapr-bot · 2023-07-25T02:52:20Z

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

JoshVanL · 2023-09-07T18:38:16Z

In v1.11 runtime requires that the certificate, private key, and trust anchor environment variable be available when running in Kubernetes mode and mTLS is enabled. Because of this, it is a requirement that the injector continue to patch a valid daprd certificate chain as environment variables on the Pod spec in order to maintain the n-1 version coexistence guarantee.

This unfortunately means that in v1.12, the injector will continue to inject a valid certificate key pair (including the private key) on the Pod spec. To improve security, rather than the injector patching in the intermediate certificate, it will instead request and inject the identity certificate of the Pod it is patching. In v1.13 we will be able to remove adding certificate environment variables completely, as planned.

To summarize the security posture:

v1.11

Any daprd, dapr enabled app, and any user/entity which is able to get pods is able to impersonate/sign for any dapr identity they want.
All control plane services can impersonate/sign for any dapr identity they want.

v1.12

daprd, dapr enabled apps and control plane services can only communicate with their given dapr identities
The injector service is able to request for any app identity
Any user/entity is able to impersonate any app which it has get pods permissions for

v1.13

Final state of this issue- no private key material is discoverable & no entity can be impersonated.

dapr-bot · 2023-11-06T20:14:29Z

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

mukundansundar · 2024-02-08T13:36:15Z

moving tracking issue to 1.14 for there are still open issues/PRs linked to it.

JoshVanL · 2024-04-16T16:11:41Z

Done!

JoshVanL mentioned this issue Jan 16, 2023

Security Overhaul: Seperate out ServiceAccounts #5767

Merged

artursouza added this to the v1.11 milestone Feb 17, 2023

This was referenced Feb 28, 2023

[Proposal] Automatic root certificate rotation #5958

Open

REQUEST: New membership for @joshvanl dapr/community#292

Closed

artursouza mentioned this issue Mar 7, 2023

v1.11 Release Planning #5966

Closed

pravinpushkar mentioned this issue Mar 14, 2023

[WIP] Increase retry time interval for service invocation #4542

Closed

7 tasks

artursouza assigned JoshVanL Mar 28, 2023

artursouza removed this from the v1.11 milestone May 26, 2023

JoshVanL mentioned this issue Jun 12, 2023

Support existing secrets for the dapr helm chart #6500

Closed

brunocascio mentioned this issue Jun 13, 2023

Helm chart includes sentry package even if it's disabled in the values.yml #6507

Closed

mukundansundar mentioned this issue Jun 27, 2023

v1.12 Release Planning #6508

Closed

dapr-bot added the stale Issues and PRs without response label Jul 25, 2023

dapr-bot removed the stale Issues and PRs without response label Jul 25, 2023

JoshVanL mentioned this issue Aug 3, 2023

[5756 Identity Management Overhaul] Injector & daprd #6761

Merged

JoshVanL mentioned this issue Aug 25, 2023

Encrypt and Authenticate Placement Raft Network #6835

Merged

JoshVanL mentioned this issue Sep 7, 2023

Allow the injector control plane service to request any identity from sentry #6902

Merged

JoshVanL mentioned this issue Oct 24, 2023

v1.13 Release Planning #7093

Closed

31 tasks

dapr-bot added the stale Issues and PRs without response label Nov 6, 2023

ItalyPaleAle added pinned and removed stale Issues and PRs without response labels Nov 6, 2023

JoshVanL mentioned this issue Nov 18, 2023

Operator: Adds authorization to the operator API #7111

Merged

JoshVanL added this to the v1.13 milestone Nov 27, 2023

JoshVanL added the P0 label Nov 27, 2023

mukundansundar added the tracking-issue label Feb 8, 2024

mukundansundar modified the milestones: v1.13, v1.14 Feb 8, 2024

JoshVanL closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Identity Management Overhaul #5756

[PROPOSAL] Identity Management Overhaul #5756

JoshVanL commented Jan 13, 2023 •

edited

yaron2 commented Jan 13, 2023

JoshVanL commented Jan 13, 2023 •

edited

yaron2 commented Jan 13, 2023

ItalyPaleAle commented Jan 16, 2023

JoshVanL commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

JoshVanL commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

JoshVanL commented Jan 18, 2023

JoshVanL commented Jan 18, 2023

ItalyPaleAle commented Jan 19, 2023

JoshVanL commented Jan 20, 2023

dapr-bot commented Jul 25, 2023

JoshVanL commented Sep 7, 2023

dapr-bot commented Nov 6, 2023

mukundansundar commented Feb 8, 2024

JoshVanL commented Apr 16, 2024

[PROPOSAL] Identity Management Overhaul #5756

[PROPOSAL] Identity Management Overhaul #5756

Comments

JoshVanL commented Jan 13, 2023 • edited

PRs

Motivation

Goals

Non-Goals

Current Shortfalls

1. Issuer Key Sharing

2. Broken CA Auth

3. Control Plane components uses same ServiceAccount

4. Injector uses separate trust anchor for MutatingWebhook

Solutions

a) Removing Issuer certificate key pair from workloads

b) App ID

c) Separate Service Acounts for control plane components.

d) Injector uses identity certificate for Mutating Webhook

yaron2 commented Jan 13, 2023

JoshVanL commented Jan 13, 2023 • edited

yaron2 commented Jan 13, 2023

ItalyPaleAle commented Jan 16, 2023

JoshVanL commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

JoshVanL commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

ItalyPaleAle commented Jan 17, 2023

JoshVanL commented Jan 18, 2023

JoshVanL commented Jan 18, 2023

ItalyPaleAle commented Jan 19, 2023

JoshVanL commented Jan 20, 2023

dapr-bot commented Jul 25, 2023

JoshVanL commented Sep 7, 2023

v1.11

v1.12

v1.13

dapr-bot commented Nov 6, 2023

mukundansundar commented Feb 8, 2024

JoshVanL commented Apr 16, 2024

JoshVanL commented Jan 13, 2023 •

edited

JoshVanL commented Jan 13, 2023 •

edited