Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide deterministic bundle #310

Closed
Jiawei0227 opened this issue Feb 28, 2024 · 12 comments · Fixed by #380
Closed

Provide deterministic bundle #310

Jiawei0227 opened this issue Feb 28, 2024 · 12 comments · Fixed by #380

Comments

@Jiawei0227
Copy link
Contributor

The generated trust bundle from trust-manager should provide deterministic bundle in the sense that if the source of the CA order changes, or there are multiple CA sources from multiple secret and they are shuffled in the spec. The generated final trust bundle should always be the same with the same ordering. Mayby by alphabetic order or expiration time order or something.

@Jiawei0227
Copy link
Contributor Author

Cross link to: #303 (comment)

This means if a Bundle A looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca1
  - secret:
      key: ca.crt
      name: ca2
  target:
    configMap:
      key: ca.crt

and bundle B looks like this

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca2
  - secret:
      key: ca.crt
      name: ca1
  target:
    configMap:
      key: ca.crt

it should always produce the same content. Also if we move one CA from ca1 -> ca2, the result should be the same.

@Jiawei0227
Copy link
Contributor Author

@SgtCoDFish thoughts on potentially get this done? It sounds to me we can just do alphabetic order during producing final bundle which should be good enough.

@SgtCoDFish
Copy link
Member

I don't currently have the bandwidth to implement this, but I'd be happy to review a PR which does it! My 2c would be to hash the DER-encoded certs and then order them alphanumerically based on the hex-encoded hash

@Jiawei0227
Copy link
Contributor Author

not sure if anyone would be interested to pick it up but this will be a critical feature. Reason is our component is mounting the trust bundle configmap and the other automation is reconciling the bundle. But if the bundle data keep reordering it will be very expensive and unnecessary.

@erikgb
Copy link
Contributor

erikgb commented Mar 1, 2024

But the order is consistent now, and that's good/required. Are you planning to shuffle the sources around @Jiawei0227? I don't say this shouldn't be fixed, but I don't consider it critical. 😸

@sebEg
Copy link

sebEg commented Jul 9, 2024

Hi, we have a Bundle which includes six configMaps using a label selector:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: cluster-default-ca
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          cluster-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

We also observe that the order of certificates changes with every reconciliation, leading to many unnecessary updates of the generated configmaps. While the output for root-certs.jks seems to be consistent, root-certs.pem and root-certs.p12 change with almost every reconciliaton.

@jan-kantert
Copy link

jan-kantert commented Jul 9, 2024

@erikgb This caused three near misses for us. In two cases it caused etcd to run out of space within a short timespan (A). In two cases (one case had both) it caused etcd to use so much memory that out masters went OOM (B).

We use trust-manager to inject three configmaps (full CA certs) into each namespace. This happened in fairly small clusters (<30 namespaces). You can reproduce it by simply restarting trust-manager a few times. It will recreate those configmaps, etcd will grow by 1-2 GB and memory on the master will rise by about 3-6 GB.

Personally, I would consider this a critical issue. Currently, we have to massively overprovision our masters. On Azure this prevents you from using the smallest Kubernetes tier at all as it will kill the API. On AWS we had to roughly double our costs.

@erikgb
Copy link
Contributor

erikgb commented Jul 9, 2024

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

@jan-kantert
Copy link

We investigated this a bit more. It turns out that our issue is caused by non-deterministic ordering in label selectors. This is how our bundle looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: xxx-default-ca
  namespace: my-ns
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          certs.infrastructure.mydomain.com/inject-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

If we replace our selector with a static list this becomes less of an issue. With the selector the content seems to change every time.

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

I will have a look if I can reproduce this in a test.

@arsenalzp
Copy link
Contributor

What caused that issue?

@maelvls
Copy link
Member

maelvls commented Jul 12, 2024

I see two different issues here:

  • On one hand, @Jiawei0227 was requesting a feature where reordering the sources array in the Bundle spec would not result in an update to the configmaps.
  • On the other hand, @jan-kantert and @sebEg found a serious bug that forces the target configmaps to be updated for no reason and seems to only occur when label selectors are used. This bug still needs a minimal working example.

I suggest that we don't conflate the feature request with the bug. I propose that the current issue (#310) keeps track of @Jiawei0227's feature request. @jan-kantert @sebEg can you create a separate issue for the bug you found?

@jabdoa2
Copy link
Contributor

jabdoa2 commented Jul 12, 2024

I fixed both cases in in #380. Both issues have the same underlying cause. The second case just triggers the issue far more frequently since kubernetes will not order configmaps when loaded via a lebel selector. We could also sort those configmaps. That would fix the second issue (unless you rename configmaps). However, just ordering the certs will fix both issues at once. I can also add a test for the second case but it will be fixed as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants