Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] Secure communication with x509 certificates #76

Closed
1 task
kvaps opened this issue Mar 23, 2024 · 27 comments
Closed
1 task

[epic] Secure communication with x509 certificates #76

kvaps opened this issue Mar 23, 2024 · 27 comments
Assignees
Milestone

Comments

@kvaps
Copy link
Member

kvaps commented Mar 23, 2024

Secure communication between etcd members and etcd clients with x509 certificates.

This is a last task remained before we start using etcd-operator in our cozystack platform.

@kvaps kvaps changed the title Certificates Secure communication with x509 certificates Mar 23, 2024
@Kirill-Garbar Kirill-Garbar self-assigned this Mar 23, 2024
@kvaps
Copy link
Member Author

kvaps commented Mar 26, 2024

@gecube could you please elaborate on this, and write user stories and user spec for certificates

@kvaps
Copy link
Member Author

kvaps commented Mar 26, 2024

@Kirill-Garbar
Copy link
Collaborator

How to create certificates infrastructure

Permalink to kamaji-etcd repo.

cfssl gencert -initca /csr/ca-csr.json | cfssljson -bare /certs/ca &&
mv /certs/ca.pem /certs/ca.crt && mv /certs/ca-key.pem /certs/ca.key &&
cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=peer-authentication /csr/peer-csr.json | cfssljson -bare /certs/peer &&
cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=peer-authentication /csr/server-csr.json | cfssljson -bare /certs/server &&
cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=client-authentication /csr/root-client-csr.json | cfssljson -bare /certs/root-client

Generate CA

cfssl gencert -initca /csr/ca-csr.json | cfssljson -bare /certs/ca - initiates CA key and certificate. We need to create ca-csr configuration. Kamaji example:

{
  "CN": "Clastix CA",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "IT",
      "ST": "Italy",
      "L": "Milan"
    }
  ]
}

CSR for our etcd (if we need it with cert-manager):

{
  "CN": "AEnix etcd cluster <<cluster-name>> CA",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "?",
      "ST": "?",
      "L": "?"
    }
  ]
}

Important points:

  • Reissue CA certificate automatically.
  • Size 4096 bytes.
  • We can't rotate CA key. If CA key is rotated, then all client and peer certificates must be signed by new CA key at the same time => possible long downtime. Old certificates signed by old CA key will not work.
  • Different CAs for peer and client authentication. Some customers will need to issue their own certificates for clients applications and will not want to touch peer certificates.
  • We should not place ca.key to etcd-cluster namespace and mount it to etcd cluster (kamaji mounts it to etcd-cluster, defrag and backup jobs). Ca.key is necessary only for issuing peer and client certificates - can be placed to etcd-operator namespace and must be protected.
  • Separate CA for every etcd cluster.
    • Why: Want to make etcd-clusters isolated. No need for different etcd clusters to have common CA.
    • Possible risk: Will applications be able to use different certificates to different etcd clusters if they need? Do libraries allow that?

Questions:

  • How to fill C, ST, L fields? Are they mandatory?

Generate certificate for peer authentication

Peer authentication certs authenticate etcd nodes against etcd nodes and used as server certificates.

cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=peer-authentication /csr/peer-csr.json | cfssljson -bare /certs/peer

config.json - configuration for certificate requests (1 year validity):

{
  "signing": {
    "default": {
      "expiry": "8760h"
    },
    "profiles": {
      "server-authentication": {
        "usages": ["signing", "key encipherment", "server auth"],
        "expiry": "8760h"
      },
      "client-authentication": {
        "usages": ["signing", "key encipherment", "client auth"],
        "expiry": "8760h"
      },
      "peer-authentication": {
        "usages": ["signing", "key encipherment", "server auth", "client auth"],
        "expiry": "8760h"
      }
    }
  }
}

peer-csr.json - certificate request config. We need to create it. Kamaji example:

{
  "CN": "etcd",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "hosts": ["test-test-etcd-0",
    "test-test-etcd-0.test-test-etcd",
    "test-test-etcd-0.test-test-etcd.etcd-operator-system.svc",
    "test-test-etcd-0.test-test-etcd.etcd-operator-system.svc.cluster.local","test-test-etcd-1",
    "test-test-etcd-1.test-test-etcd",
    "test-test-etcd-1.test-test-etcd.etcd-operator-system.svc",
    "test-test-etcd-1.test-test-etcd.etcd-operator-system.svc.cluster.local","test-test-etcd-2",
    "test-test-etcd-2.test-test-etcd",
    "test-test-etcd-2.test-test-etcd.etcd-operator-system.svc",
    "test-test-etcd-2.test-test-etcd.etcd-operator-system.svc.cluster.local",
    "127.0.0.1"
  ]
}
  • test-test-etcd - helm release name
  • etcd-operator-system - namespace name
  • 3 replicas

Hostnames helm template:

{{- range $count := until (int $.Values.replicas) -}}
    {{ printf "\"%s-%d\"," ( include "etcd.stsName" $outer ) $count }}
    {{ printf "\"%s-%d.%s\"," ( include "etcd.stsName" $outer ) $count (include "etcd.serviceName" $outer) }}
    {{ printf "\"%s-%d.%s.%s.svc\"," ( include "etcd.stsName" $outer ) $count (include "etcd.serviceName" $outer) $.Release.Namespace }}
    {{ printf "\"%s-%d.%s.%s.svc.cluster.local\"," ( include "etcd.stsName" $outer ) $count (include "etcd.serviceName" $outer) $.Release.Namespace }}
{{- end }}

Important points:

  • Rotate private keys. They are not rotated by cert-manager by default. If we enable this feature, private key and certificate will be written at the same time => no downtime is expected. See cert-manager docs.

Questions:

  • Key length? It is not recommended to use 2048, but 4096 can bring higher CPU usage. 3072? Cert-manager does not recreate private key if key length setting is changed. It was true before - most probably now length can be changed on-the-fly.
  • Expiration period? Hardcoded 1 year for the first implementation (30 days reissue before expiration).

Generate server certificate for client trust

cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=peer-authentication /csr/server-csr.json | cfssljson -bare /certs/server

server-csr.json - certificate request config. We need to create it. Kamaji example:

{
 "CN": "etcd",
 "key": {
   "algo": "rsa",
   "size": 4096
 },
 "hosts": ["test-test-etcd-0.test-test-etcd.etcd-operator-system.svc.cluster.local","test-test-etcd-1.test-test-etcd.etcd-operator-system.svc.cluster.local","test-test-etcd-2.test-test-etcd.etcd-operator-system.svc.cluster.local",
   "etcd-server.etcd-operator-system.svc.cluster.local",
   "etcd-server.etcd-operator-system.svc",
   "etcd-server",
   "127.0.0.1"
 ]
}

Hostnames helm template:

{{- range $count := until (int $.Values.replicas) -}}
    {{ printf "\"%s-%d.%s.%s.svc.cluster.local\"," ( include "etcd.fullname" $outer ) $count (include "etcd.serviceName" $outer) $.Release.Namespace }}
{{- end }}

Important points:

  • Rotate private keys. They are not rotated by cert-manager by default. If we enable this feature, private key and certificate will be written at the same time => no downtime is expected. See cert-manager docs.

Questions:

  • Key length? It is not recommended to use 2048, but 4096 can bring higher CPU usage. 3072? Cert-manager does not recreate private key if key length setting is changed. It was true before - most probably now length can be changed on-the-fly.
  • Expiration period? Hardcoded 1 year for the first implementation (30 days reissue before expiration).

Generate root client certificate

cfssl gencert -ca=/certs/ca.crt -ca-key=/certs/ca.key -config=/csr/config.json -profile=client-authentication /csr/root-client-csr.json | cfssljson -bare /certs/root-client

root-client-csr.json request config. We need to create it. Kamaji example:

{
  "CN": "root",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "O": "system:masters"
    }
  ]
}

Important points:

  • Root certificate will be used only by operator for maintenance tasks. Certificate for Kubernetes is generated separately.
  • Rotate private keys. They are not rotated by cert-manager by default. If we enable this feature, private key and certificate will be written at the same time => no downtime is expected. See cert-manager docs.
  • Root client certificate can be used for kubernetes api server application during the beta test period.

Questions:

  • Key length? It is not recommended to use 2048, but 4096 can bring higher CPU usage. 3072? Cert-manager does not recreate private key if key length setting is changed. It was true before - most probably now length can be changed on-the-fly.
  • Expiration period? Hardcoded 1 year for the first implementation (30 days reissue before expiration).

Etcd reference for v3.5

What certificates and private keys to use for what configuration key: kamaji-etcd chart:

Client-to-server communication:

--cert-file=<path>: Certificate used for SSL/TLS connections to etcd. When this option is set, advertise-client-urls can use the HTTPS schema.

--key-file=<path>: Key for the certificate. Must be unencrypted.

--client-cert-auth: When this is set etcd will check all incoming HTTPS requests for a client certificate signed by the trusted CA, requests that don’t supply a valid client certificate will fail. If authentication is enabled, the certificate provides credentials for the user name given by the Common Name field.

--trusted-ca-file=<path>: Trusted certificate authority.

--auto-tls: Use automatically generated self-signed certificates for TLS connections with clients.

Peer (server-to-server / cluster) communication:

The peer options work the same way as the client-to-server options:

--peer-cert-file=<path>: Certificate used for SSL/TLS connections between peers. This will be used both for listening on the peer address as well as sending requests to other peers.

--peer-key-file=<path>: Key for the certificate. Must be unencrypted.

--peer-client-cert-auth: When set, etcd will check all incoming peer requests from the cluster for valid client certificates signed by the supplied CA.

--peer-trusted-ca-file=<path>: Trusted certificate authority.

--peer-auto-tls: Use automatically generated self-signed certificates for TLS connections between peers.

If either a client-to-server or peer certificate is supplied the key must also be set. All of these configuration options are also available through the environment variables, ETCD_CA_FILE, ETCD_PEER_CA_FILE and so on.

Common options:

--cipher-suites: Comma-separated list of supported TLS cipher suites between server/client and peers (empty will be auto-populated by Go).

--tls-min-version=<version> Sets the minimum TLS version supported by etcd.

--tls-max-version=<version> Sets the maximum TLS version supported by etcd. If not set the maximum version supported by Go will be used.

@kvaps kvaps added this to the v0.0.3 milestone Mar 31, 2024
@kvaps kvaps mentioned this issue Mar 31, 2024
@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 2, 2024

For now it was decided to create secret reference sections in the etcdCluster spec and expect secrets with certificates from the user of etcd-operator.

Basic spec from @lllamnyp (TBD)

spec:
  security:
    serverTLSSecretRef: # secretRef
      name: server-tls-secret
    clientCertAuth: true # bool
    trustedCAFile: # secretRef
      name: trusted-tls-secret
    peerTLSSecretRef: # secretRef
      name: peer-tls-secret
    peerClientCertAuth: true # bool

FYI @kvaps

@kvaps kvaps changed the title Secure communication with x509 certificates [epic] Secure communication with x509 certificates Apr 3, 2024
@sergeyshevch sergeyshevch pinned this issue Apr 4, 2024
@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 6, 2024

Peer.ca

Option Description
secretName Secret name of user-provided secret. If not specified then operator generates certificate by the spec below
metadata Metadata of generated secret.
duration Expiration time of generated secret.
renewBefore Time period before expiration time when certificate will be reissued.
privateKey Private key configuration: algorithm and key size.

Peer.cert

Option Description
secretName Secret name of user-provided secret. If not specified then operator generates certificate by the spec below. If peer.ca.secretName is provided, then this certificate is generated from the CA that was provided by the user. You can't define the secret name in this section and do not define peer.ca.secretName.
metadata Metadata of generated secret.
duration Expiration time of generated secret.
renewBefore Time period before expiration time when certificate will be reissued.
privateKey Private key configuration: algorithm, key size and boolean parameter is it necessary to rotate private key when certificate is expired

ClientServer section has the same fields as peer section.

Rbac

Option Description
enabled Enables role-based access control: creates root user in etcd, gives him root role and enables authentication in etcd.
spec:
  security:
    peer:
      enabled: true # optional
      ca:
        # if not defined, then operator generates CA by the spec below
        secretName: ext-peer-ca-tls-secret # oneof secretName or secretTemplate
        secretTemplate: # oneof secretName or secretTemplate
          annotations: {} # optional
          labels: {} # optional
        duration: 86400h # optional
        renewBefore: 720h # optional
        privateKey:
          algorithm: RSA # optional
          size: 4096 # optional
      cert:
        secretName: ext-peer-tls-secret
        secretTemplate:
          annotations: {}
          labels: {}
        duration: 720h
        renewBefore: 180h
        privateKey:
          rotate: true # optional
          algorithm: RSA
          size: 4096
    clientServer:
      enabled: true
      ca:
        secretName: ext-server-ca-tls-secret
        secretTemplate:
          annotations: {}
          labels: {}
        duration: 86400h
        renewBefore: 720h
        privateKey:
          algorithm: RSA
          size: 4096
      serverCert:
        secretName: ext-server-tls-secret
        secretTemplate:
          annotations: {}
          labels: {}
        extraSANs: []
        duration: 720h
        renewBefore: 180h
        privateKey:
          rotate: true
          algorithm: RSA
          size: 4096
      rootClientCert:
        secretName: ext-client-tls-secret
        secretTemplate:
          annotations: {}
          labels: {}
        duration: 720h
        renewBefore: 180h
        privateKey:
          rotate: true
          algorithm: RSA
          size: 4096
      auth:
        enabled: true # optional

Important points:

  • If field has a value and it is optional, then this value is a default.
  • peer:
    • If ca.secretName is not defined, operator generates its own CA.
    • If ca.secretName is defined, then every field under secretName should not be defined.
    • If cert.secretName id not defined, then certificate is generate by operator from the CA defined in the section above (user-managed or operator-managed).
    • User must define ca.secretName if cert.secretName is defined.
    • Algorithm is a list of the values. NOTE: look into the lib that generates certs what values exist (or to cert-manager).
  • clientServer:
    • See peer logic.
    • RootClientCert uses server ca and has the same logic as server.cert.
    • Rbac.enabled enables role-based access control: creates root user in etcd, gives him root role and enables authentication in etcd.

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

I would suggest to reuse logic of resource templates we introduced before, eg for every specific resources configuration:

secretTemplate:
  metadata:
    name: ""
    annotations: {}
    labels: {}

Instead of this:

secretName: ""
metadata:
  annotations: {}
  labels: {}

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

Also I'd like to ask if we really need to mount CA secret to the etcd containers? - or is it just to manage it?

@gecube
Copy link
Collaborator

gecube commented Apr 6, 2024

@kvaps We discussed it with @Kirill-Garbar and there is no obvious reason to do it. As ca.crt is already present in peer and server certificate secrets. And it's authenticity is guaranteed by cert-manager

@Kirill-Garbar
Copy link
Collaborator

Ofcourse I do not want to mount CA secret to etcd container - it is not necessary.

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

I don't like spec.security.clientServer, I think spec.security.client would be more correct. As there are always certa for peer and client communication

@Kirill-Garbar
Copy link
Collaborator

Client-to-server communication is written in etcd security documents: https://etcd.io/docs/v3.5/op-guide/security/
Took it from there.

In this section root client and etcd server certificates are defined.

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

@kvaps We discussed it with @Kirill-Garbar and there is no obvious reason to do it. As ca.crt is already present in peer and server certificate secrets. And it's authenticity is guaranteed by cert-manager

Yeah, and I like this. However despite the fact it is guaranteed by cert-manager, it is not guaranteed by kubernetes.io/tls secret type, just be aware and keep this in mind.

@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 6, 2024

I do not want to make CA and CERTificate one secret despite the fact that certificate secret has CA.crt in one of the sections. It is not secure - if certificate secret is hijacked, then BAD ACTOR can place its own CA and etcd will automatically trust it.

But I agree that in operator functionality CA secret (that is mounted to etcd) should not include ca.key.

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

What is the difference between clientServer.rootClientCert and clientServer.cert?

Also option clientServer.rbac=true is not obvious to me. From my point of view there is should be just one client certificate with root access to the cluster.

If you want to manage additional users and their access to the cluster, it must be implemented later using CRs. Imao

@gecube
Copy link
Collaborator

gecube commented Apr 6, 2024

if certificate secret is hijacked, then BAD ACTOR can place its own CA and etcd will automatically trust it.

I don't agree as if hi-jacker has the access to the secrets with the certificates and private keys, the game is over already.

@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 6, 2024

ClientServer cert is server certificate.
RootClientCert is certificate for the root user. Even if RBAC is not enabled, user needs client certificate.

A agree with the topic anbout the CRs for users. But root cert is necessary to manage etcd cluster from the operator

@gecube
Copy link
Collaborator

gecube commented Apr 6, 2024

Also option clientServer.rbac=true is not obvious to me. From my point of view there is should be just one client certificate with root access to the cluster.

I may guess that it is for different use-cases, like patroni etc.

@Kirill-Garbar
Copy link
Collaborator

if certificate secret is hijacked, then BAD ACTOR can place its own CA and etcd will automatically trust it.

I don't agree as if hi-jacker has the access to the secrets with the certificates and private keys, the game is over already.

You can't remove one security layer, because you have another security layers. Security layers are independent. It is not an argument.

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

I do not want to make CA and CERTificate one secret despite the fact that certificate secret has CA.crt in one of the sections. It is not secure - if certificate secret is hijacked, then BAD ACTOR can place its own CA and etcd will automatically trust it.

It's okay, client certificates usually contain CA certificate but not contain CA secret. Just keep in mind that this field introduced by cert-manager and not standartized in kubernetes.io/tls secret type.

So it is widely used, but not standartized in Kubernetes yet.

@gecube
Copy link
Collaborator

gecube commented Apr 6, 2024

Just keep in mind that it is introduced by cert-manager and not required by kubernetes.io/tls secret type.

agree, it's a good remark, but it could be a part of our "API" or "convention"

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

ClientServer cert is server certificate.
RootClientCert is certificate for the root user. Even if RBAC is not enabled, user needs client certificate.

Ah got it! I would like to move this into the same level then, eg:

security:
  peerCertificate: {}
  peerTrustedCACertficate: {}
  clientCertificate: {}
  serverCertificate: {}
  trustedCACertificate: {}

As this would make it closer to repeat etcd configuration keys.

@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 6, 2024

Ah got it! I would like to move this into the same level then, eg:

security:
  peerCertificate: {}
  peerTrustedCACertficate: {}
  clientCertificate: {}
  serverCertificate: {}
  trustedCACertificate: {}

I personally like nesting structures more as usually hierarchical structure shows relations and usually more extendable. But I do not have an argument against "closer to etcd keys". Semantically closer to etcd is important.

I prepared structure with your proposal.
2 questions:

  • First version of the structure has enablers for peer and client-server communication. Not clear where to put them to enable separately. Or we make decision that we have one enabler under the security section that enables everything.
  • Where to put RBAC enabler?
spec:
  security:
    clientTLS: false # Disables client-server tls communication
    rbac: false # Disables etcd role-based access control
    peerCertificate:
      secretName: ext-peer-tls-secret # oneof secretName or secretTemplate
      secretTemplate: # oneof secretName or secretTemplate
        annotations: {}
        labels: {}
      duration: 720h
      renewBefore: 180h
      privateKey:
        rotate: true # optional
        algorithm: RSA
        size: 4096
    peerTrustedCACertficate:
      # if not defined, then operator generates CA by the spec below
      secretName: ext-peer-ca-tls-secret
      secretTemplate:
        annotations: {}
        labels: {}
      duration: 86400h # optional
      renewBefore: 720h # optional
      privateKey:
        algorithm: RSA # optional
        size: 4096 # optional
    serverCertificate:
      secretName: ext-server-tls-secret
      secretTemplate:
        annotations: {}
        labels: {}
      duration: 720h
      renewBefore: 180h
      privateKey:
        rotate: true
        algorithm: RSA
        size: 4096
    trustedCACertificate:
      secretName: ext-server-ca-tls-secret
      secretTemplate:
        annotations: {}
        labels: {}
      duration: 86400h
      renewBefore: 720h
      privateKey:
        algorithm: RSA
        size: 4096
    clientCertificate:
      secretName: ext-client-tls-secret
      secretTemplate:
        annotations: {}
        labels: {}
      duration: 720h
      renewBefore: 180h
      privateKey:
        rotate: true
        algorithm: RSA
        size: 4096

@kvaps
Copy link
Member Author

kvaps commented Apr 6, 2024

Cool! Let's now add there

secretTemplate: {}

If it's specified, it means our operator manages secrets

Similarly to volumeClaimTemplate in STS or podDisruptionBudgetTemplate logic we have

@Kirill-Garbar
Copy link
Collaborator

Kirill-Garbar commented Apr 6, 2024

If it's specified, it means our operator manages secrets

I would say that operator should manage secrets by default without any additional configurations. So I would base this decision (manage by operator or attach external ones) on another field: secretName or we can call it externalSecretName. If it is defined, then do not generate own certs.

P.S. Changed both specs by your request: first nested version and flat map.

@kvaps
Copy link
Member Author

kvaps commented Apr 7, 2024

I would suggest:

  1. To make secretTemplate and secretName mutually exclusive. It's not allowed to have both options defined simultaneously.
  2. If secretName specified then use external secretName
  3. If secretName not specified then use automaic certificate generation from secretTemplate
  4. let's remove .metadata.name from secretTemplate. I don't see the cases where user would need to override the name for the secrets.

In this way, to enable automatic certificate managedement user would need to add simple:

---
apiVersion: etcd.aenix.io/v1alpha1
kind: EtcdCluster
metadata:
  name: test
  namespace: default
spec:
  replicas: 3
  security: {}

@kvaps
Copy link
Member Author

kvaps commented Apr 8, 2024

Another option is that we can introduce tls option and place all the certs under it:

security:
  clientAuth: true
  clientTLS: true
  tls:
    peer:
      secretName: ""
      secretTemplate: {}
    peerCA:
      secretName: ""
      secretTemplate: {}
    ca:
      secretName: ""
      secretTemplate: {}
    client:
      secretName: ""
      secretTemplate: {}
    server:
      secretName: ""
      secretTemplate: {}
      extraSANs: []

@kvaps
Copy link
Member Author

kvaps commented Apr 8, 2024

All right I agree with the variant from #76 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants