Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service mesh: add mTLS auth method #24263

Merged
merged 3 commits into from Mar 20, 2023
Merged

Conversation

meyskens
Copy link
Member

@meyskens meyskens commented Mar 9, 2023

This adds an mTLS auth handler to the Serice Mesh auth package.
It will listen on a given port and does a mutual TLS handshake with
SPIFFE IDs it received. This will assure the both sides got the needed
certificates.

In order to integrate with the datapath tables it also improves the SPIFFE
interface to use the Cilium Numeric Identities. And convert them from and
to valid SNI fields. As well as implement code to validate the URI SANS
inside the certificates.

This can be enabled in the in the Helm chart.

Fixes: #23807

Add mtls-spiffe as auth mode in the CiliumNetworkPolicy

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2023
@meyskens meyskens force-pushed the mtls-spiffe branch 2 times, most recently from aecb3fb to bde87f1 Compare March 9, 2023 13:58
@meyskens
Copy link
Member Author

meyskens commented Mar 9, 2023

How to test this (I know you cannot wait ;) ):

Enable it via Helm:

auth:
  mTLS:
    enabled: true

Install SPIRE: https://github.com/meyskens/cilium-spiffe-poc/tree/meyskens/cilium-mtls (make install)

Deploy a policy to use auth, I used the connectivity test pods for this

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: auth-egress
  namespace: cilium-test
spec:
  endpointSelector:
    matchLabels:
      kind: client
  egress:
    - toEndpoints:
        - matchLabels:
            kind: echo
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
      auth:
        type: "mtls-spiffe"
    - toEndpoints:
      - matchLabels:
          "k8s:io.kubernetes.pod.namespace": "kube-system"
          "k8s:k8s-app": "kube-dns"
      toPorts:
        - ports:
            - port: "53"
              protocol: ANY
          rules:
            dns:
              - matchPattern: "*"

Then set up a connection between two pods and scan the logs of the Cilium agent for logs like

level=debug msg="auth: egress policy is requiring authentication type mtls-spiffe for identity 30086->41044, endpoint 10.244.1.148->10.244.0.130" subsys=auth

Create a SPIFFE ID for the identities:

kubectl exec -n spire spire-server-0 -- \
		/opt/spire/bin/spire-server entry create \
		-spiffeID spiffe://spiffe.cilium.io/cilium-id/ENTER-ID-HERE \
		-parentID spiffe://spiffe.cilium.io/dclient \
		-selector cilium:mtls

(this will not be needed once #23802 is built)

Try the connection again and enjoy a brand new mTLS experience <3

@meyskens meyskens added kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. area/servicemesh GH issues or PRs regarding servicemesh labels Mar 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2023
@meyskens meyskens marked this pull request as ready for review March 9, 2023 14:05
@meyskens meyskens requested review from a team as code owners March 9, 2023 14:05
@meyskens meyskens force-pushed the mtls-spiffe branch 5 times, most recently from bd51500 to 57d3468 Compare March 9, 2023 15:35
@meyskens meyskens requested a review from a team as a code owner March 9, 2023 15:35
@meyskens meyskens requested a review from qmonnet March 9, 2023 15:35
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

One major concern I have is that I don't see any tests. I realize from reviewing that this seems to be more of integration code so unit testing probably wasn't obvious. However, I can imagine a test that can spin off two TLS clients / servers and validate the code paths without too much complication.

Otherwise, I only have code nits.

pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/spire/certificate_provider.go Outdated Show resolved Hide resolved
pkg/auth/spire/delegate.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
@meyskens
Copy link
Member Author

meyskens commented Mar 9, 2023

@christarazi i agree on the testing side, I have been thinking about this for a bit, it would be good to have it covered with testing on some kind of integration test in the package between the mTLS and SPIFFE package. Need to discuss however if we make it part of this PR or a separate work item.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @meyskens! I'm really impressed with how quickly you've pulled functional code together.

However, I'm in a similar boat to @christarazi here, the changes look great, but I'd like to see unit tests, particularly for mtls_authhandler.go and certificate_provider.go. I think just tests that exercise the code paths inside the if statements are a great place to start, even for the internal functions. Let's lock in tests while this is a small change, and then we can have greater confidence as the changesets and assoicated testing get bigger.

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @meyskens!

Couple of comments about context handling and locking, but overall LGTM.

pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/spire/delegate.go Show resolved Hide resolved
Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great 🎉 just some small suggestions

pkg/auth/certs/provider.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API changes

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @meyskens! Couple of nitpicking comments but the patch LGTM 🎉

pkg/auth/mtls_authhandler.go Outdated Show resolved Hide resolved
pkg/auth/mtls_authhandler.go Show resolved Hide resolved
@kaworu
Copy link
Member

kaworu commented Mar 15, 2023

@nathanjsweet question: will the socket mount shared between the spire agent and cilium agent be a problem in the context of OpenShift?

@meyskens
Copy link
Member Author

The SPIRE installation finalization will be part of #23806
For now we have this socket setup but for some clusters it might be that we need to change this with a more complex setup. It isn't yet final

@christarazi
Copy link
Member

christarazi commented Mar 15, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: migrate-svc restart count values do not match

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

This adds an mTLS auth handler to the Serice Mesh auth package.
It will listen on a given port and does a mutual TLS handshake with
SPIFFE IDs it received. This will assure the both sides got the needed
certificates.

In order to integrate with the datapath tables it also improves the SPIFFE
interface to use the Cilium Numeric Identities. And convert them from and
to valid SNI fields. As well as implement code to validate the URI SANS
inside the certificates.

Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
Removing the debuig line that tells which SVID it received.
The delegate API will send the state of the world on every sync.
In very large deployments this will make a lot of debug logs.

Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
This adds the ability to enable mtls-spiffe support in the Helm chart,
It will set the required config flags to the defaults as wel
as mount the spire socket to the agent pods.

Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
@meyskens
Copy link
Member Author

meyskens commented Mar 16, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: migrate-svc restart count values do not match

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work @meyskens !

@meyskens
Copy link
Member Author

CI fails seem to be a flake or not related to this PR

@sayboras
Copy link
Member

test-1.16-4.19 failure should be fixed as part of #24336

test-1.24-5.4 failure is related to new test, which could be a flake #22950.

@sayboras
Copy link
Member

sayboras commented Mar 20, 2023

/mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4

👍 created #24453

@sayboras sayboras added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 20, 2023
@aditighag aditighag merged commit 971ec32 into cilium:master Mar 20, 2023
41 checks passed
@meyskens meyskens deleted the mtls-spiffe branch March 21, 2023 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable mTLS handshake between agents
10 participants