Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtls: SPIRE server and agent installation #24765

Merged
merged 6 commits into from Apr 19, 2023

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Apr 5, 2023

Description

This is to install SPIRE server and agent for cilium mTLS.

TODO

  • Figure out the way to register the below identities as part of the installation
    • init sidecar
Temp script
kubectl exec -n cilium-spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://spiffe.cilium/ns/spire/sa/spire-agent \
  -selector k8s_psat:agent_ns:cilium-spire \
  -selector k8s_psat:agent_sa:spire-agent \
  -node

# Then add the cilium-agent identity to the server (required for the delegated identity to work)
kubectl exec -n cilium-spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://spiffe.cilium/cilium-agent \
  -parentID spiffe://spiffe.cilium/ns/spire/sa/spire-agent \
  -selector k8s:ns:kube-system \
  -selector k8s:label:k8s-app:spiffetest

kubectl exec -n cilium-spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://spiffe.cilium/dclient \
  -parentID spiffe://spiffe.cilium/ns/spire/sa/spire-agent \
  -selector k8s:ns:kube-system \
  -selector k8s:sa:cilium

kubectl exec -n cilium-spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://spiffe.cilium/cilium-operator \
  -parentID spiffe://spiffe.cilium/ns/spire/sa/spire-agent \
  -selector k8s:ns:kube-system \
  -selector k8s:sa:cilium-operator

Fixes: #23806

Testing

Testing was done with fresh cilium installation with mTLS enabled.

  • auth.mTLS.spire.enabled=true
  • auth.mTLS.spire.install.enabled=true
$ cilium connectivity test --test auth
ℹ️  Single-node environment detected, enabling single-node connectivity test
ℹ️  Monitor aggregation detected, will skip some flow validation steps
...
[=] Skipping Test [client-egress-l7-set-header]
[=] Test [echo-ingress-auth-always-fail]
........
[=] Test [echo-ingress-auth-mtls-spiffe]
........
✅ All 2 tests (16 actions) successful, 35 tests skipped, 0 scenarios skipped.
mtls: SPIRE server and agent installation

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 5, 2023
@sayboras sayboras added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Apr 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 5, 2023
@sayboras sayboras added area/helm Impacts helm charts and user deployment experience dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 5, 2023
@sayboras sayboras added the area/servicemesh GH issues or PRs regarding servicemesh label Apr 5, 2023
@sayboras sayboras force-pushed the tam/spire-installation branch 7 times, most recently from a315cd7 to 5a1d388 Compare April 10, 2023 15:02
@sayboras sayboras marked this pull request as ready for review April 10, 2023 15:03
@sayboras sayboras requested review from a team as code owners April 10, 2023 15:03
@sayboras
Copy link
Member Author

/test

@sayboras
Copy link
Member Author

/test-1.16-4.19

@sayboras
Copy link
Member Author

/test-1.24-5.4

@sayboras
Copy link
Member Author

/test-1.25-4.19

@sayboras
Copy link
Member Author

/test-1.26-net-next

@sayboras
Copy link
Member Author

/test

@sayboras
Copy link
Member Author

/test-1.26-net-next

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 18, 2023
@meyskens
Copy link
Member

Adding a note that i got this error on my system with this PR on the spire agent, it could be me but better verify a later change did not break it.

time="2023-04-18T12:46:11Z" level=error msg="no identity issued" method=SubscribeToX509SVIDs service=spire.api.agent.delegatedidentity.v1.DelegatedIdentity subsystem_name=debug_api
time="2023-04-18T12:46:11Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Internal desc = workloadattestor(k8s): unable to perform request: Get \"https://127.0.0.1:10250/pods\": x509: certificate signed by unknown authority" pid=5656 subsystem_name=workload_attestor
time="2023-04-18T12:46:11Z" level=error msg="no identity issued" method=SubscribeToX509Bundles service=spire.api.agent.delegatedidentity.v1.DelegatedIdentity subsystem_name=debug_api

# -- SPIRE agent init containers
initContainers:
- name: init
image: cgr.dev/chainguard/wait-for-it@sha256:ecb58e3a2ffbdb732bb9049987e06eaf826d945410e167f31d6ffe28fab259f4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be configurable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FY I plan to revert this change in #24897

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what @meyskens says 👍, we probably either change the default here, or just remove it from being configurable.

# -- SPIRE agent configuration
agent:
# -- SPIRE agent image
image: ghcr.io/spiffe/spire-agent:1.5.1@sha256:40228af4d9a094f0fef2d7a303a3b6a689c4b4eba2fa9f7da5125b81d2d68ec8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to mirror these containers to a repository in the cilium organization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of users are kind of doing that i.e. copy public images to private registry (e.g. ecr, google, etc), so I think we don't need to do it ourselves.

@squeed
Copy link
Contributor

squeed commented Apr 18, 2023

How does this affect cluster bootstrap? Is a running SPIRE agent + server required for cilium agent start / restart / upgrade?

If a cluster is using an in-cluster-hosted CSI driver, the statefulset volume will not be available until after the network starts up, for example. Is that going to be a problem here?

This is highlighted as part of CFP.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This is to avoid any confusion on valid DNS name, which could cause any
assumption on external traffic/lookup as part of Auth process.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This is to simplify the identity registration as part of bootstrap.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras
Copy link
Member Author

How does this affect cluster bootstrap? Is a running SPIRE agent + server required for cilium agent start / restart / upgrade?

The current implementation makes sure that cilium will not crash or wait for SPIRE component to bootstrap. Cilium will perform connection retry with back-off to SPIRE.

If a cluster is using an in-cluster-hosted CSI driver, the statefulset volume will not be available until after the network starts up, for example. Is that going to be a problem here?

Same as above, Cilium will automatically retry with backoff, so once SPIRE components are up, things will be converged.

@maintainer-s-little-helper
Copy link

Commit fbc1e1e75f0131da33c167ce451da94633a4b4d0 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 18, 2023
@maintainer-s-little-helper
Copy link

Commit fbc1e1e75f0131da33c167ce451da94633a4b4d0 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@sayboras sayboras removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 18, 2023
@ldelossa
Copy link
Contributor

@sayboras should this be marked ready to merge? Some tests seem to be still running, and we are failing a smoke testing.

@sayboras
Copy link
Member Author

should this be marked ready to merge? Some tests seem to be still running, and we are failing a smoke testing.

Thanks, I didn't know that we need to remove ready-to-merge manually. I pushed one small change to incorporate review comments.

@sayboras sayboras removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 18, 2023
This commit is just to put the layout for current structure of spire
configuration. All generated docs and files are included.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This commit is to add all required manifests for both spire agent and
server components.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This commit is to register all cilium related identities in spire, so
that other components (e.g. spire-agent, cilium agent, cilium operator)
are having required permissions. Currently, a small bash script is used
for simplicity, once things are getting more complicated, we can move
these steps to a small golang utility, like what we have with cilium-mount
or cilium-sysctlfix.

Kind note that shareProcessNamespace is enabled, so that init container can
cooperate with main spire-server and perform required identity registration.

https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/

Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras
Copy link
Member Author

/test

@sayboras sayboras requested a review from squeed April 19, 2023 10:10
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 19, 2023
@sayboras sayboras merged commit c268e97 into cilium:main Apr 19, 2023
57 checks passed
@sayboras sayboras deleted the tam/spire-installation branch April 19, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Impacts helm charts and user deployment experience area/servicemesh GH issues or PRs regarding servicemesh ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mTLS enablement, SPIRE server and agent installation
7 participants