-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend connectivity-check for HTTP policy validation via CUE #12599
Extend connectivity-check for HTTP policy validation via CUE #12599
Conversation
test-focus K8sConformance.*connectivity |
Well, it's also because then it would be exactly the same as the egress-only reject policy. I should update the commit message to reflect this. |
The two tests that use this connectivity-check passed:
So from a CI perspective this should be good to go. Only question might be whether we want to add a dedicated CI test for this so we can more easily focus on just the proxy failures if a regression is introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach looks good, did not really review the yamls in detail. There are some additional dimensions that have been problematic in the past:
- Service access via hostport/not
- Hostport backend POD in the same v.s. other node as the one that was addressed.
Then there are bunch of datapath config dimensions (tunnel/no tunnel, endpoint routes, kube proxy/free, encryption/no, etc. ), but these are out of scope for the connectivity checks that apply to any given config.
examples/kubernetes/connectivity-check/pod-to-c-intra-node-proxy-ingress-reject.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit
examples/kubernetes/connectivity-check/connectivity-check-single-node.yaml
Show resolved
Hide resolved
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: echo-c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: conformance test is passed right now, but we might need to add steps to dump details and logs in github action.
This https://github.com/cilium/cilium/blob/master/.github/workflows/smoke-test.yaml#L153 is my naive attempt, and little verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayboras is there something we can do from the YAMLs to assist this?
My typical debugging involves basically kubectl describe [...]
which hopefully tells me which pod + which container is failing, then from there I can try to reason about why that specific check will have failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your offer, but seems unlikely we can do much.
I would love to get to bottom of this #12279, but it's not failing for my self hosted github action. Current logs we have from github actions didn't help much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new commit on top that hopes to improve the debugging, and also covers dumping logs for all of the services (by iterating the cue definitions rather than listing them one-by-one).
f9955b5
to
df2a3a9
Compare
deployment: "pod-to-c-multi-node-hostport-proxy-egress": _hostPortProxyResource | ||
egressCNP: "pod-to-c-multi-node-hostport-proxy-egress": _hostPortProxyPolicy | ||
deployment: "pod-to-c-intra-node-hostport-proxy-egress": _hostPortProxyResource | ||
egressCNP: "pod-to-c-intra-node-hostport-proxy-egress": _hostPortProxyPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrajahalme I've added hostport checks here too.
These lines will generate the YAML to deploy pods with egress L7 policy per the declaration at the top of this file (effectively allowing requests to /public
), with the destination "echo-c-host-headless:40001"
, a service that is declared in echo-servers.cue
. Both intra-node and multi-node checks are added here, with affinity derived from the name of the check here *-{intra,multi}-node-*
(via defaults.cue
).
test-focus K8sConformance.*connectivity |
There's one failure occurring in the hostport proxy case, which is consistent with what I'm observing in a test cluster. I believe that's actually a Cilium bug. Looking into adding quarantine labels to allow us to add such YAMLs but not automatically deploy them. |
df2a3a9
to
bf8c5c9
Compare
test-focus K8sConformance.*connectivity |
This works in manual testing + CI so probably good enough for review. I'll follow up on the smoke tests next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look at the CUE-based version, and everything looks great to me!
One suggestion would be to place some parts of the very informative commit messages in a README file in the connectivity-check
directory.
For example:
- all of the description of how to use the CLI tools from connectivity-check: Add cue CLI tools
- the make targets description from connectivity-check: Support generating YAMLs via cue
- the implementation details from descriptions of connectivity-check: Introduce proxy checks
In the smoke test, we are relying on kube-proxy for service connectivity so it doesn't make sense to enable BPF masquerading. In fact, this causes issues for connectivity from a node to a pod on a remote node via ClusterIP (see related issue). For the moment, disable BPF masquerading while we figure out the longer-term solution to that issue. Related: cilium#12699 Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
b16daef
to
d36adae
Compare
test-focus K8sConformance.*connectivity |
All code owner signoffs, and all CI checks that rely on connectivity checks have passed: https://github.com/cilium/cilium/pull/12599/checks?check_run_id=925168850 Marking |
Commit 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") introduced the usage of the CUE language to make the management of the connectivity checks easier and more consistent. But two deployments, "pod-to-b-intra-node-nodeport" and "pod-to-b-multi-node-nodeport", have accidentally been left out of the process. This commit reintroduces them. Only the cue file was hand-edited, the changes in the yaml files where generated by running "make". Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") Fixes: #12599 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Commit 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") introduced the usage of the CUE language to make the management of the connectivity checks easier and more consistent. But two deployments, "pod-to-b-intra-node-nodeport" and "pod-to-b-multi-node-nodeport", have accidentally been left out of the process. This commit reintroduces them. Only the cue file was hand-edited, the changes in the yaml files where generated by running "make". Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") Fixes: #12599 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 8a03d54 ] * connectivity-check: Add 'make clean' support Factor out the targets for all YAMLs so it can be reused by a new phony target, 'clean'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce cuelang framework CUE (https://cuelang.org/) is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". In context of the connectivity check YAMLs, CUE is useful to allow us to "evaporate" the boilerplate necessary to define Kubernetes YAMLs for Deployments, Services and CiliumNetworkPolicies and allow developers to specify various permutations for connectivity checks concisely. Why should we use it? * It's more concise: One template definition, multiple reuse. This is useful for introducing new connectivity checks as upcoming commits will demonstrate as the developer doesn't need to perform the tedious and error-prone process of copying and modifying the YAMLs to implement various permutations of a check. Furthermore this helps reviewers as they will not have to read through swathes of YAMLs but can instead focus on the diffs in the templating that are introduced and compare to existing data definitions. * Consolidate constant declaration. When a core change needs to be made to something like the readinessProbe for probes that expect a success or failure, we can update one definition in the main CUE file and all YAMLs will subsequently be generated with this change in mind. During the process of preparing these changes, I noticed inconsistencies between various existing YAMLs which appear to just be unintentional, where some YAMLs were improved with better timeoute behaviour or error rendering, but other YAMLs were left out. * The data is more structured. Upcoming commits will introduce simple CLI tools that allow matching on different classes of connectivity checks to generate the corresponding YAMLs. Previously we have depended upon file naming schemes and Makefile globbing magic to implement this which quickly reaches a limit in which checks should be selected for a specific check. What are the dangers? * It's relatively immature. At current version v0.2.2 it is subject to language changes. Upcoming commits will pin the CLI tool usage to a docker container derived from this version to ensure compatibility. * One more language in the tree to understand, review and interact with. Mitigating circumstances: This language comes out of the Golang community and as such brings some commonalities; furthermore it is beginning to be used in other Kubernetes projects, so there is some broader community alignment. * Its power allows you to hide as much or as little complexity as you want. It's tricky to strike a fine balance between explicitly declaring (and duplicating) relevant fields in the local file vs. hiding convenient templating language in common files. For examples, see defaults.cue which automatically derives connectivity check destinations based on object name declarations matching regexes of "pod-to-X", and applies affinity/anti-affinity via matches on "intra-host" or "multi-host". * All declarations are additive, ie there is no ordering based upon the layout in code; instead, data dependencies are determined using the declarations, and all data is arranged into a lattice to determine the evaluation ordering[0]. This can be counter-intuitive to reason about for the uninitiated. The general approach used in this commit was to `cue import` various existing YAML files to generate JSON equivalents, then iteratively combining & consolidating existing definitions using the language constructs provided by CUE. CUE also provides mechanisms to generate schemas and autogenerate the structures used here directly from API definitions (eg from k8s source or Cilium tree), however this area was not explored in this PR yet. While this doesn't take advantage of one major aspect of the language, upcoming commits will demonstrate the way that these changes were validated without the use of standardized schemas from the underlying Kubernetes resource definitions. (TL;DR: `kubectl diff ...` with kubectl validation on a live cluster). This was sufficient to extend the connectivity checks and does not preclude future explanation of the use of schemas for these definitions. This commit introduces usage of CUE in a relatively minimal way into the tree which was useful for my goals of extending the connectivity checks. If we find that it is useful and powerful, we may consider whether to extend its usage to other areas of the code (such as for test manifest generation). [0] https://cuelang.org/docs/concepts/logic/#the-value-lattice Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add cue CLI tools Add some basic tooling around connectivity-check YAML generation: $ cue cmd help List connectivity-check resources specified in this directory Usage: cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command> Available Commands: dump Generate connectivity-check YAMLs from the cuelang scripts ls List connectivity-check resources specified in this directory List available connectivity-check components: $ cue cmd ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Service services-check any echo-b Service services-check any echo-b-headless Service services-check any echo-b-host-headless Deployment network-check any echo-a Deployment services-check any echo-b Deployment services-check any echo-b-host Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Deployment policy-check any pod-to-a-allowed-cnp Deployment policy-check any pod-to-a-denied-cnp Deployment policy-check any pod-to-external-fqdn-allow-google-cnp Deployment services-check multi-node pod-to-b-multi-node-clusterip Deployment services-check multi-node pod-to-b-multi-node-headless Deployment services-check intra-node pod-to-b-intra-node-clusterip Deployment services-check intra-node pod-to-b-intra-node-headless Deployment services-check multi-node host-to-b-multi-node-clusterip Deployment services-check multi-node host-to-b-multi-node-headless CiliumNetworkPolicy policy-check any pod-to-a-allowed-cnp CiliumNetworkPolicy policy-check any pod-to-a-denied-cnp CiliumNetworkPolicy policy-check any pod-to-external-fqdn-allow-google-cnp These can be filtered by component, topology or name. For example: $ cue cmd -t component=network ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Deployment network-check any echo-a Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Finally, to gather the (filtered) YAMLs for the specified resources: $ cue cmd dump | head -n 20 metadata: name: echo-a labels: name: echo-a topology: any component: network-check spec: ports: - port: 80 selector: name: echo-a type: ClusterIP apiVersion: v1 kind: Service --- ... Or with an upcoming commit you can just use the Makefile, which now depends on the cuelang/cue:v0.2.2 Docker image: $ make connectivity-check.yaml Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Support generating YAMLs via cue Replace the existing YAML generation from individual YAML declarations for each service with generating YAMLs from the CUE definitions. Three new targets will assist in validating the migration from the existing definitions over to CUE: * make generate_all * For object declared in CUE, generate a file corresponding to that definition. For most of the existing YAMLs, this will overwrite the copy of the YAML in the tree. This can allow manual inspection of individual YAMLs, though the 'inspect' approach is broadly more useful for evaluating the overall diff. * make deploy * Deploy the hostport connectivity checks YAML into an existing cluster. * make inspect * Generate the YAML file for all connectivity checks, then use kubectl to diff these newly generated definitions against the running cluster (assuming it was deployed via make deploy). This commit is purely the makefile changes for easier review & inspection. Upcoming commits will use these targets to demonstrate that there is no meaningful change in the generated YAMLs for existing YAMLs in the tree. In particular, `make inspect` can be used in an iterative manner by initially deploying the current version of the YAMLs from the tree, then making changes to the CUE files and inspecting each time a change is made. When the diff in the cluster represents the changes that the developer intends to make, the developer can commit the changes to the CUE files and re-generate the tree versions of the YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Replace YAMLs with cue-generated YAMLs Prior commits introduced CUE definitions that are equivalent to these YAML files, so we can now: * Remove the individual declarations which were previously source-of-truth for the connectivity checks * Update the overall connectivity-check YAMLs to reflect the minor changes that the CUE definitions represent. To validate this, heavy use of `make inspect` was used. As described in the prior commit message where this was introduced, this allows diffing the latest CUE-based YAML definitions against a running copy of the YAMLs in a cluster. There are few meaningful changes in this commit which are hard to assess directly from the git diff, but are easier using `make inspect`: * All containers are converted to use readinessProbe and not livenessProbe. * All readiness probes now specify --connect-timeout of 5s. * Readiness probes access `/public` or `/private` per the underlying container HTTP server paths rather than just accessing `/`. * DNS allow policies are converted to consistently allow both TCP and UDP-based DNS. * Container names are derived from pod names. * The new YAMLs declare additional labels for all resourcess, such as 'component' and 'topology'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce proxy checks These new checks configure various L7 proxy paths to validate connectivity via L7 proxies, in the following dimensions: - Apply policy on egress; ingress; or both (proxy-to-proxy) - Intra-node / Multi-node - Allow / Deny Note that proxy-to-proxy always configures egress allow policy to ensure that the traffic goes via the proxy and in the drop case the requests are only rejected at the destination. This is because applying egress deny at the source would prevent proxy-to-proxy connectivity, meaning the test would be equivalent to the egress-only reject policy case. This way, we ensure that the path via the egress proxy to the destination is tested in the reject case. These are implemented partially through a new 'echo-c' pod which always has ingress policy applied to allow GET requests to '/public'. Depending on whether ingress policy is needed to check the particular permutation the new checks may connect to 'echo-a' or 'echo-c'. These are implemented by adding pods for each permutation of policy apply point and topology; then by adding allow / deny containers within that pod to test the allow/deny cases. The 'connectivity-check-proxy.yaml' includes all of the above. Finally, the omissions: This commit does not attempt to address variations in datapath configuration. This includes IPv4 vs. IPv6; tunnel/direct-routing; endpoint config; kube proxy/free; encryption. These are left up to the cluster operator configuring Cilium in specific modes and subsequently deploying these YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Minor naming fixups Make some of these resource names a bit more consistent. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add quarantine label to metadata This new label will be used during YAML generation to ensure that resources which we are still working on fixes for are kept in a separate category apart from the regular connectivity checks, to allow us to check them in & distribute them without causing CI to instantly fail. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add hostport + proxy checks Introduces checks for egress proxy policy when accessing a hostport on a remote node. These are added as part of the component=hostport-check to ensure they are not pulled in when running connectivity checks in environments without hostport support. Additionally, these new tests are quarantined for now as they are known to fail in some environments. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Expand readme for latest checks Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Re-add liveness probes It appears that some of these checks require liveness probes rather than readiness probes to pass on the github actions smoke-test, so ensure all containers are checked with both. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Improve state gathering upon failure Commit bb91571 ("smoke-test: Print pod/deploy state on failure") attempted to improve the information available during a failure from the smoke-tests, but only added it to the quick-install test and not the conformance test. Add the same output also to the conformance test so we can more easily debug failures there. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Disable bpf masquerading In the smoke test, we are relying on kube-proxy for service connectivity so it doesn't make sense to enable BPF masquerading. In fact, this causes issues for connectivity from a node to a pod on a remote node via ClusterIP (see related issue). For the moment, disable BPF masquerading while we figure out the longer-term solution to that issue. Related: #12699 Signed-off-by: Joe Stringer <joe@cilium.io> * docs: Update connectivity-check examples Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 5d3babd ] Commit 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") introduced the usage of the CUE language to make the management of the connectivity checks easier and more consistent. But two deployments, "pod-to-b-intra-node-nodeport" and "pod-to-b-multi-node-nodeport", have accidentally been left out of the process. This commit reintroduces them. Only the cue file was hand-edited, the changes in the yaml files where generated by running "make". Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") Fixes: #12599 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 8a03d54 ] * connectivity-check: Add 'make clean' support Factor out the targets for all YAMLs so it can be reused by a new phony target, 'clean'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce cuelang framework CUE (https://cuelang.org/) is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". In context of the connectivity check YAMLs, CUE is useful to allow us to "evaporate" the boilerplate necessary to define Kubernetes YAMLs for Deployments, Services and CiliumNetworkPolicies and allow developers to specify various permutations for connectivity checks concisely. Why should we use it? * It's more concise: One template definition, multiple reuse. This is useful for introducing new connectivity checks as upcoming commits will demonstrate as the developer doesn't need to perform the tedious and error-prone process of copying and modifying the YAMLs to implement various permutations of a check. Furthermore this helps reviewers as they will not have to read through swathes of YAMLs but can instead focus on the diffs in the templating that are introduced and compare to existing data definitions. * Consolidate constant declaration. When a core change needs to be made to something like the readinessProbe for probes that expect a success or failure, we can update one definition in the main CUE file and all YAMLs will subsequently be generated with this change in mind. During the process of preparing these changes, I noticed inconsistencies between various existing YAMLs which appear to just be unintentional, where some YAMLs were improved with better timeoute behaviour or error rendering, but other YAMLs were left out. * The data is more structured. Upcoming commits will introduce simple CLI tools that allow matching on different classes of connectivity checks to generate the corresponding YAMLs. Previously we have depended upon file naming schemes and Makefile globbing magic to implement this which quickly reaches a limit in which checks should be selected for a specific check. What are the dangers? * It's relatively immature. At current version v0.2.2 it is subject to language changes. Upcoming commits will pin the CLI tool usage to a docker container derived from this version to ensure compatibility. * One more language in the tree to understand, review and interact with. Mitigating circumstances: This language comes out of the Golang community and as such brings some commonalities; furthermore it is beginning to be used in other Kubernetes projects, so there is some broader community alignment. * Its power allows you to hide as much or as little complexity as you want. It's tricky to strike a fine balance between explicitly declaring (and duplicating) relevant fields in the local file vs. hiding convenient templating language in common files. For examples, see defaults.cue which automatically derives connectivity check destinations based on object name declarations matching regexes of "pod-to-X", and applies affinity/anti-affinity via matches on "intra-host" or "multi-host". * All declarations are additive, ie there is no ordering based upon the layout in code; instead, data dependencies are determined using the declarations, and all data is arranged into a lattice to determine the evaluation ordering[0]. This can be counter-intuitive to reason about for the uninitiated. The general approach used in this commit was to `cue import` various existing YAML files to generate JSON equivalents, then iteratively combining & consolidating existing definitions using the language constructs provided by CUE. CUE also provides mechanisms to generate schemas and autogenerate the structures used here directly from API definitions (eg from k8s source or Cilium tree), however this area was not explored in this PR yet. While this doesn't take advantage of one major aspect of the language, upcoming commits will demonstrate the way that these changes were validated without the use of standardized schemas from the underlying Kubernetes resource definitions. (TL;DR: `kubectl diff ...` with kubectl validation on a live cluster). This was sufficient to extend the connectivity checks and does not preclude future explanation of the use of schemas for these definitions. This commit introduces usage of CUE in a relatively minimal way into the tree which was useful for my goals of extending the connectivity checks. If we find that it is useful and powerful, we may consider whether to extend its usage to other areas of the code (such as for test manifest generation). [0] https://cuelang.org/docs/concepts/logic/#the-value-lattice Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add cue CLI tools Add some basic tooling around connectivity-check YAML generation: $ cue cmd help List connectivity-check resources specified in this directory Usage: cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command> Available Commands: dump Generate connectivity-check YAMLs from the cuelang scripts ls List connectivity-check resources specified in this directory List available connectivity-check components: $ cue cmd ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Service services-check any echo-b Service services-check any echo-b-headless Service services-check any echo-b-host-headless Deployment network-check any echo-a Deployment services-check any echo-b Deployment services-check any echo-b-host Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Deployment policy-check any pod-to-a-allowed-cnp Deployment policy-check any pod-to-a-denied-cnp Deployment policy-check any pod-to-external-fqdn-allow-google-cnp Deployment services-check multi-node pod-to-b-multi-node-clusterip Deployment services-check multi-node pod-to-b-multi-node-headless Deployment services-check intra-node pod-to-b-intra-node-clusterip Deployment services-check intra-node pod-to-b-intra-node-headless Deployment services-check multi-node host-to-b-multi-node-clusterip Deployment services-check multi-node host-to-b-multi-node-headless CiliumNetworkPolicy policy-check any pod-to-a-allowed-cnp CiliumNetworkPolicy policy-check any pod-to-a-denied-cnp CiliumNetworkPolicy policy-check any pod-to-external-fqdn-allow-google-cnp These can be filtered by component, topology or name. For example: $ cue cmd -t component=network ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Deployment network-check any echo-a Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Finally, to gather the (filtered) YAMLs for the specified resources: $ cue cmd dump | head -n 20 metadata: name: echo-a labels: name: echo-a topology: any component: network-check spec: ports: - port: 80 selector: name: echo-a type: ClusterIP apiVersion: v1 kind: Service --- ... Or with an upcoming commit you can just use the Makefile, which now depends on the cuelang/cue:v0.2.2 Docker image: $ make connectivity-check.yaml Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Support generating YAMLs via cue Replace the existing YAML generation from individual YAML declarations for each service with generating YAMLs from the CUE definitions. Three new targets will assist in validating the migration from the existing definitions over to CUE: * make generate_all * For object declared in CUE, generate a file corresponding to that definition. For most of the existing YAMLs, this will overwrite the copy of the YAML in the tree. This can allow manual inspection of individual YAMLs, though the 'inspect' approach is broadly more useful for evaluating the overall diff. * make deploy * Deploy the hostport connectivity checks YAML into an existing cluster. * make inspect * Generate the YAML file for all connectivity checks, then use kubectl to diff these newly generated definitions against the running cluster (assuming it was deployed via make deploy). This commit is purely the makefile changes for easier review & inspection. Upcoming commits will use these targets to demonstrate that there is no meaningful change in the generated YAMLs for existing YAMLs in the tree. In particular, `make inspect` can be used in an iterative manner by initially deploying the current version of the YAMLs from the tree, then making changes to the CUE files and inspecting each time a change is made. When the diff in the cluster represents the changes that the developer intends to make, the developer can commit the changes to the CUE files and re-generate the tree versions of the YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Replace YAMLs with cue-generated YAMLs Prior commits introduced CUE definitions that are equivalent to these YAML files, so we can now: * Remove the individual declarations which were previously source-of-truth for the connectivity checks * Update the overall connectivity-check YAMLs to reflect the minor changes that the CUE definitions represent. To validate this, heavy use of `make inspect` was used. As described in the prior commit message where this was introduced, this allows diffing the latest CUE-based YAML definitions against a running copy of the YAMLs in a cluster. There are few meaningful changes in this commit which are hard to assess directly from the git diff, but are easier using `make inspect`: * All containers are converted to use readinessProbe and not livenessProbe. * All readiness probes now specify --connect-timeout of 5s. * Readiness probes access `/public` or `/private` per the underlying container HTTP server paths rather than just accessing `/`. * DNS allow policies are converted to consistently allow both TCP and UDP-based DNS. * Container names are derived from pod names. * The new YAMLs declare additional labels for all resourcess, such as 'component' and 'topology'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce proxy checks These new checks configure various L7 proxy paths to validate connectivity via L7 proxies, in the following dimensions: - Apply policy on egress; ingress; or both (proxy-to-proxy) - Intra-node / Multi-node - Allow / Deny Note that proxy-to-proxy always configures egress allow policy to ensure that the traffic goes via the proxy and in the drop case the requests are only rejected at the destination. This is because applying egress deny at the source would prevent proxy-to-proxy connectivity, meaning the test would be equivalent to the egress-only reject policy case. This way, we ensure that the path via the egress proxy to the destination is tested in the reject case. These are implemented partially through a new 'echo-c' pod which always has ingress policy applied to allow GET requests to '/public'. Depending on whether ingress policy is needed to check the particular permutation the new checks may connect to 'echo-a' or 'echo-c'. These are implemented by adding pods for each permutation of policy apply point and topology; then by adding allow / deny containers within that pod to test the allow/deny cases. The 'connectivity-check-proxy.yaml' includes all of the above. Finally, the omissions: This commit does not attempt to address variations in datapath configuration. This includes IPv4 vs. IPv6; tunnel/direct-routing; endpoint config; kube proxy/free; encryption. These are left up to the cluster operator configuring Cilium in specific modes and subsequently deploying these YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Minor naming fixups Make some of these resource names a bit more consistent. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add quarantine label to metadata This new label will be used during YAML generation to ensure that resources which we are still working on fixes for are kept in a separate category apart from the regular connectivity checks, to allow us to check them in & distribute them without causing CI to instantly fail. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add hostport + proxy checks Introduces checks for egress proxy policy when accessing a hostport on a remote node. These are added as part of the component=hostport-check to ensure they are not pulled in when running connectivity checks in environments without hostport support. Additionally, these new tests are quarantined for now as they are known to fail in some environments. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Expand readme for latest checks Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Re-add liveness probes It appears that some of these checks require liveness probes rather than readiness probes to pass on the github actions smoke-test, so ensure all containers are checked with both. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Improve state gathering upon failure Commit bb91571 ("smoke-test: Print pod/deploy state on failure") attempted to improve the information available during a failure from the smoke-tests, but only added it to the quick-install test and not the conformance test. Add the same output also to the conformance test so we can more easily debug failures there. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Disable bpf masquerading In the smoke test, we are relying on kube-proxy for service connectivity so it doesn't make sense to enable BPF masquerading. In fact, this causes issues for connectivity from a node to a pod on a remote node via ClusterIP (see related issue). For the moment, disable BPF masquerading while we figure out the longer-term solution to that issue. Related: #12699 Signed-off-by: Joe Stringer <joe@cilium.io> * docs: Update connectivity-check examples Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 5d3babd ] Commit 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") introduced the usage of the CUE language to make the management of the connectivity checks easier and more consistent. But two deployments, "pod-to-b-intra-node-nodeport" and "pod-to-b-multi-node-nodeport", have accidentally been left out of the process. This commit reintroduces them. Only the cue file was hand-edited, the changes in the yaml files where generated by running "make". Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") Fixes: #12599 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 8a03d54 ] * connectivity-check: Add 'make clean' support Factor out the targets for all YAMLs so it can be reused by a new phony target, 'clean'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce cuelang framework CUE (https://cuelang.org/) is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". In context of the connectivity check YAMLs, CUE is useful to allow us to "evaporate" the boilerplate necessary to define Kubernetes YAMLs for Deployments, Services and CiliumNetworkPolicies and allow developers to specify various permutations for connectivity checks concisely. Why should we use it? * It's more concise: One template definition, multiple reuse. This is useful for introducing new connectivity checks as upcoming commits will demonstrate as the developer doesn't need to perform the tedious and error-prone process of copying and modifying the YAMLs to implement various permutations of a check. Furthermore this helps reviewers as they will not have to read through swathes of YAMLs but can instead focus on the diffs in the templating that are introduced and compare to existing data definitions. * Consolidate constant declaration. When a core change needs to be made to something like the readinessProbe for probes that expect a success or failure, we can update one definition in the main CUE file and all YAMLs will subsequently be generated with this change in mind. During the process of preparing these changes, I noticed inconsistencies between various existing YAMLs which appear to just be unintentional, where some YAMLs were improved with better timeoute behaviour or error rendering, but other YAMLs were left out. * The data is more structured. Upcoming commits will introduce simple CLI tools that allow matching on different classes of connectivity checks to generate the corresponding YAMLs. Previously we have depended upon file naming schemes and Makefile globbing magic to implement this which quickly reaches a limit in which checks should be selected for a specific check. What are the dangers? * It's relatively immature. At current version v0.2.2 it is subject to language changes. Upcoming commits will pin the CLI tool usage to a docker container derived from this version to ensure compatibility. * One more language in the tree to understand, review and interact with. Mitigating circumstances: This language comes out of the Golang community and as such brings some commonalities; furthermore it is beginning to be used in other Kubernetes projects, so there is some broader community alignment. * Its power allows you to hide as much or as little complexity as you want. It's tricky to strike a fine balance between explicitly declaring (and duplicating) relevant fields in the local file vs. hiding convenient templating language in common files. For examples, see defaults.cue which automatically derives connectivity check destinations based on object name declarations matching regexes of "pod-to-X", and applies affinity/anti-affinity via matches on "intra-host" or "multi-host". * All declarations are additive, ie there is no ordering based upon the layout in code; instead, data dependencies are determined using the declarations, and all data is arranged into a lattice to determine the evaluation ordering[0]. This can be counter-intuitive to reason about for the uninitiated. The general approach used in this commit was to `cue import` various existing YAML files to generate JSON equivalents, then iteratively combining & consolidating existing definitions using the language constructs provided by CUE. CUE also provides mechanisms to generate schemas and autogenerate the structures used here directly from API definitions (eg from k8s source or Cilium tree), however this area was not explored in this PR yet. While this doesn't take advantage of one major aspect of the language, upcoming commits will demonstrate the way that these changes were validated without the use of standardized schemas from the underlying Kubernetes resource definitions. (TL;DR: `kubectl diff ...` with kubectl validation on a live cluster). This was sufficient to extend the connectivity checks and does not preclude future explanation of the use of schemas for these definitions. This commit introduces usage of CUE in a relatively minimal way into the tree which was useful for my goals of extending the connectivity checks. If we find that it is useful and powerful, we may consider whether to extend its usage to other areas of the code (such as for test manifest generation). [0] https://cuelang.org/docs/concepts/logic/#the-value-lattice Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add cue CLI tools Add some basic tooling around connectivity-check YAML generation: $ cue cmd help List connectivity-check resources specified in this directory Usage: cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command> Available Commands: dump Generate connectivity-check YAMLs from the cuelang scripts ls List connectivity-check resources specified in this directory List available connectivity-check components: $ cue cmd ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Service services-check any echo-b Service services-check any echo-b-headless Service services-check any echo-b-host-headless Deployment network-check any echo-a Deployment services-check any echo-b Deployment services-check any echo-b-host Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Deployment policy-check any pod-to-a-allowed-cnp Deployment policy-check any pod-to-a-denied-cnp Deployment policy-check any pod-to-external-fqdn-allow-google-cnp Deployment services-check multi-node pod-to-b-multi-node-clusterip Deployment services-check multi-node pod-to-b-multi-node-headless Deployment services-check intra-node pod-to-b-intra-node-clusterip Deployment services-check intra-node pod-to-b-intra-node-headless Deployment services-check multi-node host-to-b-multi-node-clusterip Deployment services-check multi-node host-to-b-multi-node-headless CiliumNetworkPolicy policy-check any pod-to-a-allowed-cnp CiliumNetworkPolicy policy-check any pod-to-a-denied-cnp CiliumNetworkPolicy policy-check any pod-to-external-fqdn-allow-google-cnp These can be filtered by component, topology or name. For example: $ cue cmd -t component=network ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Deployment network-check any echo-a Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Finally, to gather the (filtered) YAMLs for the specified resources: $ cue cmd dump | head -n 20 metadata: name: echo-a labels: name: echo-a topology: any component: network-check spec: ports: - port: 80 selector: name: echo-a type: ClusterIP apiVersion: v1 kind: Service --- ... Or with an upcoming commit you can just use the Makefile, which now depends on the cuelang/cue:v0.2.2 Docker image: $ make connectivity-check.yaml Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Support generating YAMLs via cue Replace the existing YAML generation from individual YAML declarations for each service with generating YAMLs from the CUE definitions. Three new targets will assist in validating the migration from the existing definitions over to CUE: * make generate_all * For object declared in CUE, generate a file corresponding to that definition. For most of the existing YAMLs, this will overwrite the copy of the YAML in the tree. This can allow manual inspection of individual YAMLs, though the 'inspect' approach is broadly more useful for evaluating the overall diff. * make deploy * Deploy the hostport connectivity checks YAML into an existing cluster. * make inspect * Generate the YAML file for all connectivity checks, then use kubectl to diff these newly generated definitions against the running cluster (assuming it was deployed via make deploy). This commit is purely the makefile changes for easier review & inspection. Upcoming commits will use these targets to demonstrate that there is no meaningful change in the generated YAMLs for existing YAMLs in the tree. In particular, `make inspect` can be used in an iterative manner by initially deploying the current version of the YAMLs from the tree, then making changes to the CUE files and inspecting each time a change is made. When the diff in the cluster represents the changes that the developer intends to make, the developer can commit the changes to the CUE files and re-generate the tree versions of the YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Replace YAMLs with cue-generated YAMLs Prior commits introduced CUE definitions that are equivalent to these YAML files, so we can now: * Remove the individual declarations which were previously source-of-truth for the connectivity checks * Update the overall connectivity-check YAMLs to reflect the minor changes that the CUE definitions represent. To validate this, heavy use of `make inspect` was used. As described in the prior commit message where this was introduced, this allows diffing the latest CUE-based YAML definitions against a running copy of the YAMLs in a cluster. There are few meaningful changes in this commit which are hard to assess directly from the git diff, but are easier using `make inspect`: * All containers are converted to use readinessProbe and not livenessProbe. * All readiness probes now specify --connect-timeout of 5s. * Readiness probes access `/public` or `/private` per the underlying container HTTP server paths rather than just accessing `/`. * DNS allow policies are converted to consistently allow both TCP and UDP-based DNS. * Container names are derived from pod names. * The new YAMLs declare additional labels for all resourcess, such as 'component' and 'topology'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce proxy checks These new checks configure various L7 proxy paths to validate connectivity via L7 proxies, in the following dimensions: - Apply policy on egress; ingress; or both (proxy-to-proxy) - Intra-node / Multi-node - Allow / Deny Note that proxy-to-proxy always configures egress allow policy to ensure that the traffic goes via the proxy and in the drop case the requests are only rejected at the destination. This is because applying egress deny at the source would prevent proxy-to-proxy connectivity, meaning the test would be equivalent to the egress-only reject policy case. This way, we ensure that the path via the egress proxy to the destination is tested in the reject case. These are implemented partially through a new 'echo-c' pod which always has ingress policy applied to allow GET requests to '/public'. Depending on whether ingress policy is needed to check the particular permutation the new checks may connect to 'echo-a' or 'echo-c'. These are implemented by adding pods for each permutation of policy apply point and topology; then by adding allow / deny containers within that pod to test the allow/deny cases. The 'connectivity-check-proxy.yaml' includes all of the above. Finally, the omissions: This commit does not attempt to address variations in datapath configuration. This includes IPv4 vs. IPv6; tunnel/direct-routing; endpoint config; kube proxy/free; encryption. These are left up to the cluster operator configuring Cilium in specific modes and subsequently deploying these YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Minor naming fixups Make some of these resource names a bit more consistent. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add quarantine label to metadata This new label will be used during YAML generation to ensure that resources which we are still working on fixes for are kept in a separate category apart from the regular connectivity checks, to allow us to check them in & distribute them without causing CI to instantly fail. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add hostport + proxy checks Introduces checks for egress proxy policy when accessing a hostport on a remote node. These are added as part of the component=hostport-check to ensure they are not pulled in when running connectivity checks in environments without hostport support. Additionally, these new tests are quarantined for now as they are known to fail in some environments. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Expand readme for latest checks Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Re-add liveness probes It appears that some of these checks require liveness probes rather than readiness probes to pass on the github actions smoke-test, so ensure all containers are checked with both. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Improve state gathering upon failure Commit bb91571 ("smoke-test: Print pod/deploy state on failure") attempted to improve the information available during a failure from the smoke-tests, but only added it to the quick-install test and not the conformance test. Add the same output also to the conformance test so we can more easily debug failures there. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Disable bpf masquerading In the smoke test, we are relying on kube-proxy for service connectivity so it doesn't make sense to enable BPF masquerading. In fact, this causes issues for connectivity from a node to a pod on a remote node via ClusterIP (see related issue). For the moment, disable BPF masquerading while we figure out the longer-term solution to that issue. Related: #12699 Signed-off-by: Joe Stringer <joe@cilium.io> * docs: Update connectivity-check examples Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 5d3babd ] Commit 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") introduced the usage of the CUE language to make the management of the connectivity checks easier and more consistent. But two deployments, "pod-to-b-intra-node-nodeport" and "pod-to-b-multi-node-nodeport", have accidentally been left out of the process. This commit reintroduces them. Only the cue file was hand-edited, the changes in the yaml files where generated by running "make". Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)") Fixes: #12599 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@joestringer I've changed the release not from |
This PR introduces a CUE-based framework for defining connectivity checks to consolidate the definitions of the various connectivity checks and support simpler, more concise declarations of new connectivity checks.
Review commit-by-commit.
Why CUE?
CUE is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". See the second commit for a more detailed discussion, talking points below as a summary:
Based on the above and the experience of converting these checks over to CUE, I believe this is a useful and powerful tool for declaring YAML variations, perfect for a use case like connectivity-checks. We may in future even be able to explore using CUE for defining manifests in the CI.
How do we know these changes are correct?
This PR was validated against the existing "hostport" connectivity check YAML from the tree using the
make inspect
target introduced in this PR. I've attached the diff for existing checks below for reference; the commit message for the relevant changes also highlights the specifics of what has generally changed:connectivity-check-hostport.yaml.diff.txt
Tested on a 2-node GKE cluster per the Cilium getting-started guide.
Extending the connectivity checks for HTTP policy validation
See the commit log for details:
Tooling
One interesting aspect around CUE is the ability to write tailored CLIs against the data to filter the data you want and output in the format you're after. See the commit message for more details of the new tooling:
Q: Do I need to install CUE?
No, the Makefile introduces usage of the cue docker image to run the YAML generation commands. For more advanced usage like the tooling described above however, you would need to either install CUE or run it via the docker container in a similar way to how the makefile does.
Outstanding questions
Related: #8491