Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extend connectivity-check for HTTP policy validation via CUE (#12599)
* connectivity-check: Add 'make clean' support Factor out the targets for all YAMLs so it can be reused by a new phony target, 'clean'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce cuelang framework CUE (https://cuelang.org/) is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". In context of the connectivity check YAMLs, CUE is useful to allow us to "evaporate" the boilerplate necessary to define Kubernetes YAMLs for Deployments, Services and CiliumNetworkPolicies and allow developers to specify various permutations for connectivity checks concisely. Why should we use it? * It's more concise: One template definition, multiple reuse. This is useful for introducing new connectivity checks as upcoming commits will demonstrate as the developer doesn't need to perform the tedious and error-prone process of copying and modifying the YAMLs to implement various permutations of a check. Furthermore this helps reviewers as they will not have to read through swathes of YAMLs but can instead focus on the diffs in the templating that are introduced and compare to existing data definitions. * Consolidate constant declaration. When a core change needs to be made to something like the readinessProbe for probes that expect a success or failure, we can update one definition in the main CUE file and all YAMLs will subsequently be generated with this change in mind. During the process of preparing these changes, I noticed inconsistencies between various existing YAMLs which appear to just be unintentional, where some YAMLs were improved with better timeoute behaviour or error rendering, but other YAMLs were left out. * The data is more structured. Upcoming commits will introduce simple CLI tools that allow matching on different classes of connectivity checks to generate the corresponding YAMLs. Previously we have depended upon file naming schemes and Makefile globbing magic to implement this which quickly reaches a limit in which checks should be selected for a specific check. What are the dangers? * It's relatively immature. At current version v0.2.2 it is subject to language changes. Upcoming commits will pin the CLI tool usage to a docker container derived from this version to ensure compatibility. * One more language in the tree to understand, review and interact with. Mitigating circumstances: This language comes out of the Golang community and as such brings some commonalities; furthermore it is beginning to be used in other Kubernetes projects, so there is some broader community alignment. * Its power allows you to hide as much or as little complexity as you want. It's tricky to strike a fine balance between explicitly declaring (and duplicating) relevant fields in the local file vs. hiding convenient templating language in common files. For examples, see defaults.cue which automatically derives connectivity check destinations based on object name declarations matching regexes of "pod-to-X", and applies affinity/anti-affinity via matches on "intra-host" or "multi-host". * All declarations are additive, ie there is no ordering based upon the layout in code; instead, data dependencies are determined using the declarations, and all data is arranged into a lattice to determine the evaluation ordering[0]. This can be counter-intuitive to reason about for the uninitiated. The general approach used in this commit was to `cue import` various existing YAML files to generate JSON equivalents, then iteratively combining & consolidating existing definitions using the language constructs provided by CUE. CUE also provides mechanisms to generate schemas and autogenerate the structures used here directly from API definitions (eg from k8s source or Cilium tree), however this area was not explored in this PR yet. While this doesn't take advantage of one major aspect of the language, upcoming commits will demonstrate the way that these changes were validated without the use of standardized schemas from the underlying Kubernetes resource definitions. (TL;DR: `kubectl diff ...` with kubectl validation on a live cluster). This was sufficient to extend the connectivity checks and does not preclude future explanation of the use of schemas for these definitions. This commit introduces usage of CUE in a relatively minimal way into the tree which was useful for my goals of extending the connectivity checks. If we find that it is useful and powerful, we may consider whether to extend its usage to other areas of the code (such as for test manifest generation). [0] https://cuelang.org/docs/concepts/logic/#the-value-lattice Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add cue CLI tools Add some basic tooling around connectivity-check YAML generation: $ cue cmd help List connectivity-check resources specified in this directory Usage: cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command> Available Commands: dump Generate connectivity-check YAMLs from the cuelang scripts ls List connectivity-check resources specified in this directory List available connectivity-check components: $ cue cmd ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Service services-check any echo-b Service services-check any echo-b-headless Service services-check any echo-b-host-headless Deployment network-check any echo-a Deployment services-check any echo-b Deployment services-check any echo-b-host Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Deployment policy-check any pod-to-a-allowed-cnp Deployment policy-check any pod-to-a-denied-cnp Deployment policy-check any pod-to-external-fqdn-allow-google-cnp Deployment services-check multi-node pod-to-b-multi-node-clusterip Deployment services-check multi-node pod-to-b-multi-node-headless Deployment services-check intra-node pod-to-b-intra-node-clusterip Deployment services-check intra-node pod-to-b-intra-node-headless Deployment services-check multi-node host-to-b-multi-node-clusterip Deployment services-check multi-node host-to-b-multi-node-headless CiliumNetworkPolicy policy-check any pod-to-a-allowed-cnp CiliumNetworkPolicy policy-check any pod-to-a-denied-cnp CiliumNetworkPolicy policy-check any pod-to-external-fqdn-allow-google-cnp These can be filtered by component, topology or name. For example: $ cue cmd -t component=network ls KIND COMPONENT TOPOLOGY NAME Service network-check any echo-a Deployment network-check any echo-a Deployment network-check any pod-to-a Deployment network-check any pod-to-external-1111 Finally, to gather the (filtered) YAMLs for the specified resources: $ cue cmd dump | head -n 20 metadata: name: echo-a labels: name: echo-a topology: any component: network-check spec: ports: - port: 80 selector: name: echo-a type: ClusterIP apiVersion: v1 kind: Service --- ... Or with an upcoming commit you can just use the Makefile, which now depends on the cuelang/cue:v0.2.2 Docker image: $ make connectivity-check.yaml Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Support generating YAMLs via cue Replace the existing YAML generation from individual YAML declarations for each service with generating YAMLs from the CUE definitions. Three new targets will assist in validating the migration from the existing definitions over to CUE: * make generate_all * For object declared in CUE, generate a file corresponding to that definition. For most of the existing YAMLs, this will overwrite the copy of the YAML in the tree. This can allow manual inspection of individual YAMLs, though the 'inspect' approach is broadly more useful for evaluating the overall diff. * make deploy * Deploy the hostport connectivity checks YAML into an existing cluster. * make inspect * Generate the YAML file for all connectivity checks, then use kubectl to diff these newly generated definitions against the running cluster (assuming it was deployed via make deploy). This commit is purely the makefile changes for easier review & inspection. Upcoming commits will use these targets to demonstrate that there is no meaningful change in the generated YAMLs for existing YAMLs in the tree. In particular, `make inspect` can be used in an iterative manner by initially deploying the current version of the YAMLs from the tree, then making changes to the CUE files and inspecting each time a change is made. When the diff in the cluster represents the changes that the developer intends to make, the developer can commit the changes to the CUE files and re-generate the tree versions of the YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Replace YAMLs with cue-generated YAMLs Prior commits introduced CUE definitions that are equivalent to these YAML files, so we can now: * Remove the individual declarations which were previously source-of-truth for the connectivity checks * Update the overall connectivity-check YAMLs to reflect the minor changes that the CUE definitions represent. To validate this, heavy use of `make inspect` was used. As described in the prior commit message where this was introduced, this allows diffing the latest CUE-based YAML definitions against a running copy of the YAMLs in a cluster. There are few meaningful changes in this commit which are hard to assess directly from the git diff, but are easier using `make inspect`: * All containers are converted to use readinessProbe and not livenessProbe. * All readiness probes now specify --connect-timeout of 5s. * Readiness probes access `/public` or `/private` per the underlying container HTTP server paths rather than just accessing `/`. * DNS allow policies are converted to consistently allow both TCP and UDP-based DNS. * Container names are derived from pod names. * The new YAMLs declare additional labels for all resourcess, such as 'component' and 'topology'. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Introduce proxy checks These new checks configure various L7 proxy paths to validate connectivity via L7 proxies, in the following dimensions: - Apply policy on egress; ingress; or both (proxy-to-proxy) - Intra-node / Multi-node - Allow / Deny Note that proxy-to-proxy always configures egress allow policy to ensure that the traffic goes via the proxy and in the drop case the requests are only rejected at the destination. This is because applying egress deny at the source would prevent proxy-to-proxy connectivity, meaning the test would be equivalent to the egress-only reject policy case. This way, we ensure that the path via the egress proxy to the destination is tested in the reject case. These are implemented partially through a new 'echo-c' pod which always has ingress policy applied to allow GET requests to '/public'. Depending on whether ingress policy is needed to check the particular permutation the new checks may connect to 'echo-a' or 'echo-c'. These are implemented by adding pods for each permutation of policy apply point and topology; then by adding allow / deny containers within that pod to test the allow/deny cases. The 'connectivity-check-proxy.yaml' includes all of the above. Finally, the omissions: This commit does not attempt to address variations in datapath configuration. This includes IPv4 vs. IPv6; tunnel/direct-routing; endpoint config; kube proxy/free; encryption. These are left up to the cluster operator configuring Cilium in specific modes and subsequently deploying these YAMLs. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Minor naming fixups Make some of these resource names a bit more consistent. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add quarantine label to metadata This new label will be used during YAML generation to ensure that resources which we are still working on fixes for are kept in a separate category apart from the regular connectivity checks, to allow us to check them in & distribute them without causing CI to instantly fail. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Add hostport + proxy checks Introduces checks for egress proxy policy when accessing a hostport on a remote node. These are added as part of the component=hostport-check to ensure they are not pulled in when running connectivity checks in environments without hostport support. Additionally, these new tests are quarantined for now as they are known to fail in some environments. Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Expand readme for latest checks Signed-off-by: Joe Stringer <joe@cilium.io> * connectivity-check: Re-add liveness probes It appears that some of these checks require liveness probes rather than readiness probes to pass on the github actions smoke-test, so ensure all containers are checked with both. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Improve state gathering upon failure Commit bb91571 ("smoke-test: Print pod/deploy state on failure") attempted to improve the information available during a failure from the smoke-tests, but only added it to the quick-install test and not the conformance test. Add the same output also to the conformance test so we can more easily debug failures there. Signed-off-by: Joe Stringer <joe@cilium.io> * smoke-test: Disable bpf masquerading In the smoke test, we are relying on kube-proxy for service connectivity so it doesn't make sense to enable BPF masquerading. In fact, this causes issues for connectivity from a node to a pod on a remote node via ClusterIP (see related issue). For the moment, disable BPF masquerading while we figure out the longer-term solution to that issue. Related: #12699 Signed-off-by: Joe Stringer <joe@cilium.io> * docs: Update connectivity-check examples Signed-off-by: Joe Stringer <joe@cilium.io>
- Loading branch information