Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend connectivity-check for HTTP policy validation via CUE #12599

Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Jul 21, 2020

This PR introduces a CUE-based framework for defining connectivity checks to consolidate the definitions of the various connectivity checks and support simpler, more concise declarations of new connectivity checks.

Review commit-by-commit.

Why CUE?

CUE is a data constraint language built defined as a superset of JSON which aims to "simplify tasks involving defining and using data". See the second commit for a more detailed discussion, talking points below as a summary:

In context of the connectivity check YAMLs, CUE is useful to allow us to
"evaporate" the boilerplate necessary to define Kubernetes YAMLs for
Deployments, Services and CiliumNetworkPolicies and allow developers to
specify various permutations for connectivity checks concisely.

Why should we use it?
* It's more concise. [...]
* Consolidate constant declaration. [...]
* More structured data declarations for k8s resources of checks [...]. Can build powerful tooilng around this.

What are the dangers (+mitigations)?
* It's relatively imature, v0.2.2. This PR pins CUE usage to this version. [...]
* Another language in the tree. [...] This however aligns with Go and k8s community exploration of the language.
* It's easy to hide all complexity, templating away all parameters in a way that makes the declarations implicit. It'll be up to us to figure out how explicitly to declare our check definitions. [...]
* Additive declarations. Evaluation of language statements are not ordered by line ordering. Can be counter-intuitive. [...]

Based on the above and the experience of converting these checks over to CUE, I believe this is a useful and powerful tool for declaring YAML variations, perfect for a use case like connectivity-checks. We may in future even be able to explore using CUE for defining manifests in the CI.

How do we know these changes are correct?

This PR was validated against the existing "hostport" connectivity check YAML from the tree using the make inspect target introduced in this PR. I've attached the diff for existing checks below for reference; the commit message for the relevant changes also highlights the specifics of what has generally changed:

 * All containers are converted to use readinessProbe and not
   livenessProbe.
 * All readiness probes now specify --connect-timeout of 5s.
 * Readiness probes access `/public` or `/private` per the underlying
   container HTTP server paths rather than just accessing `/`.
 * DNS allow policies are converted to consistently allow both TCP and
   UDP-based DNS.
 * Container names are derived from pod names.
 * The new YAMLs declare additional labels for all resourcess, such as
   'component' and 'topology'.

connectivity-check-hostport.yaml.diff.txt

Tested on a 2-node GKE cluster per the Cilium getting-started guide.

Extending the connectivity checks for HTTP policy validation

See the commit log for details:

These new checks configure various L7 proxy paths to validate
connectivity via L7 proxies, in the following dimensions:

- Apply policy on egress; ingress; or both (proxy-to-proxy)
- Intra-node / Multi-node
- Allow / Deny

Note that proxy-to-proxy always configures egress allow policy to ensure
that the traffic goes via the proxy and in the drop case the requests
are only rejected at the destination. This is because applying egress
deny at the source would prevent proxy-to-proxy connectivity, meaning
the test would be equivalent to the egress-only reject policy case. This
way, we ensure that the path via the egress proxy to the destination is
tested in the reject case.

These are implemented partially through a new 'echo-c' pod which always
has ingress policy applied to allow GET requests to '/public'. Depending
on whether ingress policy is needed to check the particular permutation
the new checks may connect to 'echo-a' or 'echo-c'.

These are implemented by adding pods for each permutation of policy
apply point and topology; then by adding allow / deny containers within
that pod to test the allow/deny cases.

The 'connectivity-check-proxy.yaml' includes all of the above.

Finally, the omissions: This commit does not attempt to address
variations in datapath configuration. This includes IPv4 vs. IPv6;
tunnel/direct-routing; endpoint config; kube proxy/free; encryption.
These are left up to the cluster operator configuring Cilium in specific
modes and subsequently deploying these YAMLs.

Tooling

One interesting aspect around CUE is the ability to write tailored CLIs against the data to filter the data you want and output in the format you're after. See the commit message for more details of the new tooling:

  $ cue cmd help
  List connectivity-check resources specified in this directory

  Usage:
    cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command>

  Available Commands:
    dump   Generate connectivity-check YAMLs from the cuelang scripts
    ls     List connectivity-check resources specified in this directory

List available connectivity-check components:

  $ cue cmd ls
  KIND                  COMPONENT        TOPOLOGY     NAME
  Service               network-check    any          echo-a
  Service               services-check   any          echo-b
  Service               services-check   any          echo-b-headless
  Service               services-check   any          echo-b-host-headless
  Deployment            network-check    any          echo-a
  Deployment            services-check   any          echo-b
  Deployment            services-check   any          echo-b-host
  Deployment            network-check    any          pod-to-a
  Deployment            network-check    any          pod-to-external-1111
  Deployment            policy-check     any          pod-to-a-allowed-cnp
  Deployment            policy-check     any          pod-to-a-denied-cnp
  Deployment            policy-check     any          pod-to-external-fqdn-allow-google-cnp
  Deployment            services-check   multi-node   pod-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   pod-to-b-multi-node-headless
  Deployment            services-check   intra-node   pod-to-b-intra-node-clusterip
  Deployment            services-check   intra-node   pod-to-b-intra-node-headless
  Deployment            services-check   multi-node   host-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   host-to-b-multi-node-headless
  CiliumNetworkPolicy   policy-check     any          pod-to-a-allowed-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-a-denied-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-external-fqdn-allow-google-cnp

These can be filtered by component, topology or name. For example:

  $ cue cmd -t component=network ls
  KIND         COMPONENT       TOPOLOGY    NAME
  Service      network-check   any         echo-a
  Deployment   network-check   any         echo-a
  Deployment   network-check   any         pod-to-a
  Deployment   network-check   any         pod-to-external-1111

Finally, to gather the (filtered) YAMLs for the specified resources:

  $ cue cmd dump | head -n 20
  metadata:
    name: echo-a
    labels:
      name: echo-a
      topology: any
      component: network-check
  spec:
    ports:
      - port: 80
    selector:
      name: echo-a
    type: ClusterIP
  apiVersion: v1
  kind: Service
  ---
  ...

Q: Do I need to install CUE?

No, the Makefile introduces usage of the cue docker image to run the YAML generation commands. For more advanced usage like the tooling described above however, you would need to either install CUE or run it via the docker container in a similar way to how the makefile does.

Outstanding questions

  • For now I have introduced the new checks (proxy-based and hostport+proxy which are quarantined) into separate YAMLs to allow them to be tested independently. They are not part of the simple YAML. This allows the regular connectivity-check YAML to remain small and contained to allow it to reliably pass in smaller environments. We should consider how to expand the testing of these YAMLs in CI.

Related: #8491

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jul 21, 2020
@joestringer joestringer requested review from kkourt, tgraf and a team July 21, 2020 00:40
@joestringer
Copy link
Member Author

test-focus K8sConformance.*connectivity

@joestringer
Copy link
Member Author

This is because applying egress deny at the source would prevent proxy-to-proxy connectivity.

Well, it's also because then it would be exactly the same as the egress-only reject policy. I should update the commit message to reflect this.

@coveralls
Copy link

coveralls commented Jul 21, 2020

Coverage Status

Coverage decreased (-0.01%) to 36.673% when pulling d36adae on joestringer:submit/proxy-connectivity-check into 3de518a on cilium:master.

@joestringer
Copy link
Member Author

The two tests that use this connectivity-check passed:

So from a CI perspective this should be good to go. Only question might be whether we want to add a dedicated CI test for this so we can more easily focus on just the proxy failures if a regression is introduced.

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good, did not really review the yamls in detail. There are some additional dimensions that have been problematic in the past:

  • Service access via hostport/not
  • Hostport backend POD in the same v.s. other node as the one that was addressed.

Then there are bunch of datapath config dimensions (tunnel/no tunnel, endpoint routes, kube proxy/free, encryption/no, etc. ), but these are out of scope for the connectivity checks that apply to any given config.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit

@joestringer joestringer marked this pull request as draft July 22, 2020 02:36
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: conformance test is passed right now, but we might need to add steps to dump details and logs in github action.
This https://github.com/cilium/cilium/blob/master/.github/workflows/smoke-test.yaml#L153 is my naive attempt, and little verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayboras is there something we can do from the YAMLs to assist this?

My typical debugging involves basically kubectl describe [...] which hopefully tells me which pod + which container is failing, then from there I can try to reason about why that specific check will have failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your offer, but seems unlikely we can do much.

I would love to get to bottom of this #12279, but it's not failing for my self hosted github action. Current logs we have from github actions didn't help much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new commit on top that hopes to improve the debugging, and also covers dumping logs for all of the services (by iterating the cue definitions rather than listing them one-by-one).

@joestringer joestringer force-pushed the submit/proxy-connectivity-check branch 2 times, most recently from f9955b5 to df2a3a9 Compare July 24, 2020 18:20
@joestringer joestringer changed the title Extend connectivity-check for HTTP policy validation Extend connectivity-check for HTTP policy validation via CUE Jul 24, 2020
Comment on lines 77 to 80
deployment: "pod-to-c-multi-node-hostport-proxy-egress": _hostPortProxyResource
egressCNP: "pod-to-c-multi-node-hostport-proxy-egress": _hostPortProxyPolicy
deployment: "pod-to-c-intra-node-hostport-proxy-egress": _hostPortProxyResource
egressCNP: "pod-to-c-intra-node-hostport-proxy-egress": _hostPortProxyPolicy
Copy link
Member Author

@joestringer joestringer Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrajahalme I've added hostport checks here too.

These lines will generate the YAML to deploy pods with egress L7 policy per the declaration at the top of this file (effectively allowing requests to /public), with the destination "echo-c-host-headless:40001", a service that is declared in echo-servers.cue. Both intra-node and multi-node checks are added here, with affinity derived from the name of the check here *-{intra,multi}-node-* (via defaults.cue).

@joestringer
Copy link
Member Author

test-focus K8sConformance.*connectivity

@joestringer
Copy link
Member Author

There's one failure occurring in the hostport proxy case, which is consistent with what I'm observing in a test cluster. I believe that's actually a Cilium bug. Looking into adding quarantine labels to allow us to add such YAMLs but not automatically deploy them.

@joestringer joestringer force-pushed the submit/proxy-connectivity-check branch from df2a3a9 to bf8c5c9 Compare July 25, 2020 01:58
@joestringer
Copy link
Member Author

test-focus K8sConformance.*connectivity

@joestringer
Copy link
Member Author

This works in manual testing + CI so probably good enough for review. I'll follow up on the smoke tests next week.

@joestringer joestringer marked this pull request as ready for review July 25, 2020 03:15
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at the CUE-based version, and everything looks great to me!

One suggestion would be to place some parts of the very informative commit messages in a README file in the connectivity-check directory.

For example:

  • all of the description of how to use the CLI tools from connectivity-check: Add cue CLI tools
  • the make targets description from connectivity-check: Support generating YAMLs via cue
  • the implementation details from descriptions of connectivity-check: Introduce proxy checks

In the smoke test, we are relying on kube-proxy for service connectivity
so it doesn't make sense to enable BPF masquerading. In fact, this
causes issues for connectivity from a node to a pod on a remote node via
ClusterIP (see related issue).

For the moment, disable BPF masquerading while we figure out the
longer-term solution to that issue.

Related: cilium#12699

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the submit/proxy-connectivity-check branch from b16daef to d36adae Compare July 29, 2020 21:39
@joestringer
Copy link
Member Author

test-focus K8sConformance.*connectivity

@joestringer
Copy link
Member Author

All code owner signoffs, and all CI checks that rely on connectivity checks have passed:

https://github.com/cilium/cilium/pull/12599/checks?check_run_id=925168850
https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated-Focus/355/

Marking ready-to-merge.

@joestringer joestringer added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 29, 2020
@nathanjsweet nathanjsweet merged commit 8a03d54 into cilium:master Jul 29, 2020
@joestringer joestringer deleted the submit/proxy-connectivity-check branch July 29, 2020 22:28
qmonnet added a commit that referenced this pull request Aug 5, 2020
Commit 8a03d54 ("Extend connectivity-check for HTTP policy
validation via CUE (#12599)") introduced the usage of the CUE language
to make the management of the connectivity checks easier and more
consistent. But two deployments, "pod-to-b-intra-node-nodeport" and
"pod-to-b-multi-node-nodeport", have accidentally been left out of the
process.

This commit reintroduces them. Only the cue file was hand-edited, the
changes in the yaml files where generated by running "make".

Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)")
Fixes: #12599
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
aanm pushed a commit that referenced this pull request Aug 5, 2020
Commit 8a03d54 ("Extend connectivity-check for HTTP policy
validation via CUE (#12599)") introduced the usage of the CUE language
to make the management of the connectivity checks easier and more
consistent. But two deployments, "pod-to-b-intra-node-nodeport" and
"pod-to-b-multi-node-nodeport", have accidentally been left out of the
process.

This commit reintroduces them. Only the cue file was hand-edited, the
changes in the yaml files where generated by running "make".

Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)")
Fixes: #12599
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
kaworu pushed a commit that referenced this pull request Aug 28, 2020
[ upstream commit 8a03d54 ]

* connectivity-check: Add 'make clean' support

Factor out the targets for all YAMLs so it can be reused by a new phony
target, 'clean'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce cuelang framework

CUE (https://cuelang.org/) is a data constraint language built defined
as a superset of JSON which aims to "simplify tasks involving defining
and using data".

In context of the connectivity check YAMLs, CUE is useful to allow us to
"evaporate" the boilerplate necessary to define Kubernetes YAMLs for
Deployments, Services and CiliumNetworkPolicies and allow developers to
specify various permutations for connectivity checks concisely.

Why should we use it?

* It's more concise: One template definition, multiple reuse. This is
  useful for introducing new connectivity checks as upcoming commits
  will demonstrate as the developer doesn't need to perform the tedious
  and error-prone process of copying and modifying the YAMLs to
  implement various permutations of a check. Furthermore this helps
  reviewers as they will not have to read through swathes of YAMLs but
  can instead focus on the diffs in the templating that are introduced
  and compare to existing data definitions.

* Consolidate constant declaration. When a core change needs to be made
  to something like the readinessProbe for probes that expect a
  success or failure, we can update one definition in the main CUE file
  and all YAMLs will subsequently be generated with this change in mind.
  During the process of preparing these changes, I noticed
  inconsistencies between various existing YAMLs which appear to just be
  unintentional, where some YAMLs were improved with better timeoute
  behaviour or error rendering, but other YAMLs were left out.

* The data is more structured. Upcoming commits will introduce simple
  CLI tools that allow matching on different classes of connectivity
  checks to generate the corresponding YAMLs. Previously we have
  depended upon file naming schemes and Makefile globbing magic to
  implement this which quickly reaches a limit in which checks should be
  selected for a specific check.

What are the dangers?

* It's relatively immature. At current version v0.2.2 it is subject to
  language changes. Upcoming commits will pin the CLI tool usage to a
  docker container derived from this version to ensure compatibility.
* One more language in the tree to understand, review and interact with.
  Mitigating circumstances: This language comes out of the Golang
  community and as such brings some commonalities; furthermore it is
  beginning to be used in other Kubernetes projects, so there is some
  broader community alignment.
* Its power allows you to hide as much or as little complexity as you
  want. It's tricky to strike a fine balance between explicitly declaring
  (and duplicating) relevant fields in the local file vs. hiding
  convenient templating language in common files. For examples, see
  defaults.cue which automatically derives connectivity check destinations
  based on object name declarations matching regexes of "pod-to-X", and
  applies affinity/anti-affinity via matches on "intra-host" or
  "multi-host".
* All declarations are additive, ie there is no ordering based upon the
  layout in code; instead, data dependencies are determined using the
  declarations, and all data is arranged into a lattice to determine the
  evaluation ordering[0]. This can be counter-intuitive to reason about
  for the uninitiated.

The general approach used in this commit was to `cue import` various
existing YAML files to generate JSON equivalents, then iteratively
combining & consolidating existing definitions using the language
constructs provided by CUE. CUE also provides mechanisms to generate
schemas and autogenerate the structures used here directly from API
definitions (eg from k8s source or Cilium tree), however this area was
not explored in this PR yet. While this doesn't take advantage of one
major aspect of the language, upcoming commits will demonstrate the way
that these changes were validated without the use of standardized
schemas from the underlying Kubernetes resource definitions. (TL;DR:
`kubectl diff ...` with kubectl validation on a live cluster). This was
sufficient to extend the connectivity checks and does not preclude
future explanation of the use of schemas for these definitions.

This commit introduces usage of CUE in a relatively minimal way into the
tree which was useful for my goals of extending the connectivity checks.
If we find that it is useful and powerful, we may consider whether to
extend its usage to other areas of the code (such as for test manifest
generation).

[0] https://cuelang.org/docs/concepts/logic/#the-value-lattice

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add cue CLI tools

Add some basic tooling around connectivity-check YAML generation:

  $ cue cmd help
  List connectivity-check resources specified in this directory

  Usage:
    cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command>

  Available Commands:
    dump   Generate connectivity-check YAMLs from the cuelang scripts
    ls     List connectivity-check resources specified in this directory

List available connectivity-check components:

  $ cue cmd ls
  KIND                  COMPONENT        TOPOLOGY     NAME
  Service               network-check    any          echo-a
  Service               services-check   any          echo-b
  Service               services-check   any          echo-b-headless
  Service               services-check   any          echo-b-host-headless
  Deployment            network-check    any          echo-a
  Deployment            services-check   any          echo-b
  Deployment            services-check   any          echo-b-host
  Deployment            network-check    any          pod-to-a
  Deployment            network-check    any          pod-to-external-1111
  Deployment            policy-check     any          pod-to-a-allowed-cnp
  Deployment            policy-check     any          pod-to-a-denied-cnp
  Deployment            policy-check     any          pod-to-external-fqdn-allow-google-cnp
  Deployment            services-check   multi-node   pod-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   pod-to-b-multi-node-headless
  Deployment            services-check   intra-node   pod-to-b-intra-node-clusterip
  Deployment            services-check   intra-node   pod-to-b-intra-node-headless
  Deployment            services-check   multi-node   host-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   host-to-b-multi-node-headless
  CiliumNetworkPolicy   policy-check     any          pod-to-a-allowed-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-a-denied-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-external-fqdn-allow-google-cnp

These can be filtered by component, topology or name. For example:

  $ cue cmd -t component=network ls
  KIND         COMPONENT       TOPOLOGY    NAME
  Service      network-check   any         echo-a
  Deployment   network-check   any         echo-a
  Deployment   network-check   any         pod-to-a
  Deployment   network-check   any         pod-to-external-1111

Finally, to gather the (filtered) YAMLs for the specified resources:

  $ cue cmd dump | head -n 20
  metadata:
    name: echo-a
    labels:
      name: echo-a
      topology: any
      component: network-check
  spec:
    ports:
      - port: 80
    selector:
      name: echo-a
    type: ClusterIP
  apiVersion: v1
  kind: Service
  ---
  ...

Or with an upcoming commit you can just use the Makefile, which now depends
on the cuelang/cue:v0.2.2 Docker image:

  $ make connectivity-check.yaml

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Support generating YAMLs via cue

Replace the existing YAML generation from individual YAML declarations
for each service with generating YAMLs from the CUE definitions.

Three new targets will assist in validating the migration from the
existing definitions over to CUE:

 * make generate_all
   * For object declared in CUE, generate a file corresponding to that
     definition. For most of the existing YAMLs, this will overwrite the
     copy of the YAML in the tree. This can allow manual inspection of
     individual YAMLs, though the 'inspect' approach is broadly more
     useful for evaluating the overall diff.
 * make deploy
   * Deploy the hostport connectivity checks YAML into an existing cluster.
 * make inspect
   * Generate the YAML file for all connectivity checks, then use
     kubectl to diff these newly generated definitions against the
     running cluster (assuming it was deployed via make deploy).

This commit is purely the makefile changes for easier review &
inspection. Upcoming commits will use these targets to demonstrate that
there is no meaningful change in the generated YAMLs for existing YAMLs
in the tree. In particular, `make inspect` can be used in an iterative
manner by initially deploying the current version of the YAMLs from the
tree, then making changes to the CUE files and inspecting each time a
change is made. When the diff in the cluster represents the changes that
the developer intends to make, the developer can commit the changes to
the CUE files and re-generate the tree versions of the YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Replace YAMLs with cue-generated YAMLs

Prior commits introduced CUE definitions that are equivalent to these
YAML files, so we can now:

 * Remove the individual declarations which were previously
   source-of-truth for the connectivity checks
 * Update the overall connectivity-check YAMLs to reflect the minor
   changes that the CUE definitions represent.

To validate this, heavy use of `make inspect` was used. As described in
the prior commit message where this was introduced, this allows diffing
the latest CUE-based YAML definitions against a running copy of the
YAMLs in a cluster.

There are few meaningful changes in this commit which are hard to assess
directly from the git diff, but are easier using `make inspect`:

 * All containers are converted to use readinessProbe and not
   livenessProbe.
 * All readiness probes now specify --connect-timeout of 5s.
 * Readiness probes access `/public` or `/private` per the underlying
   container HTTP server paths rather than just accessing `/`.
 * DNS allow policies are converted to consistently allow both TCP and
   UDP-based DNS.
 * Container names are derived from pod names.
 * The new YAMLs declare additional labels for all resourcess, such as
   'component' and 'topology'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce proxy checks

These new checks configure various L7 proxy paths to validate
connectivity via L7 proxies, in the following dimensions:

- Apply policy on egress; ingress; or both (proxy-to-proxy)
- Intra-node / Multi-node
- Allow / Deny

Note that proxy-to-proxy always configures egress allow policy to ensure
that the traffic goes via the proxy and in the drop case the requests
are only rejected at the destination. This is because applying egress
deny at the source would prevent proxy-to-proxy connectivity, meaning
the test would be equivalent to the egress-only reject policy case. This
way, we ensure that the path via the egress proxy to the destination is
tested in the reject case.

These are implemented partially through a new 'echo-c' pod which always
has ingress policy applied to allow GET requests to '/public'. Depending
on whether ingress policy is needed to check the particular permutation
the new checks may connect to 'echo-a' or 'echo-c'.

These are implemented by adding pods for each permutation of policy
apply point and topology; then by adding allow / deny containers within
that pod to test the allow/deny cases.

The 'connectivity-check-proxy.yaml' includes all of the above.

Finally, the omissions: This commit does not attempt to address
variations in datapath configuration. This includes IPv4 vs. IPv6;
tunnel/direct-routing; endpoint config; kube proxy/free; encryption.
These are left up to the cluster operator configuring Cilium in specific
modes and subsequently deploying these YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Minor naming fixups

Make some of these resource names a bit more consistent.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add quarantine label to metadata

This new label will be used during YAML generation to ensure that
resources which we are still working on fixes for are kept in a separate
category apart from the regular connectivity checks, to allow us to
check them in & distribute them without causing CI to instantly fail.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add hostport + proxy checks

Introduces checks for egress proxy policy when accessing a hostport on a
remote node. These are added as part of the component=hostport-check to
ensure they are not pulled in when running connectivity checks in
environments without hostport support.

Additionally, these new tests are quarantined for now as they are known
to fail in some environments.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Expand readme for latest checks

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Re-add liveness probes

It appears that some of these checks require liveness probes rather than
readiness probes to pass on the github actions smoke-test, so ensure all
containers are checked with both.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Improve state gathering upon failure

Commit bb91571 ("smoke-test: Print pod/deploy state on failure")
attempted to improve the information available during a failure from the
smoke-tests, but only added it to the quick-install test and not the
conformance test. Add the same output also to the conformance test so we
can more easily debug failures there.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Disable bpf masquerading

In the smoke test, we are relying on kube-proxy for service connectivity
so it doesn't make sense to enable BPF masquerading. In fact, this
causes issues for connectivity from a node to a pod on a remote node via
ClusterIP (see related issue).

For the moment, disable BPF masquerading while we figure out the
longer-term solution to that issue.

Related: #12699

Signed-off-by: Joe Stringer <joe@cilium.io>

* docs: Update connectivity-check examples

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
kaworu pushed a commit that referenced this pull request Aug 28, 2020
[ upstream commit 5d3babd ]

Commit 8a03d54 ("Extend connectivity-check for HTTP policy
validation via CUE (#12599)") introduced the usage of the CUE language
to make the management of the connectivity checks easier and more
consistent. But two deployments, "pod-to-b-intra-node-nodeport" and
"pod-to-b-multi-node-nodeport", have accidentally been left out of the
process.

This commit reintroduces them. Only the cue file was hand-edited, the
changes in the yaml files where generated by running "make".

Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)")
Fixes: #12599
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport pending to v1.8 in 1.8.3 Sep 1, 2020
christarazi pushed a commit that referenced this pull request Sep 3, 2020
[ upstream commit 8a03d54 ]

* connectivity-check: Add 'make clean' support

Factor out the targets for all YAMLs so it can be reused by a new phony
target, 'clean'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce cuelang framework

CUE (https://cuelang.org/) is a data constraint language built defined
as a superset of JSON which aims to "simplify tasks involving defining
and using data".

In context of the connectivity check YAMLs, CUE is useful to allow us to
"evaporate" the boilerplate necessary to define Kubernetes YAMLs for
Deployments, Services and CiliumNetworkPolicies and allow developers to
specify various permutations for connectivity checks concisely.

Why should we use it?

* It's more concise: One template definition, multiple reuse. This is
  useful for introducing new connectivity checks as upcoming commits
  will demonstrate as the developer doesn't need to perform the tedious
  and error-prone process of copying and modifying the YAMLs to
  implement various permutations of a check. Furthermore this helps
  reviewers as they will not have to read through swathes of YAMLs but
  can instead focus on the diffs in the templating that are introduced
  and compare to existing data definitions.

* Consolidate constant declaration. When a core change needs to be made
  to something like the readinessProbe for probes that expect a
  success or failure, we can update one definition in the main CUE file
  and all YAMLs will subsequently be generated with this change in mind.
  During the process of preparing these changes, I noticed
  inconsistencies between various existing YAMLs which appear to just be
  unintentional, where some YAMLs were improved with better timeoute
  behaviour or error rendering, but other YAMLs were left out.

* The data is more structured. Upcoming commits will introduce simple
  CLI tools that allow matching on different classes of connectivity
  checks to generate the corresponding YAMLs. Previously we have
  depended upon file naming schemes and Makefile globbing magic to
  implement this which quickly reaches a limit in which checks should be
  selected for a specific check.

What are the dangers?

* It's relatively immature. At current version v0.2.2 it is subject to
  language changes. Upcoming commits will pin the CLI tool usage to a
  docker container derived from this version to ensure compatibility.
* One more language in the tree to understand, review and interact with.
  Mitigating circumstances: This language comes out of the Golang
  community and as such brings some commonalities; furthermore it is
  beginning to be used in other Kubernetes projects, so there is some
  broader community alignment.
* Its power allows you to hide as much or as little complexity as you
  want. It's tricky to strike a fine balance between explicitly declaring
  (and duplicating) relevant fields in the local file vs. hiding
  convenient templating language in common files. For examples, see
  defaults.cue which automatically derives connectivity check destinations
  based on object name declarations matching regexes of "pod-to-X", and
  applies affinity/anti-affinity via matches on "intra-host" or
  "multi-host".
* All declarations are additive, ie there is no ordering based upon the
  layout in code; instead, data dependencies are determined using the
  declarations, and all data is arranged into a lattice to determine the
  evaluation ordering[0]. This can be counter-intuitive to reason about
  for the uninitiated.

The general approach used in this commit was to `cue import` various
existing YAML files to generate JSON equivalents, then iteratively
combining & consolidating existing definitions using the language
constructs provided by CUE. CUE also provides mechanisms to generate
schemas and autogenerate the structures used here directly from API
definitions (eg from k8s source or Cilium tree), however this area was
not explored in this PR yet. While this doesn't take advantage of one
major aspect of the language, upcoming commits will demonstrate the way
that these changes were validated without the use of standardized
schemas from the underlying Kubernetes resource definitions. (TL;DR:
`kubectl diff ...` with kubectl validation on a live cluster). This was
sufficient to extend the connectivity checks and does not preclude
future explanation of the use of schemas for these definitions.

This commit introduces usage of CUE in a relatively minimal way into the
tree which was useful for my goals of extending the connectivity checks.
If we find that it is useful and powerful, we may consider whether to
extend its usage to other areas of the code (such as for test manifest
generation).

[0] https://cuelang.org/docs/concepts/logic/#the-value-lattice

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add cue CLI tools

Add some basic tooling around connectivity-check YAML generation:

  $ cue cmd help
  List connectivity-check resources specified in this directory

  Usage:
    cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command>

  Available Commands:
    dump   Generate connectivity-check YAMLs from the cuelang scripts
    ls     List connectivity-check resources specified in this directory

List available connectivity-check components:

  $ cue cmd ls
  KIND                  COMPONENT        TOPOLOGY     NAME
  Service               network-check    any          echo-a
  Service               services-check   any          echo-b
  Service               services-check   any          echo-b-headless
  Service               services-check   any          echo-b-host-headless
  Deployment            network-check    any          echo-a
  Deployment            services-check   any          echo-b
  Deployment            services-check   any          echo-b-host
  Deployment            network-check    any          pod-to-a
  Deployment            network-check    any          pod-to-external-1111
  Deployment            policy-check     any          pod-to-a-allowed-cnp
  Deployment            policy-check     any          pod-to-a-denied-cnp
  Deployment            policy-check     any          pod-to-external-fqdn-allow-google-cnp
  Deployment            services-check   multi-node   pod-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   pod-to-b-multi-node-headless
  Deployment            services-check   intra-node   pod-to-b-intra-node-clusterip
  Deployment            services-check   intra-node   pod-to-b-intra-node-headless
  Deployment            services-check   multi-node   host-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   host-to-b-multi-node-headless
  CiliumNetworkPolicy   policy-check     any          pod-to-a-allowed-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-a-denied-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-external-fqdn-allow-google-cnp

These can be filtered by component, topology or name. For example:

  $ cue cmd -t component=network ls
  KIND         COMPONENT       TOPOLOGY    NAME
  Service      network-check   any         echo-a
  Deployment   network-check   any         echo-a
  Deployment   network-check   any         pod-to-a
  Deployment   network-check   any         pod-to-external-1111

Finally, to gather the (filtered) YAMLs for the specified resources:

  $ cue cmd dump | head -n 20
  metadata:
    name: echo-a
    labels:
      name: echo-a
      topology: any
      component: network-check
  spec:
    ports:
      - port: 80
    selector:
      name: echo-a
    type: ClusterIP
  apiVersion: v1
  kind: Service
  ---
  ...

Or with an upcoming commit you can just use the Makefile, which now depends
on the cuelang/cue:v0.2.2 Docker image:

  $ make connectivity-check.yaml

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Support generating YAMLs via cue

Replace the existing YAML generation from individual YAML declarations
for each service with generating YAMLs from the CUE definitions.

Three new targets will assist in validating the migration from the
existing definitions over to CUE:

 * make generate_all
   * For object declared in CUE, generate a file corresponding to that
     definition. For most of the existing YAMLs, this will overwrite the
     copy of the YAML in the tree. This can allow manual inspection of
     individual YAMLs, though the 'inspect' approach is broadly more
     useful for evaluating the overall diff.
 * make deploy
   * Deploy the hostport connectivity checks YAML into an existing cluster.
 * make inspect
   * Generate the YAML file for all connectivity checks, then use
     kubectl to diff these newly generated definitions against the
     running cluster (assuming it was deployed via make deploy).

This commit is purely the makefile changes for easier review &
inspection. Upcoming commits will use these targets to demonstrate that
there is no meaningful change in the generated YAMLs for existing YAMLs
in the tree. In particular, `make inspect` can be used in an iterative
manner by initially deploying the current version of the YAMLs from the
tree, then making changes to the CUE files and inspecting each time a
change is made. When the diff in the cluster represents the changes that
the developer intends to make, the developer can commit the changes to
the CUE files and re-generate the tree versions of the YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Replace YAMLs with cue-generated YAMLs

Prior commits introduced CUE definitions that are equivalent to these
YAML files, so we can now:

 * Remove the individual declarations which were previously
   source-of-truth for the connectivity checks
 * Update the overall connectivity-check YAMLs to reflect the minor
   changes that the CUE definitions represent.

To validate this, heavy use of `make inspect` was used. As described in
the prior commit message where this was introduced, this allows diffing
the latest CUE-based YAML definitions against a running copy of the
YAMLs in a cluster.

There are few meaningful changes in this commit which are hard to assess
directly from the git diff, but are easier using `make inspect`:

 * All containers are converted to use readinessProbe and not
   livenessProbe.
 * All readiness probes now specify --connect-timeout of 5s.
 * Readiness probes access `/public` or `/private` per the underlying
   container HTTP server paths rather than just accessing `/`.
 * DNS allow policies are converted to consistently allow both TCP and
   UDP-based DNS.
 * Container names are derived from pod names.
 * The new YAMLs declare additional labels for all resourcess, such as
   'component' and 'topology'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce proxy checks

These new checks configure various L7 proxy paths to validate
connectivity via L7 proxies, in the following dimensions:

- Apply policy on egress; ingress; or both (proxy-to-proxy)
- Intra-node / Multi-node
- Allow / Deny

Note that proxy-to-proxy always configures egress allow policy to ensure
that the traffic goes via the proxy and in the drop case the requests
are only rejected at the destination. This is because applying egress
deny at the source would prevent proxy-to-proxy connectivity, meaning
the test would be equivalent to the egress-only reject policy case. This
way, we ensure that the path via the egress proxy to the destination is
tested in the reject case.

These are implemented partially through a new 'echo-c' pod which always
has ingress policy applied to allow GET requests to '/public'. Depending
on whether ingress policy is needed to check the particular permutation
the new checks may connect to 'echo-a' or 'echo-c'.

These are implemented by adding pods for each permutation of policy
apply point and topology; then by adding allow / deny containers within
that pod to test the allow/deny cases.

The 'connectivity-check-proxy.yaml' includes all of the above.

Finally, the omissions: This commit does not attempt to address
variations in datapath configuration. This includes IPv4 vs. IPv6;
tunnel/direct-routing; endpoint config; kube proxy/free; encryption.
These are left up to the cluster operator configuring Cilium in specific
modes and subsequently deploying these YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Minor naming fixups

Make some of these resource names a bit more consistent.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add quarantine label to metadata

This new label will be used during YAML generation to ensure that
resources which we are still working on fixes for are kept in a separate
category apart from the regular connectivity checks, to allow us to
check them in & distribute them without causing CI to instantly fail.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add hostport + proxy checks

Introduces checks for egress proxy policy when accessing a hostport on a
remote node. These are added as part of the component=hostport-check to
ensure they are not pulled in when running connectivity checks in
environments without hostport support.

Additionally, these new tests are quarantined for now as they are known
to fail in some environments.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Expand readme for latest checks

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Re-add liveness probes

It appears that some of these checks require liveness probes rather than
readiness probes to pass on the github actions smoke-test, so ensure all
containers are checked with both.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Improve state gathering upon failure

Commit bb91571 ("smoke-test: Print pod/deploy state on failure")
attempted to improve the information available during a failure from the
smoke-tests, but only added it to the quick-install test and not the
conformance test. Add the same output also to the conformance test so we
can more easily debug failures there.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Disable bpf masquerading

In the smoke test, we are relying on kube-proxy for service connectivity
so it doesn't make sense to enable BPF masquerading. In fact, this
causes issues for connectivity from a node to a pod on a remote node via
ClusterIP (see related issue).

For the moment, disable BPF masquerading while we figure out the
longer-term solution to that issue.

Related: #12699

Signed-off-by: Joe Stringer <joe@cilium.io>

* docs: Update connectivity-check examples

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
christarazi pushed a commit that referenced this pull request Sep 3, 2020
[ upstream commit 5d3babd ]

Commit 8a03d54 ("Extend connectivity-check for HTTP policy
validation via CUE (#12599)") introduced the usage of the CUE language
to make the management of the connectivity checks easier and more
consistent. But two deployments, "pod-to-b-intra-node-nodeport" and
"pod-to-b-multi-node-nodeport", have accidentally been left out of the
process.

This commit reintroduces them. Only the cue file was hand-edited, the
changes in the yaml files where generated by running "make".

Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)")
Fixes: #12599
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
joestringer added a commit that referenced this pull request Sep 4, 2020
[ upstream commit 8a03d54 ]

* connectivity-check: Add 'make clean' support

Factor out the targets for all YAMLs so it can be reused by a new phony
target, 'clean'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce cuelang framework

CUE (https://cuelang.org/) is a data constraint language built defined
as a superset of JSON which aims to "simplify tasks involving defining
and using data".

In context of the connectivity check YAMLs, CUE is useful to allow us to
"evaporate" the boilerplate necessary to define Kubernetes YAMLs for
Deployments, Services and CiliumNetworkPolicies and allow developers to
specify various permutations for connectivity checks concisely.

Why should we use it?

* It's more concise: One template definition, multiple reuse. This is
  useful for introducing new connectivity checks as upcoming commits
  will demonstrate as the developer doesn't need to perform the tedious
  and error-prone process of copying and modifying the YAMLs to
  implement various permutations of a check. Furthermore this helps
  reviewers as they will not have to read through swathes of YAMLs but
  can instead focus on the diffs in the templating that are introduced
  and compare to existing data definitions.

* Consolidate constant declaration. When a core change needs to be made
  to something like the readinessProbe for probes that expect a
  success or failure, we can update one definition in the main CUE file
  and all YAMLs will subsequently be generated with this change in mind.
  During the process of preparing these changes, I noticed
  inconsistencies between various existing YAMLs which appear to just be
  unintentional, where some YAMLs were improved with better timeoute
  behaviour or error rendering, but other YAMLs were left out.

* The data is more structured. Upcoming commits will introduce simple
  CLI tools that allow matching on different classes of connectivity
  checks to generate the corresponding YAMLs. Previously we have
  depended upon file naming schemes and Makefile globbing magic to
  implement this which quickly reaches a limit in which checks should be
  selected for a specific check.

What are the dangers?

* It's relatively immature. At current version v0.2.2 it is subject to
  language changes. Upcoming commits will pin the CLI tool usage to a
  docker container derived from this version to ensure compatibility.
* One more language in the tree to understand, review and interact with.
  Mitigating circumstances: This language comes out of the Golang
  community and as such brings some commonalities; furthermore it is
  beginning to be used in other Kubernetes projects, so there is some
  broader community alignment.
* Its power allows you to hide as much or as little complexity as you
  want. It's tricky to strike a fine balance between explicitly declaring
  (and duplicating) relevant fields in the local file vs. hiding
  convenient templating language in common files. For examples, see
  defaults.cue which automatically derives connectivity check destinations
  based on object name declarations matching regexes of "pod-to-X", and
  applies affinity/anti-affinity via matches on "intra-host" or
  "multi-host".
* All declarations are additive, ie there is no ordering based upon the
  layout in code; instead, data dependencies are determined using the
  declarations, and all data is arranged into a lattice to determine the
  evaluation ordering[0]. This can be counter-intuitive to reason about
  for the uninitiated.

The general approach used in this commit was to `cue import` various
existing YAML files to generate JSON equivalents, then iteratively
combining & consolidating existing definitions using the language
constructs provided by CUE. CUE also provides mechanisms to generate
schemas and autogenerate the structures used here directly from API
definitions (eg from k8s source or Cilium tree), however this area was
not explored in this PR yet. While this doesn't take advantage of one
major aspect of the language, upcoming commits will demonstrate the way
that these changes were validated without the use of standardized
schemas from the underlying Kubernetes resource definitions. (TL;DR:
`kubectl diff ...` with kubectl validation on a live cluster). This was
sufficient to extend the connectivity checks and does not preclude
future explanation of the use of schemas for these definitions.

This commit introduces usage of CUE in a relatively minimal way into the
tree which was useful for my goals of extending the connectivity checks.
If we find that it is useful and powerful, we may consider whether to
extend its usage to other areas of the code (such as for test manifest
generation).

[0] https://cuelang.org/docs/concepts/logic/#the-value-lattice

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add cue CLI tools

Add some basic tooling around connectivity-check YAML generation:

  $ cue cmd help
  List connectivity-check resources specified in this directory

  Usage:
    cue [-t component=<component>] [-t name=<name>] [-t topology=<topology>] <command>

  Available Commands:
    dump   Generate connectivity-check YAMLs from the cuelang scripts
    ls     List connectivity-check resources specified in this directory

List available connectivity-check components:

  $ cue cmd ls
  KIND                  COMPONENT        TOPOLOGY     NAME
  Service               network-check    any          echo-a
  Service               services-check   any          echo-b
  Service               services-check   any          echo-b-headless
  Service               services-check   any          echo-b-host-headless
  Deployment            network-check    any          echo-a
  Deployment            services-check   any          echo-b
  Deployment            services-check   any          echo-b-host
  Deployment            network-check    any          pod-to-a
  Deployment            network-check    any          pod-to-external-1111
  Deployment            policy-check     any          pod-to-a-allowed-cnp
  Deployment            policy-check     any          pod-to-a-denied-cnp
  Deployment            policy-check     any          pod-to-external-fqdn-allow-google-cnp
  Deployment            services-check   multi-node   pod-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   pod-to-b-multi-node-headless
  Deployment            services-check   intra-node   pod-to-b-intra-node-clusterip
  Deployment            services-check   intra-node   pod-to-b-intra-node-headless
  Deployment            services-check   multi-node   host-to-b-multi-node-clusterip
  Deployment            services-check   multi-node   host-to-b-multi-node-headless
  CiliumNetworkPolicy   policy-check     any          pod-to-a-allowed-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-a-denied-cnp
  CiliumNetworkPolicy   policy-check     any          pod-to-external-fqdn-allow-google-cnp

These can be filtered by component, topology or name. For example:

  $ cue cmd -t component=network ls
  KIND         COMPONENT       TOPOLOGY    NAME
  Service      network-check   any         echo-a
  Deployment   network-check   any         echo-a
  Deployment   network-check   any         pod-to-a
  Deployment   network-check   any         pod-to-external-1111

Finally, to gather the (filtered) YAMLs for the specified resources:

  $ cue cmd dump | head -n 20
  metadata:
    name: echo-a
    labels:
      name: echo-a
      topology: any
      component: network-check
  spec:
    ports:
      - port: 80
    selector:
      name: echo-a
    type: ClusterIP
  apiVersion: v1
  kind: Service
  ---
  ...

Or with an upcoming commit you can just use the Makefile, which now depends
on the cuelang/cue:v0.2.2 Docker image:

  $ make connectivity-check.yaml

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Support generating YAMLs via cue

Replace the existing YAML generation from individual YAML declarations
for each service with generating YAMLs from the CUE definitions.

Three new targets will assist in validating the migration from the
existing definitions over to CUE:

 * make generate_all
   * For object declared in CUE, generate a file corresponding to that
     definition. For most of the existing YAMLs, this will overwrite the
     copy of the YAML in the tree. This can allow manual inspection of
     individual YAMLs, though the 'inspect' approach is broadly more
     useful for evaluating the overall diff.
 * make deploy
   * Deploy the hostport connectivity checks YAML into an existing cluster.
 * make inspect
   * Generate the YAML file for all connectivity checks, then use
     kubectl to diff these newly generated definitions against the
     running cluster (assuming it was deployed via make deploy).

This commit is purely the makefile changes for easier review &
inspection. Upcoming commits will use these targets to demonstrate that
there is no meaningful change in the generated YAMLs for existing YAMLs
in the tree. In particular, `make inspect` can be used in an iterative
manner by initially deploying the current version of the YAMLs from the
tree, then making changes to the CUE files and inspecting each time a
change is made. When the diff in the cluster represents the changes that
the developer intends to make, the developer can commit the changes to
the CUE files and re-generate the tree versions of the YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Replace YAMLs with cue-generated YAMLs

Prior commits introduced CUE definitions that are equivalent to these
YAML files, so we can now:

 * Remove the individual declarations which were previously
   source-of-truth for the connectivity checks
 * Update the overall connectivity-check YAMLs to reflect the minor
   changes that the CUE definitions represent.

To validate this, heavy use of `make inspect` was used. As described in
the prior commit message where this was introduced, this allows diffing
the latest CUE-based YAML definitions against a running copy of the
YAMLs in a cluster.

There are few meaningful changes in this commit which are hard to assess
directly from the git diff, but are easier using `make inspect`:

 * All containers are converted to use readinessProbe and not
   livenessProbe.
 * All readiness probes now specify --connect-timeout of 5s.
 * Readiness probes access `/public` or `/private` per the underlying
   container HTTP server paths rather than just accessing `/`.
 * DNS allow policies are converted to consistently allow both TCP and
   UDP-based DNS.
 * Container names are derived from pod names.
 * The new YAMLs declare additional labels for all resourcess, such as
   'component' and 'topology'.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Introduce proxy checks

These new checks configure various L7 proxy paths to validate
connectivity via L7 proxies, in the following dimensions:

- Apply policy on egress; ingress; or both (proxy-to-proxy)
- Intra-node / Multi-node
- Allow / Deny

Note that proxy-to-proxy always configures egress allow policy to ensure
that the traffic goes via the proxy and in the drop case the requests
are only rejected at the destination. This is because applying egress
deny at the source would prevent proxy-to-proxy connectivity, meaning
the test would be equivalent to the egress-only reject policy case. This
way, we ensure that the path via the egress proxy to the destination is
tested in the reject case.

These are implemented partially through a new 'echo-c' pod which always
has ingress policy applied to allow GET requests to '/public'. Depending
on whether ingress policy is needed to check the particular permutation
the new checks may connect to 'echo-a' or 'echo-c'.

These are implemented by adding pods for each permutation of policy
apply point and topology; then by adding allow / deny containers within
that pod to test the allow/deny cases.

The 'connectivity-check-proxy.yaml' includes all of the above.

Finally, the omissions: This commit does not attempt to address
variations in datapath configuration. This includes IPv4 vs. IPv6;
tunnel/direct-routing; endpoint config; kube proxy/free; encryption.
These are left up to the cluster operator configuring Cilium in specific
modes and subsequently deploying these YAMLs.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Minor naming fixups

Make some of these resource names a bit more consistent.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add quarantine label to metadata

This new label will be used during YAML generation to ensure that
resources which we are still working on fixes for are kept in a separate
category apart from the regular connectivity checks, to allow us to
check them in & distribute them without causing CI to instantly fail.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Add hostport + proxy checks

Introduces checks for egress proxy policy when accessing a hostport on a
remote node. These are added as part of the component=hostport-check to
ensure they are not pulled in when running connectivity checks in
environments without hostport support.

Additionally, these new tests are quarantined for now as they are known
to fail in some environments.

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Expand readme for latest checks

Signed-off-by: Joe Stringer <joe@cilium.io>

* connectivity-check: Re-add liveness probes

It appears that some of these checks require liveness probes rather than
readiness probes to pass on the github actions smoke-test, so ensure all
containers are checked with both.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Improve state gathering upon failure

Commit bb91571 ("smoke-test: Print pod/deploy state on failure")
attempted to improve the information available during a failure from the
smoke-tests, but only added it to the quick-install test and not the
conformance test. Add the same output also to the conformance test so we
can more easily debug failures there.

Signed-off-by: Joe Stringer <joe@cilium.io>

* smoke-test: Disable bpf masquerading

In the smoke test, we are relying on kube-proxy for service connectivity
so it doesn't make sense to enable BPF masquerading. In fact, this
causes issues for connectivity from a node to a pod on a remote node via
ClusterIP (see related issue).

For the moment, disable BPF masquerading while we figure out the
longer-term solution to that issue.

Related: #12699

Signed-off-by: Joe Stringer <joe@cilium.io>

* docs: Update connectivity-check examples

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
joestringer pushed a commit that referenced this pull request Sep 4, 2020
[ upstream commit 5d3babd ]

Commit 8a03d54 ("Extend connectivity-check for HTTP policy
validation via CUE (#12599)") introduced the usage of the CUE language
to make the management of the connectivity checks easier and more
consistent. But two deployments, "pod-to-b-intra-node-nodeport" and
"pod-to-b-multi-node-nodeport", have accidentally been left out of the
process.

This commit reintroduces them. Only the cue file was hand-edited, the
changes in the yaml files where generated by running "make".

Fixes: 8a03d54 ("Extend connectivity-check for HTTP policy validation via CUE (#12599)")
Fixes: #12599
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.3 Sep 4, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.3 Sep 4, 2020
@aanm aanm added release-note/misc This PR makes changes that have no direct user impact. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 4, 2020
@aanm
Copy link
Member

aanm commented Sep 4, 2020

@joestringer I've changed the release not from minor to misc since it's not user faced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.8.3
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

None yet

10 participants