Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM support fixes #123

Merged
merged 11 commits into from
Mar 25, 2021
Merged

VM support fixes #123

merged 11 commits into from
Mar 25, 2021

Conversation

jrajahalme
Copy link
Member

Various changes to allow external workloads to be used when installing Cilium with the Cilium CLI tool:

  • Relax clustermesh validation on "enable"
  • Change default service type to NodePort on port 32379
  • Set etcd client Common Names to the corresponding user names
  • Add external-workload certs
  • Auto-detect datapath mode for Kind

These changes are required to run the updated GSG at cilium/cilium#15320.

@jrajahalme
Copy link
Member Author

Fixed format.

Copy link
Member

@tgraf tgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

clustermesh/clustermesh.go Outdated Show resolved Hide resolved
clustermesh/clustermesh.go Outdated Show resolved Hide resolved
@jrajahalme jrajahalme force-pushed the vm-support branch 2 times, most recently from d7f0f11 to a921cc4 Compare March 18, 2021 02:08
@jrajahalme jrajahalme requested a review from tgraf March 18, 2021 02:08
@jrajahalme
Copy link
Member Author

@tgraf Updated 1st and 3rd commit as per your comments above, and added the last commit since your review, please re-review :-)

Allow enabling clustermesh with default cluster name and ID (zero) to
allow external workloads to be used without explicitly setting cluster
ID and/or name. Validate both local and remote cluster config fully
when connecting to remote cluster instead.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@tklauser
Copy link
Member

@jrajahalme FYI, I've pushed two fixup commits addressing the staticcheck errors in the Go build GH action. You can apply them to the respective commits automatically by rebasing with git rebase --interactive --autosquash.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial test of this on GKE using the docs in cilium/cilium#15320. Some of my review comments might be applicable here too cilium/cilium#15320 (review)

clustermesh/clustermesh.go Outdated Show resolved Hide resolved
clustermesh/clustermesh.go Show resolved Hide resolved
Add support for extracting service's ClusterIP and port rather than
erroring out.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
…er type can not be used

Error out if service type is not explicitly set or can not be
auto-detected as LoadBalancer type. Warn if service type is set to
HostPort.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Client certificate's Common Name is used as etcd user account name
once TLS based user auth (--client-cert-auth) is enabled. Use the user
account names as CNs as follows:

- Admin cert: root
- Client cert: remote

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Add a new cert to be used by External Workloads. Common Name is set to
the etcd user account name that has write access to the registation
key (externalworkload).

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Auto-detect tunnel mode for Kind and disable kube-proxy replacement to
be able to access NodePort services.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
This makes 'cilium clustermesh status' succeed with a warning message
instead of failing when Cluster ID and/or Cluster Name has not been
set when Cilium was installed. In that case warn like this:

✅ Service "clustermesh-apiserver" of type "NodePort" found
⚠️  Cluster not configured for clustermesh, use '--cluster-id' and '--cluster-name' with 'cilium install'. External workloads may still be configured.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Add 'external-workload' (alias 'vm') subcommands to 'clustermesh':

'cilium clustermesh external-workload status' - Show the status of external workloads

'cilium clustermesh external-workload create <name...>' - Create new Cilium External Workload resource to allow a VM to join

    A new CEW resource with name <name> is created with a "default" namespace label. Options:

    '--namespace string' (alias '-n')   Specify other than "default" as the namespace label
    '--labels'                          Pass a comma separated list of other labels for the identity of the external workload
    '--ipv4-alloc-cidr string'          IPv4 allocation CIDR to be used instead the default picked by the VM (e.g., 10.15.0.0/30)
    '--ipv6-alloc-cidr string'          IPv6 allocation CIDR to be used instead the default picked by the VM (e.g., f00d::a0f:0:0:0/126)

'cilium clustermesh external-workload delete <name...>' - Delete Cilium External Workload resources

    The named CEW resources will be deleted. External Workloads that have
    already registered may continue to communicate with the cluster, but may not
    rergister again. Options:

    '--all'   Delete all CEW resources if none are named on the command line.

'cilium clustermesh external-workload install <file>' - Create an installation script to be used in external workloads to install or uninstall Cilium

    Write an installation script to the named file. Note that the script inlines
    the TLS credentials for external workload registration as well as the access
    details to the your k8s cluster. The file needs to be copied to the external
    workload (such as a VM) and executed there to install Cilium as a Docker
    container and connect to your k8s cluster. 'uninstall' parameter to the
    script will cause the script to uninstall Cilium from the external workload.

All these commands require clustermesh to be enabled (via 'cilium clustermesh enable').

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member Author

Tested with a GCP VM against GKE cluster, fixed the GSG PR by adding --config tunnel=vxlan. The generated install script needed fixes for:

  • Do not assume docker is suid root (use sudo explicitly in all docker invocations)
  • Use force option (-f) with mv and ln to avoid confusing sudo with a removed /etc/resolv.conf
  • Do not assume `/etc/resolv.conf', when a symbolic link, links to an absolute path.

…tunneling disabled

As of now external workload installs rely on vxlan tunneling. Fail the
install script generation if Cilium has tunneling disabled of not set
to vxlan.

In future consider testing with geneve and non-tunneled datapaths.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member Author

Added commit to fail external workload install script generation if Cilium datapath config in cluster is not using vxlan.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tested this with GKE and a GCE VM and it worked like a charm 🚀

There is one complaint from the staticcheck GitHub action, otherwise LGTM.

…slog

Define $SUDO as an empty string if running as root.

Use 'local' docker log driver to not depend on syslog.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member Author

Added one more commit: clustermesh: Allow VM install script to run as root, do not assume syslog

This allows the script to run as root and in environments where syslog is not available. Needed this to mock a VM with a Ubuntu 20.10 docker image, and running cilium agent inside as a "docker-in-docker" image.

@jrajahalme
Copy link
Member Author

@tgraf It should be pretty straightforward to add a GH action run to test VM support using kind and docker. We have also successfully tested with GKE & GCP, would that be desirable for a GH CI run as well?

@tklauser
Copy link
Member

@tgraf It should be pretty straightforward to add a GH action run to test VM support using kind and docker. We have also successfully tested with GKE & GCP, would that be desirable for a GH CI run as well?

I'll take care of adding GH actions to test VM support on GKE & GCP in a follow-up PR. I can try to add one covering kind & docker as well.

'make staticcheck' does not allow error messages starting with a
capital letter, so do not use 'Cilium' to start an error message.

Correctly spell 'DaemonSet' in error messages.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member Author

Fixed error message spelling and capitalization.

@tklauser tklauser merged commit ac78a56 into master Mar 25, 2021
@tklauser tklauser deleted the vm-support branch March 25, 2021 11:23
jrajahalme added a commit that referenced this pull request Apr 20, 2021
`cilium clustermesh status` was inadvertently broken by
#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add a warning about missing cluster name and/or id to `cilium
clustermesh enable` with a note that external workloads do not need
either.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
jrajahalme added a commit that referenced this pull request Apr 20, 2021
`cilium clustermesh status` was inadvertently broken by
#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add a warning about missing cluster name and/or id to `cilium
clustermesh enable` with a note that external workloads do not need
either.

Fixes: #166
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
jrajahalme added a commit that referenced this pull request Apr 20, 2021
`cilium clustermesh status` was inadvertently broken by
#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add the same warning about missing cluster name and/or id to `cilium
clustermesh enable` as well with a note that external workloads do not
need either.

Fixes: #166
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
tgraf pushed a commit that referenced this pull request Apr 20, 2021
`cilium clustermesh status` was inadvertently broken by
#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add the same warning about missing cluster name and/or id to `cilium
clustermesh enable` as well with a note that external workloads do not
need either.

Fixes: #166
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
aditighag pushed a commit to aditighag/cilium-cli that referenced this pull request Apr 21, 2023
`cilium clustermesh status` was inadvertently broken by
cilium#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add the same warning about missing cluster name and/or id to `cilium
clustermesh enable` as well with a note that external workloads do not
need either.

Fixes: cilium#166
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
michi-covalent pushed a commit to michi-covalent/cilium that referenced this pull request May 30, 2023
`cilium clustermesh status` was inadvertently broken by
cilium/cilium-cli#123 when clusters have not
been connected yet. Fix this by expliticly checking cluster's name and
id instead of checking the number of connections.

Add the same warning about missing cluster name and/or id to `cilium
clustermesh enable` as well with a note that external workloads do not
need either.

Fixes: #166
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants