-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Action] Bootstrap Kubernetes cluster with IaC tooling #1
Comments
@nikimanoledaki I've been investigating using CAPI / CAPEM and I think us not having access to a permanent management cluster is a problem. In the Equinix docs they have an alternative approach intended for management clusters using K3s managed by Pulumi. I've added more detail in the design doc. PTAL Do you think this is a good direction? |
Update following the WG meeting and discussions I had since with @nikimanoledaki We think an important factor is how we isolate between test runs to have accurate results.
We think we’ll need an IaC tool to manage the management cluster if we use CAPEM or the whole cluster if we don’t use CAPEM. At the moment we’re leaning toward Ansible for that. https://github.com/equinix/ansible-collection-metal However we both think its important to continue work on the pipeline design. We can then design the cluster topology to support the pipeline rather than the other way round. Lastly I added some notes to the design doc for #1 (comment) on K3s / Pulumi that are now outdated and would have been better here. 🤦♂️ I've removed them and updated the CAPI / CAPEM section. Sorry about that! |
WG-meeting-recap: We had thoughtful discussions and actionable steps directed towards the main objective of developing an end-to-end proof of concept concerning the Working Group Green Review. In the doc design you can find a first draft of the workflow:
We also thought of testing Falco in two ways:
The roadmap ahead is well-defined with practical next steps, which will be documented soon as issues under the designated milestone. Thanks @rossf7 for his invaluable documentation on manual Equinix cluster creation, open for insightful comments on his pull request. The team can’t wait for the initial measurements, let’s continue to collaborate and innovate as always! |
If any help is needed I am happy to contribute 👍🏼 |
I added some more detail in the design doc. We want to use an IaC tool we can run in a GitHub action. Ansible and OpenTofu have both been discussed. I'd be fine with using Ansible (although full disclosure I don't have much experience with it). We will need to provision the control plane and worker nodes as Equinix servers and they have integrations for Ansible and Terraform. For each server we need to configure user_data that will bootstrap
For provisioning Kubernetes we could use Kubeadm. Unless anyone can suggest a better approach? @dipankardas011 help with this would be much appreciated. I think we first need to agree on the design. Would you like to work on that? cc @nikimanoledaki @guidemetothemoon @leonardpahlke @AntonioDiTuri |
also no sure, by the design. is it deciding on the Infrastructure code part or its just the diagrams? |
@dipankardas011 If you would like to investigate how you would do the Infrastructure as Code part that would be great. But please don't spend too much time on it until we've heard from the rest of the team. I'm happy to help with the Equinix Metal integration as I've worked with their infra quite a bit. |
Okay I will be creating a basic diagram of workflow |
Should I create it on excalidraw or draw io |
Here is my iter 1 -> https://gitlab.com/dipankardas011/draw.io/-/tree/main/CNCF%20WG%20Green%20Review |
I cannot access it. It says I don't have the permissions. |
fixed the link |
Hi @dipankardas011 thanks the diagram is looking good! In the diagram OpenTofu (Terraform) is used to provision the Equinix servers and Ansible is used to provision Kubernetes with Kubeadm. Do you think we could use a single tool for both? Or are advantages to using separate tools? For the GitOps part you have this described as "GitOps for CNCF projects". I think this should be "GitOps for pipeline components". Could you update that? This is because we want to use Flux to manage the components that should always be running like Prometheus. The CNCF projects like Falco and any workload specific test workloads will be managed by the pipeline. |
What I have experienced we can add the script in user_data section when we provision infra(iac tools) I think this method involving 2 tools is good when a lot of times the infra needs to configure Also another issue I have seen that if error occurs in userdata section we don't get any signal like error occurred, just wanted to point that out |
Yes |
Updated! |
Yes, exactly that, the script in the user_data can run the IaC tool. I agree using 2 tools makes sense providing we can use the Equinix Terraform module with OpenTofu.
Good catch 👍 we will need to handle that. We have some contacts at Equinix. So we can try asking them for some guidance if needed. |
As suggested by @nikimanoledaki we could use this directory structure with the IaC code under infrastructure and the Kubernetes manifests under clusters managed by Flux.
See #5 (comment) |
I did a spike to investigate this and I've created a WIP PR to get feedback #6 Dipankar I think the original design you proposed to use OpenTofu / Terraform to manage the Equinix infra and Ansible to provision Kubernetes is good. I don't see a benefit to using Ansible to manage both. OpenTofu have a GitHub Action that works well and I think does everything we need https://github.com/opentofu/setup-opentofu I'm using an S3 bucket to store the state. It looks like we can request a S3 bucket and credentials via servicedesk? @dipankardas011 Would you like to work on the Ansible playbook? @nikimanoledaki @guidemetothemoon @leonardpahlke @AntonioDiTuri Please take a look at the PR when you have time. Leo / Niki no worries if that is after KubeCon! |
Okay then we can use the user_data section 👍 |
Discussed with @wrkode and @dipankardas011 in the WG slack channel. We think there may be some advantages to using K3s instead of Kubeadm. It makes it easier to provision the cluster and we could run the K3s steps in the The main challenge is we need to get the |
we can also use the k3s shell to up the cluster, this will pass tokens and stand-up the workers |
I can take a look at this early next week. |
This should be unblocked once we get AWS access to use an S3 bucket: #8 |
PR is updated with user_data to provision K8s with K3s added by @dipankardas011 Next step is installing Cilium for CNI using its Helm chart. Once we have the AWS credentials for S3 we can add the secrets to the repo. There is an extra secret needed for the |
Helm install script for cilium |
btw. you can also integrate tenv that support Terraform as well as OpenTofu (and Terragrunt :) ) in one tool. It allow you to simplify version management. |
Cluster API
We may want to use the Equinix Metal Cluster API Provider (CAPEM) for our cluster bootstrapping on the community cluster. Alternatives such as Ansible or Firecracker microVMs are being considered, to work with Falco's setup: cncf/tag-env-sustainability#182
Requirements
The cluster requirements are listed in the design doc.
Equinix infrastructure access
This issue will help us know more about the kind of access that will be needed for individual contributors to the infra. Please see this for some of the available options, and follow up in that thread with any questions/issues.
Documentation
We should document this process as we go.
Development environment
Dev environment setup tracked in this issue: #3
The text was updated successfully, but these errors were encountered: