Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add hetzner baremetal cluster stack #125

Merged
merged 13 commits into from
Sep 9, 2024
Merged

✨ Add hetzner baremetal cluster stack #125

merged 13 commits into from
Sep 9, 2024

Conversation

chess-knight
Copy link
Member

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #108

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squash commits
  • include documentation
  • add unit tests

Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
@chess-knight chess-knight marked this pull request as ready for review July 3, 2024 12:00
@chess-knight chess-knight requested a review from janiskemper July 3, 2024 12:00
@chess-knight
Copy link
Member Author

chess-knight commented Jul 3, 2024

Hi @janiskemper. As discussed in the chat, I added you as the reviewer because you are an expert on Hetzner. This is the first version of pure Hetzner baremetal clusterstack(control plane and worker nodes in robot, loadbalancer in the hcloud). The manifests are mostly taken from CAPH(without machine health checks and remediation(missing namespace defaulting kubernetes-sigs/cluster-api#10805)), but I read that it is only for demonstration purposes. Maybe we can collaborate here, and tune this cluster stack by removing/adding/editing kubeadm settings or cluster variables, playing with cluster addons(e.g. cilium values are taken from CAPH, and CCM from syself/charts) and so on. Thanks

@janiskemper
Copy link
Member

Thanks @chess-knight ! One thing you can maybe look at: The CCM we use still mainly is our fork under the syself organization. However, hetzner supports bare metal in the upstream one as well for some time now. You need to configure it in a certain way, but I believe that there should be some documentation about it. If not, you can surely open an issue and ask for it.

This might be a better option for SCS, since our fork usually lacks behind a bit ;)

Otherwise, I would really advice you to integrate custom remediation and MHC. There are a lot of cases where a reboot of the machine fixes it already, especially with bare metal.

@chess-knight
Copy link
Member Author

chess-knight commented Jul 8, 2024

Otherwise, I would really advice you to integrate custom remediation and MHC. There are a lot of cases where a reboot of the machine fixes it already, especially with bare metal.

Right now, the remediationTemplate cannot be added to the cluster stacks(clusterclass), because you need to know the namespace of HetznerBareMetalRemediationTemplate in advance. I fixed it upstream kubernetes-sigs/cluster-api#10843 but we have to wait for the merge and release.
Until the upstream patch, the user can specify MHC in Cluster.spec.topology in the same way(the user will know the namespace of HetznerBareMetalRemediationTemplate in advance).

EDIT: Namespace defaulting for the remediationTemplate will be available in CAPI v1.8.x

Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
@chess-knight chess-knight marked this pull request as draft July 9, 2024 09:57
@chess-knight
Copy link
Member Author

chess-knight commented Jul 9, 2024

Thanks @chess-knight ! One thing you can maybe look at: The CCM we use still mainly is our fork under the syself organization. However, hetzner supports bare metal in the upstream one as well for some time now. You need to configure it in a certain way, but I believe that there should be some documentation about it. If not, you can surely open an issue and ask for it.

This might be a better option for SCS, since our fork usually lacks behind a bit ;)

Hi @janiskemper, I tried to replace Syself's hccm with Hetzner's but one thing doesn't work out of the box. CAPH is setting providerID field hcloud://bm-$SERVER_NUMBER for the machines and hccm is setting hrobot://$SERVER_NUMBER for the nodes. Then IMO CAPI cannot finish joining worker nodes, because the providerID field is different. E.g. it doesn't remove CABPK taint node.cluster.x-k8s.io/uninitialized:NoSchedule, see also hetznercloud/hcloud-cloud-controller-manager#611 which is maybe related. I am falling into this https://github.com/kubernetes-sigs/cluster-api/blob/v1.7.4/internal/controllers/machine/machine_controller_noderef.go#L80.

@janiskemper
Copy link
Member

There is a config I believe. Someone from Hetzner tried this out and uses it. But I don't know where to find it

@chess-knight
Copy link
Member Author

There is a config I believe. Someone from Hetzner tried this out and uses it. But I don't know where to find it

If I see correctly, everything is hardcoded:

Right now the only way I see is to manually change it for all hetznerbaremetalmachines with e.g. KUBE_EDITOR="sed -i 's#hcloud://bm-#hrobot://#'" kubectl edit hetznerbaremetalmachine

@janiskemper
Copy link
Member

search for "prefixRobotLegacy" in the providerid.go. I just had a quick look at the links you provided

@chess-knight
Copy link
Member Author

search for "prefixRobotLegacy" in the providerid.go. I just had a quick look at the links you provided

Yeah I see, but it is there only for migration purposes and not used for the new nodes. Instead "prefixRobot" is used.

@janiskemper
Copy link
Member

I don't know, sorry. I know that they got it running, but I don't know how exactly it works. I haven't tried. You can open an issue I guess and ask for a short guide to use the hcloud ccm with CAPH bare metal

Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
@chess-knight chess-knight marked this pull request as ready for review July 23, 2024 12:01
@chess-knight
Copy link
Member Author

Hi @janiskemper, can you recheck this PR, please?

@chess-knight chess-knight requested a review from guettli July 29, 2024 09:33
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
@chess-knight chess-knight requested a review from batistein August 15, 2024 12:14
Copy link

@guettli guettli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chess-knight I had a look at the changes. I have not tested it. But it looks good to me.

@chess-knight chess-knight merged commit 373f560 into main Sep 9, 2024
2 checks passed
@chess-knight chess-knight deleted the feat/hetzner branch September 9, 2024 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster Stacks for Hetzner
3 participants