Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: flatcar nodes #3380

Merged
merged 24 commits into from Jun 10, 2020
Merged

feat: flatcar nodes #3380

merged 24 commits into from Jun 10, 2020

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Jun 1, 2020

Reason for Change:

This PR introduces flatcar-backed nodes by specifying the "flatcar" distro. E.g., in an agentPoolProfile definition:

"distro": "flatcar"

Derived from @invidian's original changeset in #3267.

Issue Fixed:

Requirements:

Notes:

@jackfrancis jackfrancis changed the title Flatcar [WIP] Flatcar Jun 1, 2020
@acs-bot
Copy link

acs-bot commented Jun 1, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Jun 1, 2020

Codecov Report

Merging #3380 into master will decrease coverage by 0.05%.
The diff coverage is 35.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3380      +/-   ##
==========================================
- Coverage   73.20%   73.14%   -0.06%     
==========================================
  Files         147      147              
  Lines       24975    25008      +33     
==========================================
+ Hits        18282    18293      +11     
- Misses       5563     5582      +19     
- Partials     1130     1133       +3     
Impacted Files Coverage Δ
pkg/api/defaults.go 92.58% <ø> (ø)
pkg/armhelpers/support_validator.go 64.06% <0.00%> (-2.07%) ⬇️
pkg/engine/template_generator.go 81.49% <0.00%> (-0.33%) ⬇️
pkg/engine/templates_generated.go 54.08% <ø> (ø)
pkg/engine/virtualmachines.go 80.04% <0.00%> (-0.94%) ⬇️
pkg/engine/virtualmachinescalesets.go 80.30% <0.00%> (-0.98%) ⬇️
pkg/engine/vmextensions.go 94.55% <0.00%> (-0.95%) ⬇️
pkg/api/vlabs/types.go 71.83% <28.57%> (-1.47%) ⬇️
pkg/api/vlabs/validate.go 80.29% <80.00%> (+0.03%) ⬆️
pkg/api/mocks.go 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3259e0...0147f76. Read the comment docs.

ExecStart=/opt/bin/etcd $DAEMON_ARGS
- name: etcd-member.service
mask: true
{{else}}
disk_setup:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnsonshi it looks like flatcar doesn't support these cloud-init interfaces? How should we be mounting /var/lib/etcddisk on flatcar-enabled master VMs?

Alternatively, we can only add flatcar support as a worker node, and require Ubuntu for master nodes.

cc @rbitia @ritazh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Coreos-cloudinit (the CoreOS/Flatcar Golang fork of upstream Python cloud-init) does not contain the necessary udev rules to automatically create the well-defined symlinks for the etcd data disk. See cloud-init udev rules

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian can you confirm that it makes sense to scope this work to worker nodes only? I can fix up the PR so that flatcar is enabled as a node distro, but not available for the master (control plane) VMs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative would be for Flatcar to include those cloud-init udev rules so that the symlinks are created. However that may exceed this PR's scope.

Copy link

@johnsonshi johnsonshi Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waagent also has the udev rules that creates symlinks to data disks at /dev/disk/azure/scsi/lun#.

I deployed a Flatcar VM. I confirmed that the udev rules for data disks are present. I then used udevadm monitor to monitor for udev events, then used a separate console to attach a data disk. From the monitor I see that the udev rule is fired. After that, I see that the symlink has been created.

https://paste.ubuntu.com/p/rqCqh7BPcT/

@jackfrancis because the symlink is created, that should mean that Flatcar should be okay for master nodes right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it worked by the time I submitted the PR, but febebda changed the way etcddisk is mounted 😞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian Understood, moving etcd data disk mounting to cloud-init was a long-standing goal, in order to improve determinism (for Ubuntu, cloud-init actively engages in fs operations during boot time, and doing related operations orthogonally/concurrently via shell script can introduce edge cases).

We're hesitant to add more maintenance surface area for the control plane implementation. How about we scope this change down to just worker nodes? Having a streamlined OS supporting nodes is the primary user value here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @jackfrancis. I understand the reasoning behind this change and it seem indeed reasonable. It's just unfortunate that it happen and this time and that Flatcar does not support required options.

I think it's fine if we add support only for workers for now and I will collaborate with Flatcar team to maybe add those missing options to Flatcar itself.

Who should revert the changes done in this PR for master nodes? Also, is there a way to keep this revert as a separate commit? Once Flatcar has support for missing options, it would be easier to add support for master nodes then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to fix up this PR to make it worker node-only.

I'll file a backlog item that documents that flatcar isn't yet implemented for control plane VMs, and will reference your original PR, which is probably the cleanest changeset representation of how we may want to revisit that work in the future.

How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, thank you for your help!

@jackfrancis
Copy link
Member Author

Pro tip for flatcar cloud-init debugging: journalctl -u oem-cloudinit

- path: "/etc/kubernetes/manifests/.keep"

{{- if .OrchestratorProfile.KubernetesConfig.RequiresDocker}}
groups:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian, oem-cloudinit is complaining that groups is not recognized:

Jun 04 23:25:23 k8s-master-36217359-0 coreos-cloudinit[871]: 2020/06/04 23:25:23 line 362: warning: unrecognized key "groups"

if [[ ! -f "/usr/local/bin/kubectl-${KUBERNETES_VERSION}" ]]; then
path=/usr/local/bin
if [[ $OS == $FLATCAR_OS_NAME ]]; then
path=/opt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian is there a reason we're installing kubelet and kubectl in /opt/ instead of /opt/bin? Should we change this to /opt/bin?

@@ -175,12 +188,16 @@ installImg() {
extractHyperkube() {
CLI_TOOL=$1
path="/home/hyperkube-downloads/${KUBERNETES_VERSION}"
targetpath="/usr/local/bin"
if [[ $OS == $FLATCAR_OS_NAME ]]; then
targetpath="/opt"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @invidian , earlier comment was for kublet only, same question for kubectl here, should we install in /opt/bin?

Copy link
Contributor

@invidian invidian Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be OK to install it in /opt/bin. I used /opt, as it was used for some other binary IIRC.

}
extractKubeBinaries() {
KUBE_BINARY_URL=${KUBE_BINARY_URL:-"https://kubernetesartifacts.azureedge.net/kubernetes/v${KUBERNETES_VERSION}/binaries/kubernetes-node-linux-amd64.tar.gz"}
K8S_TGZ_TMP=${KUBE_BINARY_URL##*/}
mkdir -p "${K8S_DOWNLOADS_DIR}"
retrycmd_get_tarball 120 5 "$K8S_DOWNLOADS_DIR/${K8S_TGZ_TMP}" ${KUBE_BINARY_URL} || exit 31
path=/usr/local/bin
if [[ $OS == $FLATCAR_OS_NAME ]]; then
path=/opt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian and same comment here

@acs-bot acs-bot added size/XL and removed size/XXL labels Jun 5, 2020
@jackfrancis jackfrancis changed the title [WIP] Flatcar feat: flatcar nodes Jun 5, 2020
@@ -21,6 +21,15 @@ spec:
{{- end}}
spec:
priorityClassName: system-cluster-critical
affinity:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian adding these declarations instead of not allowing the addon to be installed in a flatcar scenario, as we support heterogenous node pool configurations

@@ -0,0 +1,2 @@
maintainers:
- invidian
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@invidian is there anyone else at kinvolk that would like to maintain this going forward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ask around, perhaps it would be good to have more than one person around. Maybe @iaguis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for volunteering me :D

Happy to be added here.

"dnsPrefix": "",
"vmSize": "Standard_D2_v3"
},
"agentPoolProfiles": [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis Just reviewed these + the options defined in https://github.com/Azure/aks-engine/blob/master/docs/topics/clusterdefinitions.md#agentPoolProfiles

LGTM for agentPoolProfiles test configs.

"ssh": {
"publicKeys": [
{
"keyData": ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis Is the doc at https://github.com/Azure/aks-engine/blob/master/docs/topics/clusterdefinitions.md#notes-on-ssh-public-keys out of date? That doc suggests that at least 1 ssh pubkey is required for linuxProfile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The test runner that we ship as part of the AKS Engine project interprets these empty strings as "I'll create ssh keys for you", so that we don't have to publish public keys in public.

Something similar is happening for the empty string service principal values below.

@jackfrancis jackfrancis force-pushed the flatcar branch 2 times, most recently from fdffd43 to 3ac7339 Compare June 10, 2020 23:29
As CoreOS will be EOL soon.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian and others added 21 commits June 10, 2020 16:30
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
If OS is different than UBUNTU_OS_NAME, then UBUNTU_RELEASE is empty,
which breaks the if condition.

UBUNTU_RELEASE should only be checked if OS == UBUNTU_OS_NAME.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Otherwise creating VMs fails.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@jackfrancis
Copy link
Member Author

Observed consistent failures in HPA functionality in 1.19-beta.1, and filed an issue here:

#3446

Tests look good across 1.15-1.18, so this is ready to merge.

@jackfrancis jackfrancis merged commit 87e89f6 into Azure:master Jun 10, 2020
backlog automation moved this from In progress to Done Jun 10, 2020
@jackfrancis jackfrancis deleted the flatcar branch June 10, 2020 23:37
penggu pushed a commit to penggu/aks-engine that referenced this pull request Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
backlog
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants