Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
@@ Coverage Diff @@
## master #3380 +/- ##
==========================================
- Coverage 73.20% 73.14% -0.06%
==========================================
Files 147 147
Lines 24975 25008 +33
==========================================
+ Hits 18282 18293 +11
- Misses 5563 5582 +19
- Partials 1130 1133 +3
Continue to review full report at Codecov.
|
ExecStart=/opt/bin/etcd $DAEMON_ARGS | ||
- name: etcd-member.service | ||
mask: true | ||
{{else}} | ||
disk_setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnsonshi it looks like flatcar doesn't support these cloud-init interfaces? How should we be mounting /var/lib/etcddisk
on flatcar-enabled master VMs?
Alternatively, we can only add flatcar support as a worker node, and require Ubuntu for master nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Coreos-cloudinit (the CoreOS/Flatcar Golang fork of upstream Python cloud-init) does not contain the necessary udev rules to automatically create the well-defined symlinks for the etcd data disk. See cloud-init udev rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian can you confirm that it makes sense to scope this work to worker nodes only? I can fix up the PR so that flatcar is enabled as a node distro, but not available for the master (control plane) VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other alternative would be for Flatcar to include those cloud-init udev rules so that the symlinks are created. However that may exceed this PR's scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waagent also has the udev rules that creates symlinks to data disks at /dev/disk/azure/scsi/lun#.
I deployed a Flatcar VM. I confirmed that the udev rules for data disks are present. I then used udevadm monitor
to monitor for udev events, then used a separate console to attach a data disk. From the monitor I see that the udev rule is fired. After that, I see that the symlink has been created.
https://paste.ubuntu.com/p/rqCqh7BPcT/
@jackfrancis because the symlink is created, that should mean that Flatcar should be okay for master nodes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it worked by the time I submitted the PR, but febebda changed the way etcddisk is mounted 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian Understood, moving etcd data disk mounting to cloud-init was a long-standing goal, in order to improve determinism (for Ubuntu, cloud-init actively engages in fs operations during boot time, and doing related operations orthogonally/concurrently via shell script can introduce edge cases).
We're hesitant to add more maintenance surface area for the control plane implementation. How about we scope this change down to just worker nodes? Having a streamlined OS supporting nodes is the primary user value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation @jackfrancis. I understand the reasoning behind this change and it seem indeed reasonable. It's just unfortunate that it happen and this time and that Flatcar does not support required options.
I think it's fine if we add support only for workers for now and I will collaborate with Flatcar team to maybe add those missing options to Flatcar itself.
Who should revert the changes done in this PR for master nodes? Also, is there a way to keep this revert as a separate commit? Once Flatcar has support for missing options, it would be easier to add support for master nodes then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to fix up this PR to make it worker node-only.
I'll file a backlog item that documents that flatcar isn't yet implemented for control plane VMs, and will reference your original PR, which is probably the cleanest changeset representation of how we may want to revisit that work in the future.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, thank you for your help!
Pro tip for flatcar cloud-init debugging: |
- path: "/etc/kubernetes/manifests/.keep" | ||
|
||
{{- if .OrchestratorProfile.KubernetesConfig.RequiresDocker}} | ||
groups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian, oem-cloudinit is complaining that groups
is not recognized:
Jun 04 23:25:23 k8s-master-36217359-0 coreos-cloudinit[871]: 2020/06/04 23:25:23 line 362: warning: unrecognized key "groups"
if [[ ! -f "/usr/local/bin/kubectl-${KUBERNETES_VERSION}" ]]; then | ||
path=/usr/local/bin | ||
if [[ $OS == $FLATCAR_OS_NAME ]]; then | ||
path=/opt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian is there a reason we're installing kubelet and kubectl in /opt/ instead of /opt/bin? Should we change this to /opt/bin?
@@ -175,12 +188,16 @@ installImg() { | |||
extractHyperkube() { | |||
CLI_TOOL=$1 | |||
path="/home/hyperkube-downloads/${KUBERNETES_VERSION}" | |||
targetpath="/usr/local/bin" | |||
if [[ $OS == $FLATCAR_OS_NAME ]]; then | |||
targetpath="/opt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @invidian , earlier comment was for kublet only, same question for kubectl here, should we install in /opt/bin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be OK to install it in /opt/bin
. I used /opt
, as it was used for some other binary IIRC.
} | ||
extractKubeBinaries() { | ||
KUBE_BINARY_URL=${KUBE_BINARY_URL:-"https://kubernetesartifacts.azureedge.net/kubernetes/v${KUBERNETES_VERSION}/binaries/kubernetes-node-linux-amd64.tar.gz"} | ||
K8S_TGZ_TMP=${KUBE_BINARY_URL##*/} | ||
mkdir -p "${K8S_DOWNLOADS_DIR}" | ||
retrycmd_get_tarball 120 5 "$K8S_DOWNLOADS_DIR/${K8S_TGZ_TMP}" ${KUBE_BINARY_URL} || exit 31 | ||
path=/usr/local/bin | ||
if [[ $OS == $FLATCAR_OS_NAME ]]; then | ||
path=/opt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian and same comment here
@@ -21,6 +21,15 @@ spec: | |||
{{- end}} | |||
spec: | |||
priorityClassName: system-cluster-critical | |||
affinity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian adding these declarations instead of not allowing the addon to be installed in a flatcar scenario, as we support heterogenous node pool configurations
@@ -0,0 +1,2 @@ | |||
maintainers: | |||
- invidian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian is there anyone else at kinvolk that would like to maintain this going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask around, perhaps it would be good to have more than one person around. Maybe @iaguis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for volunteering me :D
Happy to be added here.
"dnsPrefix": "", | ||
"vmSize": "Standard_D2_v3" | ||
}, | ||
"agentPoolProfiles": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackfrancis Just reviewed these + the options defined in https://github.com/Azure/aks-engine/blob/master/docs/topics/clusterdefinitions.md#agentPoolProfiles
LGTM for agentPoolProfiles
test configs.
"ssh": { | ||
"publicKeys": [ | ||
{ | ||
"keyData": "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackfrancis Is the doc at https://github.com/Azure/aks-engine/blob/master/docs/topics/clusterdefinitions.md#notes-on-ssh-public-keys out of date? That doc suggests that at least 1 ssh pubkey is required for linuxProfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The test runner that we ship as part of the AKS Engine project interprets these empty strings as "I'll create ssh keys for you", so that we don't have to publish public keys in public.
Something similar is happening for the empty string service principal values below.
fdffd43
to
3ac7339
Compare
This reverts commit b1284c7.
As CoreOS will be EOL soon. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
If OS is different than UBUNTU_OS_NAME, then UBUNTU_RELEASE is empty, which breaks the if condition. UBUNTU_RELEASE should only be checked if OS == UBUNTU_OS_NAME. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Otherwise creating VMs fails. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Observed consistent failures in HPA functionality in 1.19-beta.1, and filed an issue here: Tests look good across 1.15-1.18, so this is ready to merge. |
Reason for Change:
This PR introduces flatcar-backed nodes by specifying the
"flatcar"
distro. E.g., in an agentPoolProfile definition:"distro": "flatcar"
Derived from @invidian's original changeset in #3267.
Issue Fixed:
Requirements:
Notes: