Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm UX improvements for the v1.5 stable release #37568

Merged

Conversation

luxas
Copy link
Member

@luxas luxas commented Nov 28, 2016

This PR targets the next stable kubeadm release.

It's work in progress, but please comment on it and review, since there are many changes.

I tried to group the commits logically, so you can review them separately.

Q: Why this large PR? Why not many small?
A: Because of the Submit Queue and the time it takes.

PTAL @kubernetes/sig-cluster-lifecycle

Edit: This work was splitted up in three PRs in total


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2016
@pires
Copy link
Contributor

pires commented Nov 28, 2016

@luxas while I understand your frustration, the process is in place for everyone and so everyone should follow it. Even if I'm not assigned to this, I wanted to review it. However, in the end, it's a hard task. If at least we had unit/e2e tests..

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Nov 28, 2016
@luxas
Copy link
Member Author

luxas commented Nov 28, 2016

@pires I'm not frustrated at all.
I just follow the pattern we have discussed in kubernetes-dev, that the fewer kubeadm PRs we have to merge this week, the better, given this rush with PRs we have right now.

This was a decision, but I definitely don't think this is too overwhelming to review, we reviewed ~5000 LOC in the initial, and now I've grouped the changes in some commits.

And in comparision: #36263 is twice the size of this one.

Also, this code is battle-tested e2e-wise (manually), I've used it for spinning up a lot of DigitalOcean clusters last week.

I will continue to work on it and rebase upon all changes I'm merging this week, and then get this up for final review and merge. But please choose a commit and start looking at it.

@luxas luxas added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Nov 28, 2016
@pires
Copy link
Contributor

pires commented Nov 28, 2016

And in comparision: #36263 is twice the size of this one.

I won't measure PR complexity with LOC. The PR you linked is purely unit-testing, this is not. It touches a lot of the different pieces of code and so, to me, it's complicated to review properly.

Anyway, this is just my two cents. If you guys decided to do it, by all means! 👍

@luxas
Copy link
Member Author

luxas commented Nov 28, 2016

Read kubernetes-dev on Slack :), but again, commits are reviewable separately

@errordeveloper
Copy link
Member

I agree with @pires.

@marun
Copy link
Contributor

marun commented Nov 29, 2016

I don't think this PR is reviewable in its current form. Even if it were acceptable to lump everything together - and I don't think it is - it is essential to ensure that the message associated with each commit clearly documents the intent of the code changes. Without that coherency, a reviewer will have a difficult time providing useful feedback for commits like 'wip' and 'a lot of changes'.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2016
@luxas luxas force-pushed the various_kubeadm_improvements branch from a0113b3 to d7e49d4 Compare December 4, 2016 20:20
@luxas
Copy link
Member Author

luxas commented Dec 4, 2016

Rebased and updated, this is still WIP, but the 7th, 8th and maybe 9th commit can now be reviewed.
The 6 first commits are from #37835 and #37831, so don't be scared of the size.

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 4, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins kops AWS e2e failed for commit d7e49d40c27b3a67f958dfd1fec23187b668cb7d. Full PR test history.

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@luxas
Copy link
Member Author

luxas commented Dec 6, 2016

As soon as #37831 and #37835 merge, I'm gonna rebase/update this PR to be mergeable (probably tomorrow), but here is a sneak peek what it now looks like:

Old kubeadm init (terrible!):

Running pre-flight checks
I1206 14:53:12.397845   15515 validators.go:50] Validating os...
I1206 14:53:12.398782   15515 validators.go:50] Validating kernel...
I1206 14:53:12.399455   15515 kernel_validator.go:77] Validating kernel version
I1206 14:53:12.399539   15515 kernel_validator.go:92] Validating kernel config
I1206 14:53:12.409318   15515 validators.go:50] Validating cgroups...
I1206 14:53:12.409391   15515 validators.go:50] Validating docker...
Using Kubernetes version: v1.4.6
<master/tokens> generated token: "579bd7.3d2b5a1cd24b6964"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2016-12-06 12:53:13 +0000 UTC Not After: 2026-12-04 12:53:13 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2016-12-06 12:53:13 +0000 UTC Not After: 2017-12-06 12:53:14 +0000 UTC
Alternate Names: [192.168.1.115 10.96.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 71.296436 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 1.502372 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/apiclient> failed to delete test deployment [no kind "DeleteOptions" is registered for version "kubeadm.k8s.io/v1alpha1"] (will ignore)<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 3.002425 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns

Kubernetes master initialised successfully!

You can now join any number of machines by running the following on each node:

kubeadm join --token=579bd7.3d2b5a1cd24b6964 192.168.1.115

New kubeadm init:

[kubeadm] Bear in mind that kubeadm is in alpha, do not use it in production clusters.
[preflight] Running pre-flight checks...
[preflight] Starting the kubelet service by running "systemctl start kubelet"
[init] Using Kubernetes version: v1.4.6
[tokens] Generated token: "e69295.cc499553b867df2c"
[certificates] Generated Certificate Authority key and certificate.
[certificates] Generated API Server key and certificate
[certificates] Generated Service Account signing keys
[certificates] Created keys and certificates in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 26.777194 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node is ready after 0.503150 seconds
[apiclient] Creating a test deployment
[apiclient] Test deployment succeeded
[token-discovery] Created the kube-discovery deployment, waiting for it to become ready
[token-discovery] kube-discovery is ready after 3.002083 seconds
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

But you still need to deploy a pod network to the cluster.
You should "kubectl apply -f" some pod network yaml file that's listed at:
    http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node:

kubeadm join --token=e69295.cc499553b867df2c 192.168.255.6

And similar improvements to kubeadm join and kubeadm reset, and in different cases where things fail => still user-friendly output

As said earlier, the three last commits are the real ones. Please only look at them.
Here are the changes in a human-readable format:

  • Mark socat, ethtool and ebtables as soft deps, since kubelet can be run in a container.
  • Auto-start the kubelet service if it isn't active. This is really convenient. If kubeadm does it, it informs the user that it ran that command so the user knows what's happening.
  • Renamed /etc/kubernetes/cloud-config.json to /etc/kubernetes/cloud-config since it shouldn't be a json file
  • A lot of logging improvements
  • Removed dead code
  • Refactored the code so setting KUBE_KUBERNETES_DIR and KUBE_HOST_PKI_PATH actually works
  • Simplification of the code
  • Made a small logging/output framework:
    • fmt.Println("[the-stage-here] Capital first letter of this message. Tell the user what the current state is")
    • fmt.Printf("[the-stage-here] Capital first letter. Maybe a [%v] in the end if an error should be displayed. Always ends with \n")
    • fmt.Errorf("Never starts with []. Includes a short error message plus the underlying error in [%v]. Never ends with \n")
    • In short: made everything consistent, since now everything is done differently which is a mess...

k8s-github-robot pushed a commit that referenced this pull request Dec 7, 2016
Automatic merge from submit-queue (batch tested with PRs 38194, 37594, 38123, 37831, 37084)

Improve kubeadm reset

Depends on: #36474
Broken out from: #37568
Carries: #35709, @camilocot

This makes the `kubeadm reset` command more robust and user-friendly.
I'll rebase after #36474 merges...

cc-ing reviewers: @mikedanese @errordeveloper @dgoodwin @jbeda
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2016
@luxas luxas force-pushed the various_kubeadm_improvements branch from d7e49d4 to 784f276 Compare December 7, 2016 07:04
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2016
@luxas
Copy link
Member Author

luxas commented Dec 7, 2016

@mikedanese Please review this today and if possible add lgtm. See: #37568 (comment) for an explanation of what changes. This fixes a lot of bugs currently at HEAD as well.

@dgoodwin Ready for yet another pass.

When this is merged, which should be today or tomorrow, we can cut the next kubeadm release for v1.5 from current master.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 8, 2016
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more small changes, mostly text, testing looks good on my vms.

Your Kubernetes master has initialized successfully!

But you still need to deploy a pod network to the cluster.
You should "kubectl apply -f" some pod network yaml file that's listed at:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks


fmt.Printf("[preflight] Starting the kubelet systemd service by running %q\n", "systemctl start kubelet")
if err := initSystem.ServiceStart("kubelet"); err != nil {
fmt.Println("[preflight] Couldn't start the kubelet service via systemd. Please start the kubelet service manually and try again.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the "err" here? Might be something relevant to the user in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not great that we're mentioning systemd multiple times here despite an attempt to remain init system agnostic.

Why don't we just go to:

Starting kubelet service...
WARNING: Unable to start kubelet service: %s
WARNING: Please ensure kubelet is running manually.

Drop "try again" as I think we just proceed here, and kubeadm will hang if it's not actually running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing

@@ -22,6 +22,7 @@ import (
"html/template"
"io"
"io/ioutil"
"os"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With testing I notice that in this file we're outputting:

[preflight] Starting the kubelet systemd service by running "systemctl start kubelet"
Using Kubernetes version: v1.4.6

Where that is about the only line in all output that doesn't have a [something] prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a rebase conflict, thanks for catching

@@ -66,7 +66,7 @@ func NewReset(skipPreFlight, removeNode bool) (*Reset, error) {
if !skipPreFlight {
fmt.Println("[preflight] Running pre-flight checks...")

if err := preflight.RunResetCheck(); err != nil {
if err := preflight.RunChecks([]preflight.PreFlightCheck{preflight.IsRootCheck{}}, os.Stderr); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to slip one more fix into this PR? :)

We clean directories, then shutdown all running containers. This can result in etcd writing more to the directory after we cleaned it, and then your next init fails and you have to run reset a second time, which will work.

We need to cleanup directories after shutting down all containers.

I can get this in a followup PR if you prefer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it

}

// Then continue with the others...
if err := preflight.RunJoinNodeChecks(cfg); err != nil {
return nil, &preflight.PreFlightError{Msg: err.Error()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just return a preflight.PreFlightError from RunJoinNodeChecks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like no one just saw it before, fixing

fmt.Println("[preflight] Running pre-flight checks...")

// First, check if we're root separately from the other preflight checks and fail fast
if err := preflight.RunChecks([]preflight.PreFlightCheck{preflight.IsRootCheck{}}, os.Stderr); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving into preflight package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, gonna do that


if err := client.Extensions().Deployments(api.NamespaceSystem).Delete("dummy", &v1.DeleteOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just not create if we are not going to delete it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked if #38330 fixes your problem? Can we just put a priority on getting that merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, tested it now.
Gonna revert this and wait for #38330, but hopefully it will merge very soon

@luxas
Copy link
Member Author

luxas commented Dec 8, 2016

Thanks for the reviews @mikedanese and @dgoodwin

If there's anything else you think should be fixed, please comment now. Otherwise I'll rebase and make it ready for merge tomorrow so we can get it in in time.

…un in a container. Also refactor preflight.go a little bit and improve logging
…on-root to avoid strange errors. Also auto-start the kubelet if inactive
… I just hacked on this and modified everything I thought was messy or could be done better.

Fix boilerplates, comments in the code and make the output of kubeadm more user-friendly
Start using HostPKIPath and KubernetesDir everywhere in the code, so they can be changed for real
More robust kubeadm reset code now.
Removed old glog-things from app.Run()
Renamed /etc/kubernetes/cloud-config.json to /etc/kubernetes/cloud-config since it shouldn't be a json file
Simplification of the code
Less verbose output from master/pki.go
Cleaned up dead code

Start a small logging/output framework:
 - fmt.Println("[the-stage-here] Capital first letter of this message. Tell the user what the current state is")
 - fmt.Printf("[the-stage-here] Capital first letter. Maybe a [%v] in the end if an error should be displayed. Always ends with \n")
 - fmt.Errorf("Never starts with []. Includes a short error message plus the underlying error in [%v]. Never ends with \n")
@luxas luxas force-pushed the various_kubeadm_improvements branch from f1cf640 to 50b1077 Compare December 9, 2016 12:34
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 50b1077625f4c5f75d7aded501b5c9634db79396. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@luxas
Copy link
Member Author

luxas commented Dec 9, 2016

@dgoodwin Looks ok?

@dgoodwin
Copy link
Contributor

dgoodwin commented Dec 9, 2016

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2016
@mikedanese
Copy link
Member

Please squash fixup commits

@luxas
Copy link
Member Author

luxas commented Dec 9, 2016

@mikedanese The fourth and the fifth one? Sure.
Also, seems like I have to update bazel as well...

@luxas luxas force-pushed the various_kubeadm_improvements branch from 50b1077 to db4ab53 Compare December 9, 2016 19:42
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit db4ab532a096420a5ce87eb374b9b591df29e8f0. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

… Set --kubelet-preferred-address-types on v1.5 and higher clusters
@luxas luxas force-pushed the various_kubeadm_improvements branch from db4ab53 to b060304 Compare December 9, 2016 20:17
@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit db4ab532a096420a5ce87eb374b9b591df29e8f0. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@luxas luxas added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2016
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 37270, 38309, 37568, 34554)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants