New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm UX improvements for the v1.5 stable release #37568
kubeadm UX improvements for the v1.5 stable release #37568
Conversation
@luxas while I understand your frustration, the process is in place for everyone and so everyone should follow it. Even if I'm not assigned to this, I wanted to review it. However, in the end, it's a hard task. If at least we had unit/e2e tests.. |
@pires I'm not frustrated at all. This was a decision, but I definitely don't think this is too overwhelming to review, we reviewed ~5000 LOC in the initial, and now I've grouped the changes in some commits. And in comparision: #36263 is twice the size of this one. Also, this code is battle-tested e2e-wise (manually), I've used it for spinning up a lot of DigitalOcean clusters last week. I will continue to work on it and rebase upon all changes I'm merging this week, and then get this up for final review and merge. But please choose a commit and start looking at it. |
I won't measure PR complexity with LOC. The PR you linked is purely unit-testing, this is not. It touches a lot of the different pieces of code and so, to me, it's complicated to review properly. Anyway, this is just my two cents. If you guys decided to do it, by all means! 👍 |
Read kubernetes-dev on Slack :), but again, commits are reviewable separately |
I agree with @pires. |
I don't think this PR is reviewable in its current form. Even if it were acceptable to lump everything together - and I don't think it is - it is essential to ensure that the message associated with each commit clearly documents the intent of the code changes. Without that coherency, a reviewer will have a difficult time providing useful feedback for commits like 'wip' and 'a lot of changes'. |
a0113b3
to
d7e49d4
Compare
Jenkins kops AWS e2e failed for commit d7e49d40c27b3a67f958dfd1fec23187b668cb7d. Full PR test history. The magic incantation to run this job again is |
As soon as #37831 and #37835 merge, I'm gonna rebase/update this PR to be mergeable (probably tomorrow), but here is a sneak peek what it now looks like: Old
New
And similar improvements to As said earlier, the three last commits are the real ones. Please only look at them.
|
Automatic merge from submit-queue (batch tested with PRs 38194, 37594, 38123, 37831, 37084) Improve kubeadm reset Depends on: #36474 Broken out from: #37568 Carries: #35709, @camilocot This makes the `kubeadm reset` command more robust and user-friendly. I'll rebase after #36474 merges... cc-ing reviewers: @mikedanese @errordeveloper @dgoodwin @jbeda
d7e49d4
to
784f276
Compare
@mikedanese Please review this today and if possible add lgtm. See: #37568 (comment) for an explanation of what changes. This fixes a lot of bugs currently at HEAD as well. @dgoodwin Ready for yet another pass. When this is merged, which should be today or tomorrow, we can cut the next kubeadm release for v1.5 from current master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more small changes, mostly text, testing looks good on my vms.
Your Kubernetes master has initialized successfully! | ||
|
||
But you still need to deploy a pod network to the cluster. | ||
You should "kubectl apply -f" some pod network yaml file that's listed at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
|
||
fmt.Printf("[preflight] Starting the kubelet systemd service by running %q\n", "systemctl start kubelet") | ||
if err := initSystem.ServiceStart("kubelet"); err != nil { | ||
fmt.Println("[preflight] Couldn't start the kubelet service via systemd. Please start the kubelet service manually and try again.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include the "err" here? Might be something relevant to the user in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not great that we're mentioning systemd multiple times here despite an attempt to remain init system agnostic.
Why don't we just go to:
Starting kubelet service...
WARNING: Unable to start kubelet service: %s
WARNING: Please ensure kubelet is running manually.
Drop "try again" as I think we just proceed here, and kubeadm will hang if it's not actually running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing
@@ -22,6 +22,7 @@ import ( | |||
"html/template" | |||
"io" | |||
"io/ioutil" | |||
"os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With testing I notice that in this file we're outputting:
[preflight] Starting the kubelet systemd service by running "systemctl start kubelet"
Using Kubernetes version: v1.4.6
Where that is about the only line in all output that doesn't have a [something] prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a rebase conflict, thanks for catching
@@ -66,7 +66,7 @@ func NewReset(skipPreFlight, removeNode bool) (*Reset, error) { | |||
if !skipPreFlight { | |||
fmt.Println("[preflight] Running pre-flight checks...") | |||
|
|||
if err := preflight.RunResetCheck(); err != nil { | |||
if err := preflight.RunChecks([]preflight.PreFlightCheck{preflight.IsRootCheck{}}, os.Stderr); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to slip one more fix into this PR? :)
We clean directories, then shutdown all running containers. This can result in etcd writing more to the directory after we cleaned it, and then your next init fails and you have to run reset a second time, which will work.
We need to cleanup directories after shutting down all containers.
I can get this in a followup PR if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it
} | ||
|
||
// Then continue with the others... | ||
if err := preflight.RunJoinNodeChecks(cfg); err != nil { | ||
return nil, &preflight.PreFlightError{Msg: err.Error()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just return a preflight.PreFlightError from RunJoinNodeChecks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like no one just saw it before, fixing
fmt.Println("[preflight] Running pre-flight checks...") | ||
|
||
// First, check if we're root separately from the other preflight checks and fail fast | ||
if err := preflight.RunChecks([]preflight.PreFlightCheck{preflight.IsRootCheck{}}, os.Stderr); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving into preflight package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, gonna do that
|
||
if err := client.Extensions().Deployments(api.NamespaceSystem).Delete("dummy", &v1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just not create if we are not going to delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked if #38330 fixes your problem? Can we just put a priority on getting that merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, tested it now.
Gonna revert this and wait for #38330, but hopefully it will merge very soon
Thanks for the reviews @mikedanese and @dgoodwin If there's anything else you think should be fixed, please comment now. Otherwise I'll rebase and make it ready for merge tomorrow so we can get it in in time. |
…un in a container. Also refactor preflight.go a little bit and improve logging
…on-root to avoid strange errors. Also auto-start the kubelet if inactive
… I just hacked on this and modified everything I thought was messy or could be done better. Fix boilerplates, comments in the code and make the output of kubeadm more user-friendly Start using HostPKIPath and KubernetesDir everywhere in the code, so they can be changed for real More robust kubeadm reset code now. Removed old glog-things from app.Run() Renamed /etc/kubernetes/cloud-config.json to /etc/kubernetes/cloud-config since it shouldn't be a json file Simplification of the code Less verbose output from master/pki.go Cleaned up dead code Start a small logging/output framework: - fmt.Println("[the-stage-here] Capital first letter of this message. Tell the user what the current state is") - fmt.Printf("[the-stage-here] Capital first letter. Maybe a [%v] in the end if an error should be displayed. Always ends with \n") - fmt.Errorf("Never starts with []. Includes a short error message plus the underlying error in [%v]. Never ends with \n")
f1cf640
to
50b1077
Compare
Jenkins GCI GKE smoke e2e failed for commit 50b1077625f4c5f75d7aded501b5c9634db79396. Full PR test history. The magic incantation to run this job again is |
@dgoodwin Looks ok? |
/lgtm |
Please squash fixup commits |
@mikedanese The fourth and the fifth one? Sure. |
50b1077
to
db4ab53
Compare
Jenkins verification failed for commit db4ab532a096420a5ce87eb374b9b591df29e8f0. Full PR test history. The magic incantation to run this job again is |
… Set --kubelet-preferred-address-types on v1.5 and higher clusters
db4ab53
to
b060304
Compare
Jenkins GKE smoke e2e failed for commit db4ab532a096420a5ce87eb374b9b591df29e8f0. Full PR test history. The magic incantation to run this job again is |
Automatic merge from submit-queue (batch tested with PRs 37270, 38309, 37568, 34554) |
This PR targets the next stable kubeadm release.
It's work in progress, but please comment on it and review, since there are many changes.
I tried to group the commits logically, so you can review them separately.
Q: Why this large PR? Why not many small?
A: Because of the Submit Queue and the time it takes.
PTAL @kubernetes/sig-cluster-lifecycle
Edit: This work was splitted up in three PRs in total
This change is