Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bringing CoreOS cloud-configs up-to-date (against 0.15.x and latest OS' alpha) #6973

Merged
merged 1 commit into from May 1, 2015

Conversation

AntonioMeireles
Copy link
Contributor

work in progress, result of the discussion in #6281.

missing is tweaking the docs and updating the aws CoreOS cloud-configs.

@pires et al. can you plz triple check that i didn't miss anything.

@pires
Copy link
Contributor

pires commented Apr 17, 2015

So far LGTM. Missing updated doc (make yourself the maintainer) and the AWS Cloud-Formation template.

@AntonioMeireles
Copy link
Contributor Author

docs tweaked. missing now is the aws stuff and the baremetal stuff.

@bussyjd
Copy link
Contributor

bussyjd commented Apr 21, 2015

Works for me after changing kubelet hostname_override parameter with the public ip

@AntonioMeireles
Copy link
Contributor Author

  • @hobti01 can you check please if latest commit get things to work out of the box ?
  • @bussyjd what was not working exacatly so that you had to change kubelet's hostname_override?

thnxs all.

@hobti01
Copy link

hobti01 commented Apr 21, 2015

@AntonioMeireles thank you for the change to use default. I had made an equivalent change locally which works.

@AntonioMeireles
Copy link
Contributor Author

missing updating now are just the cloud-formation changes, and the bare-metal ones. unless someone steps ups and offers to test/help on those fronts i don't feel brave enough to do them now in this round... other than that i 'think' that this is ready hopefully

@pires @kelseyhightower @* validate plz

@bussyjd
Copy link
Contributor

bussyjd commented Apr 22, 2015

@AntonioMeireles
I get the following error with private ip:

kubelet.go:1561] error getting node: node 192.168.133.14 not found
[...]
kubelet.go:1735] error updating node status, will retry: error getting node "192.168.133.14": minion "192.168.133.14" not found

I believe it is the same as #3185

@AntonioMeireles
Copy link
Contributor Author

@bussyjd what cloud provider are you using ?

@bussyjd
Copy link
Contributor

bussyjd commented Apr 22, 2015

@AntonioMeireles I am using esxi VMs with 2 networks. I boot coreos 653.0.0 and run the cloudinit files manually.

@AntonioMeireles
Copy link
Contributor Author

@bussyjd according to the upstream CoreOS docs here "The $private_ipv4 and $public_ipv4 substitution variables referenced in other documents are not supported on VMware." (sic) so you'll have to manually edit them. [afaict there is aditional upstream work going on here to get rid in the future of present cloud-config limitations]

@bussyjd
Copy link
Contributor

bussyjd commented Apr 22, 2015

@AntonioMeireles I am aware of it and triple checked for mismatch. Replacing the values manually could get me to a running cluster but only with the hostname_override set to the public ip. I reckon that OEM Metadata will useful in my use case.

@AntonioMeireles
Copy link
Contributor Author

@bussyjd ok, thanks. sorry i can't help you more, as i don't have right now access to the tools to try to replicate your issue locally :/

@kelseyhightower you (when back from €uro tour) perhaps ?

@bussyjd
Copy link
Contributor

bussyjd commented Apr 28, 2015

Working fine for me

@AntonioMeireles
Copy link
Contributor Author

@erictune ping. (getting this in would be probably handy for a few people as 0.15.0 is out for a while...)

coreos:
etcd2:
name: ${DEFAULT_IPV4}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work. You need to depend on the setup-network-environment tool. That's why I've always used real unit files vs the cloud-init short cuts.

${DEFAULT_IPV4} is set by the env file produced by setup-network-environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just set it to master @AntonioMeireles?

@vmarmol
Copy link
Contributor

vmarmol commented Apr 29, 2015

Potentially relevant to this PR: #7445

@AntonioMeireles
Copy link
Contributor Author

@kelseyhightower et al. - just droped the overreliance in cloud-config shortcuts and rebased to accomodate f5e81c25c , so could you all plz check if there are any remaining oversights on my side when time permits. (thanks in advance)

@AntonioMeireles
Copy link
Contributor Author

@hobti01 that commit applies mods to currently shipping etcd (v1) setup, and this PR puts everything in the etcd2 bandwagon.
explaining - now (as per this PR) all nodes have a local etcd running in proxy mode (that points/relays to the 'real' etcd in master). so we don't need in the nodes in most cases to specify / point to master as by default the tooling will talk to the local one (that transparently will forward to / relay from) the master one. hope this clarifies things.

@pires
Copy link
Contributor

pires commented Apr 30, 2015

@erictune @kelseyhightower this LGTM but probably @AntonioMeireles should squash?

@AntonioMeireles
Copy link
Contributor Author

@pires fwiw i didn't squash it yet in order to make timeline/observability a bit more easy/granular. to read/follow.. as soon as stakeholders happy i'll do it ...

@erictune
Copy link
Member

erictune commented May 1, 2015

Please squash and I will merge.

  - allow payloads to run in privileged mode.
  - update kube-register to latest upstream (v0.0.3).
  - jump into the etcd2 bandwagon.
    - etcd master on master node.
    - etcd proxies in nodes.
  - update docs to reflect minimum required CoreOS version.
    - 653.0.0 is the first to ship with etcd2, which we now consume.
  - propagate changes on coreos/cloud-configs/ also to aws/cloud-configs/.
  - update tested k8s versions that this addresses in the
    getting-started-guides table ence making sure we are consistent across
    it regarding the versions we claim to have tested, add myself there as
    contact too.
  - do not assume that cloud-init shortcuts will get everything right.
    - they won't (as setup-network-environment who populates *_ipv4, etc
      only runs way later).
  - use flannel's plain defaults, as they should just be enough for the
    common case.

Signed-off-by: António Meireles <antonio.meireles@reformi.st>
@AntonioMeireles
Copy link
Contributor Author

@erictune done 'n' thanks.

erictune added a commit that referenced this pull request May 1, 2015
bringing CoreOS cloud-configs up-to-date (against 0.15.x and latest OS' alpha)
@erictune erictune merged commit 285a990 into kubernetes:master May 1, 2015
@AntonioMeireles AntonioMeireles deleted the etcd2 branch May 1, 2015 20:51
@ngConsulti
Copy link

@bussyjd, how is it "working fine now"? I am seeing the same pattern you described when --hostname_override is explicitly set to the private network IP.

kubelet.go:1561] error getting node: node 10.99.0.14 not found
kubelet.go:1735] error updating node status, will retry: error getting node "10.99.0.14": minion "10.99.0.14" not found

If I peek at the registry, I see the public IP listed:

$ etcdctl ls /registry/minions
/registry/minions/108.61.224.76

Empirically, it appears that the --hostname_override flag is ignored when first registering the minion/node, but respected when attempting to update the same.

Can anyone offer and explanation, or better yet, a workaround?

@roberthbailey
Copy link
Contributor

@ngConsulti My guess is that this is a mismatch between the nodeID created by kube-register and the nodeID that the node is using itself while sending node status updates to the apiserver. Once we get #6949 merged the kubelets will register themselves so there shouldn't be a mismatch between the nodeIDs any longer.

@ngConsulti
Copy link

Thanks, @roberthbailey. I switched to using the public IP to get unstuck. This allowed me to register a single minion/node. However, any additional nodes fail with the following error:

kubelet[874]: I0505 05:25:23.260569     874 event.go:200] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"104.238.147.238", UID:"104.238.147.238", APIVersion:"", ResourceVersion:"", FieldPath:""}): reason: 'starting' Starting kubelet.
kubelet[874]: E0505 05:25:23.277562     874 event.go:182] Server rejected event '&api.Event{TypeMeta:api.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"104.238.147.238.13db3c394341f287", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", CreationTimestamp:util.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*util.Time)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"104.238.147.238", UID:"104.238.147.238", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"starting", Message:"Starting kubelet.", Source:api.EventSource{Component:"kubelet", Host:"104.238.147.238"}, FirstTimestamp:util.Time{Time:time.Time{sec:63566400323, nsec:259462279, loc:(*time.Location)(0x176b160)}}, LastTimestamp:util.Time{Time:time.Time{sec:63566400323, nsec:259462279, loc:(*time.Location)(0x176b160)}}, Count:1}': 'the server responded with the status code 405 but did not return more information (post events)' (will not retry!)
...
kubelet[874]: E0505 05:25:23.409338     874 kubelet.go:1735] error updating node status, will retry: error getting node "104.238.147.238": minion "104.238.147.238" not found

Why the server says "Method Not Allowed" (HTTP 405) is beyond me...since it worked for the first node!

@ngConsulti
Copy link

Ah...good old user error. I failed to explicitly set the public-ip in fleet to the private IP, which caused kube-register to get confused. This solved my issue:

coreos:
...
  fleet:
    public-ip: $V4_PRIVATE_IP
...

@roberthbailey
Copy link
Contributor

@ngConsulti Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants