Bring up a kuberenetes cluster using coreos image as worker nodes #7445

dchen1107 · 2015-04-28T16:10:55Z

By default, gce provider is using ContainerVM image. Ran e2e tests against the default configuration, all tests are passed:

Ran 36 of 41 Specs in 1098.626 seconds
SUCCESS! -- 36 Passed | 0 Failed | 1 Pending | 4 Skipped I0428 01:48:18.127752 15780 driver.go:96] All tests pass

To bringing up a kubernetes cluster using coreos image with rkt installed, one can export following variables first, then call kube-up.sh

export KUBE_OS_DISTRIBUTION=coreos
export KUBE_GCE_MINION_IMAGE=coreos-stable-633-1-0-v20150414
export KUBE_GCE_MINION_PROJECT=coreos-cloud

The new cloud provider (gce-coreos) works until I rebased to the latest one. I believe the breakage introduced by kube_proxy_token which merged yesterday afternoon. Please note that works means

a cluster with rkt installed
master can schedule work to node
node can start containers through docker daemon still since current Kubelet doesn't enable rkt runtime yet.
master can query node / pod status

Next step is kubelet integrating with rkt runtime, so that we can announce the experimental support for rkt.

cc/ @bgrant0607 @vmarmol @yifan-gu

vmarmol · 2015-04-28T16:24:16Z

cluster/gce/config-default.sh

+OS_DISTRIBUTION=${KUBE_OS_DISTRIBUTION:-debian}
+MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-container-vm-v20150317}
+MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers}
+MINION_IMAGE=${KUBE_GCE_MINION_IMAGE:-container-vm-v20150317}


nit: s/MINION/NODE/g

yifan-gu · 2015-04-28T16:31:16Z

cc @bakins

bgrant0607 · 2015-04-28T16:37:29Z

Looks reasonable to me. A comment about how to enable coreos would be useful.
cc @zmerlynn

bakins · 2015-04-28T17:59:45Z

Looks much better than the approach I was taking: creating a complete new provider.

dchen1107 · 2015-04-28T22:10:28Z

Fixed issue related to kube-proxy-token. Now coreos cluster (only worker nodes) are up and running:

$ cluster/kubectl.sh get -o json nodes shows the node is running coreos, and ready. But ContainerRuntime is still docker for now.

Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get -o json nodes
{
    "kind": "List",
    "apiVersion": "v1beta3",
    "metadata": {},
    "items": [
        {
            "kind": "Node",
            "apiVersion": "v1beta3",
            "metadata": {
                "name": "kubernetes-minion-8q2o",
                "selfLink": "/api/v1beta1/nodes/kubernetes-minion-8q2o",
                "uid": "b758703c-edf1-11e4-b47c-42010af0d8b5",
                "resourceVersion": "191",
                "creationTimestamp": "2015-04-28T21:58:34Z"
            },
            "spec": {
                "externalID": "7765620702215044789"
            },
            "status": {
                "capacity": {
                    "cpu": "1",
                    "memory": "3794428Ki"
                },
                "conditions": [
                    {
                        "type": "Ready",
                        "status": "True",
                        "lastHeartbeatTime": "2015-04-28T22:02:24Z",
                        "lastTransitionTime": "2015-04-28T21:58:36Z",
                        "reason": "kubelet is posting ready status"
                    }
                ],
                "addresses": [
                    {
                        "type": "ExternalIP",
                        "address": "104.154.86.58"
                    },
                    {
                        "type": "LegacyHostIP",
                        "address": "104.197.5.143"
                    }
                ],
                "nodeInfo": {
                    "machineID": "56a63500ff0945d4536c0bce20b744c8",
                    "systemUUID": "56A63500-FF09-45D4-536C-0BCE20B744C8",
                    "bootID": "5dec80dd-81ed-4c02-ade9-69f352b41277",
                    "kernelVersion": "3.19.0",
                    "osImage": "CoreOS 633.1.0",
                    "containerRuntimeVersion": "docker://1.5.0",
                    "kubeletVersion": "v0.15.0-826-g8c098be0f83bce-dirty",
                    "KubeProxyVersion": "v0.15.0-826-g8c098be0f83bce-dirty"
                }
            }
        }
    ]
}

Scheduled a pod to node (one node in my cluster):

$ cluster/kubectl.sh create -f examples/guestbook/redis-master-controller.json
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl create -f examples/guestbook/redis-master-controller.json
replicationcontrollers/redis-master

$ cluster/kubectl.sh get rc
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get rc
CONTROLLER                             CONTAINER(S)            IMAGE(S)                                          SELECTOR                     REPLICAS
elasticsearch-logging                  elasticsearch-logging   gcr.io/google_containers/elasticsearch:1.0        name=elasticsearch-logging   1
kibana-logging                         kibana-logging          gcr.io/google_containers/kibana:1.2               name=kibana-logging          1
kube-dns                               etcd                    quay.io/coreos/etcd:v2.0.3                        k8s-app=kube-dns             1
                                       kube2sky                gcr.io/google_containers/kube2sky:1.2                                          
                                       skydns                  gcr.io/google_containers/skydns:2015-03-11-001                                 
monitoring-heapster-controller         heapster                gcr.io/google_containers/heapster:v0.10.0         name=heapster                1
monitoring-influx-grafana-controller   influxdb                gcr.io/google_containers/heapster_influxdb:v0.3   name=influxGrafana           1
                                       grafana                 gcr.io/google_containers/heapster_grafana:v0.6                                 
redis-master                           master                  redis                                             name=redis-master            1

$ cluster/kubectl.sh get pods -l name=redis-master
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get pods -l name=redis-master
POD                  IP           CONTAINER(S)   IMAGE(S)   HOST                                   LABELS              STATUS    CREATED          MESSAGE
redis-master-g44nt   172.17.0.7                             kubernetes-minion-8q2o/104.197.5.143   name=redis-master   Running   About a minute   
                                  master         redis                                                                 Running   36 seconds

cc/ @brendandburns @thockin

@yifan-gu could you please have a PR to enable rkt runtime for kubelet. Once this is merged, I will disable docker and test rkt support throughout. Thanks!

yifan-gu · 2015-04-28T22:21:48Z

@dchen1107 Thanks for this!! I just finished the basic implementation for missing functions. I will clean up for a review, and we still need to refactor the kubelet a bit to let runtime provide syncPod interface, so that I can enable rkt.

bgrant0607 · 2015-04-28T22:22:29Z

I guess Travis doesn't run the tests if we only change cluster turnup scripts/config?

Merging since @dchen1107 says e2e passed.

bgrant0607 · 2015-04-28T22:22:45Z

Needs rebase, actually.

yifan-gu · 2015-04-28T22:25:01Z

@dchen1107 Or we can skip refactor for now, and just hijack a rkt.syncPod() in kubelet.syncPod(), which can be faster, and we can refactor that next week?

zmerlynn · 2015-04-28T22:30:53Z

cluster/gce/util.sh

@@ -646,6 +577,7 @@ function kube-up {
  for (( i=0; i<${#MINION_NAMES[@]}; i++)); do
    create-route "${MINION_NAMES[$i]}" "${MINION_IP_RANGES[$i]}" &
    add-instance-metadata "${MINION_NAMES[$i]}" "node-ip-range=${MINION_IP_RANGES[$i]}" &
+    add-instance-metadata "${MINION_NAMES[$i]}" "node-name=${MINION_NAMES[$i]}" &


This won't work, is racy if you have a lot of nodes. You need to use the multi-KV form of add-instance-metadata that does it in one GCE command, otherwise the command may fail.

(the instance metadata update uses an opportunistic locking approach, so this just pounds the same metadata from two processes and it fails. It's easy to see from even a tiny node count.)

Wait, where is node-name even used? I'm not seeing this new metadata even consumed.

Yes it is unused due to the issue you pointed out here. I removed it.

…fault for enable coreos and rocket support

zmerlynn · 2015-04-28T22:38:39Z

Please don't merge this as is. You're moving a file that is actually pushed as part of the release. (You could try to run https://github.com/GoogleCloudPlatform/kubernetes/blob/master/build/push-devel-build.sh and you'd see an error .. that mimicks how the build pushes files today.)configure-vm.sh` is relied on internally by GKE.

dchen1107 · 2015-04-28T22:53:24Z

@zmerlynn If I moved configure-vm.sh back to the cluster/gce/, is the problem resolved here? configure-vm.sh is not required for coreos worker node now. We can clean it up for later.

zmerlynn · 2015-04-28T22:55:09Z

Yes, if it stays in place that's fine.

helper utility library.

dchen1107 · 2015-04-29T00:26:44Z

Ok, I did

rebased the pr
moved configure-vm.sh back to the original directory, and ran build/push-devel-build.sh, works
re-ran e2e against gce debian cluster, passed:

Ran 36 of 41 Specs in 1124.775 seconds
SUCCESS! -- 36 Passed | 0 Failed | 1 Pending | 4 Skipped I0428 17:23:51.407086 8669 driver.go:96] All tests pass

I think we can merge it now.

vmarmol · 2015-04-29T01:04:39Z

Thanks @dchen1107! Merging. If anyone hits any issues due to this PR please ping me and @dchen1107

Bring up a kuberenetes cluster using coreos image as worker nodes

dchen1107 · 2015-04-29T01:54:39Z

@ andyzheng0831 here is what I had so far. Hope this can help you with your project.

dchen1107 · 2015-05-07T17:37:39Z

@andyzheng0831 again :-)

googlebot added the cla: yes label Apr 28, 2015

vmarmol reviewed Apr 28, 2015
View reviewed changes

dchen1107 force-pushed the rkt-support branch from 1bfba93 to 85aa163 Compare April 28, 2015 16:49

a-robinson assigned vmarmol Apr 28, 2015

dchen1107 force-pushed the rkt-support branch 2 times, most recently from 255d694 to 6f00ecf Compare April 28, 2015 17:36

dchen1107 force-pushed the rkt-support branch from 6f00ecf to f7e8dbe Compare April 28, 2015 22:02

dchen1107 changed the title ~~WIP: Bring up a kuberenetes cluster using coreos image as worker nodes~~ Bring up a kuberenetes cluster using coreos image as worker nodes Apr 28, 2015

zmerlynn reviewed Apr 28, 2015
View reviewed changes

Introduce MASTER_IMAGE, MINION_IMAGE and OS_DISTRIBUTION to config-de…

8963347

…fault for enable coreos and rocket support

dchen1107 force-pushed the rkt-support branch from f7e8dbe to ae6f606 Compare April 28, 2015 22:42

Factory out debian e.g. ContainerVM image specific support to its own

5fa1132

helper utility library.

dchen1107 force-pushed the rkt-support branch from 68e7cc8 to e6e98ef Compare April 28, 2015 23:08

dchen1107 added 2 commits April 28, 2015 16:13

Bring up a cluster using coreos image for worker nodes.

13a0b03

Remove unused node-name attribute

876f8be

dchen1107 force-pushed the rkt-support branch from e6e98ef to 876f8be Compare April 28, 2015 23:14

dchen1107 mentioned this pull request Apr 29, 2015

kubelet/rkt: Add basic rkt runtime routines. #7465

Merged

vmarmol added a commit that referenced this pull request Apr 29, 2015

Merge pull request #7445 from dchen1107/rkt-support

fc34277

Bring up a kuberenetes cluster using coreos image as worker nodes

vmarmol merged commit fc34277 into kubernetes:master Apr 29, 2015

dchen1107 added area/rkt sig/node Categorizes an issue or PR as relevant to SIG Node. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Apr 29, 2015

vmarmol mentioned this pull request Apr 29, 2015

bringing CoreOS cloud-configs up-to-date (against 0.15.x and latest OS' alpha) #6973

Merged

bakins mentioned this pull request May 7, 2015

AWS: use CoreOS for nodes #7905

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring up a kuberenetes cluster using coreos image as worker nodes #7445

Bring up a kuberenetes cluster using coreos image as worker nodes #7445

dchen1107 commented Apr 28, 2015

vmarmol Apr 28, 2015

yifan-gu commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

bakins commented Apr 28, 2015

dchen1107 commented Apr 28, 2015

yifan-gu commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

yifan-gu commented Apr 28, 2015

zmerlynn Apr 28, 2015

zmerlynn Apr 28, 2015

zmerlynn Apr 28, 2015

dchen1107 Apr 28, 2015

zmerlynn commented Apr 28, 2015

dchen1107 commented Apr 28, 2015

zmerlynn commented Apr 28, 2015

dchen1107 commented Apr 29, 2015

vmarmol commented Apr 29, 2015

dchen1107 commented Apr 29, 2015

dchen1107 commented May 7, 2015

Bring up a kuberenetes cluster using coreos image as worker nodes #7445

Bring up a kuberenetes cluster using coreos image as worker nodes #7445

Conversation

dchen1107 commented Apr 28, 2015

vmarmol Apr 28, 2015

Choose a reason for hiding this comment

yifan-gu commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

bakins commented Apr 28, 2015

dchen1107 commented Apr 28, 2015

yifan-gu commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

yifan-gu commented Apr 28, 2015

zmerlynn Apr 28, 2015

Choose a reason for hiding this comment

zmerlynn Apr 28, 2015

Choose a reason for hiding this comment

zmerlynn Apr 28, 2015

Choose a reason for hiding this comment

dchen1107 Apr 28, 2015

Choose a reason for hiding this comment

zmerlynn commented Apr 28, 2015

dchen1107 commented Apr 28, 2015

zmerlynn commented Apr 28, 2015

dchen1107 commented Apr 29, 2015

vmarmol commented Apr 29, 2015

dchen1107 commented Apr 29, 2015

dchen1107 commented May 7, 2015