Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring up a kuberenetes cluster using coreos image as worker nodes #7445

Merged
merged 4 commits into from Apr 29, 2015

Conversation

dchen1107
Copy link
Member

By default, gce provider is using ContainerVM image. Ran e2e tests against the default configuration, all tests are passed:

Ran 36 of 41 Specs in 1098.626 seconds
SUCCESS! -- 36 Passed | 0 Failed | 1 Pending | 4 Skipped I0428 01:48:18.127752   15780 driver.go:96] All tests pass

To bringing up a kubernetes cluster using coreos image with rkt installed, one can export following variables first, then call kube-up.sh

export KUBE_OS_DISTRIBUTION=coreos
export KUBE_GCE_MINION_IMAGE=coreos-stable-633-1-0-v20150414
export KUBE_GCE_MINION_PROJECT=coreos-cloud

The new cloud provider (gce-coreos) works until I rebased to the latest one. I believe the breakage introduced by kube_proxy_token which merged yesterday afternoon. Please note that works means

  • a cluster with rkt installed
  • master can schedule work to node
  • node can start containers through docker daemon still since current Kubelet doesn't enable rkt runtime yet.
  • master can query node / pod status

Next step is kubelet integrating with rkt runtime, so that we can announce the experimental support for rkt.

cc/ @bgrant0607 @vmarmol @yifan-gu

OS_DISTRIBUTION=${KUBE_OS_DISTRIBUTION:-debian}
MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-container-vm-v20150317}
MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers}
MINION_IMAGE=${KUBE_GCE_MINION_IMAGE:-container-vm-v20150317}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/MINION/NODE/g

@yifan-gu
Copy link
Contributor

cc @bakins

@bgrant0607
Copy link
Member

Looks reasonable to me. A comment about how to enable coreos would be useful.
cc @zmerlynn

@bakins
Copy link

bakins commented Apr 28, 2015

Looks much better than the approach I was taking: creating a complete new provider.

@dchen1107
Copy link
Member Author

Fixed issue related to kube-proxy-token. Now coreos cluster (only worker nodes) are up and running:

$ cluster/kubectl.sh get -o json nodes shows the node is running coreos, and ready. But ContainerRuntime is still docker for now.

Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get -o json nodes
{
    "kind": "List",
    "apiVersion": "v1beta3",
    "metadata": {},
    "items": [
        {
            "kind": "Node",
            "apiVersion": "v1beta3",
            "metadata": {
                "name": "kubernetes-minion-8q2o",
                "selfLink": "/api/v1beta1/nodes/kubernetes-minion-8q2o",
                "uid": "b758703c-edf1-11e4-b47c-42010af0d8b5",
                "resourceVersion": "191",
                "creationTimestamp": "2015-04-28T21:58:34Z"
            },
            "spec": {
                "externalID": "7765620702215044789"
            },
            "status": {
                "capacity": {
                    "cpu": "1",
                    "memory": "3794428Ki"
                },
                "conditions": [
                    {
                        "type": "Ready",
                        "status": "True",
                        "lastHeartbeatTime": "2015-04-28T22:02:24Z",
                        "lastTransitionTime": "2015-04-28T21:58:36Z",
                        "reason": "kubelet is posting ready status"
                    }
                ],
                "addresses": [
                    {
                        "type": "ExternalIP",
                        "address": "104.154.86.58"
                    },
                    {
                        "type": "LegacyHostIP",
                        "address": "104.197.5.143"
                    }
                ],
                "nodeInfo": {
                    "machineID": "56a63500ff0945d4536c0bce20b744c8",
                    "systemUUID": "56A63500-FF09-45D4-536C-0BCE20B744C8",
                    "bootID": "5dec80dd-81ed-4c02-ade9-69f352b41277",
                    "kernelVersion": "3.19.0",
                    "osImage": "CoreOS 633.1.0",
                    "containerRuntimeVersion": "docker://1.5.0",
                    "kubeletVersion": "v0.15.0-826-g8c098be0f83bce-dirty",
                    "KubeProxyVersion": "v0.15.0-826-g8c098be0f83bce-dirty"
                }
            }
        }
    ]
}

Scheduled a pod to node (one node in my cluster):

$ cluster/kubectl.sh create -f examples/guestbook/redis-master-controller.json
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl create -f examples/guestbook/redis-master-controller.json
replicationcontrollers/redis-master

$ cluster/kubectl.sh get rc
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get rc
CONTROLLER                             CONTAINER(S)            IMAGE(S)                                          SELECTOR                     REPLICAS
elasticsearch-logging                  elasticsearch-logging   gcr.io/google_containers/elasticsearch:1.0        name=elasticsearch-logging   1
kibana-logging                         kibana-logging          gcr.io/google_containers/kibana:1.2               name=kibana-logging          1
kube-dns                               etcd                    quay.io/coreos/etcd:v2.0.3                        k8s-app=kube-dns             1
                                       kube2sky                gcr.io/google_containers/kube2sky:1.2                                          
                                       skydns                  gcr.io/google_containers/skydns:2015-03-11-001                                 
monitoring-heapster-controller         heapster                gcr.io/google_containers/heapster:v0.10.0         name=heapster                1
monitoring-influx-grafana-controller   influxdb                gcr.io/google_containers/heapster_influxdb:v0.3   name=influxGrafana           1
                                       grafana                 gcr.io/google_containers/heapster_grafana:v0.6                                 
redis-master                           master                  redis                                             name=redis-master            1

$ cluster/kubectl.sh get pods -l name=redis-master
Starting cluster using os distro: debian
current-context: "golden-system-455_kubernetes"
Running: cluster/../cluster/gce/../../cluster/../_output/dockerized/bin/linux/amd64/kubectl get pods -l name=redis-master
POD                  IP           CONTAINER(S)   IMAGE(S)   HOST                                   LABELS              STATUS    CREATED          MESSAGE
redis-master-g44nt   172.17.0.7                             kubernetes-minion-8q2o/104.197.5.143   name=redis-master   Running   About a minute   
                                  master         redis                                                                 Running   36 seconds      

cc/ @brendandburns @thockin

@yifan-gu could you please have a PR to enable rkt runtime for kubelet. Once this is merged, I will disable docker and test rkt support throughout. Thanks!

@dchen1107 dchen1107 changed the title WIP: Bring up a kuberenetes cluster using coreos image as worker nodes Bring up a kuberenetes cluster using coreos image as worker nodes Apr 28, 2015
@yifan-gu
Copy link
Contributor

@dchen1107 Thanks for this!! I just finished the basic implementation for missing functions. I will clean up for a review, and we still need to refactor the kubelet a bit to let runtime provide syncPod interface, so that I can enable rkt.

@bgrant0607
Copy link
Member

I guess Travis doesn't run the tests if we only change cluster turnup scripts/config?

Merging since @dchen1107 says e2e passed.

@bgrant0607
Copy link
Member

Needs rebase, actually.

@yifan-gu
Copy link
Contributor

@dchen1107 Or we can skip refactor for now, and just hijack a rkt.syncPod() in kubelet.syncPod(), which can be faster, and we can refactor that next week?

@@ -646,6 +577,7 @@ function kube-up {
for (( i=0; i<${#MINION_NAMES[@]}; i++)); do
create-route "${MINION_NAMES[$i]}" "${MINION_IP_RANGES[$i]}" &
add-instance-metadata "${MINION_NAMES[$i]}" "node-ip-range=${MINION_IP_RANGES[$i]}" &
add-instance-metadata "${MINION_NAMES[$i]}" "node-name=${MINION_NAMES[$i]}" &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, is racy if you have a lot of nodes. You need to use the multi-KV form of add-instance-metadata that does it in one GCE command, otherwise the command may fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the instance metadata update uses an opportunistic locking approach, so this just pounds the same metadata from two processes and it fails. It's easy to see from even a tiny node count.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, where is node-name even used? I'm not seeing this new metadata even consumed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is unused due to the issue you pointed out here. I removed it.

@zmerlynn
Copy link
Member

Please don't merge this as is. You're moving a file that is actually pushed as part of the release. (You could try to run https://github.com/GoogleCloudPlatform/kubernetes/blob/master/build/push-devel-build.sh and you'd see an error .. that mimicks how the build pushes files today.)configure-vm.sh` is relied on internally by GKE.

@dchen1107
Copy link
Member Author

@zmerlynn If I moved configure-vm.sh back to the cluster/gce/, is the problem resolved here? configure-vm.sh is not required for coreos worker node now. We can clean it up for later.

@zmerlynn
Copy link
Member

Yes, if it stays in place that's fine.

@dchen1107
Copy link
Member Author

Ok, I did

  1. rebased the pr
  2. moved configure-vm.sh back to the original directory, and ran build/push-devel-build.sh, works
  3. re-ran e2e against gce debian cluster, passed:

Ran 36 of 41 Specs in 1124.775 seconds
SUCCESS! -- 36 Passed | 0 Failed | 1 Pending | 4 Skipped I0428 17:23:51.407086 8669 driver.go:96] All tests pass

I think we can merge it now.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 29, 2015

Thanks @dchen1107! Merging. If anyone hits any issues due to this PR please ping me and @dchen1107

vmarmol added a commit that referenced this pull request Apr 29, 2015
Bring up a kuberenetes cluster using coreos image as worker nodes
@vmarmol vmarmol merged commit fc34277 into kubernetes:master Apr 29, 2015
@dchen1107 dchen1107 added area/rkt sig/node Categorizes an issue or PR as relevant to SIG Node. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Apr 29, 2015
@dchen1107
Copy link
Member Author

@ andyzheng0831 here is what I had so far. Hope this can help you with your project.

@dchen1107
Copy link
Member Author

@andyzheng0831 again :-)

@bakins bakins mentioned this pull request May 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rkt sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants