Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

Tnc bootstrap #3053

Merged
merged 1 commit into from
Mar 7, 2018
Merged

Tnc bootstrap #3053

merged 1 commit into from
Mar 7, 2018

Conversation

thorfour
Copy link
Contributor

@thorfour thorfour commented Mar 1, 2018

No description provided.

@coreosbot
Copy link

Can one of the admins verify this patch?

- /bootstrap
- --config=/etc/cluster-config
- --port=45900
- --cert=/assets/tls/ca.crt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre where can I mount the CA assets from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca certs are dropped into https://github.com/coreos/tectonic-installer/blob/master/modules/ignition/ca_certs.tf#L19
Other kube certs a temporary dropped into /opt/tectonic/tls
This probably relates to https://github.com/coreos/tectonic-installer/pull/2972/files

Copy link
Member

@enxebre enxebre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thorfour I rebased this branch onto master and got this to a semi-working state in this commit enxebre@d6ff12c, which addresses the comments here and add/remove some missing parts.
Feel free to cherry pick it. Things TBD:

config.tf Outdated
@@ -71,6 +71,7 @@ variable "tectonic_container_images" {
awscli = "quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600"
gcloudsdk = "google/cloud-sdk:178.0.0-alpine"
bootkube = "quay.io/coreos/bootkube:v0.10.0"
tnc_bootstrap = "quay.io/coreos/tectonic-node-controller-dev:fad3a8e284e2c414fdf1713c7e0ae9d1e1e487ba"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't download this image with my tectonic license

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's weird since I added your user to have read access on quay

@@ -61,7 +61,7 @@ resource "aws_autoscaling_group" "masters" {

data "ignition_config" "ncg_master" {
append {
source = "http://${var.cluster_name}-ncg.${var.base_domain}/ignition?profile=master"
source = "http://${var.cluster_name}-ncg.${var.base_domain}/ign/v1/role/master"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker image seems to be created from a branch which does not support rest like, but querystrings https://github.com/thorfour/tectonic-operators/blob/templates/controller/node/pkg/ignition/server.go#L81

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I tried to modify and bazel build //controller/node/cmd/bootstrap but got running external/local_config_cc/cc_wrapper.sh failed: exit status 1 ld: library not found for -lcrt0.o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the image was old. I pushed a new dev image that uses the rest path

@@ -27,7 +27,7 @@ data "aws_ami" "coreos_ami" {

data "ignition_config" "ncg_worker" {
append {
source = "http://${var.cluster_name}-ncg.${var.base_domain}/ignition?profile=worker"
source = "http://${var.cluster_name}-ncg.${var.base_domain}/ign/v1/role/worker"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker image seems to be created from a branch which does not support rest like, but querystrings https://github.com/thorfour/tectonic-operators/blob/templates/controller/node/pkg/ignition/server.go#L81

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in image: cb561ffb2b7c747a037231fab07a64fe6ebc7322

@@ -13,3 +13,24 @@ data:
networkProfile: ${tectonic_networking}
calicoConfig:
mtu: ${calico_mtu}
tnc-config: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-config.yaml is now generated at runtime by the cli, so this template is ignored, there's already a PR to delete it.
To get this config through the cluster-config will need to modify
https://github.com/coreos/tectonic-installer/blob/master/installer/pkg/config-generator/generator.go#L66
and
https://github.com/coreos/tectonic-config

kind: DaemonSet
metadata:
name: tectonic-node-controller
namespace: tectonic-system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lives in tectonic-system but the config in kube-system so it's not accessible for the pod

containers:
- name: tectonic-node-controller
image: ${tnc_bootstrap_image}
command:
Copy link
Member

@enxebre enxebre Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not like to the current image, docker inspect shows:

           "Entrypoint": [
                "/app/controller/node/cmd/bootstrap/bootstrap"
            ],

So command should be removed and need to add args:

        args:
        - --config=/etc/cluster-config/tnc-config
        - --port=45900
        - --cert=/opt/tectonic/tls/ca.crt
        - --key=/opt/tectonic/tls/ca.key

@thorfour
Copy link
Contributor Author

thorfour commented Mar 2, 2018

@enxebre latest dev image e10b61c01c8b297da029d517922c72cbd71017f7 supports the rest paths, and has a new flag called debug which turns off tls

@thorfour
Copy link
Contributor Author

thorfour commented Mar 2, 2018

current state of affairs is bootstrapping is stuck with 2018/03/02 13:51:08 Waiting for TNC to be running, this might take a while...

@enxebre
Copy link
Member

enxebre commented Mar 5, 2018

Hey @thorfour just need to set the s3 key for the bootstrap node back to the rest like url enxebre@c331064
Now we can focus on templates correctness, first issue with current config for either masters and workers is k8s-node-bootstrap.service failing because of tectonic_torcx_image_url and tectonic_torcx_image_tag are missing
Also noticed other variables not being set. The config yaml gets properly populated so this seems like tnc templates interpolation not working

@thorfour
Copy link
Contributor Author

thorfour commented Mar 5, 2018

🎉 It bootstraps a cluster! (kind of)

tectonic comes up just fine. But I ended up with a single node in the cluster

@thorfour
Copy link
Contributor Author

thorfour commented Mar 5, 2018

I'm wondering if that has to do with the fact that I had to run the join step after tnc bootstrapping was already torn down?

@coreosbot
Copy link

Can one of the admins verify this patch?

@enxebre
Copy link
Member

enxebre commented Mar 6, 2018

Hey @thorfour tectonic install bootstrap will always deploy a single node cluster independently from the TNC (the bootstrap node which gets config from s3).
The rest of the nodes will get the ign config from the TNC running on the bootstrap node when running tectonic install join
Or you can just run it all in one go with tectonic install

Currently when running tectonic install bootstrap and then sshing into the bootstrap node and curl http://0.0.0.0:49500/ign/v1/role/master or curl http://0.0.0.0:49500/ign/v1/role/worker I get unable to parse node config: failed to unmarshal systemd unit into struct: yaml: unmarshal errors: so no nodes will ever be able to join the cluster.

We know this pr already take cares of deploying the tnc successfully so what needs to be done now is to ensure the TNC serves the right config so other nodes can join the cluster successfully

@enxebre
Copy link
Member

enxebre commented Mar 6, 2018

this fixes the issue mentioned above, please cherry pick it enxebre@d7141f1
And these are the issues with the current templates https://github.com/coreos-inc/tectonic-operators/pull/286#issuecomment-370751593. By fixing those and #3053 (comment), nodes are able to join the cluster

https_proxy = "${var.tectonic_https_proxy_address}"
image_re = "${var.tectonic_image_re}"
kube_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
kubelet_node_label = "node-role.kubernetes.io/master"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is setting master label for both master and workers

@thorfour thorfour changed the title Tnc bootstrap WIP Tnc bootstrap Mar 6, 2018
@thorfour
Copy link
Contributor Author

thorfour commented Mar 6, 2018

@enxebre all green. Let's do TLS and long-running switchover in a follow-up PR

@enxebre
Copy link
Member

enxebre commented Mar 7, 2018

hey @thorfour thanks a lot! This looks a good start for fully transitioning to the TNC.
TBD in follow up PRs:

kubelet_image_url = "${replace(var.container_images["hyperkube"],var.image_re,"$1")}"
kubelet_image_tag = "${replace(var.container_images["hyperkube"],var.image_re,"$2")}"
iscsi_enabled = "${var.iscsi_enabled}"
kubeconfig_fetch_cmd = "${var.kubeconfig_fetch_cmd != "" ? "ExecStartPre=${var.kubeconfig_fetch_cmd}" : ""}"
Copy link
Member

@enxebre enxebre Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be to have "ExecStartPre=" as part of the go template a bit less error prone?

@enxebre enxebre merged commit f4bca81 into coreos:master Mar 7, 2018
wking added a commit to wking/openshift-installer that referenced this pull request Aug 21, 2018
In 251147e (TNC bootstrapping, 2018-02-13,
coreos/tectonic-installer#3053), the resource was renamed from 'ncg'
to 'tnc', but the name property wasn't updated to match.
wking added a commit to wking/openshift-installer that referenced this pull request Aug 21, 2018
In 251147e (TNC bootstrapping, 2018-02-13,
coreos/tectonic-installer#3053), the resource was renamed from 'ncg'
to 'tnc', but the name property wasn't updated to match.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants