Skip to content
This repository has been archived by the owner on Sep 2, 2021. It is now read-only.

Rebar KRIB plugin fails with Task etcd-config failed during cert setup stage (or just after) #133

Closed
taylor opened this issue Sep 5, 2018 · 1 comment

Comments

@taylor
Copy link

taylor commented Sep 5, 2018

KRIB is failing with Task etcd-config failed

Full error:

Log for Job: aa7e391c-87fe-4344-ba5b-3e7ce81e7d42
Starting task krib-install-cluster:etcd-config:etcd-config on machine-2
Starting command ./etcd-config-etcd-config.sh.tmpl


Command running
Configure the etcd cluster
Add initial variables to track members.
[]
[]
Creating 1 servers
Electing etcd members to cluster profile: k8s
Certs plugin detected....setting up CA
We are first machine in cluster, setting up the root certs...
  Client CA Exists, but we did not set password.  Need to reset!!
Command exited with status 1
Action etcd-config.sh.tmpl finished
Task etcd-config failed
Marked machine machine-2 as not runnable
Updated job for krib-install-cluster:etcd-config:etcd-config to failed
Task signalled that it failed

It look like it' falling into this block here, https://github.com/digitalrebar/provision-content/blob/master/krib/templates/etcd-config.sh.tmpl#L68

It seems like the command

drpcli machines runaction $RS_UUID getca certs/root $CLIENT_CA 2>/dev/null

is failing to run so it thinks the certs are created "when they are not".

The steps to recreate below assume Rebar is already installed. We started with the KRIB content plugin following the video https://youtu.be/rzBq3BsYQTM?t=1295. We also assume you have created a portal.rackn.io account (required for bulk actions and adding the content package)

Steps to recreate.

  1. Go to https://portal.rackn.io/#/e/147.75.196.129:8092/machines and login

  2. Go to https://portal.rackn.io/#/e/147.75.196.129:8092/plugins/packet-ipmi and enter a machine name, count and click create. This will send an api request to create the nodes.

  3. Go to https://portal.rackn.io/#/e/147.75.196.129:8092/machines, and wait for nodes to be discovered.

  4. Go to krib profile https://portal.rackn.io/#/e/147.75.196.129:8092/profiles/example-ha-krib

  5. Clone

  6. Enter name for new profile e.g k8s

  7. Set the etcd/cluster-profile param to match the profile name "k8s"

  8. Set the krib/cluster-profile param to match the profile name "k8s"

  9. Save new profile

  10. Go to https://portal.rackn.io/#/e/147.75.196.129:8092/bulk

  11. Select all machines to be used in new k8s cluster

  12. Use profile drop down to choose the new "k8s" profile

  13. Click the + symbol to add the profile to the machines

  14. Use Workflows drop down to choose the krib-install-cluster workflow

  15. Click the Change Worlflow button (play / skip button) - This starts the k8s deployment

Expected:

  • Running Kubernetes cluster

Result:

  • KRIB is failing with Task etcd-config failed
@zehicle
Copy link
Member

zehicle commented Sep 5, 2018

The "Client CA Exists, but we did not set password. Need to reset!!" warning in the script indicates that the certificate information in the cert-data profile does not match the profile. If you run the "krib-reset-cluster" stage to clear the values then it will remove the previous run data.

If using sledgehammer instances, you'll need to reboot the machines to detach the drives.

Please try that and see if it clears the error.

@zehicle zehicle closed this as completed Nov 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants