conjure-up isn't properly handling a failed bootstrap #641

Closed
battlemidget opened this Issue Jan 31, 2017 · 10 comments

Comments

Projects
None yet
3 participants
Contributor

battlemidget commented Jan 31, 2017

In some cases where a bootstrap doesn't complete fully some of the configs for juju may be in a inconsistent state which in turn enables conjure-up to fail when attempting to login to the controller.

Contributor

mikemccracken commented Jan 31, 2017

Contributor

battlemidget commented Jan 31, 2017

This is the current error seen if the accounts.yaml file is either missing or incomplete: http://pastebin.com/2HwvdqPn

I think the controllers.yaml file exists but a matching account for that controller does not. Still waiting on the contents of this particular users's juju files.

Contributor

mikemccracken commented Jan 31, 2017

mc0e commented Jan 31, 2017

I'm about to try deploying kubernetes core to localhost a third time. In each of the attempts so far the process has been incomplete, and it's at least unclear how to restart it.

In the first attempt, one of the etcd processes and one of the workers had still not come up after 3 hours, and I terminated conjure-up (ctrl-c). It was far from clear how I might try to recover the situation, so in the end I used lxc stop <id> and lxc delete <id> to remove everything, and then since conjure-up wouldn't run, I deleted the .local/share/juju directory. No-doubt there's a better way somehow, but the "It's all so easy" sales pitch is already looking dubious.

I tried again. It takes a long time to run, so I left it running while I went off to bed. Next day I saw that it had gotten to a point where it appeared about to install and configure kubectl, but that hadn't happened (waiting for user input?) and the network link had had a glitch sometime over night.

What I'd like to see is conjure-up having the ability on a subsequent invocation to recognise that it has an incomplete build operation, and ask if it should pick up where it left off. Failing that it should at least assist with tearing down the failed installation.

After my first attempt, I thought that maybe conjure-up would find the remains of the previous run, but that just led to me having many lxd containers running by the time I decided to just wipe them out and start over.

Contributor

battlemidget commented Jan 31, 2017

@mc0e while we address your other points in the code can you give us some additional information on your environment? Are you running conjure-up inside a VM? If so, what hardware specs? If not, what is the hardware specs on the system you're running conjure-up? If it's a laptop does it meet the recommended requirements from http://conjure-up.io/docs/en/users/#hardware-requirements?

Contributor

battlemidget commented Jan 31, 2017

Also for this particular issue this is the error that isn't being surfaced properly:

ERROR enabling HTTPS listener: cannot listen on https socket: listen tcp [::]:8443: socket: address family not supported by protocol

So it's giving us a false error thinking it has something to do with missing juju files.

@battlemidget battlemidget self-assigned this Jan 31, 2017

@battlemidget battlemidget added the bug label Jan 31, 2017

@battlemidget battlemidget added this to the 2.1.0 milestone Jan 31, 2017

mc0e commented Jan 31, 2017

I'm running in VM with 8GB RAM and 200GB SSD. Less than those hardware requirements you point me to. That might make sense of the varied points at which the process fails.

Would I expect 8GB of RAM to prevent me even installing, or would it just mean there's not all that much room to do things in it once it's up? At this point I'm primarily interested in exploring how it works.

@battlemidget battlemidget changed the title from incomplete juju bootstraps leaves some incomplete configs; causes conjure-up to fail to conjure-up isn't properly handling a failed bootstrap Feb 9, 2017

Contributor

battlemidget commented Feb 9, 2017

@mikemccracken this is the bug I was wanting to address with properly handling a failed bootstrap and providing the user with the errorview and a correct traceback.

Contributor

battlemidget commented Feb 10, 2017

This is handled with 2403933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment