New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate validation issues when provisioning with version > v1.10.18 of this cookbook #32

Closed
jperville opened this Issue Dec 18, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@jperville
Collaborator

jperville commented Dec 18, 2016

Hello @IshentRas and @ianmiell ,

As promised in #29 (comment) , here is the bug report for the issue I'm having with this cookbook since v1.10.19. This is becoming critical, since I now have to chose between staying on v1.10.18 (and backporting my fixes by hand) and enjoying the new features added since v1.10.20.

I have a Vagrantfile featuring several VMs (master and minion); the ipaddress of the openshift master and nodes are stored in role attributes as needed by this cookbook (see demo project Vagrantfile below). Before v1.10.18, I could provision just fine (even if the first run would fail because origin-master service not restarted yet after configuring, a second chef-run finishes the job). Since v1.10.19, I run into certificate validation issues which makes the origin-node service on the master VM fail forever with the following message in the journal:

Dec 18 18:36:18 master systemd[1]: Starting Origin Node...
Dec 18 18:36:18 master origin-node[24838]: F1218 18:36:18.150550   24838 start_node.go:126] cannot fetch "default" cluster network: Get https://192.168.33.220.xip.io:8443/oapi/v1/clusternetworks/default: x509: certificate signed by unknown authority
Dec 18 18:36:18 master systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Dec 18 18:36:18 master systemd[1]: Failed to start Origin Node.
Dec 18 18:36:18 master systemd[1]: Unit origin-node.service entered failed state.
Dec 18 18:36:18 master systemd[1]: origin-node.service failed.

The origin-master journal is spammed with messages like this:

Dec 18 18:50:10 master origin-master[24505]: I1218 18:50:10.486666   24505 nodecontroller.go:609] NodeController is entering network segmentation mode.
Dec 18 18:50:13 master origin-master[24505]: I1218 18:50:13.262304   24505 server.go:2161] http: TLS handshake error from 192.168.33.220:38170: remote error: bad certificate

This problem is 100% reproducible. I have prepared a demo project to reproduce the issue here: https://github.com/PerfectMemory/origin-provision-bug-demo .. Just git clone and then run vagrant provision master (I assume that you have the tool installed). I use the latest stable version of Vagrant, Virtualbox and chef-dk.

I also included a full log of vagrant up master on my system here: https://github.com/PerfectMemory/origin-provision-bug-demo/blob/master/vagrant-master.log

Thank you in advance.

@IshentRas

This comment has been minimized.

Show comment
Hide comment
@IshentRas

IshentRas Dec 18, 2016

Owner

Thanks very much @jperville
I have now submitted a PR to your project PerfectMemory/origin-provision-bug-demo#1
I have also fixed the issue regarding node etc..
In a nutshell, everything should be working now :-)

In addition to the fix we put in 1.10.22, I do believe your main issues were wrong run_list + bad options
Check CHANGELOG : https://github.com/IshentRas/cookbook-openshift3/blob/master/CHANGELOG.md

Owner

IshentRas commented Dec 18, 2016

Thanks very much @jperville
I have now submitted a PR to your project PerfectMemory/origin-provision-bug-demo#1
I have also fixed the issue regarding node etc..
In a nutshell, everything should be working now :-)

In addition to the fix we put in 1.10.22, I do believe your main issues were wrong run_list + bad options
Check CHANGELOG : https://github.com/IshentRas/cookbook-openshift3/blob/master/CHANGELOG.md

@IshentRas IshentRas self-assigned this Dec 18, 2016

@IshentRas IshentRas added the bug label Dec 18, 2016

@jperville

This comment has been minimized.

Show comment
Hide comment
@jperville

jperville Dec 19, 2016

Collaborator

Thanks for investigating my issue @IshentRas .

After merging your PR, I tried to provision my VM and still ran into the certificate issue, then I found a suspicious line in my squid log (TCP_REFRESH_FAIL_OLD for http://192.168.33.220:9999/node/generated-configs/master.tgz ) which made me add 192.168.33.220 to ENV[no_proxy].

After making sure that chef is not passing through the proxy to talk to the master VM, I was able to provision both master and minions, without having to manually restart the chef-run in between.

Thank you very much again for the quick and efficient troubleshooting.

Collaborator

jperville commented Dec 19, 2016

Thanks for investigating my issue @IshentRas .

After merging your PR, I tried to provision my VM and still ran into the certificate issue, then I found a suspicious line in my squid log (TCP_REFRESH_FAIL_OLD for http://192.168.33.220:9999/node/generated-configs/master.tgz ) which made me add 192.168.33.220 to ENV[no_proxy].

After making sure that chef is not passing through the proxy to talk to the master VM, I was able to provision both master and minions, without having to manually restart the chef-run in between.

Thank you very much again for the quick and efficient troubleshooting.

@jperville jperville closed this Dec 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment