Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The job cloud_controller_ng failed when updating instance api #1047

Closed
bingosummer opened this issue Dec 26, 2017 · 5 comments
Closed

The job cloud_controller_ng failed when updating instance api #1047

bingosummer opened this issue Dec 26, 2017 · 5 comments
Labels

Comments

@bingosummer
Copy link
Contributor

bingosummer commented Dec 26, 2017

Issue

The job cloud_controller_ng failed when updating instance api.

Context

The instance 'api' is not running after update. The update costs 20+ minutes which is much longer than usual.

When I ssh into the instance api and run monit summary. The result is:

Process 'consul_agent'              running
Process 'cloud_controller_ng'       Execution failed
Process 'cloud_controller_worker_local_1' running
Process 'cloud_controller_worker_local_2' running
Process 'nginx_cc'                  running
Process 'route_registrar'           running
Process 'statsd_injector'           running
Process 'file_server'               running
Process 'routing-api'               running
Process 'policy-server'             running
Process 'policy-server-internal'    running
Process 'cc_uploader'               running
Process 'metron_agent'              running
System 'system_localhost'           running

But I didn't find any useful information in /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.log.

After restarting the process cloud_controller_ng using monit restart cloud_controller_ng, everything is OK. The instance api is running, and the deployment is successful.

Steps to Reproduce

  1. Deploy CF using cf-deployment.yml v1.5.0.
  2. The deployment failed in the step of updating instance api

Expected result

The CF should be deployed successfully.

Current result

The task log:

Task 42 | 05:46:51 | Updating instance consul: consul/9e8fd131-1835-4beb-8f18-b330c6fa5b2d (0) (canary) (00:02:34)
Task 42 | 05:49:25 | Updating instance adapter: adapter/2cfebb1d-6f30-40ff-80d5-87b168ebd916 (0) (canary)
Task 42 | 05:49:25 | Updating instance nats: nats/17e03f6a-0fa6-4727-9ad4-09cd0e932c0f (0) (canary)
Task 42 | 05:50:14 | Updating instance adapter: adapter/2cfebb1d-6f30-40ff-80d5-87b168ebd916 (0) (canary) (00:00:49)
Task 42 | 05:50:14 | Updating instance nats: nats/17e03f6a-0fa6-4727-9ad4-09cd0e932c0f (0) (canary) (00:00:49)
Task 42 | 05:50:14 | Updating instance database: database/f4b765b3-6d2e-42b4-8f77-5eceb4cc95e6 (0) (canary) (00:05:04)
Task 42 | 05:55:18 | Updating instance diego-api: diego-api/71483cb8-ad54-481f-a47b-56198b526d48 (0) (canary) (00:00:52)
Task 42 | 05:56:10 | Updating instance singleton-blobstore: singleton-blobstore/380fc426-1869-489d-93b0-4e9418057591 (0) (canary)
Task 42 | 05:56:10 | Updating instance uaa: uaa/1d636b7b-415a-4bf8-bfcf-091a17999e64 (0) (canary)
Task 42 | 05:56:10 | Updating instance api: api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0) (canary)
Task 42 | 05:56:10 | Updating instance cc-worker: cc-worker/e040fcbb-c80c-47fc-8e06-3320f39220ff (0) (canary) (00:01:10)
Task 42 | 05:58:58 | Updating instance uaa: uaa/1d636b7b-415a-4bf8-bfcf-091a17999e64 (0) (canary) (00:02:48)
Task 42 | 05:59:03 | Updating instance singleton-blobstore: singleton-blobstore/380fc426-1869-489d-93b0-4e9418057591 (0) (canary) (00:02:53)
Task 42 | 06:18:11 | Updating instance api: api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0) (canary) (00:22:01)
                   L Error: 'api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng
Task 42 | 06:18:11 | Error: 'api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng

Task 42 Started  Wed Dec 27 05:38:12 UTC 2017
Task 42 Finished Wed Dec 27 06:18:11 UTC 2017
Task 42 Duration 00:39:59
Task 42 error

Updating deployment:
  Expected task '42' to succeed but state is 'error'

Exit code 1

The monit.log:

api/17ea48e1-5083-4a21-b7a4-1536c1bfb066:/var/vcap/sys/log/cloud_controller_ng# grep cloud_controller_ng  /var/vcap/monit/monit.log
[UTC Dec 27 05:58:04] error    : 'cloud_controller_ng' process is not running
[UTC Dec 27 05:58:04] info     : 'cloud_controller_ng' trying to restart
[UTC Dec 27 05:58:04] info     : 'cloud_controller_ng' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl
[UTC Dec 27 05:58:09] info     : start service 'cloud_controller_ng' on user request
[UTC Dec 27 05:58:34] error    : 'cloud_controller_ng' failed to start
[UTC Dec 27 05:58:34] info     : 'nginx_cc' start: /var/vcap/jobs/cloud_controller_ng/bin/nginx_ctl
[UTC Dec 27 05:58:35] info     : 'cloud_controller_worker_local_1' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_worker_ctl
[UTC Dec 27 05:58:36] info     : 'cloud_controller_worker_local_2' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_worker_ctl
[UTC Dec 27 05:59:46] info     : 'cloud_controller_ng' start action done
[UTC Dec 27 05:59:46] info     : 'cloud_controller_ng' process is running with pid 8673

The logs of the job cloud_controller_ng:

api/17ea48e1-5083-4a21-b7a4-1536c1bfb066:/var/vcap/sys/log/cloud_controller_ng# cat cloud_controller_ng_ctl.log
[2017-12-27 05:58:04+0000] ------------ STARTING cloud_controller_ng_ctl at Wed Dec 27 05:58:04 UTC 2017 --------------
[2017-12-27 05:58:04+0000] Checking for blobstore availability
[2017-12-27 05:58:40+0000] Blobstore is available
[2017-12-27 05:59:12+0000] Thin web server (v1.7.0 codename Dunder Mifflin)
[2017-12-27 05:59:12+0000] Maximum connections set to 1024
[2017-12-27 05:59:12+0000] Listening on /var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock, CTRL+C to stop
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/153909902

The labels on this github issue will be updated when the story is started.

@bingosummer
Copy link
Contributor Author

bingosummer commented Dec 30, 2017

monit regarded cloud_controller_ng as failed at UTC Dec 27 05:58:34. But Blobstore is available at 2017-12-27 05:58:40+0000, and cloud_controller_ng was finally up at 2017-12-27 05:59:12+0000.
The status of cloud_controller_ng is not monitored correctly.

@slenky
Copy link

slenky commented Jan 18, 2018

Any possible solutions? I faced the same issue with bosh-lite environment

@kukgini
Copy link

kukgini commented Apr 11, 2018

@slenky bosh ssh into api and execute monit restart

monit restart cloud_controller_ng

And then, bosh deploy (again) would continue deploy successfully (in my case).
It is just workaround.
Anyway, I could able to move on the next version.

@christarazi
Copy link
Contributor

Closing due to inactivity. Please reopen if this is still an issue.

@christarazi && @cwlbraa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants