The job `cloud_controller_ng` failed when updating instance api #1047

bingosummer · 2017-12-26T16:13:18Z

Issue

The job cloud_controller_ng failed when updating instance api.

Context

The instance 'api' is not running after update. The update costs 20+ minutes which is much longer than usual.

When I ssh into the instance api and run monit summary. The result is:

Process 'consul_agent'              running
Process 'cloud_controller_ng'       Execution failed
Process 'cloud_controller_worker_local_1' running
Process 'cloud_controller_worker_local_2' running
Process 'nginx_cc'                  running
Process 'route_registrar'           running
Process 'statsd_injector'           running
Process 'file_server'               running
Process 'routing-api'               running
Process 'policy-server'             running
Process 'policy-server-internal'    running
Process 'cc_uploader'               running
Process 'metron_agent'              running
System 'system_localhost'           running

But I didn't find any useful information in /var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.log.

After restarting the process cloud_controller_ng using monit restart cloud_controller_ng, everything is OK. The instance api is running, and the deployment is successful.

Steps to Reproduce

Deploy CF using cf-deployment.yml v1.5.0.
The deployment failed in the step of updating instance api

Expected result

The CF should be deployed successfully.

Current result

The task log:

Task 42 | 05:46:51 | Updating instance consul: consul/9e8fd131-1835-4beb-8f18-b330c6fa5b2d (0) (canary) (00:02:34)
Task 42 | 05:49:25 | Updating instance adapter: adapter/2cfebb1d-6f30-40ff-80d5-87b168ebd916 (0) (canary)
Task 42 | 05:49:25 | Updating instance nats: nats/17e03f6a-0fa6-4727-9ad4-09cd0e932c0f (0) (canary)
Task 42 | 05:50:14 | Updating instance adapter: adapter/2cfebb1d-6f30-40ff-80d5-87b168ebd916 (0) (canary) (00:00:49)
Task 42 | 05:50:14 | Updating instance nats: nats/17e03f6a-0fa6-4727-9ad4-09cd0e932c0f (0) (canary) (00:00:49)
Task 42 | 05:50:14 | Updating instance database: database/f4b765b3-6d2e-42b4-8f77-5eceb4cc95e6 (0) (canary) (00:05:04)
Task 42 | 05:55:18 | Updating instance diego-api: diego-api/71483cb8-ad54-481f-a47b-56198b526d48 (0) (canary) (00:00:52)
Task 42 | 05:56:10 | Updating instance singleton-blobstore: singleton-blobstore/380fc426-1869-489d-93b0-4e9418057591 (0) (canary)
Task 42 | 05:56:10 | Updating instance uaa: uaa/1d636b7b-415a-4bf8-bfcf-091a17999e64 (0) (canary)
Task 42 | 05:56:10 | Updating instance api: api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0) (canary)
Task 42 | 05:56:10 | Updating instance cc-worker: cc-worker/e040fcbb-c80c-47fc-8e06-3320f39220ff (0) (canary) (00:01:10)
Task 42 | 05:58:58 | Updating instance uaa: uaa/1d636b7b-415a-4bf8-bfcf-091a17999e64 (0) (canary) (00:02:48)
Task 42 | 05:59:03 | Updating instance singleton-blobstore: singleton-blobstore/380fc426-1869-489d-93b0-4e9418057591 (0) (canary) (00:02:53)
Task 42 | 06:18:11 | Updating instance api: api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0) (canary) (00:22:01)
                   L Error: 'api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng
Task 42 | 06:18:11 | Error: 'api/17ea48e1-5083-4a21-b7a4-1536c1bfb066 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng

Task 42 Started  Wed Dec 27 05:38:12 UTC 2017
Task 42 Finished Wed Dec 27 06:18:11 UTC 2017
Task 42 Duration 00:39:59
Task 42 error

Updating deployment:
  Expected task '42' to succeed but state is 'error'

Exit code 1

The monit.log:

api/17ea48e1-5083-4a21-b7a4-1536c1bfb066:/var/vcap/sys/log/cloud_controller_ng# grep cloud_controller_ng  /var/vcap/monit/monit.log
[UTC Dec 27 05:58:04] error    : 'cloud_controller_ng' process is not running
[UTC Dec 27 05:58:04] info     : 'cloud_controller_ng' trying to restart
[UTC Dec 27 05:58:04] info     : 'cloud_controller_ng' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_ng_ctl
[UTC Dec 27 05:58:09] info     : start service 'cloud_controller_ng' on user request
[UTC Dec 27 05:58:34] error    : 'cloud_controller_ng' failed to start
[UTC Dec 27 05:58:34] info     : 'nginx_cc' start: /var/vcap/jobs/cloud_controller_ng/bin/nginx_ctl
[UTC Dec 27 05:58:35] info     : 'cloud_controller_worker_local_1' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_worker_ctl
[UTC Dec 27 05:58:36] info     : 'cloud_controller_worker_local_2' start: /var/vcap/jobs/cloud_controller_ng/bin/cloud_controller_worker_ctl
[UTC Dec 27 05:59:46] info     : 'cloud_controller_ng' start action done
[UTC Dec 27 05:59:46] info     : 'cloud_controller_ng' process is running with pid 8673

The logs of the job cloud_controller_ng:

api/17ea48e1-5083-4a21-b7a4-1536c1bfb066:/var/vcap/sys/log/cloud_controller_ng# cat cloud_controller_ng_ctl.log
[2017-12-27 05:58:04+0000] ------------ STARTING cloud_controller_ng_ctl at Wed Dec 27 05:58:04 UTC 2017 --------------
[2017-12-27 05:58:04+0000] Checking for blobstore availability
[2017-12-27 05:58:40+0000] Blobstore is available
[2017-12-27 05:59:12+0000] Thin web server (v1.7.0 codename Dunder Mifflin)
[2017-12-27 05:59:12+0000] Maximum connections set to 1024
[2017-12-27 05:59:12+0000] Listening on /var/vcap/sys/run/cloud_controller_ng/cloud_controller.sock, CTRL+C to stop

The text was updated successfully, but these errors were encountered:

cf-gitbot · 2017-12-26T16:13:20Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/153909902

The labels on this github issue will be updated when the story is started.

bingosummer · 2017-12-30T17:09:26Z

monit regarded cloud_controller_ng as failed at UTC Dec 27 05:58:34. But Blobstore is available at 2017-12-27 05:58:40+0000, and cloud_controller_ng was finally up at 2017-12-27 05:59:12+0000.
The status of cloud_controller_ng is not monitored correctly.

slenky · 2018-01-18T09:36:30Z

Any possible solutions? I faced the same issue with bosh-lite environment

kukgini · 2018-04-11T07:26:40Z

@slenky bosh ssh into api and execute monit restart

monit restart cloud_controller_ng

And then, bosh deploy (again) would continue deploy successfully (in my case).
It is just workaround.
Anyway, I could able to move on the next version.

christarazi · 2019-10-29T22:40:53Z

Closing due to inactivity. Please reopen if this is still an issue.

@christarazi && @cwlbraa

cf-gitbot added the unscheduled label Dec 26, 2017

bingosummer mentioned this issue Dec 30, 2017

The job cloud_controller_ng failed when updating instance api cloudfoundry/capi-release#71

Closed

christarazi closed this as completed Oct 29, 2019

cf-gitbot added delivered accepted and removed unscheduled delivered labels Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The job `cloud_controller_ng` failed when updating instance api #1047

The job `cloud_controller_ng` failed when updating instance api #1047

bingosummer commented Dec 26, 2017 •

edited

cf-gitbot commented Dec 26, 2017

bingosummer commented Dec 30, 2017 •

edited

slenky commented Jan 18, 2018

kukgini commented Apr 11, 2018

christarazi commented Oct 29, 2019

The job cloud_controller_ng failed when updating instance api #1047

The job cloud_controller_ng failed when updating instance api #1047

Comments

bingosummer commented Dec 26, 2017 • edited

Issue

Context

Steps to Reproduce

Expected result

Current result

cf-gitbot commented Dec 26, 2017

bingosummer commented Dec 30, 2017 • edited

slenky commented Jan 18, 2018

kukgini commented Apr 11, 2018

christarazi commented Oct 29, 2019

The job `cloud_controller_ng` failed when updating instance api #1047

The job `cloud_controller_ng` failed when updating instance api #1047

bingosummer commented Dec 26, 2017 •

edited

bingosummer commented Dec 30, 2017 •

edited