Skip to content
This repository has been archived by the owner on Apr 3, 2020. It is now read-only.

How to access BOSH Director on GCP when concourse-up deploy fails deploying Concourse? #83

Closed
gerhard opened this issue Jan 21, 2019 · 10 comments
Labels

Comments

@gerhard
Copy link

gerhard commented Jan 21, 2019

concourse-up deploy ci --iaas GCP --region europe-west1 fails with:

Task 10 | 16:48:55 | Preparing deployment: Preparing deployment (00:00:01)
Task 10 | 16:48:56 | Preparing deployment: Rendering templates (00:00:01)
Task 10 | 16:48:58 | Preparing package compilation: Finding packages to compile (00:00:01)
Task 10 | 16:48:59 | Creating missing vms: web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0)
Task 10 | 16:48:59 | Creating missing vms: worker/cf010b93-25f5-4c61-b7f2-c865a71c36ac (0)
Task 10 | 16:48:59 | Creating missing vms: worker/8f7677d0-a97a-44b3-bb12-82ddf127680d (1)
Task 10 | 16:49:52 | Creating missing vms: web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0) (00:00:53)
Task 10 | 16:50:16 | Creating missing vms: worker/cf010b93-25f5-4c61-b7f2-c865a71c36ac (0) (00:01:17)
Task 10 | 16:50:16 | Creating missing vms: worker/8f7677d0-a97a-44b3-bb12-82ddf127680d (1) (00:01:17)
Task 10 | 16:50:17 | Updating instance web: web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0) (canary)
Task 10 | 16:50:17 | Updating instance worker: worker/cf010b93-25f5-4c61-b7f2-c865a71c36ac (0) (canary) (00:01:02)
Task 10 | 16:51:19 | Updating instance worker: worker/8f7677d0-a97a-44b3-bb12-82ddf127680d (1) (00:00:43)
Task 10 | 17:02:13 | Updating instance web: web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0) (canary) (00:11:56)
                   L Error: 'web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0)' is not running after update. Review logs for failed jobs: atc, grafana
Task 10 | 17:02:13 | Error: 'web/27568ac3-125a-4ace-8d19-a8e37b60e55c (0)' is not running after update. Review logs for failed jobs: atc, grafana

How can I access the BOSH Director to see what the failure is?

@gerhard gerhard changed the title How to access BOSH Director on GCP when concourse-up deploy fails deploying Concourse How to access BOSH Director on GCP when concourse-up deploy fails deploying Concourse? Jan 21, 2019
@crsimmons
Copy link
Contributor

@gerhard You should be able to export the necessary env vars using `eval "$(concourse-up info ci --iaas GCP --region europe-west1 --env)"

Failing that the director credentials should be in one of the files in your concourse-up-ci-europe-west1-config GCS bucket.

@gerhard
Copy link
Author

gerhard commented Jan 21, 2019

concourse-up info ci --iaas GCP --region europe-west1 --env is failing with exit status 1 on:

instances, err := boshClient.Instances()
if err != nil {
return nil, err
}

config.json in GCS looks like it's missing values, such as director_{ca_cert,cert,key}:

{
  "allow_ips": "\"0.0.0.0/0\"",
  "availability_zone": "europe-west1-b",
  "concourse_ca_cert": "",
  "concourse_cert": "",
  "concourse_db_name": "concourse_atc",
  "concourse_key": "",
  "concourse_password": "",
  "concourse_username": "",
  "concourse_user_provided_cert": false,
  "concourse_web_size": "small",
  "concourse_worker_count": 2,
  "concourse_worker_size": "xlarge",
  "config_bucket": "concourse-up-37x-europe-west1-config",
  "credhub_admin_client_secret": "",
  "credhub_ca_cert": "",
  "credhub_password": "",
  "credhub_url": "",
  "credhub_username": "",
  "deployment": "concourse-up-37x",
  "director_ca_cert": "",
  "director_cert": "",
  "director_hm_user_password": "***",
  "director_key": "",
  "director_mbus_password": "***",
  "director_nats_password": "***",
  "director_password": "***",
  "director_public_ip": "",
  "director_registry_password": "***",
  "director_username": "admin",
  "domain": "37x.concourse.rabbitmq.com",
  "encryption_key": "***",
  "github_auth_is_set": true,
  "github_client_id": "***",
  "github_client_secret": "***",
  "grafana_password": "",
  "grafana_username": "",
  "hosted_zone_id": "concourse",
  "hosted_zone_record_prefix": "37x",
  "iaas": "GCP",
  "multi_az_rds": false,
  "namespace": "europe-west1",
  "private_key": "***",
  "project": "37x",
  "public_key": "***",
  "rds_default_database_name": "bosh-ccesxzzt",
  "rds_instance_class": "db-g1-small",
  "rds_password": "***",
  "rds_username": "***",
  "region": "europe-west1",
  "source_access_ip": "82.39.214.211",
  "spot": true,
  "tags": null,
  "tf_state_path": "terraform.tfstate",
  "version": "",
  "worker_type": "m4"
}

@crsimmons
Copy link
Contributor

There should be files called director-creds.yml and director-state.json in your bucket that are updated after create-env.

It looks like config was not being saved after the first successful create-env which is why yours was mostly empty. We've made a change to save it to the bucket even if the initial Concourse deployment fails. You can try building using e56b71a

@gerhard
Copy link
Author

gerhard commented Jan 22, 2019

Trying now, thank you!

@gerhard
Copy link
Author

gerhard commented Jan 22, 2019

Now that I've managed to bosh ssh into the concourse/web instance, I can see why atc & grafana jobs are failing:

tail /var/vcap/sys/log/{atc,grafana}/*.log
==> /var/vcap/sys/log/atc/atc.stderr.log <==
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input
tls: failed to find any PEM data in certificate input

==> /var/vcap/sys/log/atc/atc.stdout.log <==

==> /var/vcap/sys/log/grafana/grafana.log <==
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing HTTP Server" logger=http.server address=0.0.0.0:3000 protocol=https subUrl= socket=
t=2019-01-22T14:10:57+0000 lvl=eror msg="Fail to start server" logger=server error="tls: failed to find any PEM data in certificate input"
t=2019-01-22T14:10:57+0000 lvl=info msg="Shutdown started" logger=server code=1 reason="Startup failed"
t=2019-01-22T14:10:57+0000 lvl=info msg="stopped http server" logger=http.server
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped CleanUpService" logger=cleanup reason="context canceled"
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped Stream Manager"
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped Alerting" logger=alerting.engine reason="context canceled"
t=2019-01-22T14:10:57+0000 lvl=info msg="Shutdown completed" logger=server reason="context canceled"

==> /var/vcap/sys/log/grafana/stderr.log <==

==> /var/vcap/sys/log/grafana/stdout.log <==
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing HTTP Server" logger=http.server address=0.0.0.0:3000 protocol=https subUrl= socket=
t=2019-01-22T14:10:57+0000 lvl=eror msg="Fail to start server" logger=server error="tls: failed to find any PEM data in certificate input"
t=2019-01-22T14:10:57+0000 lvl=info msg="Shutdown started" logger=server code=1 reason="Startup failed"
t=2019-01-22T14:10:57+0000 lvl=info msg="stopped http server" logger=http.server
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
t=2019-01-22T14:10:57+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped CleanUpService" logger=cleanup reason="context canceled"
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped Stream Manager"
t=2019-01-22T14:10:57+0000 lvl=info msg="Stopped Alerting" logger=alerting.engine reason="context canceled"
t=2019-01-22T14:10:57+0000 lvl=info msg="Shutdown completed" logger=server reason="context canceled"

OK, got it:

cat /var/vcap/jobs/atc/config/tls_cert
../letsencrypt/live/concourse.rabbitmq.com/fullchain.pem

So --tls-cert & --tls-key require the actual value, not paths to files... Attempting a fix.

@crsimmons
Copy link
Contributor

This has been raised before on the AWS side (#77). We haven't gotten around to looking at it so PRs would be welcome!

@gerhard
Copy link
Author

gerhard commented Jan 22, 2019

Closing this in favour of #77

Using values instead of paths made the Concourse deployment on GCP go through. Thank you!

@gerhard gerhard closed this as completed Jan 22, 2019
@gerhard
Copy link
Author

gerhard commented Jan 23, 2019

Any chance you could cut a new release with all the latest GCP-related fixes, maybe 0.17.1?

I am currently using a dev build, would be nice to depend on an official release.

@crsimmons
Copy link
Contributor

0.18.1 that we cut yesterday should have all the GCP-related fixes.

@gerhard
Copy link
Author

gerhard commented Jan 23, 2019

That's super duper, switching to it now. 💯

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants