Skip to content

Documentation and tools for creating a Cromwell installation on GCP compatible with DNAstack's Workbench.

License

Notifications You must be signed in to change notification settings

DNAstack/cromwell-on-gcp-workbench-engine-installer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cromwell on GCP

Overview

The Cromwell-on-GCP installer uses Terraform to create an installation of Cromwell using Google Compute Engine, Google Cloud Run, Google Cloud SQL, and the Google Pipelines API, in either a single project or multi-project architecture.

A Cloud Run service is used as the ingress for all requests to Cromwell, and calling this service requires a GCP identity token of a principal (user or service account) that has been granted permission to call that service.

Resource Layout

The following represents the layout of resources. Italicized GCP projects indicate these can be pre-existing projects not generated by this script.

  • Deployment GCP Project
    • Compute VM (with running Cromwell deployment)
    • Service Account for Cromwell
    • Cloud SQL Instance (MySQL)
    • GCS bucket (for Cromwell output)
    • Cloud Run service (with Nginx reverse proxy to Cromwell)
    • VPC and Cloud NAT (networks for Cromwell, Database, etc.)
  • Billing GCP Project (for GCS bucket billing)
  • Compute GCP Project
    • VPC and Cloud NAT (used by pipeline tasks)
    • Service Account for pipeline tasks

Enabled APIs

This script enables all APIs required for running and configuring Cromwell, including Compute, Networking, and Lifesciences Pipelines. When a pre-existing compute project is used, required APIs will be enabled in this project but they will not be disabled if you run terraform destroy.

Prerequisites

To run this script, you must have the following prepared:

  1. Pick an existing GCP Billing Account ID

    The given ID will be set as the billing account for the generated GCP project. To find or create billing accounts go to the billing account page in the Google Cloud Console.

  2. Pick a new GCP project ID and name

    These are the ID and name for the generated project. You can set them to whatever valid values you wish, but the project ID must be globally unique.

  3. Install the gcloud and terraform command-line tools.

  4. (Optional) Pick an existing GCP Project Folder ID

    When provided, the GCP project containing the Cromwell deployment will be generated in the given folder ID. To find or create a folder, go to the cloud resource manager in the Google Cloud Console.

  5. Optional: Install jq — this is used in documented commands for testing your installation, and is not required for installing the engine.

Apply Configuration with Single Project

  1. Set the gcp directory that contains this README as your working directory:

    cd gcp
  2. Authorize you application-default credentials:

    gcloud auth application-default login
  3. Create file with variable assignments for your installation, cromwell.tfvars, replacing $FOLDER_ID, $BILLING_ACCOUNT, $PROJECT_ID, and $PROJECT_NAME with literal values:

    deployment_project_id              = "$PROJECT_ID"
    deployment_project_name            = "$PROJECT_NAME"
    deployment_project_billing_account = "$BILLING_ACCOUNT"
    # Delete the line below to create a project without a project folder
    deployment_project_folder_id       = "$FOLDER_ID"
  4. Apply the configuration with your variable assignments:

    terraform apply -var-file=cromwell.tfvars

    Terraform will print out a plan and ask you to type yes before starting. If you are running this for the first time, the plan should only add resources (no changes or removals). Make sure the plan only adds resources before accepting!

Apply Configuration with Multiple Projects

Follow the steps to set up with a single project, but in addition to the variables mentioned above you must also add these variables to your cromwell.tfvars file:

compute_project_id = "$COMPUTE_PROJECT_ID"
billing_project_id = "$BILLING_PROJECT_ID"

When either of these variables are unassigned, they default to using the deployment_project_id project that is always generated as part of applying the configuraiton.

Apply Cromwell Configurations

Add the following variables to your cromwell.tfvars file, replacing $CROMWELL_VERSION with literal values:

cromwell_version = "$CROMWELL_VERSION" // defaults to 85

Destroying Configuration

  1. Before you can destroy resources, you must update stateful resources so that they can be destroyed:

    terraform apply -var-file=cromwell.tfvars -var=allow_deletion=true
  2. Once the above step succeeds, you can destroy the configured resources. Important: this deletes the entire deployment project. It does not delete the compute or billing projects, and will not disable any APIs that were enabled in those projects. Run:

    terraform destroy -var-file=cromwell.tfvars -var=allow_deletion=true

Deploying to a different location

The Cloud Life Sciences API supports a number of different locations for submitting and jobs to. The location does not correspond to the actual zone used for executing a task (that can be controlled via the zone attribute in the WDL runtime, or setting the zone variable in your cromwell.tfvars), instead it is where the job metadata is located.

The deployment location is controlled via the region variable (default value of us-central1). If you would like to use one of the alternative regions supported by the Cloud Life Sciences API, simply set the region and zone variables in your cromwell.tfvars file to a supported value.

Currently supported values for region are:

  • us-central1
  • us-west2
  • northamerica-northeast1
  • europe-west2
  • europe-west4
  • asia-southeast1
  • asia-southeast2

To find the available zones for a given region, run:

gcloud compute zones list | grep <region>
region = northamerica-northeast1
zone = northamerica-northeast1-c

Importing an existing Project

Applying the default configuration of the engine installer will create a new project to deploy the resources too. This may not always be desirable especially where billing user and project creation permissions are an issue. To circumvent this, you can import an existing project and deploy the resources to that.

  1. Add the project id that you want to import to the cromwell.tfvars file
    deployment_project_id = "$PROJECT_ID"
    # This should be the only value in your tfvars
  2. Run the terraform command to import the GCP project and start managing it with terraform. This will sync the remote project to your local terraform state and allow you to reference the values defined in it.
    terraform import -var-file=cromwell.tfvars google_project.project $PROJECT_ID
  3. Extract the required variables from the terraform.tfstate
    • deployment_project_name
      1. In the terraform.tfstate file, find the resource with: `"type": "google_project"
      2. In the instance[0].attributes find the name attribute and copy the value
      3. Set you deployment_project_name to the extracted value in your cromwell.tfvars
    • deployment_project_billing_account
      1. In the terraform.tfstate file, find the resource with: `"type": "google_project"
      2. In the instance[0].attributes find the billing_account attribute and copy the value
      3. Set you deployment_project_billing_account to the extracted value in your cromwell.tfvars
  4. Once you have updated your cromwell.tfvars apply the configuration
     terraform apply -var-file=cromwell.tfvars

Using the Cromwell Installation

Getting Deployment Information

Run terraform output to show information on the installation. This output will include:

  • The URL, name, and location of the Cloud Run service used as an ingress for your Cromwell installation
  • The email of a service account that can be used by external services to send requests to Cromwell

Sending Requests to Cromwell with Developer Credentials

To send a request to the ingress service with your own identity token, run the following:

curl --request GET \
  --url "$(terraform output -json service | jq -r '.urls[0]')/api/ga4gh/wes/v1/service-info" \
  --header "Authorization: Bearer $(gcloud auth print-identity-token)"

If you receive a 403 response, you may need to grant yourself permission to call the Cloud Run service:

gcloud run services add-iam-policy-binding "$(terraform output -json service | jq -r '.name')" \
  --member="user:$(gcloud config get account)" \
  --role='roles/run.invoker' \
  --region="$(terraform output -json service | jq -r '.location')" \
  --project="$(terraform output -json service | jq -r '.project')"

Authenticating with a Service Account

To allow an external service to access Cromwell, you must use a service account with the appropriate permissions, and a service account key to authenticate. One such service account is generated for you by this Terraform module.

To get the service account email:

terraform output -raw generated_service_account_email

To get the service account JSON:

terraform output -raw generated_service_account_private_key | base64 --decode

You can generate additional keys for this service account using gcloud or the Google Cloud Console.

If you need to create additional service accounts, the only required setup is granting them the roles/run.invoker role on the Cloud Run service.

About

Documentation and tools for creating a Cromwell installation on GCP compatible with DNAstack's Workbench.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •