# Setup Cromwell GVS Input

Starting a job on `cromwell` requires a source wdl and inputs to be configured. This notebook helps configure inputs and submits the job.

In [None]:
import json
import os
from ipywidgets import widgets

## Setup variables

In [None]:
# This notebook will run the GVS workflow on the first NUM_OF_INPUTS samples in the INPUT_SOURCE location
NUM_OF_INPUTS = 300  # CHANGE THIS NUMBER!
CALLSET_IDENTIFIER = '300_samples_batch_id'  # CHANGE THIS NAME! 

# CHANGE THIS SOURCE LOCATION OF INPUT FILES
INPUT_SOURCE = 'gs://{EXAMPLE_BUCKET}/PATH/TO/SAMPLES'

# CHANGE THIS TO NAME YOUR BQ DATASET
GVS_BQ_DATASET = 'gvs_300'

MAIN_WORKFLOW = "GvsJointVariantCalling"
WDL_FILE = f"{MAIN_WORKFLOW}.wdl"

GOOGLE_CLOUD_PROJECT = os.getenv('GOOGLE_CLOUD_PROJECT')

The below cell will create a `~/terra-tutorials/cromwell` directory if it doesn't already exist. This contains files like a cromwell server log that another notebook may have created.

In [None]:
CROMWELL_EXAMPLES_DIR=os.path.expanduser('~/terra-tutorials/cromwell')
CROMWELL_SERVER_LOG=f'{CROMWELL_EXAMPLES_DIR}/cromwell.server.log'

!mkdir -p {CROMWELL_EXAMPLES_DIR}

In [None]:
# We need the "main" wdl
!cp gvs_wdls/GvsJointVariantCalling.wdl .

In [None]:
!terra resource create bq-dataset --name={GVS_BQ_DATASET}

## Build json input file

In [None]:
# The gsutil ls call returns a list containing all the vcf.gz and vcf.gz.tbi files. Lets pull out all the vcf.gz files.
input_source_list = !gsutil ls "{INPUT_SOURCE}/**"
input_source_list = [input_source for input_source in input_source_list if input_source.endswith('vcf.gz')]

In [None]:
input_vcfs = []
input_vcf_indexes = []
sample_names = []

for vcf_path in input_source_list[:NUM_OF_INPUTS]:
    sample_name = vcf_path.split('/')[-2]
    input_vcfs.append(vcf_path)
    input_vcf_indexes.append(f'{vcf_path}.tbi')
    sample_names.append(sample_name)

In [None]:
input_dict = {
    'GvsJointVariantCalling.input_vcfs': input_vcfs,
    'GvsJointVariantCalling.call_set_identifier': CALLSET_IDENTIFIER,
    'GvsJointVariantCalling.external_sample_names': sample_names,
    'GvsJointVariantCalling.dataset_name': GVS_BQ_DATASET,
    'GvsJointVariantCalling.input_vcf_indexes': input_vcf_indexes,
    'GvsJointVariantCalling.project_id': GOOGLE_CLOUD_PROJECT
}

with open('gvs.inputs', 'w') as outfile:
    json.dump(input_dict, outfile, indent=4)

!head gvs.inputs

## Build Empty options file

In [None]:
with open('gvs_options.json', 'w') as outfile:
    json.dump({}, outfile, indent=4)

## Submit job to server

#### Submitting jobs with Cromshell

[Cromshell](https://github.com/broadinstitute/cromshell) is a script for submitting workflows to a Cromwell server and monitoring / querying their results. Cromshell is preinstalled on Terra cloud environments.

##### Configure the Cromshell host port

Prior to use, Cromshell needs to know what host and port the Cromwell server is running on.

Run the cell below to write the Cromshell server configuration file.

In [None]:
%%bash

mkdir -p ~/.cromshell

echo 'localhost:8000' > ~/.cromshell/cromwell_server.config

In [None]:
!cromshell submit GvsJointVariantCalling.wdl gvs.inputs gvs_options.json gvs_wdls.zip

The following blocks of commented code are examples in Python and curl to do the same Cromwell submission.

They are included here just for example.

In [None]:
# import requests

# url = "http://localhost:8000/api/workflows/v1"

# files = {
#     'workflowSource': ('file', open(WDL_FILE, 'rb')),
#     'workflowDependencies': ('file', open('gvs_wdls.zip', 'rb')),
#     'workflowInputs': ('file', open('gvs.inputs', 'rb'))
# }

# headers = {
#     'Accept': 'application/json'
# }

# response = requests.post(url, headers=headers, files=files)
# response.content

In [None]:
# %%bash -s {WDL_FILE}

# WDL_FILE="$1"
# curl -X POST --header "Accept: application/json"\
#     -v "localhost:8000/api/workflows/v1" \
#     -F workflowSource=@"${WDL_FILE}" \
#     -F workflowDependencies=@gvs_wdls.zip \
#     -F workflowInputs=@gvs.inputs

### Check status of job

In [None]:
!cromshell status

In [None]:
!tail -n 5 {CROMWELL_SERVER_LOG}

# Observe Cromwell output

In [None]:
!cromshell list-outputs > gvs_output_list.txt

In [None]:
!grep CreateManifest/manifest.txt gvs_output_list.txt