# Setup Cromwell GVS Input

Starting a job on `cromwell` requires a source wdl and inputs to be configured. This notebook helps configure inputs and submits the job.

- Build the zip file (same as before) 
  - Clone from checked in repo
- Build the input file
  - Maybe take a number between 1-300 as input, and pick out that number from the bucket
- Submit with cromshell?
- View outputs with cromshell
- View billing info

In [None]:
import json
import os
from ipywidgets import widgets

## Setup variables

In [None]:
NUM_OF_INPUTS = 2  # CHANGE THIS NUMBER!
CALLSET_IDENTIFIER = 'willyn_three_hundred_samples'  # CHANGE THIS NAME! 

# Change this if the source location of input changes
INPUT_SOURCE = 'gs://cloned-ws-files-terra-happy-plum-360/SRA8739/trujila6_ehive_rosaprd_2852'

MAIN_WORKFLOW = "GvsJointVariantCalling"
WDL_FILE = f"{MAIN_WORKFLOW}.wdl"

GOOGLE_CLOUD_PROJECT = os.getenv('GOOGLE_CLOUD_PROJECT')
GVS_BQ_DATASET = 'gvs_testing'

The below cell will create a `~/terra-tutorials/cromwell` directory if it doesn't already exist. This contains files like a cromwell server log that another notebook may have created.

In [None]:
CROMWELL_EXAMPLES_DIR=os.path.expanduser('~/terra-tutorials/cromwell')
CROMWELL_SERVER_LOG=f'{CROMWELL_EXAMPLES_DIR}/cromwell.server.log'

!mkdir -p {CROMWELL_EXAMPLES_DIR}

In [None]:
# We need the "main" wdl
!cp gvs_wdls/GvsJointVariantCalling.wdl .

In [None]:
!terra resource create bq-dataset --name={GVS_BQ_DATASET}

## Build json input file

In [None]:
input_source_list = !gsutil ls gs://cloned-ws-files-terra-happy-plum-360/SRA8739/trujila6_ehive_rosaprd_2852

In [None]:
input_vcfs = []
input_vcf_indexes = []
sample_names = []

for input_source in input_source_list[:NUM_OF_INPUTS]:
    sample_name = input_source.split('/')[-2]
    input_vcfs.append(f'{input_source}{sample_name}.g.vcf.gz')
    input_vcf_indexes.append(f'{input_source}{sample_name}.g.vcf.gz.tbi')
    sample_names.append(sample_name)

In [None]:
input_dict = {
    'GvsJointVariantCalling.input_vcfs': input_vcfs,
    'GvsJointVariantCalling.call_set_identifier': CALLSET_IDENTIFIER,
    'GvsJointVariantCalling.external_sample_names': input_vcf_indexes,
    'GvsJointVariantCalling.dataset_name': GVS_BQ_DATASET,
    'GvsJointVariantCalling.input_vcf_indexes': sample_names,
    'GvsJointVariantCalling.project_id': GOOGLE_CLOUD_PROJECT
}

with open('gvs.inputs', 'w') as outfile:
    json.dump(input_dict, outfile, indent=4)


## Build Empty options file

In [None]:
with open('gvs_options.json', 'w') as outfile:
    json.dump({}, outfile, indent=4)

## Submit job to server

In [None]:
!cromshell submit GvsJointVariantCalling.wdl gvs.inputs gvs_options.json gvs_wdls.zip

The following blocks of commented code are examples in Python and curl to do the same Cromwell submission.

They are included here just for example.

In [None]:
# import requests

# url = "http://localhost:8000/api/workflows/v1"

# files = {
#     'workflowSource': ('file', open(WDL_FILE, 'rb')),
#     'workflowDependencies': ('file', open('gvs_wdls.zip', 'rb')),
#     'workflowInputs': ('file', open('gvs.inputs', 'rb'))
# }

# headers = {
#     'Accept': 'application/json'
# }

# response = requests.post(url, headers=headers, files=files)
# response.content

In [None]:
# %%bash -s {WDL_FILE}

# WDL_FILE="$1"
# curl -X POST --header "Accept: application/json"\
#     -v "localhost:8000/api/workflows/v1" \
#     -F workflowSource=@"${WDL_FILE}" \
#     -F workflowDependencies=@gvs_wdls.zip \
#     -F workflowInputs=@gvs.inputs

### Check status of job

In [None]:
!cromshell status

In [None]:
!tail -n 5 {CROMWELL_SERVER_LOG}

# Observe Cromwell output

In [None]:
!cromshell list-outputs 0a38709c-2bad-4099-aafa-a0180d2bf896 > gvs_output_list.txt

In [None]:
!grep CreateManifest/manifest.txt gvs_output_list.txt

In [None]:
!gsutil cat gs://cloned-ws-files-autodelete-after-two-weeks-terra-lambent-hazeln/workflows/cromwell-executions/GvsJointVariantCalling/0a38709c-2bad-4099-aafa-a0180d2bf896/call-GvsUnified/GvsUnified/2412a680-c990-4b42-9ec0-9f5f6e392199/call-GvsExtractCallset/GvsExtractCallset/43243bde-8536-44d0-a42f-1693d7068476/call-CreateManifest/manifest.txt

In [None]:
!grep sample-name-list gvs_output_list.txt

In [None]:
!gsutil cat gs://cloned-ws-files-autodelete-after-two-weeks-terra-lambent-hazeln/workflows/cromwell-executions/GvsJointVariantCalling/0a38709c-2bad-4099-aafa-a0180d2bf896/call-GvsUnified/GvsUnified/2412a680-c990-4b42-9ec0-9f5f6e392199/call-GvsExtractCallset/GvsExtractCallset/43243bde-8536-44d0-a42f-1693d7068476/call-GenerateSampleListFile/sample-name-list.txt


In [None]:
!grep "\-willyn-ten-samples" gvs_output_list.txt > final_output_list.txt

In [None]:
%%bash
while read line; do
  gsutil cp "${line}" gs://cloned-ws-files-autodelete-after-two-weeks-terra-lambent-hazeln/workflow_outputs/0a38709c-2bad-4099-aafa-a0180d2bf896
done <final_output_list.txt

In [None]:
!gsutil cp gs://cloned-ws-files-autodelete-after-two-weeks-terra-lambent-hazeln/workflow_outputs/0a38709c-2bad-4099-aafa-a0180d2bf896/0000000000-willyn-ten-samples.vcf.gz .

In [None]:
!gzip -d 0000000000-willyn-ten-samples.vcf.gz

In [None]:
!grep -v "#" 0000000000-willyn-ten-samples.vcf

In [None]:
!tail 0000000000-willyn-ten-samples.vcf