## Running samtools view via the WES API

#### Learning Objectives
Workshop attendees will learn how use the GA4GH Workflow Execution Service (WES).  

What will participants do as part of the exercise?

 - Understanding how to run a workflow via WES
 - Adjust some parameters of the workflow
 - Check the status of the runs
 - Access the workflow results via DRS
 
 #### Icons in this Guide

 🖐 A hands-on section where you will code something or interact with the server
 
 
 
Just as we used a python client to submit DRS requests in the previous notebook we will use a similar client from the fasp package to run workflows.

By setting the debug flag to True on the client, the actual http calls will be shown.

#### Step 1: Set some variables for your specific settings
 🖐 Set SB_PROJECT to the username and project you created earlier.
You will only need to change SB_API_KEY_PATH if you changed the location of the key file.


In [4]:
from fasp.workflow import sbcgcWESClient
SB_PROJECT = 'ianfore/ian-tutorial'
SB_API_KEY_PATH = '~/tutkeys/sbcgc_key.json'
DOWNLOAD_LOCATION = '~/Downloads'

cl = sbcgcWESClient(SB_PROJECT, SB_API_KEY_PATH, debug=False)

#### Step 2: Set up workflow run
Note that we may have to use one of the alternate options below for the SAMTools View application.

In [16]:
task_name = "Tutorial run 1 test via WES - header only"

drs_uris  = ['drs://cgc-ga4gh-api.sbgenomics.com/62b0a8daf08fea4770469454',
'drs://cgc-ga4gh-api.sbgenomics.com/62bd910d14b0e420a0e2dc8f']

#sam_view_app = 'sbg://admin/sbg-public-data/samtools-view-1-9-cwl1-0'
#sam_view_app = 'sbg://yasasvinip/test-1/samtools-view-1-9-cwl1-0'
sam_view_app = 'sbg://forei/ismb-tutorial/samtools-view-1-9-cwl1-0-tut'
#sam_view_app = 'sbg://ianfore/ian-tutorial/samtools-view-1-9-cwl1-0-tut'

params = {
"project": SB_PROJECT,
"name": task_name,
"inputs": {
    "output_header_only": True,
    "include_header": True,
        "in_alignments": {
          "path": drs_uris[0],
          "class": "File"
        }
    }
}

#### Step 3: submit the workflow 

Now we have formulated the body in the way that it can be passed to a client function as follows.

In [17]:
import json



run_id= cl.run_generic_workflow(
    workflow_url=sam_view_app,
    workflow_params = json.dumps(params),
    workflow_type = "CWL",
    workflow_type_version = "v1.0",
    verbose=False
)
run_id

'62decebc-550f-4023-9ab7-b5ef505dd87e'

Step 4: Check the status of the workflow

In [22]:
cl.get_task_status(run_id)

'COMPLETE'

The task will take a few minutes to complete.

Go ahead with the next step

#### Step 5: Adjust a parameter of the tool

Using the desciption of the app on the Seven Bridges Platform
identify the parameter that directs samtools view to only output the count of matching records
https://cgc.sbgenomics.com/u/forei/ismb-tutorial/apps/#forei/ismb-tutorial/samtools-view-1-9-cwl1-0

Alter the details in the following copy of the previous run
* Edit the parameters section below to set the value of the parameter you have identified to True.
* Delete the other parameters from the previous run.
* Enter a task name that will help you identify the task

In [19]:
task_name2 = "new copy fixed samtools view count only"

params2 = {
"project": SB_PROJECT,
"name": task_name2,
"inputs": {
    "count_alignments": True,
        "in_alignments": {
          "path": drs_uris[0],
          "class": "File"
        }
    }
}

#### Step 6: Submit the revised task and make a note of the run_id

In [20]:
run_id2 = cl.run_generic_workflow(
    workflow_url=sam_view_app,
    workflow_params = json.dumps(params2),
    workflow_type = "CWL",
    workflow_type_version = "v1.0",
    verbose=False
)
run_id2

'0d73f669-ff42-4229-a711-8a830e35ecb8'

#### Step 7: Check the status of the second run
🖐 Noting the name of the variable in which the id of the new run was executed, write a line to check the status of the run

In [34]:
cl.get_task_status(run_id2)

'COMPLETE'

## Getting the results - via DRS

Once the first run is complete, further steps can use DRS to obtain the file output from the workflow.

#### Step 8: Show the full response from the WES server
Note the outputs section which shows outputs specitic to the SAMTools View command.

In [24]:
runLog = cl.get_run_log(run_id)
runLog

{'request': {'tags': {},
  'workflow_params': {'name': 'Tutorial run 1 test via WES - header only',
   'project': 'ianfore/ian-tutorial',
   'inputs': {'count_alignments': None,
    'filter_exclude_all': None,
    'output_fmt_option': None,
    'in_reference': None,
    'regions_array': None,
    'filter_mapq': None,
    'in_alignments': {'path': 'drs://cgc-ga4gh-api.sbgenomics.com/62bd93e814b0e420a0e2dea9',
     'basename': 'G20479.HCC1143_1M.aligned.bam',
     'nameext': '.bam',
     'class': 'File',
     'nameroot': 'G20479.HCC1143_1M.aligned'},
    'read_tag_to_strip': None,
    'subsample_fraction': None,
    'output_format': None,
    'output_header_only': True,
    'fast_bam_compression': None,
    'multi_region_iterator': None,
    'in_index': None,
    'filter_include': None,
    'mem_per_job': None,
    'collapse_cigar': None,
    'min_cigar_operations': None,
    'read_group': None,
    'threads': None,
    'cpu_per_job': None,
    'read_group_list': None,
    'reference_fil

#### Step 9: Extract the value of the DRS URI for the output file

In [25]:
results_drs_uri = runLog['outputs']['out_alignments']['path']
results_drs_uri

'drs://cgc-ga4gh-api.sbgenomics.com/62c9fc947e075536654152d5'

We'll pass over the question of how one would determine which DRS server that URI needs to be sent to because
* In this case it's fairly obvious - it's the CGC DRS Server
* We want to get something up and working

#### Step 10 Use DRS to get file details: 
First we set up a DRS client to access the results files

In [27]:
from fasp.loc import sbcgcDRSClient
drsClient = sbcgcDRSClient(SB_API_KEY_PATH, 's3')

As in Example 2-1 DRS can be used to get details of the file. Note that only the id portion of the DRS URI is being passed. 

In [28]:
# get the id part of the URI
out_alignments_drs_id = results_drs_uri.split('/')[-1]
print(f"Getting {out_alignments_drs_id} from DRS Client")
fileDetails = drsClient.get_object(out_alignments_drs_id)
fileDetails

Getting 62c9fc947e075536654152d5 from DRS Client


{'id': '62c9fc947e075536654152d5',
 'name': '_3_G20479.HCC1143_1M.aligned.header.sam',
 'size': 3598,
 'checksums': [{'type': 'etag',
   'checksum': '159acae2ce81efee40f0f89ee58f2f95-1'}],
 'self_uri': 'drs://cgc-ga4gh-api.sbgenomics.com/62c9fc947e075536654152d5',
 'created_time': '2022-07-09T22:09:24Z',
 'updated_time': '2022-07-09T22:09:24Z',
 'mime_type': 'application/json',
 'access_methods': [{'type': 's3',
   'region': 'us-east-1',
   'access_id': 'aws-us-east-1'}]}

#### Note we can check the size of results files before downloading

In [29]:
print(fileDetails['size'])

3598


#### Step 11: Downloading the file
Now we can use the url obtained to download the file. 

DRS is used again to get a URL to the results file.

In [30]:
url = drsClient.get_access_url(out_alignments_drs_id,'s3')

We'll create a small function to encapsulate the download.

In [31]:
import requests
import os
def download(url, file_path):
    with open(os.path.expanduser(file_path), "wb") as file:
        response = requests.get(url)
        file.write(response.content)

In [33]:
fullPath = DOWNLOAD_LOCATION + '/' + fileDetails['name']
download(url, fullPath)

#### Step 12: Repeat the steps above to retrieve the results of the second run

In [36]:
runLog = cl.get_run_log(run_id2)
# Note the difference in the name of the output
results_drs_uri = runLog['outputs']['alignment_count']['path']
# get the id part of the URI
alignment_count_drs_id = results_drs_uri.split('/')[-1]
print(f"Getting {alignment_count_drs_id} from DRS Client")
fileDetails = drsClient.get_object(alignment_count_drs_id)
url = drsClient.get_access_url(alignment_count_drs_id,'s3')
fullPath = DOWNLOAD_LOCATION + '/' + fileDetails['name']
download(url, fullPath)

Getting 62c9fd8a7e07553665415319 from DRS Client
