<a href="https://colab.research.google.com/github/vkt1414/Cloud-Resources-Workflows/blob/main/Notebooks/Totalsegmentator/executionAnalytics/Terra-Cromwell%20Workflow%20Metadata.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An introduction to using the Fiss API in Python in BioData Catalyst

This notebook introduces users to the Firecloud API using a Python Jupyter notebook. The example covers how the API communicates between the data table and notebook. The user loads an existing Terra data table into the notebook, subsets the dataframe, and saves the new dataframe as a tsv to the workspace bucket or as a new Terra data table.

Note: a more scalable version of this process is available in the [terra_data_table_util](https://app.terra.bio/#workspaces/biodata-catalyst/BioData%20Catalyst%20Collection/notebooks/launch/terra_data_table_util.ipynb) notebook. 

## Notebook Runtime

We suggest using both the default environment and compute power. 

###**Installing Packages**

In [1]:
%%capture
!pip install firecloud 

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Load packages

In [3]:
from firecloud import fiss
import firecloud.api as fapi
import os
import io
import pandas as pd
import glob

In [4]:
import os

credentials_file = '/content/drive/MyDrive/idc/application_default_credentials.json'
# Set the environment variable
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials_file

# Your code goes here
# Perform the authentication and access the necessary resources


## Set environment variables that Fiss API requires

In [5]:
# Get the Google billing project name and workspace name
#billing_project = os.environ['WORKSPACE_NAMESPACE']
#workspace = os.environ['WORKSPACE_NAME']
#bucket = os.environ['WORKSPACE_BUCKET'] + "/"

billing_project= 'terra-billing-datester'
workspace= 'TotalSegmentator'
bucket= 'gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/'


# Verify that we've captured the environment variables
print("Billing project: " + billing_project)
print("Workspace: " + workspace)
print("Workspace storage bucket: " + bucket)

Billing project: terra-billing-datester
Workspace: TotalSegmentator
Workspace storage bucket: gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/


In [6]:
import time
import pandas as pd

def get_workflow_metadata_with_retry(billing_project, workspace, submission_id, workflow_id):
    max_attempts = 3
    print(f'Getting workflow metadata for workflow {workflow_id} in submission {submission_id}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fiss.fapi.get_workflow_metadata(billing_project, workspace, submission_id, workflow_id)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved workflow metadata for workflow {workflow_id} in submission {submission_id}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to get workflow metadata for workflow {workflow_id} after {max_attempts} attempts')


def get_submission_with_retry(billing_project, workspace, submission_id):
    max_attempts = 3
    print(f'Getting submission {submission_id}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fapi.get_submission(billing_project, workspace, submission_id)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved submission {submission_id}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to get submission {submission_id} after {max_attempts} attempts')
    
    
def list_submissions_with_retry(billing_project, workspace):
    max_attempts = 3
    print(f'Listing submissions for workspace {workspace} in billing project {billing_project}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fapi.list_submissions(billing_project, workspace)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved list of submissions for workspace {workspace} in billing project {billing_project}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to list submissions for workspace {workspace} in billing project {billing_project} after {max_attempts} attempts')

In [7]:
submissions_list_response=list_submissions_with_retry(billing_project, workspace)
submissions_list=submissions_list_response.json()
submissions_list = [submission['submissionId'] for submission in submissions_list]
submissions_list

Listing submissions for workspace TotalSegmentator in billing project terra-billing-datester...
Attempt 1 of 3...
Successfully retrieved list of submissions for workspace TotalSegmentator in billing project terra-billing-datester.


['00813d38-99bd-45f2-9c4f-5218fb7bdbfc',
 '012a732c-fe81-48c2-8ce6-35728bdc1328',
 '06dd6a3f-c124-4aaa-9161-b5fef94451c0',
 '0b190bd4-3072-49ed-bd0e-beab0887fbd3',
 '0bad3c79-b9d1-43d1-a3a4-b38c89040c06',
 '0dfe7244-5ea8-4c46-abcc-a7f34abce611',
 '10a9c913-c517-431f-8460-d0106af489a9',
 '12807b09-6a75-428c-aa89-951efc9606fd',
 '12c37d37-524f-43ac-99ba-1d60937624c4',
 '144bff4f-b630-4fda-9b29-cbf573d4861d',
 '1629ef8d-96df-466f-93e5-2253eea8bd11',
 '1970cdda-c4e0-4b99-ba54-75167044eb75',
 '1e1e4c7e-d4ca-422b-8ec2-afc8ca2030ef',
 '25691caa-64d1-4cb6-a9be-682677d66c8c',
 '26fd9e93-397a-431d-8a81-d72a568d4108',
 '31e9b428-a65c-49ba-9846-9da6fe1914f3',
 '353e334f-3dfd-46de-97c8-c54024e5948c',
 '35b368c1-d3df-414e-a137-1988195640b1',
 '361caca4-0c32-49fc-b5ec-d4901a9edcaf',
 '367ddbd0-6ef2-4403-af12-23cfc0b7d31e',
 '3c4c7112-2f30-4d41-aaf7-44b73ca64a95',
 '3d833a12-0cee-4715-8890-13ff5679d59d',
 '40a4894b-cc95-4654-8c37-9716d746c439',
 '4367015e-74da-4bb7-892b-ed009edc1ff6',
 '49f585a8-1e06-

In [8]:
columns = ['attempt', 'backend', 'cromwell-workflow-id', 'terra-submission-id', 'wdl-task-name', 'compressedDockerSize', 'dockerImageUsed', 'end', 'start', 'executionStatus', 'executionBucket', 'googleProject', 'instanceName', 'machineType', 'zone', 'runtimeAttributes', 'description', 'startTime', 'endTime']
final = pd.DataFrame(columns=columns)

for submission in submissions_list:
    submissions_response= get_submission_with_retry(billing_project, workspace, submission)
    workflows = submissions_response.json()['workflows']
    #workflows_ids = [workflow['workflowId'] for workflow in workflows]
    workflows_ids = []
    for workflow in workflows:
        if 'workflowId' in workflow:
            workflows_ids.append(workflow['workflowId'])
    for workflow in workflows_ids:
        workflow_response = get_workflow_metadata_with_retry(billing_project, workspace, submission, workflow)
        workflow_data = workflow_response.json()
        if 'calls' in workflow_data:
            keys = workflow_data['calls'].keys()
            dataframes = {}
            for key in keys:
                attempts = workflow_data['calls'][key]
                df = pd.DataFrame(columns=columns)
                rows = []
                for attempt in attempts:
                    for event in attempt['executionEvents']:
                        if 'backendLabels' in attempt:
            # Extract the information we need
                            row = {
                                'attempt': attempt['attempt'],
                                'backend': attempt['backend'],
                                'cromwell-workflow-id': attempt['backendLabels']['cromwell-workflow-id'],
                                'terra-submission-id': attempt['backendLabels']['terra-submission-id'],
                                'wdl-task-name': attempt['backendLabels']['wdl-task-name'],
                                'compressedDockerSize': float(attempt.get('compressedDockerSize', 0))/1073741824,
                                'dockerImageUsed': attempt.get('dockerImageUsed', 'default_value'),
                                'end': attempt['end'],
                                'start': attempt['start'],
                                'executionStatus': attempt['executionStatus'],
                                'failures': attempt.get('failures','default'),
                                'executionBucket': attempt['jes']['executionBucket'],
                                'googleProject': attempt['jes']['googleProject'],
                                'instanceName': attempt['jes']['instanceName'],
                                'machineType': attempt['jes']['machineType'],
                                'zone': attempt['jes']['zone'],
                                'runtimeAttributes': attempt['runtimeAttributes'],
                                'description': event['description'],
                                'startTime': event['startTime'],
                                'endTime': event['endTime']
                            }
                            rows.append(row)
                df = pd.concat([df, pd.DataFrame(rows)], ignore_index=True)
                dataframes[key] = df
            final = pd.concat([final, pd.concat(dataframes.values(), ignore_index=True)], ignore_index=True)
final

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow 96598e02-8cc1-4699-b3e1-af76516c0b0d in submission 55f6e13e-27d6-416b-ae33-a9c92e014819.
Getting workflow metadata for workflow 1d46feb3-186c-4709-bb44-d87e9a1cf3fb in submission 55f6e13e-27d6-416b-ae33-a9c92e014819...
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow 1d46feb3-186c-4709-bb44-d87e9a1cf3fb in submission 55f6e13e-27d6-416b-ae33-a9c92e014819.
Getting workflow metadata for workflow 1b80e5fb-0b38-4b12-99a8-7343befd5fb4 in submission 55f6e13e-27d6-416b-ae33-a9c92e014819...
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow 1b80e5fb-0b38-4b12-99a8-7343befd5fb4 in submission 55f6e13e-27d6-416b-ae33-a9c92e014819.
Getting workflow metadata for workflow 134ecd5f-efb9-42ce-acb4-0cd27f6cfee8 in submission 55f6e13e-27d6-416b-ae33-a9c92e014819...
Attempt 1 of 3...
Successfully retrieved workflow metadata for

Unnamed: 0,attempt,backend,cromwell-workflow-id,terra-submission-id,wdl-task-name,compressedDockerSize,dockerImageUsed,end,start,executionStatus,executionBucket,googleProject,instanceName,machineType,zone,runtimeAttributes,description,startTime,endTime,failures
0,1,PAPIv2-CloudNAT,cromwell-b8811db6-b436-4bc3-b8a3-e880d54c1827,terra-00813d38-99bd-45f2-9c4f-5218fb7bdbfc,downloaddicomandconvertandinferencetotalsegmen...,5.449095,default_value,2023-05-25T21:50:55.450Z,2023-05-25T21:24:55.331Z,RetryableFailure,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-202358b29676eee81785d8...,custom-2-13312,us-central1-f,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",Pending,2023-05-25T21:24:55.331Z,2023-05-25T21:24:55.331Z,"[{'causedBy': [], 'message': 'Task TotalSegmen..."
1,1,PAPIv2-CloudNAT,cromwell-b8811db6-b436-4bc3-b8a3-e880d54c1827,terra-00813d38-99bd-45f2-9c4f-5218fb7bdbfc,downloaddicomandconvertandinferencetotalsegmen...,5.449095,default_value,2023-05-25T21:50:55.450Z,2023-05-25T21:24:55.331Z,RetryableFailure,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-202358b29676eee81785d8...,custom-2-13312,us-central1-f,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",PreparingJob,2023-05-25T21:24:59.391Z,2023-05-25T21:25:00.085Z,"[{'causedBy': [], 'message': 'Task TotalSegmen..."
2,1,PAPIv2-CloudNAT,cromwell-b8811db6-b436-4bc3-b8a3-e880d54c1827,terra-00813d38-99bd-45f2-9c4f-5218fb7bdbfc,downloaddicomandconvertandinferencetotalsegmen...,5.449095,default_value,2023-05-25T21:50:55.450Z,2023-05-25T21:24:55.331Z,RetryableFailure,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-202358b29676eee81785d8...,custom-2-13312,us-central1-f,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",RequestingExecutionToken,2023-05-25T21:24:55.331Z,2023-05-25T21:24:59.391Z,"[{'causedBy': [], 'message': 'Task TotalSegmen..."
3,1,PAPIv2-CloudNAT,cromwell-b8811db6-b436-4bc3-b8a3-e880d54c1827,terra-00813d38-99bd-45f2-9c4f-5218fb7bdbfc,downloaddicomandconvertandinferencetotalsegmen...,5.449095,default_value,2023-05-25T21:50:55.450Z,2023-05-25T21:24:55.331Z,RetryableFailure,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-202358b29676eee81785d8...,custom-2-13312,us-central1-f,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",UpdatingJobStore,2023-05-25T21:50:54.434Z,2023-05-25T21:50:55.450Z,"[{'causedBy': [], 'message': 'Task TotalSegmen..."
4,1,PAPIv2-CloudNAT,cromwell-b8811db6-b436-4bc3-b8a3-e880d54c1827,terra-00813d38-99bd-45f2-9c4f-5218fb7bdbfc,downloaddicomandconvertandinferencetotalsegmen...,5.449095,default_value,2023-05-25T21:50:55.450Z,2023-05-25T21:24:55.331Z,RetryableFailure,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-202358b29676eee81785d8...,custom-2-13312,us-central1-f,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",CallCacheReading,2023-05-25T21:25:00.085Z,2023-05-25T21:25:00.094Z,"[{'causedBy': [], 'message': 'Task TotalSegmen..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59902,1,PAPIv2-CloudNAT,cromwell-c0913919-5ab6-4928-ab01-35dd51be305c,terra-ffb85344-8150-48ad-be9f-32b0e155f838,totalsegmentatorendtoend,5.464721,vamsithiriveedhi/totalsegmentator@sha256:a5e62...,2023-05-31T00:30:26.091Z,2023-05-30T21:31:56.020Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-30ff14bcae7918fd4c47ef...,custom-2-13312,us-west4-a,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",Delocalization,2023-05-31T00:25:49.566Z,2023-05-31T00:26:18.738Z,default
59903,1,PAPIv2-CloudNAT,cromwell-c0913919-5ab6-4928-ab01-35dd51be305c,terra-ffb85344-8150-48ad-be9f-32b0e155f838,totalsegmentatorendtoend,5.464721,vamsithiriveedhi/totalsegmentator@sha256:a5e62...,2023-05-31T00:30:26.091Z,2023-05-30T21:31:56.020Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-30ff14bcae7918fd4c47ef...,custom-2-13312,us-west4-a,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",PreparingJob,2023-05-30T21:31:59.100Z,2023-05-30T21:31:59.211Z,default
59904,1,PAPIv2-CloudNAT,cromwell-c0913919-5ab6-4928-ab01-35dd51be305c,terra-ffb85344-8150-48ad-be9f-32b0e155f838,totalsegmentatorendtoend,5.464721,vamsithiriveedhi/totalsegmentator@sha256:a5e62...,2023-05-31T00:30:26.091Z,2023-05-30T21:31:56.020Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-30ff14bcae7918fd4c47ef...,custom-2-13312,us-west4-a,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...","Worker ""google-pipelines-worker-30ff14bcae7918...",2023-05-30T21:32:12.840Z,2023-05-30T21:35:27.651Z,default
59905,1,PAPIv2-CloudNAT,cromwell-c0913919-5ab6-4928-ab01-35dd51be305c,terra-ffb85344-8150-48ad-be9f-32b0e155f838,totalsegmentatorendtoend,5.464721,vamsithiriveedhi/totalsegmentator@sha256:a5e62...,2023-05-31T00:30:26.091Z,2023-05-30T21:31:56.020Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-30ff14bcae7918fd4c47ef...,custom-2-13312,us-west4-a,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",Background,2023-05-30T21:39:04.763Z,2023-05-30T21:39:04.877Z,default


In [None]:
final.to_csv('data.csv')