<a href="https://colab.research.google.com/github/vkt1414/Cloud-Resources-Workflows/blob/main/Notebooks/Totalsegmentator/executionAnalytics/Terra-Cromwell%20Workflow%20Metadata.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An introduction to using the Fiss API in Python in BioData Catalyst

This notebook introduces users to the Firecloud API using a Python Jupyter notebook. The example covers how the API communicates between the data table and notebook. The user loads an existing Terra data table into the notebook, subsets the dataframe, and saves the new dataframe as a tsv to the workspace bucket or as a new Terra data table.

Note: a more scalable version of this process is available in the [terra_data_table_util](https://app.terra.bio/#workspaces/biodata-catalyst/BioData%20Catalyst%20Collection/notebooks/launch/terra_data_table_util.ipynb) notebook. 

## Notebook Runtime

We suggest using both the default environment and compute power. 

###**Installing Packages**

In [13]:
%%capture
!pip install firecloud 

# Load packages

In [12]:
from firecloud import fiss
import firecloud.api as fapi
import os
import io
import pandas as pd
import glob

In [19]:
from google.oauth2 import service_account
FILEPATH='/content/drive/MyDrive/idc/idc-external-025-service-account.json'
!export GOOGLE_APPLICATION_CREDENTIALS=FILEPATH.json

## Set environment variables that Fiss API requires

In [16]:
# Get the Google billing project name and workspace name
#billing_project = os.environ['WORKSPACE_NAMESPACE']
#workspace = os.environ['WORKSPACE_NAME']
#bucket = os.environ['WORKSPACE_BUCKET'] + "/"

billing_project= 'terra-billing-datester'
workspace= 'TotalSegmentator'
bucket= 'gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/'


# Verify that we've captured the environment variables
print("Billing project: " + billing_project)
print("Workspace: " + workspace)
print("Workspace storage bucket: " + bucket)

Billing project: terra-billing-datester
Workspace: TotalSegmentator
Workspace storage bucket: gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/


In [20]:
import time
import pandas as pd

def get_workflow_metadata_with_retry(billing_project, workspace, submission_id, workflow_id):
    max_attempts = 3
    print(f'Getting workflow metadata for workflow {workflow_id} in submission {submission_id}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fiss.fapi.get_workflow_metadata(billing_project, workspace, submission_id, workflow_id)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved workflow metadata for workflow {workflow_id} in submission {submission_id}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to get workflow metadata for workflow {workflow_id} after {max_attempts} attempts')


def get_submission_with_retry(billing_project, workspace, submission_id):
    max_attempts = 3
    print(f'Getting submission {submission_id}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fapi.get_submission(billing_project, workspace, submission_id)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved submission {submission_id}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to get submission {submission_id} after {max_attempts} attempts')
    
    
def list_submissions_with_retry(billing_project, workspace):
    max_attempts = 3
    print(f'Listing submissions for workspace {workspace} in billing project {billing_project}...')
    for i in range(max_attempts):
        print(f'Attempt {i+1} of {max_attempts}...')
        response = fapi.list_submissions(billing_project, workspace)
        if response.status_code >= 200 and response.status_code < 300:
            print(f'Successfully retrieved list of submissions for workspace {workspace} in billing project {billing_project}.')
            return response
        else:
            print(f'Received response with status code {response.status_code}. Retrying after 30 seconds...')
            time.sleep(30)
    raise Exception(f'Failed to list submissions for workspace {workspace} in billing project {billing_project} after {max_attempts} attempts')

In [21]:
submissions_list_response=list_submissions_with_retry(billing_project, workspace)
submissions_list=submissions_list_response.json()
submissions_list = [submission['submissionId'] for submission in submissions_list]
submissions_list

Listing submissions for workspace TotalSegmentator in billing project terra-billing-datester...
Attempt 1 of 3...
Successfully retrieved list of submissions for workspace TotalSegmentator in billing project terra-billing-datester.


['012a732c-fe81-48c2-8ce6-35728bdc1328',
 '06dd6a3f-c124-4aaa-9161-b5fef94451c0',
 '0b190bd4-3072-49ed-bd0e-beab0887fbd3',
 '0bad3c79-b9d1-43d1-a3a4-b38c89040c06',
 '0dfe7244-5ea8-4c46-abcc-a7f34abce611',
 '12807b09-6a75-428c-aa89-951efc9606fd',
 '12c37d37-524f-43ac-99ba-1d60937624c4',
 '1970cdda-c4e0-4b99-ba54-75167044eb75',
 '1e1e4c7e-d4ca-422b-8ec2-afc8ca2030ef',
 '25691caa-64d1-4cb6-a9be-682677d66c8c',
 '26fd9e93-397a-431d-8a81-d72a568d4108',
 '35b368c1-d3df-414e-a137-1988195640b1',
 '367ddbd0-6ef2-4403-af12-23cfc0b7d31e',
 '3d833a12-0cee-4715-8890-13ff5679d59d',
 '40a4894b-cc95-4654-8c37-9716d746c439',
 '4f2556f9-6496-421f-83b2-fc345080e083',
 '57ecea46-3aeb-45c6-9db7-344ae6c9667f',
 '63f0383c-b544-459c-a575-217c17bc5b91',
 '6ad3ce9c-807f-40ff-bb04-7fbbd165ba9d',
 '6c60c83d-28c7-4dd4-b7ab-3d2e1a0d52f8',
 '7f4da724-5b7c-497b-a071-4f6d4f7ee171',
 '8a53de40-e270-49e6-abdc-d174f1bc855b',
 '8ece60a6-9ada-4483-aed7-6d197ff5476c',
 '921af976-3942-442f-887c-9c5029420b8f',
 '9594107f-5126-

In [22]:
columns = ['attempt', 'backend', 'cromwell-workflow-id', 'terra-submission-id', 'wdl-task-name', 'compressedDockerSize', 'dockerImageUsed', 'end', 'start', 'executionStatus', 'executionBucket', 'googleProject', 'instanceName', 'machineType', 'zone', 'runtimeAttributes', 'description', 'startTime', 'endTime']
final = pd.DataFrame(columns=columns)

for submission in submissions_list:
    submissions_response= get_submission_with_retry(billing_project, workspace, submission)
    workflows = submissions_response.json()['workflows']
    #workflows_ids = [workflow['workflowId'] for workflow in workflows]
    workflows_ids = []
    for workflow in workflows:
        if 'workflowId' in workflow:
            workflows_ids.append(workflow['workflowId'])
    for workflow in workflows_ids:
        workflow_response = get_workflow_metadata_with_retry(billing_project, workspace, submission, workflow)
        workflow_data = workflow_response.json()
        if 'calls' in workflow_data:
            keys = workflow_data['calls'].keys()
            dataframes = {}
            for key in keys:
                attempts = workflow_data['calls'][key]
                df = pd.DataFrame(columns=columns)
                rows = []
                for attempt in attempts:
                    for event in attempt['executionEvents']:
                        if 'backendLabels' in attempt:
            # Extract the information we need
                            row = {
                                'attempt': attempt['attempt'],
                                'backend': attempt['backend'],
                                'cromwell-workflow-id': attempt['backendLabels']['cromwell-workflow-id'],
                                'terra-submission-id': attempt['backendLabels']['terra-submission-id'],
                                'wdl-task-name': attempt['backendLabels']['wdl-task-name'],
                                'compressedDockerSize': float(attempt.get('compressedDockerSize', 0))/1073741824,
                                'dockerImageUsed': attempt.get('dockerImageUsed', 'default_value'),
                                'end': attempt['end'],
                                'start': attempt['start'],
                                'executionStatus': attempt['executionStatus'],
                                'executionBucket': attempt['jes']['executionBucket'],
                                'googleProject': attempt['jes']['googleProject'],
                                'instanceName': attempt['jes']['instanceName'],
                                'machineType': attempt['jes']['machineType'],
                                'zone': attempt['jes']['zone'],
                                'runtimeAttributes': attempt['runtimeAttributes'],
                                'description': event['description'],
                                'startTime': event['startTime'],
                                'endTime': event['endTime']
                            }
                            rows.append(row)
                df = pd.concat([df, pd.DataFrame(rows)], ignore_index=True)
                dataframes[key] = df
            final = pd.concat([final, pd.concat(dataframes.values(), ignore_index=True)], ignore_index=True)
final

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow 92c1990b-fc79-4772-bd59-dd403efc6365 in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0.
Getting workflow metadata for workflow d952a7b5-73ab-4d08-a06c-93db8ce70956 in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0...
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow d952a7b5-73ab-4d08-a06c-93db8ce70956 in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0.
Getting workflow metadata for workflow 8886a57f-8127-48b6-bae5-e4ba142ea1dd in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0...
Attempt 1 of 3...
Successfully retrieved workflow metadata for workflow 8886a57f-8127-48b6-bae5-e4ba142ea1dd in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0.
Getting workflow metadata for workflow 048f4a6b-e257-4c9c-96ef-1f536093ba63 in submission 06dd6a3f-c124-4aaa-9161-b5fef94451c0...
Attempt 1 of 3...
Successfully retrieved workflow metadata for

Unnamed: 0,attempt,backend,cromwell-workflow-id,terra-submission-id,wdl-task-name,compressedDockerSize,dockerImageUsed,end,start,executionStatus,executionBucket,googleProject,instanceName,machineType,zone,runtimeAttributes,description,startTime,endTime
0,1,PAPIv2-CloudNAT,cromwell-46fe754a-1148-44e9-b816-daf89b198cc1,terra-012a732c-fe81-48c2-8ce6-35728bdc1328,papermill,7.818606,us-central1-docker.pkg.dev/erudite-bonbon-3739...,2023-01-21T17:05:24.848Z,2023-01-21T16:00:39.292Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-40688f65473a19a0b030ba...,custom-2-13312,us-central1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",UpdatingCallCache,2023-01-21T17:05:23.285Z,2023-01-21T17:05:23.848Z
1,1,PAPIv2-CloudNAT,cromwell-46fe754a-1148-44e9-b816-daf89b198cc1,terra-012a732c-fe81-48c2-8ce6-35728bdc1328,papermill,7.818606,us-central1-docker.pkg.dev/erudite-bonbon-3739...,2023-01-21T17:05:24.848Z,2023-01-21T16:00:39.292Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-40688f65473a19a0b030ba...,custom-2-13312,us-central1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",waiting for quota,2023-01-21T16:00:41.399Z,2023-01-21T16:00:58.675Z
2,1,PAPIv2-CloudNAT,cromwell-46fe754a-1148-44e9-b816-daf89b198cc1,terra-012a732c-fe81-48c2-8ce6-35728bdc1328,papermill,7.818606,us-central1-docker.pkg.dev/erudite-bonbon-3739...,2023-01-21T17:05:24.848Z,2023-01-21T16:00:39.292Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-40688f65473a19a0b030ba...,custom-2-13312,us-central1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...","Pulling ""gcr.io/google.com/cloudsdktool/cloud-...",2023-01-21T16:01:33.313Z,2023-01-21T16:01:46.197Z
3,1,PAPIv2-CloudNAT,cromwell-46fe754a-1148-44e9-b816-daf89b198cc1,terra-012a732c-fe81-48c2-8ce6-35728bdc1328,papermill,7.818606,us-central1-docker.pkg.dev/erudite-bonbon-3739...,2023-01-21T17:05:24.848Z,2023-01-21T16:00:39.292Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-40688f65473a19a0b030ba...,custom-2-13312,us-central1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...","Worker ""google-pipelines-worker-40688f65473a19...",2023-01-21T16:00:58.675Z,2023-01-21T16:01:33.313Z
4,1,PAPIv2-CloudNAT,cromwell-46fe754a-1148-44e9-b816-daf89b198cc1,terra-012a732c-fe81-48c2-8ce6-35728bdc1328,papermill,7.818606,us-central1-docker.pkg.dev/erudite-bonbon-3739...,2023-01-21T17:05:24.848Z,2023-01-21T16:00:39.292Z,Done,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-40688f65473a19a0b030ba...,custom-2-13312,us-central1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",Complete in GCE / Cromwell Poll Interval,2023-01-21T16:59:48.051Z,2023-01-21T17:05:23.285Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35018,1,PAPIv2-CloudNAT,cromwell-816dacf5-22db-477a-9e49-095c2f02e562,terra-fe58406b-9501-4168-846e-7cc97c60574c,downloaddicomandconvertandinferencetotalsegmen...,5.446280,default_value,2023-04-13T16:28:56.644Z,2023-04-13T15:44:31.588Z,Aborted,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-2d015c53ae7bb3acf87e04...,custom-4-20480,europe-west1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",RequestingExecutionToken,2023-04-13T15:44:31.588Z,2023-04-13T15:44:33.469Z
35019,1,PAPIv2-CloudNAT,cromwell-816dacf5-22db-477a-9e49-095c2f02e562,terra-fe58406b-9501-4168-846e-7cc97c60574c,downloaddicomandconvertandinferencetotalsegmen...,5.446280,default_value,2023-04-13T16:28:56.644Z,2023-04-13T15:44:31.588Z,Aborted,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-2d015c53ae7bb3acf87e04...,custom-4-20480,europe-west1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",Pending,2023-04-13T15:44:31.588Z,2023-04-13T15:44:31.588Z
35020,1,PAPIv2-CloudNAT,cromwell-816dacf5-22db-477a-9e49-095c2f02e562,terra-fe58406b-9501-4168-846e-7cc97c60574c,downloaddicomandconvertandinferencetotalsegmen...,5.446280,default_value,2023-04-13T16:28:56.644Z,2023-04-13T15:44:31.588Z,Aborted,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-2d015c53ae7bb3acf87e04...,custom-4-20480,europe-west1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",WaitingForValueStore,2023-04-13T15:44:33.469Z,2023-04-13T15:44:33.469Z
35021,1,PAPIv2-CloudNAT,cromwell-816dacf5-22db-477a-9e49-095c2f02e562,terra-fe58406b-9501-4168-846e-7cc97c60574c,downloaddicomandconvertandinferencetotalsegmen...,5.446280,default_value,2023-04-13T16:28:56.644Z,2023-04-13T15:44:31.588Z,Aborted,gs://fc-5af492dc-6993-4c91-bbf6-3e2747868642/s...,terra-fd442854,google-pipelines-worker-2d015c53ae7bb3acf87e04...,custom-4-20480,europe-west1-b,"{'bootDiskSizeGb': '10', 'continueOnReturnCode...",PreparingJob,2023-04-13T15:44:33.469Z,2023-04-13T15:44:33.884Z


In [23]:
final.to_csv('data.csv')