In [1]:
import os
import sys
import json

In [2]:
import pprint
pp = pprint.PrettyPrinter(indent=1, sort_dicts=False)

In [3]:
# load local lib
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from src.cromwell.utils import *

In [4]:
CROMWELL_METADATA_MODEL_EXPLANATION.print()

Here we explain the assumptions and the model used in interpreting Cromwell Metadata Tree. We assume there exists three types of computing nodes:
  1) simple tasks, which are computing units defined in WDL with keyword "task", and maps to concrete computing instances;
  2) subworkflows (subWF), which are computing units defined in WDL with keyword "workflow", and does not map to concrete computing instances, but delegates to other computing nodes;
  3) scatters, which are computing units signified in WDL with keyword "scatter", and does not map to concrete computing instances, but have homogenous shards. Each shard may compose of several computing nodes.

We assume a simple task's json is representable by list of a (relatively) simple dict. The length of the list is the # of attempts made for preemptible tasks. The json contains almost all the following keys:
['attempt', 'backend', 'backendLabels', 'backendLogs', 'backendStatus', 'callCaching', 'callRoot', 'commandLine', 'compressedDoc

## Data

In [5]:
with open('/Users/shuang/Desktop/jonn.big.metadata.json', 'r') as ff:
    jonns_workflow = json.load(ff)

## Build the tree

In [6]:
jonns_model = WorkflowMinimumDiagnosisMetadata(jonns_workflow)

## Present the tree

## Show topology in text form

In [7]:
jonns_model.topology()

Workflow:   MASseqStarcodeCbcParameterSweep

Workflow: MASseqStarcodeCbcParameterSweep, 23 leaves, 19 simple scatters, 0 subworkflows, 2 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_07_RemoveKineticsTags, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_08_FindCCSReport, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_09_FilterS2EByMinReadQuality, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_10_GetS2ERCcsRejectedReads, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_11_ExtractS2ECcsReclaimableReads, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarc

## Show diagnosis on failed tasks

In [8]:
jonns_model.diagnose(show_success_too=False)

Workflow:   MASseqStarcodeCbcParameterSweep
            (uuid: b33df17f-59de-4141-8175-d58e7851892d)
Status:     Failed
Wall-Clock: 8:59


[1m[34mDiagnosis[0m

    [31mMASseqStarcodeCbcParameterSweep.t_33_CorrectBarcodesWithStarcodeSeedCounts.t_33_CorrectBarcodesWithStarcodeSeedCounts[0m has 600 shards,
      shard 0 was attempted 2 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/MASseqStarcodeCbcParameterSweep/b33df17f-59de-4141-8175-d58e7851892d/call-t_33_CorrectBarcodesWithStarcodeSeedCounts/shard-0/attempt-2/t_33_CorrectBarcodesWithStarcodeSeedCounts-0.log[0m
      shard 1 was attempted 2 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/MASseqStarcodeCbcParameterSweep/b33df17f-59de-4141-8175-d58e7851892d/call-t_33_CorrectBarcodesWithStarcodeSeedCounts/shard-1/attempt-2/t_33_CorrectBarcodesWithStarcodeSeedCounts-1.log[0m
      

# A different type My model

## My first scenario

In [9]:
with open('/Users/shuang/Desktop/withshards.big.workflow.json', 'r') as ff:
    my_workflow = json.load(ff)
my_model = WorkflowMinimumDiagnosisMetadata(my_workflow)

In [10]:
my_model.topology()

Workflow:   PBCCSWholeGenome

Workflow: PBCCSWholeGenome, 10 leaves, 0 simple scatters, 1 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBCCSWholeGenome.CallVariants, 3 leaves, 4 simple scatters, 1 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Call, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Discover, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Sniffles, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.SubsetBam, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 subworkflow: PBCCSWholeGenome.CallVariants.DVP, 2 leaves, 3 simple scatters, 0 subworkflows, 0 complex scatters

      Level-2 simple scatter: PBCCSWholeGenome.CallVariants.DVP.DV,

In [11]:
my_model.diagnose(show_success_too=True)

Workflow:   PBCCSWholeGenome
            (uuid: 18378026-20e0-40a2-a824-341e7db9017c)
Status:     Succeeded
Wall-Clock: 6:44


[1m[34mDiagnosis[0m

  [31mPBCCSWholeGenome.FinalizeAlignedBai[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeAlignedBam[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeAlignedPbi[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERGVcf[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERTbi[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERVcf[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizePBSV[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeSniffles[0m is not sharded, was attempted 1 times, ultimately 

## My 2nd scenario

In [12]:
with open('/Users/shuang/Desktop/failed.big.workflow.json', 'r') as ff:
    my_2nd_workflow = json.load(ff)

In [13]:
my_2nd_model = WorkflowMinimumDiagnosisMetadata(my_2nd_workflow)

In [14]:
my_2nd_model.topology()

Workflow:   PBAssembleWithHifiasm

Workflow: PBAssembleWithHifiasm, 15 leaves, 0 simple scatters, 2 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBAssembleWithHifiasm.CallAssemblyVariants, 2 leaves, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBAssembleWithHifiasm.Hifiasm, 2 leaves, 0 simple scatters, 0 subworkflows, 0 complex scatters



In [15]:
my_2nd_model.diagnose(show_success_too=False)

Workflow:   PBAssembleWithHifiasm
            (uuid: 86585cc9-1a3d-46cc-a616-8aed202a7acd)
Status:     Failed
Wall-Clock: 2:28


[1m[34mDiagnosis[0m

  [31mPBAssembleWithHifiasm.FinalizeHifiasmAlternateFa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/PBAssembleWithHifiasm/86585cc9-1a3d-46cc-a616-8aed202a7acd/call-FinalizeHifiasmAlternateFa/attempt-3/FinalizeHifiasmAlternateFa.log[0m
  [31mPBAssembleWithHifiasm.FinalizeHifiasmAlternateGfa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/PBAssembleWithHifiasm/86585cc9-1a3d-46cc-a616-8aed202a7acd/call-FinalizeHifiasmAlternateGfa/attempt-3/FinalizeHifiasmAlternateGfa.log[0m
  [31mPBAssembleWithHifiasm.FinalizeHifiasmPrimaryFa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attem