In [1]:
# auto reloading of local scripts under dev
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys
import json

In [3]:
import pprint
pp = pprint.PrettyPrinter(indent=1, sort_dicts=False)

In [4]:
# load local lib
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from src.cromwell_utils import *

In [5]:
CROMWELL_METADATA_MODEL_EXPLANATION.print()

Here we explain the assumptions and the model used in interpreting Cromwell Metadata Tree. We assume there exists three types of computing nodes:
  1) simple tasks, which are computing units defined in WDL with keyword "task", and maps to concrete computing instances;
  2) subworkflows (subWF), which are computing units defined in WDL with keyword "workflow", and does not map to concrete computing instances, but delegates to other computing nodes;
  3) scatters, which are computing units signified in WDL with keyword "scatter", and does not map to concrete computing instances, but have homogenous shards. Each shard may compose of several computing nodes.

We assume a simple task's json is representable by list of a (relatively) simple dict. The length of the list is the # of attempts made for preemptible tasks. The json contains almost all the following keys:
['attempt', 'backend', 'backendLabels', 'backendLogs', 'backendStatus', 'callCaching', 'callRoot', 'commandLine', 'compressedDoc

## Build the tree

In [6]:
with open('/Users/shuang/Desktop/jonn.big.metadata.json', 'r') as ff:
    jonns_workflow = json.load(ff)

In [7]:
jonns_model = WorkflowMinimumDiagnosisMetadata(jonns_workflow)

In [8]:
len(jonns_model.tree)

44

In [9]:
jonns_model.tree[1]

<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cb50>

In [10]:
jonns_model.tree[0]

{'t_22_MergeS2ECcsReclaimedArrayElementSubshards': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cb20>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6ce50>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cd00>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6ccd0>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cac0>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6ca60>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cee0>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cf40>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6cfa0>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e6ce20>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e7d0a0>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e7d100>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e7d160>,
  <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x107e7d1c0>

In [11]:
x = [e for e in jonns_model.tree if isinstance(e, dict) and 'ScatterAt296_18' in e]
print(f"len x: {len(x)} (expects 1).")
y = list(x[0].values())[0]
print(f"len y: {len(y)} (expects 300).")
print(f"type of an element in y: {type(y[1])} (tuple expected).")
s = y[1]
print(f"And its shard index is expected to be 1: {s[1]}")
z = s[0]
print(f"The shard is expected to hold a single callable, represented by a length-1 list: type {type(z)}, length {len(z)}")
u = z[0]
print(f"Type of the only callable in a shard {type(u)} (dict expected), its len {len(u)} (expects 1).")
k = next(iter(u.keys()))
v = next(iter(u.values()))
print(f"the only callable in a shard is named {k} (expects t_21_SegmentS2ECcsReclaimedReads), its value is a {type(v)} (list expected), and len {len(v)} (expects 10).")

len x: 1 (expects 1).
len y: 300 (expects 300).
type of an element in y: <class 'tuple'> (tuple expected).
And its shard index is expected to be 1: 1
The shard is expected to hold a single callable, represented by a length-1 list: type <class 'list'>, length 1
Type of the only callable in a shard <class 'dict'> (dict expected), its len 1 (expects 1).
the only callable in a shard is named t_21_SegmentS2ECcsReclaimedReads (expects t_21_SegmentS2ECcsReclaimedReads), its value is a <class 'list'> (list expected), and len 10 (expects 10).


In [12]:
x[0]

{'ScatterAt296_18': [([{'t_21_SegmentS2ECcsReclaimedReads': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0160>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f01f0>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0250>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f02b0>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0310>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0370>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f03d0>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0430>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0490>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f04f0>]}],
   0),
  ([{'t_21_SegmentS2ECcsReclaimedReads': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f0550>,
      <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f3f05b0>,
      <src.cromwell_utils.TaskMinim

## Present the tree

## Show topology in text form

In [13]:
jonns_model.topology()

Workflow:   MASseqStarcodeCbcParameterSweep

Workflow: MASseqStarcodeCbcParameterSweep, 23 leaves, 19 simple scatters, 0 subworkflows, 2 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_07_RemoveKineticsTags, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_08_FindCCSReport, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_09_FilterS2EByMinReadQuality, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_10_GetS2ERCcsRejectedReads, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarcodeCbcParameterSweep.t_11_ExtractS2ECcsReclaimableReads, 300 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 simple scatter: MASseqStarc

## Show diagnosis on failed tasks

In [14]:
jonns_model.diagnose(show_success_too=False)

Workflow:   MASseqStarcodeCbcParameterSweep
            (uuid: b33df17f-59de-4141-8175-d58e7851892d)
Status:     Failed
Wall-Clock: 8:59


[1m[34mDiagnosis[0m

    [31mMASseqStarcodeCbcParameterSweep.t_33_CorrectBarcodesWithStarcodeSeedCounts.t_33_CorrectBarcodesWithStarcodeSeedCounts[0m has 600 shards,
      shard 0 was attempted 2 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/MASseqStarcodeCbcParameterSweep/b33df17f-59de-4141-8175-d58e7851892d/call-t_33_CorrectBarcodesWithStarcodeSeedCounts/shard-0/attempt-2/t_33_CorrectBarcodesWithStarcodeSeedCounts-0.log[0m
      shard 1 was attempted 2 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/MASseqStarcodeCbcParameterSweep/b33df17f-59de-4141-8175-d58e7851892d/call-t_33_CorrectBarcodesWithStarcodeSeedCounts/shard-1/attempt-2/t_33_CorrectBarcodesWithStarcodeSeedCounts-1.log[0m
      

# My model

In [15]:
with open('/Users/shuang/Desktop/withshards.big.workflow.json', 'r') as ff:
    my_workflow = json.load(ff)

In [16]:
my_model = WorkflowMinimumDiagnosisMetadata(my_workflow)

In [17]:
my_model.tree

[{'CallVariants': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9aea00>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae130>,
   {'DVP': [{'PEPPER': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae460>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9aee80>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae400>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae250>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9aea30>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae820>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae160>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae670>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae850>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae7f0>,
       <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11f9ae430>,
       <src.cromwell_utils

In [18]:
my_model.topology()

Workflow:   PBCCSWholeGenome

Workflow: PBCCSWholeGenome, 10 leaves, 0 simple scatters, 1 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBCCSWholeGenome.CallVariants, 3 leaves, 4 simple scatters, 1 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Call, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Discover, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.Sniffles, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 simple scatter: PBCCSWholeGenome.CallVariants.SubsetBam, 25 shard-attempts, 0 simple scatters, 0 subworkflows, 0 complex scatters

    Level-1 subworkflow: PBCCSWholeGenome.CallVariants.DVP, 2 leaves, 3 simple scatters, 0 subworkflows, 0 complex scatters

      Level-2 simple scatter: PBCCSWholeGenome.CallVariants.DVP.DV,

In [19]:
my_model.diagnose(show_success_too=True)

Workflow:   PBCCSWholeGenome
            (uuid: 18378026-20e0-40a2-a824-341e7db9017c)
Status:     Succeeded
Wall-Clock: 6:44


[1m[34mDiagnosis[0m

  [31mPBCCSWholeGenome.FinalizeAlignedBai[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeAlignedBam[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeAlignedPbi[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERGVcf[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERTbi[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeDVPEPPERVcf[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizePBSV[0m is not sharded, was attempted 1 times, ultimately succeeded.
  [31mPBCCSWholeGenome.FinalizeSniffles[0m is not sharded, was attempted 1 times, ultimately 

In [20]:
with open('/Users/shuang/Desktop/failed.big.workflow.json', 'r') as ff:
    my_2nd_workflow = json.load(ff)

In [21]:
my_2nd_model = WorkflowMinimumDiagnosisMetadata(my_2nd_workflow)

In [22]:
my_2nd_model.tree

[<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35970>,
 <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35850>,
 {'CallAssemblyVariants': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd358b0>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35a30>]},
 {'FinalizeHifiasmAlternateFa': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35a00>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35580>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd354c0>]},
 {'FinalizeHifiasmAlternateGfa': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35460>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35040>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35100>]},
 {'Hifiasm': [<src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35220>,
   <src.cromwell_utils.TaskMinimalDiagnosisMetadata at 0x11fd35910>]},
 {'FinalizeHifiasmPrimaryGfa': [<src.cromwell_utils.TaskMinimalDiagnosis

In [23]:
my_2nd_model.topology()

Workflow:   PBAssembleWithHifiasm

Workflow: PBAssembleWithHifiasm, 15 leaves, 0 simple scatters, 2 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBAssembleWithHifiasm.CallAssemblyVariants, 2 leaves, 0 simple scatters, 0 subworkflows, 0 complex scatters

  Level-0 subworkflow: PBAssembleWithHifiasm.Hifiasm, 2 leaves, 0 simple scatters, 0 subworkflows, 0 complex scatters



In [24]:
my_2nd_model.diagnose(show_success_too=False)

Workflow:   PBAssembleWithHifiasm
            (uuid: 86585cc9-1a3d-46cc-a616-8aed202a7acd)
Status:     Failed
Wall-Clock: 2:28


[1m[34mDiagnosis[0m

  [31mPBAssembleWithHifiasm.FinalizeHifiasmAlternateFa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/PBAssembleWithHifiasm/86585cc9-1a3d-46cc-a616-8aed202a7acd/call-FinalizeHifiasmAlternateFa/attempt-3/FinalizeHifiasmAlternateFa.log[0m
  [31mPBAssembleWithHifiasm.FinalizeHifiasmAlternateGfa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attempts in order: []. Last attempt log file: [4mgs://broad-dsp-lrma-cromwell/PBAssembleWithHifiasm/86585cc9-1a3d-46cc-a616-8aed202a7acd/call-FinalizeHifiasmAlternateGfa/attempt-3/FinalizeHifiasmAlternateGfa.log[0m
  [31mPBAssembleWithHifiasm.FinalizeHifiasmPrimaryFa[0m is not sharded, was attempted 3 times, ultimately failed. PAPI codes for all attem