WDLs for trivial bam manipulation #82

Merged
merged 1 commit into from Jan 21, 2017

Conversation

Projects
None yet
2 participants
Collaborator

vdauwera commented Jan 8, 2017

  • Grab headers from a list of SAM/BAMs (with caveat and a better way to do it on GCP with gsutil)
  • Validate a list of SAM/BAMs in SUMMARY mode
Collaborator

vdauwera commented Jan 8, 2017

Note that this builds on #81

Will run full tests before requesting review.

Collaborator

vdauwera commented Jan 8, 2017 edited

  • Test ExtractSamHeadersWf: fa16e964-ef39-4cbe-ba10-f104fc81c538
  • Test ValidateBamsWf: 0fbe8061-a1e9-4684-9d83-3b057badaebc
Collaborator

vdauwera commented Jan 8, 2017

Both tests pass; this is ready to review.

+
+ # Outputs that will be retained when execution is complete
+ output {
+ ExtractSAMHeader.*
@cjllanwarne

cjllanwarne Jan 13, 2017

Contributor

The new style syntax would be File output_bam = ExtractSAMHeader.output_bam. That'd make it much easier to compose this WDL into other WDLs in the future

@vdauwera

vdauwera Jan 13, 2017

Collaborator

Oh interesting. What does this "new style" hang on? Is it a purely stylistic recommendation or an actual engine support thing? If so, at what version of Cromwell does this kick in?

+
+ # Outputs that will be retained when execution is complete
+ output {
+ ValidateBAM.*
@cjllanwarne

cjllanwarne Jan 13, 2017

Contributor

Again, I think our samples should use the new style syntax as much as possible

Collaborator

vdauwera commented Jan 15, 2017

@cjllanwarne I updated the output syntax as you recommended, but I'm getting this validation error (with wdltool 0.9):

ERROR: validation_report is declared as a File but the expression evaluates to a Array[String]:

    File validation_report = ValidateBAM.validation_report

The task output is defined as File validation_report = "${output_basename}.txt"

Is it because the new syntax is not yet supported by wdltool, or did I misunderstand your recommendation?

Collaborator

vdauwera commented Jan 15, 2017 edited

And while we're on this code, I have a question re: using the value of optional args in output filenames.

I have an optional arg declared as String ? validation_mode, then I use it in my command as MODE=${default="SUMMARY" validation_mode}. Is there any way to use the value of validation_mode in the output filename, so that it will be e.g. myFoo.SUMMARY.txt if I don't set a mode, but myFOO.BAR.txt if I set ValidateBAM.validation_mode: "BAR" in the inputs.json?


UPDATE 1: I tried

command {
    ...
    OUTPUT=${output_basename}.${default='SUMMARY' validation_mode}.txt \
    ...
}
...
output {
    File validation_report = "${output_basename}.${default='SUMMARY' validation_mode}.txt"
  }

and that's not throwing up any validation errors, so I will run a test and see if it works. If it does we should document that this can be done, @katevoss and @knoblett.


UPDATE 2: The above didn't seem to work in a real test. Although the job seems to proceed normally and there is no error anywhere (rc is 0), the outputs never materialize. Going back to the boring way of naming my output file fixes it. So either the two expressions I used above are not equivalent once they're evaluated, or there's something more subtle that's breaking here.

Contributor

cjllanwarne commented Jan 17, 2017

@vdauwera

(1) Oops, I didn't see the scatter! Cromwell is correct in telling me I got this wrong. Because it's coming from a call inside a scatter the output is indeed an Array[File]. e.g. Array[File] validation_reports = ValidateBAM.validation_report

(2) That looks like it should work, so might be a bug in Cromwell. You could maybe try doing it with an upfront input declaration instead of the two interpolations later on? eg:

String? validation_mode
String validation_mode_text = "${default="SUMMARY"validation_mode}"

and then use validation_mode_text instead of the default stuff later on?

Collaborator

vdauwera commented Jan 18, 2017

Aah, you illuminate my path like the North Star, @cjllanwarne. Thank you!

Collaborator

vdauwera commented Jan 18, 2017

Hmm. I updated the relevant lines and now my WDLs pass validation locally, but the Cromwell in the dsde-methods SwaggerUI is choking with this error:

{
  "status": "fail",
  "message": "Workflow input processing failed.",
  "errors": [
    "Unable to load namespace from workflow: Unrecognized token on line 77, column 10:\n\n    Array[File] output_headers = ExtractSAMHeader.output_header\n         ^"
  ]
}

With reformatting it looks like this (approximate caret position):

Array[File] output_headers = ExtractSAMHeader.output_header
         ^

Any idea what's wrong? Could it be a Cromwell version problem? I think this Swagger is running C19.

Contributor

cjllanwarne commented Jan 18, 2017

Oh no! Cromwell 19 is not going to support new style output format. I think that's C23+

Contributor

cjllanwarne commented Jan 18, 2017

FWIW that part 2 might also be a C19 bug that's fixed later.

Also as a side-note, I wouldn't be tooo surprised if the space between String ? is not liked in later versions so I'd stick to String? if possible

Collaborator

vdauwera commented Jan 20, 2017

OK, the new style outputs are working correctly on the updated swagger-cromwell-24.

However the optional/default input declaration for the validation mode has not been working from me. I decided for now to just require the validation_mode to be specified explicitly as an input, since it's a very reasonable requirement. But I would be curious to see if the other way could be made to work.

Collaborator

vdauwera commented Jan 20, 2017

Oh and is it fair to say

Note that you may also see the ? modifier used next to the Type with a space, like this: Type ? variableName. Both formats are currently allowed (up to Cromwell v24), but the extra space may be disallowed in future versions.

Collaborator

vdauwera commented Jan 20, 2017

vdauwera self-assigned this Jan 20, 2017

@vdauwera vdauwera WDLs for trivial BAM manipulations
    - WDL to extract the SAM headers from a list of BAMs
    - WDL to validate a list of SAM/BAM files
dc8d09d

@vdauwera vdauwera merged commit 9f7bfa8 into develop Jan 21, 2017

1 check was pending

code-review/pullapprove Approval required by 2 of: cjllanwarne, geoffjentry, Horneth, kshakir, mcovarr
Details

vdauwera deleted the gvda_trivial_bam_manipulation branch Jan 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment