-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize wdl description #392
Changes from all commits
f1a7620
0e4c2e7
09bb301
0081ec7
74698cf
87ad08a
eb9273f
e1dfd97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,5 @@ | ||
version 1.0 | ||
|
||
########################################################################################## | ||
## A workflow that performs CCS correction on PacBio HiFi reads from a single flow cell. | ||
## The workflow shards the subreads into clusters and performs CCS in parallel on each cluster. | ||
## Ultimately, all the corrected reads (and uncorrected) are gathered into a single BAM. | ||
## Various metrics are produced along the way. | ||
########################################################################################## | ||
|
||
import "../../../tasks/Utility/PBUtils.wdl" as PB | ||
import "../../../tasks/Alignment/AlignReads.wdl" as AR | ||
import "../../../tasks/Utility/Utils.wdl" as Utils | ||
|
@@ -20,6 +13,30 @@ import "../../../tasks/Transcriptomics/MASSeq.wdl" as MAS | |
import "../../../tasks/Utility/JupyterNotebooks.wdl" as JUPYTER | ||
|
||
workflow PBFlowcell { | ||
|
||
meta { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jonn-smith can you please add your description of the MASseq part? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the genome side, here's my proposed description:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long term, we should dis-integrate this workflow and update it to match Revio outputs, which is assumed to be the major working machines down the road. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @SHuang-Broad how's this: For MAS-seq transcriptome data, this workflow will determine the most likely MAS-seq model, then it will use that model to annotate, segment, and filter the CCS reads. These CCS reads will then be aligned to the reference in trascriptome alignemnt mode. Note: Currently the MAS-seq workflow separates CLR reads, but does not process them. |
||
description: "The workflow performs the alignment of an SMRT cell's worth of data to a reference. For genomic sequencing data, the workflow also optionally performs CCS correction if the data is from a CCS library but did not get corrected on-instrument. For MAS-seq transcriptome data, this workflow will determine the most likely MAS-seq model, then it will use that model to annotate, segment, and filter the CCS reads. These CCS reads will then be aligned to the reference in trascriptome alignemnt mode. Note: Currently the MAS-seq workflow separates CLR reads, but does not process them." | ||
} | ||
parameter_meta { | ||
bam: "GCS path to raw subread bam" | ||
ccs_report_txt: "GCS path to CCS report txt, required if on-instrument corrected, otherwise CCS is run in this workflow for CCS libraries" | ||
pbi: "GCS path to pbi index for raw subread bam" | ||
ref_map_file: "table indicating reference sequence and auxillary file locations" | ||
|
||
SM: "the value to place in the BAM read group's SM field" | ||
LB: "the value to place in the BAM read group's LB (library) field" | ||
|
||
num_shards: "number of shards into which fastq files should be batched" | ||
experiment_type: "type of experiment run (CLR, CCS, ISOSEQ, MASSEQ)" | ||
dir_prefix: "directory prefix for output files" | ||
|
||
mas_seq_model: "Longbow model to use for MAS-seq data." | ||
|
||
DEBUG_MODE: "[default valued] enables debugging tasks / subworkflows (default: false)" | ||
|
||
gcs_out_root_dir: "GCS bucket to store the reads, variants, and metrics files" | ||
} | ||
|
||
input { | ||
File bam | ||
File pbi | ||
|
@@ -45,26 +62,6 @@ workflow PBFlowcell { | |
Boolean DEBUG_MODE = false | ||
} | ||
|
||
parameter_meta { | ||
bam: "GCS path to raw subread bam" | ||
ccs_report_txt: "GCS path to CCS report txt, required if on-instrument corrected, otherwise CCS is run in this workflow for CCS libraries" | ||
pbi: "GCS path to pbi index for raw subread bam" | ||
ref_map_file: "table indicating reference sequence and auxillary file locations" | ||
|
||
SM: "the value to place in the BAM read group's SM field" | ||
LB: "the value to place in the BAM read group's LB (library) field" | ||
|
||
num_shards: "number of shards into which fastq files should be batched" | ||
experiment_type: "type of experiment run (CLR, CCS, ISOSEQ, MASSEQ)" | ||
dir_prefix: "directory prefix for output files" | ||
|
||
mas_seq_model: "Longbow model to use for MAS-seq data." | ||
|
||
DEBUG_MODE: "[default valued] enables debugging tasks / subworkflows (default: false)" | ||
|
||
gcs_out_root_dir: "GCS bucket to store the reads, variants, and metrics files" | ||
} | ||
|
||
# Call our timestamp so we can store outputs without clobbering previous runs: | ||
call Utils.GetCurrentTimestampString as WdlExecutionStartTimestamp { input: } | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!