-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option in depletion to revert tags in aligned files #751
Conversation
The desire is to revert the bam to unaligned state. Tags from representative bams listed via: samtools view input.bam | cut -f 12- | tr '\t' '\n' | cut -d ':' -f 1 | awk '{ if(!x[$1]++) { print }}'
the DX web interface does not seem to render a UI element for Array[String]?
taxon_filter.py
Outdated
help='When supplying an aligned input file, clear the per-read attribute tags') | ||
parser.add_argument("--tagsToClear", type=str, nargs='+', dest="tags_to_clear", default=["XT", "X0", "X1", "XA", | ||
"AM", "SM", "BQ", "CT", "XN", "OC", "OP"], | ||
help='blastn chunk size (default: %(default)s)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this argparse help string looks incorrect!
@@ -7,6 +7,8 @@ task deplete_taxa { | |||
Array[File]? bmtaggerDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz | |||
Array[File]? blastDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz | |||
Int? query_chunk_size | |||
Boolean? clear_tags = false | |||
String? tags_to_clear_space_separated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array[String]?
perhaps? Then further below in the task command section you can use the ${sep=' ' tags_to_clear}
WDL string join notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because of the behavior of this pair of parameters, maybe it would be clearer to the user of this workflow if tags_to_clear
were actually set to a default list that matches the argparse default in python, otherwise it's not clear that leaving it unspecified actually implies a decent default list (or why this requires two parameters instead of just one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had it as Array[String]
, but DNAnexus rendered a simple text box in the optional settings, and the expected input format was not clear (comma-separated, or space-separated?). Setting a default makes sense though, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds like something to ask the dxWDL folks about later on.
I'll want to run this in DNAnexus prior to merging. |
taxon_filter.py
Outdated
@@ -371,6 +371,7 @@ def multi_db_deplete_bam(inBam, refDbs, deplete_method, outBam, **kwargs): | |||
for db in refDbs: | |||
if not samtools.isEmpty(tmpBamIn): | |||
tmpBamOut = mkstempfname('.bam') | |||
print("db", db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this print line for?
Depletion calls RevertSam on already-aligned input. This adds an option to clear tags in the bam file as well, with a default tag list an an optional argument to explicitly specify the tags. These options are passed through to the WDL. Also bugfix blastdb prefix finding for large multipart blastdbs; parallelize mvicuna rmdup by library in response to #754