Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option in depletion to revert tags in aligned files #751

Merged
merged 13 commits into from
Jan 23, 2018

Conversation

tomkinsc
Copy link
Member

@tomkinsc tomkinsc commented Jan 8, 2018

Depletion calls RevertSam on already-aligned input. This adds an option to clear tags in the bam file as well, with a default tag list an an optional argument to explicitly specify the tags. These options are passed through to the WDL. Also bugfix blastdb prefix finding for large multipart blastdbs; parallelize mvicuna rmdup by library in response to #754

The desire is to revert the bam to unaligned state. Tags from
representative bams listed via: samtools view input.bam | cut -f 12- |
tr '\t' '\n' | cut -d ':' -f 1 | awk '{ if(!x[$1]++) { print }}'
the DX web interface does not seem to render a UI element for
Array[String]?
taxon_filter.py Outdated
help='When supplying an aligned input file, clear the per-read attribute tags')
parser.add_argument("--tagsToClear", type=str, nargs='+', dest="tags_to_clear", default=["XT", "X0", "X1", "XA",
"AM", "SM", "BQ", "CT", "XN", "OC", "OP"],
help='blastn chunk size (default: %(default)s)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this argparse help string looks incorrect!

@@ -7,6 +7,8 @@ task deplete_taxa {
Array[File]? bmtaggerDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz
Array[File]? blastDbs # .tar.gz, .tgz, .tar.bz2, .tar.lz4, .fasta, or .fasta.gz
Int? query_chunk_size
Boolean? clear_tags = false
String? tags_to_clear_space_separated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array[String]? perhaps? Then further below in the task command section you can use the ${sep=' ' tags_to_clear} WDL string join notation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because of the behavior of this pair of parameters, maybe it would be clearer to the user of this workflow if tags_to_clear were actually set to a default list that matches the argparse default in python, otherwise it's not clear that leaving it unspecified actually implies a decent default list (or why this requires two parameters instead of just one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had it as Array[String], but DNAnexus rendered a simple text box in the optional settings, and the expected input format was not clear (comma-separated, or space-separated?). Setting a default makes sense though, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds like something to ask the dxWDL folks about later on.

@tomkinsc
Copy link
Member Author

tomkinsc commented Jan 9, 2018

I'll want to run this in DNAnexus prior to merging.

@tomkinsc tomkinsc requested a review from dpark01 January 11, 2018 21:15
taxon_filter.py Outdated
@@ -371,6 +371,7 @@ def multi_db_deplete_bam(inBam, refDbs, deplete_method, outBam, **kwargs):
for db in refDbs:
if not samtools.isEmpty(tmpBamIn):
tmpBamOut = mkstempfname('.bam')
print("db", db)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this print line for?

@dpark01 dpark01 added this to In progress in v1.19.1 Jan 18, 2018
@tomkinsc tomkinsc merged commit 5d8c65f into master Jan 23, 2018
v1.19.1 automation moved this from In progress to Done Jan 23, 2018
@tomkinsc tomkinsc deleted the ct-depletion-revert-read-tags branch January 23, 2018 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v1.19.1
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants