-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
making aMeta less pathogen oriented #123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeandroRitter A couple of comments regarding the bowtie2 alignment.
"""bowtie2 --large-index -x {params.PATHO_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} 2> {log} | samtools view -bS -q 1 -h -@ {threads} - | samtools sort - -@ {threads} -o {output.bam} >> {log};""" | ||
"""samtools index {output.bam}""" | ||
"""bowtie2 --large-index -x {params.BOWTIE2_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} > $(dirname {output.bam})/AlignedToBowtie2DB.sam 2> {log};""" | ||
"""grep @ $(dirname {output.bam})/AlignedToBowtie2DB.sam | awk '!seen[$2]++' > $(dirname {output.bam})/header_nodups.txt;""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since samtools is used further down, why not replace grep @
by samtools head
to view the header, and grep -v '^@'
below to samtools view --no-header
? This would make it easier to understand the logic of the commands.
Are all the commands here meant to rewrite the samtools header to remove duplicates? Why not use samtools reheader
? Speed issues? This shouldn't be the case; see http://www.htslib.org/doc/samtools-reheader.html, where the description reads "This command is much faster than replacing the header with a BAM→SAM→BAM conversion. "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See issue #122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@percyfal all the dark magic I used with grep and not samtools was because one can't use samtools on a quasi-sam that has duplicate records in its header. So the whole point was that what you get from a bowtie2 (using full NT index) is not possible to process with samtools, so one has to "manually" fix the header before being able to use samtools
shell: | ||
"""bowtie2 --large-index -x {params.PATHO_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} 2> {log} | samtools view -bS -q 1 -h -@ {threads} - | samtools sort - -@ {threads} -o {output.bam} >> {log};""" | ||
"""samtools index {output.bam}""" | ||
"""bowtie2 --large-index -x {params.BOWTIE2_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} > $(dirname {output.bam})/AlignedToBowtie2DB.sam 2> {log};""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the end result here is a bam file, why not directly write to bam using the -b
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not possible unfortunately because of the duplicate records in the header
I made a number of changes, trying to make aMeta follow up all detected by KrakenUniq species instead of pathogens only. Previously it was too pathogen oriented, now we deliver all followup stats for all detected microbes (including pathogens, if present, of course). This commit should fix a few issues such as #122, #120, #119, #84