Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making aMeta less pathogen oriented #123

Merged
merged 4 commits into from
Mar 3, 2023
Merged

Conversation

LeandroRitter
Copy link
Collaborator

I made a number of changes, trying to make aMeta follow up all detected by KrakenUniq species instead of pathogens only. Previously it was too pathogen oriented, now we deliver all followup stats for all detected microbes (including pathogens, if present, of course). This commit should fix a few issues such as #122, #120, #119, #84

Copy link
Collaborator

@percyfal percyfal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeandroRitter A couple of comments regarding the bowtie2 alignment.

"""bowtie2 --large-index -x {params.PATHO_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} 2> {log} | samtools view -bS -q 1 -h -@ {threads} - | samtools sort - -@ {threads} -o {output.bam} >> {log};"""
"""samtools index {output.bam}"""
"""bowtie2 --large-index -x {params.BOWTIE2_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} > $(dirname {output.bam})/AlignedToBowtie2DB.sam 2> {log};"""
"""grep @ $(dirname {output.bam})/AlignedToBowtie2DB.sam | awk '!seen[$2]++' > $(dirname {output.bam})/header_nodups.txt;"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since samtools is used further down, why not replace grep @ by samtools head to view the header, and grep -v '^@' below to samtools view --no-header? This would make it easier to understand the logic of the commands.

Are all the commands here meant to rewrite the samtools header to remove duplicates? Why not use samtools reheader? Speed issues? This shouldn't be the case; see http://www.htslib.org/doc/samtools-reheader.html, where the description reads "This command is much faster than replacing the header with a BAM→SAM→BAM conversion. "

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See issue #122

Copy link
Collaborator Author

@LeandroRitter LeandroRitter Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@percyfal all the dark magic I used with grep and not samtools was because one can't use samtools on a quasi-sam that has duplicate records in its header. So the whole point was that what you get from a bowtie2 (using full NT index) is not possible to process with samtools, so one has to "manually" fix the header before being able to use samtools

shell:
"""bowtie2 --large-index -x {params.PATHO_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} 2> {log} | samtools view -bS -q 1 -h -@ {threads} - | samtools sort - -@ {threads} -o {output.bam} >> {log};"""
"""samtools index {output.bam}"""
"""bowtie2 --large-index -x {params.BOWTIE2_DB} --end-to-end --threads {threads} --very-sensitive -U {input.fastq} > $(dirname {output.bam})/AlignedToBowtie2DB.sam 2> {log};"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the end result here is a bam file, why not directly write to bam using the -b option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible unfortunately because of the duplicate records in the header

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants