New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account overwrites coverage table with relative abundance table when run twice #24

and3k opened this Issue Nov 13, 2018 · 1 comment


None yet
1 participant

and3k commented Nov 13, 2018


I noticed, that when is run twice (e.g., first collating genomes, then samples), the coverage information gets lost.

Here is a minimal example:

input: a.txt b.txt

when i run -t a.txt b.txt > ab.txt it results in ab.txt

Notice how these sections are suddenly identical

## coverage
# genome	sample_a.sam	sample_b.sam
genome1.fna	80.57256734444161	81.80919187285467
genome2.fna	19.427432655558402	18.031204027824025
genome3.fna	0.0	0.15960409932130895
## relative abundance
# genome	sample_a.sam	sample_b.sam
genome1.fna	80.5725673444416	81.80919187285467
genome2.fna	19.4274326555584	18.031204027824025
genome3.fna	0.0	0.15960409932130895

It’s not shown in this example, but I have observed, that when the coverage values are replaced with the wrong "relative abundance" values, they seem to be used for coverage filtering, which can result in loosing valid iRep values.



This comment has been minimized.

and3k commented Nov 15, 2018

As a workaround, removing the relative abundance section before using seems to fix it.

This is the command I use the process the files before merging them:

awk '/## index of replication/{ flag=1 } /## relative abundance/{ flag=0 } /## % windows passing filter/{ flag=1 } flag' a.tsv > a_tmp.tsv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment