Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To pair or not to pair #50

Closed
rcedgar opened this issue Apr 24, 2020 · 1 comment
Closed

To pair or not to pair #50

rcedgar opened this issue Apr 24, 2020 · 1 comment
Labels
Bioinformatics Bioinformatics task

Comments

@rcedgar
Copy link
Collaborator

rcedgar commented Apr 24, 2020

From Artem by email: "I'd be happy to abandon paired-end analysis and it would make our lives much simpler, if there is data showing that it's not going to be informative for detecting CoVs."

It seems certain to me that read pairing could have no meaningful benefit in the big compute, though it might in a second pass to analyze candidate datasets. If one read in a pair does(n't) have at least one alignment, bowtie2 will (not) find and report at least one alignment independently of the other read. This is equally true in both paired and unpaired mode. Essentially the only benefit of pairing is to increase MAPQ if the pair are mapped to locations that are close together consistent with the estimated range of construct lengths (say, in the range 200 - 500nt for a typical Illumina shotgun library). In rare cases, this resolves the location of one of the reads if it maps to repetitive sequence -- e.g., if R1 has a unique mapping but R2 maps to repeats that are 300nt and 1000nt distant, then the first repeat is probably right. In unpaired mode, bowtie2 would make an arbitrary choice between the two R2 alignments and would assign a very low MAPQ.

To re-state, if bowtie2 finds alignments for a paired read, it will almost always find alignments for R1 and R2 separately in unpaired mode. The only difference between paired and unpaired is that in paired mode the location may be better resolved and the MAPQ may be higher. We don't care about location or MAPQ for the first round, so these benefits have no value.

You can verify this by comparing paired and unpaired mode in the benchmark tests.

@ababaian
Copy link
Owner

We will revisit this in the future if the need arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bioinformatics Bioinformatics task
Projects
None yet
Development

No branches or pull requests

2 participants