-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losing 50% reads after merging #831
Comments
Can you share with us a table of sequnces that were not merged ? |
Is it possible that heterogeneity spacers were used in sequencing these samples? |
Hi! Thank you very much for your answers! First, @jgrzadziel , I have tried to retrieved the not merged samples but I have failed... All I get is this: merg2 <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE, returnRejects = TRUE)
So there is no sequence attached to them! While on the merged ones there is. I guess because there is no consensus sequence.... @benjjneb , I am not sure about the heterogeneity spacers... Maybe? The data that they sent me had all the indexes and I guess any other tags removed but not the primers. How will that affect my sequences? I did noticed something weird happening on the first bases of the forward primer (low quality) and I removed the first 10. Just in case. |
Sorry, I meant on the forward sequences after removing the primer. Then I removed another 10 nt. |
If heterogeneity spacers were left on the reads they sent you, it will interfere with DADA2 and can lead to results like what you are seeing, where a large fraction of reads end up failing to merge successfully. I don't know if that is what is going on here, but it would be worth confirming the exact library setup the seuqencing provider used, and perhaps asking them directly if they use heterogeneity spacers in the amplicon libraries. |
Ok, thanks! Yes, I will contact them and see what I can find out. In the meantime... If the problem is those spacers... Will removing the primers with cutadap solve the problem? If I am correct, the spacers will be before the primer and cutadapt will look for the primer and remove everything before them.. |
If you are right, then yes. But if the spacers are after the primers, then it won't solve the problem. And that second library design (spacers after primers) is one that we have seen more than once. |
Um, Ok. Thanks. I hope that is not the case. But I will wait for their reply. Thanks a lot! Irene |
Hi! I finally got an answer from the sequencing company. They say that they don't use heterogeneity spacers... I tried cutadapt just in case and, as expected, I got the same results as just trimming a fixed number of nucleotides to remove the primers. Not sure what to do now.... Thanks! |
Hi again, But in some cases, there are about 30-40% (so I'm loosing over 50% of reads, just like you). I don't know what is the reason, but I suspect some errors during sequencing since this problem is always connected to whole sequencing "run". For example in one run, I have 20 samples and each of them is properly filtered etc., but taking another run (sequenced on another day, another machine) the filtered-to-input sequences ratio is just like abovementioned ~30-40%. Interestingly, the same samples when using another software (for example fastq join) and setting the same parameters (the same minimum overlap, maximum mismatch) as in mergePairs are much "better" merged, which means that 90-98% reads are merged. Just to be clear, when testing different software I used dada2 filtered sequences (from fastq/filtered folder). In conclusion, I assume that the problem is linked to:
Examples (the same primers, the same sequencing technogy, the same company and the protocol, but different "run"):Bad sequencing run:
Good sequencing run:
|
@MicroIrene I'm not sure unfortunately. You could try extending truncLen even a bit more, e.g. |
Huh. That is a major difference between runs from the same sequencing provider! |
@benjjneb |
Oh, that's a shame. Well, thank you very much anyway. I tried trimming a bit more already and still not working. I think my only option now is to assume that I lose that many reads and keep going with the analysis... As @jgrzadziel suggests it might be an issue with the quality of the run. Unfortunately the run cannot be repeated, so I'll stick to what I have. |
@MicroIrene |
Interesting.... Yes, I could do that. Actually, I tried before with a different software (with obitools) and they merged very well (99% of the reads merged successfully). But I thought that dada2 needed to work with the reads before merging for the learning errors and sample inference steps.... Is it going to be a problem to work with the merged reads? |
My suggestion was to apply dada2 for: filterAndTrim then using another software: merge sequences then move back to dada2 for: seqtab.nochim (or remove chimeras also in another software) What do you think @benjjneb ? |
Hello @MicroIrene, I'm having the same problem with my dataset (I'm using DADA2 to process it). |
Hi,
I have a problem with my dataset, and that is that I am losing a lot of reads after merging. Approximately 50% of them. I have read all the threads related with this problem but they don't seem to give me a solution... Can you please help? Is this a real problem or shall I just accept that I loose so many and continue working with that?
My sequences are bacteria 16S V3-V4 region. Primers 341F and 805R. Expected fragment size (after removing primers) approx. 426 nt. I sequenced Illumina 2x300. I cut the forward sequences down to 270, reverse to 180. This deleted almost all the bad quality regions of the sequences. Overlap region should be approx. 24 nt (so more than the minimum 12 needed). Primers have definitely been removed from the sequences.
I have tried to check if it was a problem with the number of mismatches, but when I checked the non-merged sequences (with mergePairs, returnRjects = TRUE), I can see that all the ones that were not merged had a 0 matched nucleotides... Does this mean that they do not overlap? I tried to increase the overlap region cutting less of the reverse (I cut down to 210), but the result was exactly the same.
Any advice?
Thank you very very much!
Irene
The text was updated successfully, but these errors were encountered: