New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapters Not Being Trimmed (Apparently)? #54
Comments
Dear George, I've looked at your data, and as far as I can see there are a couple of issues: Both issues relate to the way AdapterRemoval aligns reads and adapters in paired-end mode, which is done by combining the read and adapter sequences and then performing a gap-less pair-wise alignment between the two sequences:
Once these combined sequences have been aligned, AdapterRemoval can then use alignment information to accurately trim the adapter sequence from the reads. However, because the alignment is ungapped, indels early in one of the reads can result in a proper alignment not being found, which appears to be the case for 3 of the pairs of reads that you included:
However, it is also possible that you are using the wrong sequence for --adapter2. You could try running AdapterRemoval with the --identify-adapters option and see what AdapterRemoval reports. That option prints consensus adapter sequences obtained by aligning the mate 1 and mate 2 reads, and should hopefully correspond (with some uncertainty) to your own --adapter1/--adapter2 values. The second problem is a bit trickier:
While --adapter1 is clearly present in the reads, the --adapter2 sequence is not. And if you look at the reverse complements of the reads, it doesn't seem like the two reads overlap at all. Possibly you are looking at some sort of dimer or other non-biological sequence with the primer sequence embedded near one end. Either way, since the two sequences are not complementary, AdapterRemoval correctly fails to align the two reads and therefore does not trim the embedded adapter sequence. You could maybe filter out reads like this after you've performing adapter trimming, which would also catch the kind of false negatives described above, but I don't have specific advice in that regard. |
Dear Mikkel, Thanks very much for your quick reply, and apologies for my delayed one. I have run AR with the --identify-adapters flag as you indicated on three of my sample, and it found the following sequences: Adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG For Adapter 2, I reckon it makes sense and the sequence does seem to be correct. However, I have noticed that I get a better result when I use just the first part of it (AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA). As for Adapter 1, from what I could see this initial A should not really be there, but I get a much better result when I include it. Moreover, I also get a better result when I use just the first part of this adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCAC), and I include this initial A. I have run some tests using this configuration, and I think AR is working as I would expect now. Many thanks once again, George. |
Dear Mikkel,
I might have noticed what could be some sort of misbehaviour. I have the attached test files (T14_Test_{Pair}.fastq), and I have tried to process them using the following command (v2.3.1):
AdapterRemoval --file1 T14_Test_1.fastq --file2 T14_Test_2.fastq --trimns --trimqualities --collapse --minlength 5 --minquality 20 --maxns 30 --adapter1 GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --adapter2 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --basename Test_AR
My understanding is that the output should look something like this (Pair 1):
@HISEQ:247:C87NTANXX:7:1101:2220:2855 1:N:0:TAATGC
ATCGTTAATCGATTTTCCTCG
+
BBBBBFFFBFFFBFFBFFFFF
@HISEQ:247:C87NTANXX:7:1101:2220:2856 1:N:0:TAATGC
ATCGTTAATCGATTTTCCTCGTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAA
+
BBBBBFFFBFFFBFFBFFFFFFFFFFFFFFFFFFFFFF/<FF//<BFFFFB<FBBBFF/FFFB/<FB
But that is not what I have been getting. On the contrary, all the reads survive almost untouched. Could you please let me know if I am missing something here? I am really sorry if I am (probably I am), but I have tried different options and I cannot see what I could do differently.
Please let me know should you need any extra information from my end.
Thanks in advance, George.
T14_Test_1-2.zip
The text was updated successfully, but these errors were encountered: