Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapters Not Being Trimmed (Apparently)? #54

Closed
g-pacheco opened this issue Apr 12, 2021 · 2 comments
Closed

Adapters Not Being Trimmed (Apparently)? #54

g-pacheco opened this issue Apr 12, 2021 · 2 comments

Comments

@g-pacheco
Copy link

Dear Mikkel,

I might have noticed what could be some sort of misbehaviour. I have the attached test files (T14_Test_{Pair}.fastq), and I have tried to process them using the following command (v2.3.1):

AdapterRemoval --file1 T14_Test_1.fastq --file2 T14_Test_2.fastq --trimns --trimqualities --collapse --minlength 5 --minquality 20 --maxns 30 --adapter1 GATCGGAAGAGCACACGTCTGAACTCCAGTCAC --adapter2 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --basename Test_AR

My understanding is that the output should look something like this (Pair 1):

@HISEQ:247:C87NTANXX:7:1101:2220:2855 1:N:0:TAATGC
ATCGTTAATCGATTTTCCTCG
+
BBBBBFFFBFFFBFFBFFFFF
@HISEQ:247:C87NTANXX:7:1101:2220:2856 1:N:0:TAATGC
ATCGTTAATCGATTTTCCTCGTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAA
+
BBBBBFFFBFFFBFFBFFFFFFFFFFFFFFFFFFFFFF/<FF//<BFFFFB<FBBBFF/FFFB/<FB

But that is not what I have been getting. On the contrary, all the reads survive almost untouched. Could you please let me know if I am missing something here? I am really sorry if I am (probably I am), but I have tried different options and I cannot see what I could do differently.

Please let me know should you need any extra information from my end.

Thanks in advance, George.
T14_Test_1-2.zip

@MikkelSchubert
Copy link
Owner

Dear George,

I've looked at your data, and as far as I can see there are a couple of issues:

Both issues relate to the way AdapterRemoval aligns reads and adapters in paired-end mode, which is done by combining the read and adapter sequences and then performing a gap-less pair-wise alignment between the two sequences:

 Adapter2' + Read1
  aligned to
 Read2' + Adapter1

Once these combined sequences have been aligned, AdapterRemoval can then use alignment information to accurately trim the adapter sequence from the reads.

However, because the alignment is ungapped, indels early in one of the reads can result in a proper alignment not being found, which appears to be the case for 3 of the pairs of reads that you included:

HISEQ:247:C87NTANXX:7:1101:2120:2851
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTCAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

HISEQ:247:C87NTANXX:7:1101:3751:2506
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGACAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

HISEQ:247:C87NTANXX:7:1101:9224:2232
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTCAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

However, it is also possible that you are using the wrong sequence for --adapter2. You could try running AdapterRemoval with the --identify-adapters option and see what AdapterRemoval reports. That option prints consensus adapter sequences obtained by aligning the mate 1 and mate 2 reads, and should hopefully correspond (with some uncertainty) to your own --adapter1/--adapter2 values.

The second problem is a bit trickier:

HISEQ:247:C87NTANXX:7:1101:2220:2855
	R1	ATCGTTAATCGATTTTCCTCGGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAA
	A1	                     GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	TAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAACCCGCGACAGCAGTTTGGTTAGGATGCGGCTTAGGGTCTTAGGTCGATCGGTAA
	A2	???

HISEQ:247:C87NTANXX:7:1101:2220:2856
	R1	ATCGTTAATCGATTTTCCTCGTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	A1	                                                                   GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	TAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAACCCGCGACAGCAGTTTGGTTAGGATGCGGCTTAGGGTCTTAGGTCGAAAAATAA
	A2	???

While --adapter1 is clearly present in the reads, the --adapter2 sequence is not. And if you look at the reverse complements of the reads, it doesn't seem like the two reads overlap at all. Possibly you are looking at some sort of dimer or other non-biological sequence with the primer sequence embedded near one end. Either way, since the two sequences are not complementary, AdapterRemoval correctly fails to align the two reads and therefore does not trim the embedded adapter sequence.

You could maybe filter out reads like this after you've performing adapter trimming, which would also catch the kind of false negatives described above, but I don't have specific advice in that regard.

@g-pacheco
Copy link
Author

g-pacheco commented Apr 20, 2021

Dear Mikkel,

Thanks very much for your quick reply, and apologies for my delayed one.

I have run AR with the --identify-adapters flag as you indicated on three of my sample, and it found the following sequences:

Adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Adapter2 (i5): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

For Adapter 2, I reckon it makes sense and the sequence does seem to be correct. However, I have noticed that I get a better result when I use just the first part of it (AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA). As for Adapter 1, from what I could see this initial A should not really be there, but I get a much better result when I include it. Moreover, I also get a better result when I use just the first part of this adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCAC), and I include this initial A.

I have run some tests using this configuration, and I think AR is working as I would expect now.

Many thanks once again, George.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants