Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not cut the base at the read 5' head #32

Closed
lingbl opened this issue Jan 25, 2019 · 10 comments
Closed

Do not cut the base at the read 5' head #32

lingbl opened this issue Jan 25, 2019 · 10 comments

Comments

@lingbl
Copy link

lingbl commented Jan 25, 2019

when use "-trimqualities" , adapterremoval will remove the base at 5' head of reads.
This will affect the "MarkDuplicates" step in the GATK4, and I need to align reads by UMI, now have to use trimmoticate to trim low quality base.

Does anyway donot trim base at 5' head , when use "-trimqualities" ?

Thanks, adapterremoval realy usefule.

@MikkelSchubert
Copy link
Owner

I am glad to hear that you've find AdapterRemoval to be useful.
Unfortunately it is currently not possible to only trim the 3' of reads using the --trimqualities option. But it shouldn't be much trouble to add an option for that, so I'll include it in the next update to AdapterRemoval.

Best,
Mikkel

@sc13-bioinf
Copy link

It turns out that trimming the base at the 5' end has serious consequences for low coverage genomes because duplicate removal will no longer recognise PCR duplicates.

@maxibor
Copy link
Contributor

maxibor commented Jun 21, 2019

I am glad to hear that you've find AdapterRemoval to be useful.
Unfortunately it is currently not possible to only trim the 3' of reads using the --trimqualities option. But it shouldn't be much trouble to add an option for that, so I'll include it in the next update to AdapterRemoval.

Best,
Mikkel

An add-on to the --trim5p --trim3p option ?

@MikkelSchubert
Copy link
Owner

Hi all,

I've just released AdapterRemoval 2.3.1 which adds a new option (--preserve5p) that prevents quality based trimming at the 5p termini when any of the --trimns, --trimqualities, or --trimwindows options are used. This also entirely disables quality based trimming of collapsed reads, since both ends of these are informative for PCR duplicate filtering (see [1] and [2] for scripts that can be used for this).

Thank you for your patience and feel free to re-open this issue or open a new issue if you run into any (related) problems.

Best,
Mikkel

[1] FilterUniqueBAM.py/FilterUniqueSAMCons.py from https://www.ncbi.nlm.nih.gov/pubmed/22237537
[2] paleomix rmdup_collapsed from https://www.ncbi.nlm.nih.gov/pubmed/24722405

@aidaanva
Copy link

Hi!
We just noticed that when using the option --preserve5p, it is still trimming merged reads in the 3' which shouldn't happen because it was originally a 5', is there any further setting to prevent this from happening?

@MikkelSchubert
Copy link
Owner

MikkelSchubert commented Jul 17, 2019 via email

@aidaanva
Copy link

aidaanva commented Aug 6, 2019

Sorry for the late reply.
Here is command-line settings:

AdapterRemoval --file1 sample1.fq.gz --basename sample1.fq --gzip --threads 4 --trimns --trimqualities --preserve5p --adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC --adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA --minlength 30 --minquality 20 --minadapteroverlap 1

An example of trimmed read (the last T is trimmed):
After AdapterRemoval with the --minquality 0:
@M_NS500382:27:HJH3LBGXX:2:13211:2150:9711 1:N:0:GTACTCGA+AACCTCAG
CACGGTATCGGCCGCAACGTTTTCAGCACGTGTTGGGTCAGAAGTTTGTAGTGGCAACACTGTAAAAATCTCTTGAGGAGT
+
AAAAAAEEEEAEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEAEEEAEEEEE/EEEEEEEAEEEAAAAA/
After AdapterRemoval with the command above:
@M_NS500382:27:HJH3LBGXX:2:13211:2150:9711 1:N:0:GTACTCGA+AACCTCAG
CACGGTATCGGCCGCAACGTTTTCAGCACGTGTTGGGTCAGAAGTTTGTAGTGGCAACACTGTAAAAATCTCTTGAGGAG
+
AAAAAAEEEEAEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEAEEEAEEEEE/EEEEEEEAEEEAAAAA

I want to mention that this data had been already trimmed and merged before running it again through AdapterRemoval. Maybe that would explain why the trimming is happening?

@MikkelSchubert
Copy link
Owner

MikkelSchubert commented Aug 7, 2019 via email

@aidaanva
Copy link

Dear Mikkel,

I am sorry for taking so long to reply. My colleague was doing some reprocessing of the data and to keep it consistent he re-run all the steps as he did previously, without realising that the reads were already trimmed and merged.

I understand that if you rerun your data some trimming may happen because of resemblance to the adapters. However, in the data of the example I was talking about, we've got reads that before rerunning AdapterRemoval are duplicates, with the same sequence but only differing in the quality of the first base in the 5'. After we run AdapterRemoval, one of the reads got trimmed (the example above) while the other didn't. So I think that if that trimming was due to adapter looking like base, it should have trimmed both reads the one with low quality in the 5' and the one with a good quality in the 5'. Is that correct or is there another behaviour that I am not having in account that could explain this phenomenon?

Thank you for your time!

Best,

Aida

@MikkelSchubert
Copy link
Owner

Dear Aida,

You are right that the T does not match the adapter sequence. Looking closer, the cause of base being trimmed is the "--minquality 20" option. The T has a Phred encoded quality score of "/", which corresponds to a quality of 15, and because AdapterRemoval is treating the reads as SE data, the --preserve5p option does not stop it from trimming that base.

Best,
Mikkel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants