Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No trimming when no adapter is found #64

Closed
Maarten-vd-Sande opened this issue Jul 31, 2019 · 9 comments
Closed

No trimming when no adapter is found #64

Maarten-vd-Sande opened this issue Jul 31, 2019 · 9 comments

Comments

@Maarten-vd-Sande
Copy link

When no adapter is found, trim galore defaults to the illumina sequence and tries to remove this.

I have a pipeline that cuts automatically using trimgalore, however sometimes people send me data that is already trimmed. Since I do not want to make assumptions about whether the data is trimmed or not, I still run it through trimgalore. I was wondering if there is some option I failed to find, that if no adapter is found, quality control is performed but no adapter trimming.

I am trying to avoid manually checking if adapters exist and adding -a -X (#51)

@FelixKrueger
Copy link
Owner

Hi Maarten,

I am afraid as it stands Trim Galore is not set up towards not trimming at all.

One option would of course be do the checking externally, e.g. I believe FastQC writes out a table with the counts for the Illumina, Nextera and small RNA adapters anyway, so one could in theory read that and base the trimming/no trimming decision on that.

We could of course add an option that skips the adapter trimming (or sets -a X itself) if really absolutely no adapter is found. The question would then be though: Where do you draw the line? At really 0 / 0 / 0 counts for all three adapters? What happens if there is 1 count for one of them? Just to remind you, the auto-detection works on the first 1 million sequences, and the adapter sequences as 12-13bp long, so very occasionally you might find one of the sequences in a read by chance, or if it occurs in the genome somewhere....

I could of course leave that thresholding problem up to you by adding an integer threshold count that you need to set yourself (anything between 0 and ?), e.g. --consider_already_trimmed [INT]. Is that something that would help in case?

@Maarten-vd-Sande
Copy link
Author

Thanks for the fast reply.

I would be very happy with the solution you propose, and leaving the 'thresholding problem' to me/the user.

@FelixKrueger
Copy link
Owner

Alright, let me see what I can do (probably tomorrow though).

@Maarten-vd-Sande
Copy link
Author

No hurries, thanks a lot!

@FelixKrueger
Copy link
Owner

Sorry for being not entirely truthful, I have now tried to add the option --consider_already_trimmed INT to do pretty much exactly what we discussed. Could you clone the latest development version and see if it works in your hands? Addressed here: 0662279.

A description should come up with trim_galore --help.

@Maarten-vd-Sande
Copy link
Author

Can't complain!
It seems to work as intended (tried on two samples, one above threshold and one below).

Thanks a lot

@Maarten-vd-Sande
Copy link
Author

p.s. any idea when this will be 'released'?

@FelixKrueger
Copy link
Owner

Thanks for the feedback. Given that there are some four changes already I suppose we could make a release soon. Let me just grab a coffee...

@FelixKrueger
Copy link
Owner

Here we go, v0.6.4 has just been posted. Enjoy!

https://github.com/FelixKrueger/TrimGalore/releases/tag/0.6.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants