Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting the search to the first n characters #709

Open
ramongallego opened this issue May 29, 2023 · 2 comments
Open

Limiting the search to the first n characters #709

ramongallego opened this issue May 29, 2023 · 2 comments

Comments

@ramongallego
Copy link

Hi there,

I was wondering if there is a way of limiting the search for adapters to the first n characters (or the last n characters) of each sequence. I find that particularly useful when demultiplexing: if there are a considerable number of barcodes to match, it is often the problem that one of the barcodes matches somewhere in the middle of the read. As many sequencing experiments return data with known structure, one can expect the demultiplexing information to be located in the first n characters, so it will be more precise and quicker to find that info if it was possible to limit the seach to those first characters

@marcelm
Copy link
Owner

marcelm commented May 29, 2023

Barcodes in data I have worked with were usually directly at the 5' end, and Cutadapt offers anchored 5' adapters for these. If you need to be a bit more flexible, you could use a non-internal adapter with a couple of N characters at the beginning and set the minimum overlap such that only full occurrences are allowed.

For example, if you have barcode ACGTACGT (length 8) and you want to allow up to 5 bases preceding it:

cutadapt -g 'XN{5}ACGTACGT;min_overlap=8' ...

If you allow a large number of N bases like this, this is a bit slower than it could be, so please let me know if that is the case and if it is a bottleneck and I could have a look into optimizing this.

@ramongallego
Copy link
Author

ramongallego commented May 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants