Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ONT Chimeric read splitting #747

Open
rhpvorderman opened this issue Dec 12, 2023 · 1 comment
Open

Feature request: ONT Chimeric read splitting #747

rhpvorderman opened this issue Dec 12, 2023 · 1 comment

Comments

@rhpvorderman
Copy link
Collaborator

Currently I am researching ONT possibilities with cutadapt, and it seems that the most basic functionality can be achieved. Unfortunately after the adapters have been adequately cut, sequali still finds adapter sequences.

These are most likely due to chimeric reads, where reads are joined by adapter sequences. These reads should be split. With the newest chemistry the amount of chimeric reads is estimated at 10% (previously around 2%). These chimeric reads are not always split by the sequence provider and historic data may also contain the 2% reads because splitting was not available back then.

Since cutadapt already has a decent alignment algorithm that can detect sequences anywhere in the read, it should be possible to write a routine that detects chimeric reads.

The hard part I guess will be the actual splitting, were one read becomes two or more reads and feed that back into the pipeline. I can imagine that consideration wasn't a thing when cutadapt was designed.

@rhpvorderman
Copy link
Collaborator Author

I did some thinking and research. The best way to approach this is as follows:

  1. Publish the user guide with the current cutadapt code. Chimeric reads are detected by using adapter detection and using --discard to throw them away.
  2. Make a dedicated read splitter. Rather than splitting the read, the longest segment is presented as canonical.
  3. Look how read splitting can be incorporated in the cutadapt single-end pipeline.

3 is quite challenging, but by following the steps, cutadapt will already be useful for nanopore with chimeric reads at step 1, without requiring extra code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant