Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random select merged reads #20

Closed
meglecz opened this issue Nov 22, 2021 · 0 comments
Closed

Random select merged reads #20

meglecz opened this issue Nov 22, 2021 · 0 comments

Comments

@meglecz
Copy link
Collaborator

meglecz commented Nov 22, 2021

In Novaseq runs there are often far too many reads. According to my tests, 5-10 million reads for a run-replicate (ca. 96 samples) is enough. Above this number of reads the number of variants, average number of variants and reads per sample will not increase after the vtam filtering.
On the other hand, too many reads increase run time and can cause memory issues.
It would be nice to have either a separate command (after merge) or an option in merge to randomly select a user-defined number of reads from each output file of the merge. These reads will be the input of sortreads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants