New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq error correction after dedup #23
Comments
Hi Perch, we just output a single "best" read. This read should be the consensus since it will have the highest counts. If you really want to retain all the duplicates but have them marked them in some way so you can manually derive the consensus, I guess we could add this as an option? |
That sounds good. |
Hi Guys, I'm a bit worried what this might do to the memory usage. We keep a buffer of reads to ensure that we get all the reads from a region before outputting (because we using the start of the read in orientation, rather than genome orientation - i.e. for a read on the reverse strand we care about the 3' most coordinate, not the BAM pos field). At the moment we only retain the representative read. If we keep all reads, we could have 100s of times more memory usage - still we could add this as an option, but with the warning that the memory usage might shoot through the roof. |
The group command can now be used to group read by their UMI for downstream processing such as deriving a consensus to correct for pcr/seq errors. |
I was wondering, how do you merge the duplicate read or do you simply discard them? I would be interested in looking at the full sequence and possibly deriving a consensus to correct for pcr/seq errors.
The text was updated successfully, but these errors were encountered: