Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq error correction after dedup #23

Closed
peterch405 opened this issue Apr 22, 2016 · 4 comments
Closed

Seq error correction after dedup #23

peterch405 opened this issue Apr 22, 2016 · 4 comments

Comments

@peterch405
Copy link
Contributor

I was wondering, how do you merge the duplicate read or do you simply discard them? I would be interested in looking at the full sequence and possibly deriving a consensus to correct for pcr/seq errors.

@TomSmithCGAT
Copy link
Member

Hi Perch, we just output a single "best" read. This read should be the consensus since it will have the highest counts.

If you really want to retain all the duplicates but have them marked them in some way so you can manually derive the consensus, I guess we could add this as an option?

@peterch405
Copy link
Contributor Author

That sounds good.

@IanSudbery
Copy link
Member

Hi Guys, I'm a bit worried what this might do to the memory usage. We keep a buffer of reads to ensure that we get all the reads from a region before outputting (because we using the start of the read in orientation, rather than genome orientation - i.e. for a read on the reverse strand we care about the 3' most coordinate, not the BAM pos field). At the moment we only retain the representative read. If we keep all reads, we could have 100s of times more memory usage - still we could add this as an option, but with the warning that the memory usage might shoot through the roof.

@TomSmithCGAT
Copy link
Member

The group command can now be used to group read by their UMI for downstream processing such as deriving a consensus to correct for pcr/seq errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants