-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing a consensus sequence #181
Comments
hi @AK-WIMM. Yes, this should be possible, although it will require some straightforward additional bespoke code. You can use the By the way, are you working in the WIMM, Oxford by any chance? |
Hi. Thanks for getting back to me and yes I am based in the WIMM, Oxford. |
Hi @AK-WIMM. I believe Nils Kölling (@koelling) is still at the WIMM and this is exactly what he was using UMI-tools for if I remember correctly |
@AK-WIMM & @koelling - If either or both of you want to help us add consensus sequence identification to UMI-tools, this is something I'd be happy to assist with. |
At the moment I am not actually outputting any consensus sequences. However, I am making a "consensus call" for each UMI group at each position in the genome, which I then use to calculate allele fractions. I was indeed planning on extending this to generate a BAM file with consensus sequences though, to help with visualisation and integrating this with downstream tools. If @AK-WIMM is who I think it is then we will have a meeting soon to discuss this anyway. If not, please let me know! :) |
I've been looking into doing this in my local pipeline and now have an early prototype that calculates consensus sequences given a set of UMI-tagged reads. The algorithm is quite simple (basically a majority vote at every nucleotide) but it seems to work well enough and runs very fast. I ended up adding writing this as a separate script that just uses UMI-tools for the UMI grouping, since that was a lot easier for me to implement. However, I would think that it makes sense to do add something like this UMI-tools too, since there is clearly some demand for this. I think it would also be quite straightforward to add this to the I would imagine you could just add a I think this would be quite straightforward to implement, but it would probably be easier if one of you adds this, since I'm not very familiar with the codebase. I'd be happy to share my consensus calling code though! |
I'm also interested in this feature, and would be happy to see it implemented in UMI-tools! |
@IanSudbery - My preference would be to have a separate command to perform consensus calling since I consider this to be a distinct task. What do you think? Internally, it's essentially just the same as group but each group is reduced to a consensus read sequence prior to writing out. EDIT - Of course it's not that simple. |
We would also be happy to help test the consensus calling functionality.
Cheers,
dave
…On Wed, Nov 29, 2017 at 6:58 AM, Tom Smith ***@***.***> wrote:
@IanSudbery <https://github.com/iansudbery> - My preference would be to
have a separate command to perform consensus calling since I consider this
to be a distinct task. What do you think?
Internally, it's essentially just the same as group but each group is
reduced to a consensus read sequence prior to writing out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#181 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFyze7ueb2JqyM-6IrDzzVhC2tkdpgRqks5s7VT5gaJpZM4PRcCR>
.
--
---
David O'Connor, Ph.D.
Professor, UW-Madison
(608) 301-5710 • @dho
http://labs.pathology.wisc.edu/oconnor
|
Hi all, Just out of curiosity, what is the current status of this issue? I am finding ways to utilize this as well. Thanks! |
After some consideration, we've decided not to implement consensus sequence computation within UMI-tools for the time being. We're hoping to release |
Hey all, I'm interested in implementing consensus functionality in UMITools. Thoughts:
Let me know if I should open a new issue for this project, and if anyone else would like to get involved. Thanks! |
With regard to the second point, this is a long standing issue with the way UMITools deals with pairs. In the group command, all read 2s are just output, unaltered, as soon as they are encountered. Adding grouping information to read2 would require knowing the group of the corresponding read1 when dealing with the read2. This means either:
All this is complicated by multi-mappers - there may be more than one read2 for each read1, even though read1 only points to one of them (and vice versa). Currently I do have a potential solution:
This way we keep the read1s in genome order, but keep the read2s next to them. I've had this idea for a while, but never had chance to implement. |
This is currently customisable - you can either choose to keep or discard unpaired reads. In dedup and count, in paired mode, unpaired read2s will always be discarded, but read1s can be kept or discarded depending on configuration. |
Hi, |
Unfortunately I don't think either me nor Tom has the capacity to deal with this at the moment. It is on the list of improvements to make if we ever secure funding to maintain UMI-tools, although I have to say, we hadn't considered the question of indels, which would take the problem from being computationally trivial, to being highly demanding (probably the reason others don't do it), although it would be a motivating factor as to why such a development was necessary. |
Hey Nils ! Would you be so kind and share your script ? |
Closing due to inactivity |
Hi,
I would like to use UMI-tools to compute a consensus sequence from a 'read group'. Is this modification possible?
Thanks!
The text was updated successfully, but these errors were encountered: