-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methylation per read #56
Comments
I suppose something equivalent to how Out of curiosity, what sort of project would benefit from something like this? |
The output from bismark was a good format for us - string the same length
as the sequence string with a letter where the Cs are indicating context
and methylation status. Different letter per context, upper or lower case
indicates status. But even a ratio of (CpG) methylation per read would be
useful.
Research is ongoing and still unpublished but hopefully I’ll be able to
share more about the application soon.
|
Would it be better for me to modify the mbias code instead to extract this information? |
@nchernia Actually, I just double checked and I use a pileup there as well. I can hack something together easily enough to do this, but do you really need it to be multithreaded? It's easy enough to write a single-threaded application to do this (one could even use python with pysam), but it'll take a lot more time to write it to handle threads. Could you privately send me the basic reason for wanting to do this (I've modified versions of this for in-the-pipeline commercial products before, so I'm used to not blabbing about other's projects)? There almost has to be a more efficient way than going over a gigantic text file with read names and either a bunch of |
FYI, the perRead stuff is now going into the |
The |
Thanks! I will give it a try very soon. |
I've been working with this function, thanks. One question - I should expect the reads to appear more than once in the output if they align in multiple places, right? |
Good question. You should see each instance, since it doesn't use |
I can confirm that you should see each instance. If you run |
Yes, I should have clarified that I do see multiple instances, just double checking that I understood how it worked. Yes, I need --ignoreFlags and --requireFlags though this isn't urgent. |
Both those options should now be supported (they both work properly when I test them locally). |
Hi - perRead seems to include MAPQ 0 reads. These are filtered out by default in "extract", right? |
|
I'm a little confused. In the usage for extract it says: So I thought reads with MAPQ < 10 were not included in extract. Is that correct? I had assumed perRead would have these same defaults. For me it's not strictly necessary but may be useful for others. |
You're correct regarding |
Thanks; whatever you think is best. Just one more clarification question, perRead looks just at CpG methylation, right? Not CHH or CHG (unless these are passed in)? |
Correct, it's just CpG at this point. I'll try to get MAPQ and base quality filtering added today. |
The |
Question - sometimes the output is readname chr position 0 0 Why does it output "no evidence"? Is it just seeing that the read is in the appropriate region but doesn't cover a CpG? |
My guess is that this happens when all of the bases are ignored due to quality filtering, but I'm not sure off-hand. |
Hello, Would it also be possible to add CHG and CHH options for perRead? |
Hi there! Thanks for making this! Just wanted to ask about column 5 of the output - number of informative bases. Is the number of informative bases the sum of the number of methylated Cs and unmethylated Cs in CpGs in a given read? |
@amatthopkins Yes, exactly. |
Thank you for this tool - it's been really nice to work with (runs very fast and the code is very readable).
We are working on a project for which we need to know the methylation per read. In Bismark, this is given via the XM tag, where there is essentially a string that represents (for every cytosine covered), the context and whether or not it is methylated.
I'm currently getting this info by print out statements in lines 424-426 in extract.c and then processing the resulting text files but obviously this is less than ideal (multiple lines printed per read name that then need to be combined, files are big, etc). Is there any way to make this an option in MethylDackel? Alternatively, do you have suggestions for a better way to gather this info on a per read basis?
Thanks!
The text was updated successfully, but these errors were encountered: