Methylation per read #56

nchernia · 2018-02-28T16:16:11Z

Thank you for this tool - it's been really nice to work with (runs very fast and the code is very readable).

We are working on a project for which we need to know the methylation per read. In Bismark, this is given via the XM tag, where there is essentially a string that represents (for every cytosine covered), the context and whether or not it is methylated.

I'm currently getting this info by print out statements in lines 424-426 in extract.c and then processing the resulting text files but obviously this is less than ideal (multiple lines printed per read name that then need to be combined, files are big, etc). Is there any way to make this an option in MethylDackel? Alternatively, do you have suggestions for a better way to gather this info on a per read basis?

Thanks!

dpryan79 · 2018-02-28T19:37:14Z

I suppose something equivalent to how MethylDackel mbias works internally would work here as well. There, instead of the status for each C being written to the output, it's stored in a vector according to whether it's C or T. What sort of output format are you looking for with this?

Out of curiosity, what sort of project would benefit from something like this?

nchernia · 2018-02-28T19:42:41Z

The output from bismark was a good format for us - string the same length as the sequence string with a letter where the Cs are indicating context and methylation status. Different letter per context, upper or lower case indicates status. But even a ratio of (CpG) methylation per read would be useful. Research is ongoing and still unpublished but hopefully I’ll be able to share more about the application soon.

nchernia · 2018-02-28T19:57:07Z

Would it be better for me to modify the mbias code instead to extract this information?

dpryan79 · 2018-02-28T20:48:40Z

@nchernia Actually, I just double checked and I use a pileup there as well. I can hack something together easily enough to do this, but do you really need it to be multithreaded? It's easy enough to write a single-threaded application to do this (one could even use python with pysam), but it'll take a lot more time to write it to handle threads. Could you privately send me the basic reason for wanting to do this (I've modified versions of this for in-the-pipeline commercial products before, so I'm used to not blabbing about other's projects)? There almost has to be a more efficient way than going over a gigantic text file with read names and either a bunch of ....h..H... or percentages.

dpryan79 · 2018-03-09T15:58:42Z

FYI, the perRead stuff is now going into the perRead branch. One can now MethylDackel perRead ... with that and it seems not to crash. But I haven't really tested it yet, so hold off on trying it unless you're a glutton for punishment. It's written to be multithreaded, though I don't think the actual options for that are there at the moment.

dpryan79 · 2018-03-20T13:50:03Z

The perRead branch seems to produce correct results now. Multithreading works properly and I added some other rudimentary options. It should be easy enough to add other options in. Let me know if you need anything else from that or need the output in a different format.

nchernia · 2018-03-20T20:24:02Z

Thanks! I will give it a try very soon.

nchernia · 2018-04-27T16:05:15Z

I've been working with this function, thanks. One question - I should expect the reads to appear more than once in the output if they align in multiple places, right?

dpryan79 · 2018-04-27T20:26:44Z

Good question. You should see each instance, since it doesn't use filter_func(). I'll try to make a short test of that, though.

dpryan79 · 2018-04-27T20:37:46Z

I can confirm that you should see each instance. If you run MethylDackel perRead tests/cg100.fa tests/cg_aln.bam in the source tree you'll see each instance of read1 in the output. I should add a method to filter reads, though. If you need --ignoreFlags or --requireFlags now then let me know.

nchernia · 2018-04-27T20:40:41Z

Yes, I should have clarified that I do see multiple instances, just double checking that I understood how it worked. Yes, I need --ignoreFlags and --requireFlags though this isn't urgent.

dpryan79 · 2018-04-27T20:56:56Z

Both those options should now be supported (they both work properly when I test them locally).

nchernia · 2018-05-03T18:49:35Z

Hi - perRead seems to include MAPQ 0 reads. These are filtered out by default in "extract", right?

dpryan79 · 2018-05-03T18:57:41Z

extract filters base score of 0 to avoid double calling overlapping ends of mates. The default -q will also skip MAPQ 0, but that can be changed. If I never added a MAPQ thresholding option to perRead I can do that, just let me know if you need it.

nchernia · 2018-05-03T19:01:42Z

I'm a little confused. In the usage for extract it says:
Options:
-q INT Minimum MAPQ threshold to include an alignment (default 10)

So I thought reads with MAPQ < 10 were not included in extract. Is that correct?

I had assumed perRead would have these same defaults. For me it's not strictly necessary but may be useful for others.

dpryan79 · 2018-05-03T19:04:35Z

You're correct regarding extract. I wrote perRead to be pretty bare-bones, so it includes most things by default. I can add the -q and -p options, though, with the defaults from extract.

nchernia · 2018-05-03T19:29:28Z

Thanks; whatever you think is best.

Just one more clarification question, perRead looks just at CpG methylation, right? Not CHH or CHG (unless these are passed in)?

dpryan79 · 2018-05-04T06:36:19Z

Correct, it's just CpG at this point. I'll try to get MAPQ and base quality filtering added today.

dpryan79 · 2018-05-04T06:59:36Z

The -q and -p options should now be present with the same default values as extract. I've not thoroughly tested these, so let me know if they're buggy.

nchernia · 2018-11-08T20:35:40Z

Question - sometimes the output is

readname chr position 0 0

Why does it output "no evidence"? Is it just seeing that the read is in the appropriate region but doesn't cover a CpG?

dpryan79 · 2018-11-09T14:32:56Z

My guess is that this happens when all of the bases are ignored due to quality filtering, but I'm not sure off-hand.

SamERoss · 2018-11-21T11:03:39Z

Hello,

Would it also be possible to add CHG and CHH options for perRead?

amatthopkins · 2019-04-03T05:04:46Z

Hi there!

Thanks for making this! Just wanted to ask about column 5 of the output - number of informative bases. Is the number of informative bases the sum of the number of methylated Cs and unmethylated Cs in CpGs in a given read?

dpryan79 · 2019-04-03T06:55:16Z

@amatthopkins Yes, exactly.

avilella mentioned this issue May 31, 2018

remove/exclude reads with densely un-converted non-CpG sites #61

Closed

dpryan79 closed this as completed Apr 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methylation per read #56

Methylation per read #56

nchernia commented Feb 28, 2018

dpryan79 commented Feb 28, 2018

nchernia commented Feb 28, 2018 via email •

edited

Loading

nchernia commented Feb 28, 2018

dpryan79 commented Feb 28, 2018

dpryan79 commented Mar 9, 2018

dpryan79 commented Mar 20, 2018

nchernia commented Mar 20, 2018

nchernia commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

nchernia commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

nchernia commented May 3, 2018

dpryan79 commented May 3, 2018 •

edited

Loading

nchernia commented May 3, 2018

dpryan79 commented May 3, 2018

nchernia commented May 3, 2018

dpryan79 commented May 4, 2018

dpryan79 commented May 4, 2018

nchernia commented Nov 8, 2018

dpryan79 commented Nov 9, 2018

SamERoss commented Nov 21, 2018 •

edited

Loading

amatthopkins commented Apr 3, 2019

dpryan79 commented Apr 3, 2019

Methylation per read #56

Methylation per read #56

Comments

nchernia commented Feb 28, 2018

dpryan79 commented Feb 28, 2018

nchernia commented Feb 28, 2018 via email • edited Loading

nchernia commented Feb 28, 2018

dpryan79 commented Feb 28, 2018

dpryan79 commented Mar 9, 2018

dpryan79 commented Mar 20, 2018

nchernia commented Mar 20, 2018

nchernia commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

nchernia commented Apr 27, 2018

dpryan79 commented Apr 27, 2018

nchernia commented May 3, 2018

dpryan79 commented May 3, 2018 • edited Loading

nchernia commented May 3, 2018

dpryan79 commented May 3, 2018

nchernia commented May 3, 2018

dpryan79 commented May 4, 2018

dpryan79 commented May 4, 2018

nchernia commented Nov 8, 2018

dpryan79 commented Nov 9, 2018

SamERoss commented Nov 21, 2018 • edited Loading

amatthopkins commented Apr 3, 2019

dpryan79 commented Apr 3, 2019

nchernia commented Feb 28, 2018 via email •

edited

Loading

dpryan79 commented May 3, 2018 •

edited

Loading

SamERoss commented Nov 21, 2018 •

edited

Loading