Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computeMatrix output matrix file is all 0 #792

Closed
semacu opened this issue Jan 17, 2019 · 4 comments
Closed

computeMatrix output matrix file is all 0 #792

semacu opened this issue Jan 17, 2019 · 4 comments

Comments

@semacu
Copy link

semacu commented Jan 17, 2019

Hi deepTools team,

Many thanks for maintaining such a great suite of tools. Here is my latest issue.

deeptools 2.4.2-5-f439d22
Python 2.7.12

I am trying to profile methylation around CpG sites in the lambda genome (48kb). I generated the input bigwig file for computeMatrix using bedGraphToBigWig as follows:

head input.bedgraph -n20
#J02459.1	0	0	0.0
#J02459.1	1	1	0.0
#J02459.1	2	2	0.0
#J02459.1	3	3	0.0
#J02459.1	4	4	0.0
#J02459.1	5	5	0.0
#J02459.1	6	6	0.0
#J02459.1	7	7	0.0
#J02459.1	8	8	0.0
#J02459.1	9	9	0.0
#J02459.1	10	10	9.09
#J02459.1	11	11	0.0
#J02459.1	12	12	0.0
#J02459.1	13	13	0.0
#J02459.1	14	14	0.0
#J02459.1	15	15	0.0
#J02459.1	16	16	0.0
#J02459.1	17	17	0.0
#J02459.1	18	18	0.0
#J02459.1	19	19	0.0

ref=lambda.sizes
bedGraphToBigWig input.bedgraph $ref input.bw

where e.g. at position 10 there is a methylation level of 9.09%. I have visualised the resulting input.bw in IGV and it looks fine.

The CpG sites are in a bed file:

head lambda.allCpG.bed
#J02459.1	3	5
#J02459.1	6	8
#J02459.1	12	14
#J02459.1	14	16
#J02459.1	22	24
#J02459.1	42	44
#J02459.1	52	54
#J02459.1	58	60
#J02459.1	68	70
#J02459.1	118	120

But when I run computeMatrix scale-regions as follows:

computeMatrix scale-regions \
-S input.bw \
-R lambda.allCpG.bed \
-out data.mat.gz \
--outFileNameMatrix data.tab \
-m 2 \
--startLabel "C" \
--endLabel "G" \
-b 30 \
-a 30 \
-bs 1 \
--missingDataAsZero \
-p "max"

The resulting data.mat.gz is all full of 0s. Then, the plotProfile just looks like a flat line.

zcat data.mat.gz | head
#@{"verbose":true,"scale":1,"skip zeros":false,"nan after end":false,"sort using":"mean","unscaled 5 prime":0,"body":2,"sample_labels":["SLX-15861.DNAA011.HJ23HBGX7.s_1.r_1_bismark_bt2.deduplicated"],"downstream":30,"unscaled 3 prime":0,"group_labels":["genes"],"bin size":1,"upstream":30,"group_boundaries":[0,3113],"sample_boundaries":[0,62],"missing data as zero":true,"ref point":null,"min threshold":null,"sort regions":"keep","proc number":40,"bin avg type":"mean","max threshold":null}
#J02459.1	3	5	J02459.1:3-5	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	6	8	J02459.1:6-8	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	12	14	J02459.1:12-14	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	14	16	J02459.1:14-16	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	22	24	J02459.1:22-24	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	42	44	J02459.1:42-44	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	52	54	J02459.1:52-54	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	58	60	J02459.1:58-60	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
#J02459.1	68	70	J02459.1:68-70	.	.	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000

Perhaps there is something I am missing, but I don't understand why computeMatrix is just giving 0s. The bedgraph contains many positions with numbers ranging between 0-100% so I don't know what could be going on.

Any hints would be useful.

@dpryan79
Copy link
Collaborator

I strongly encourage you to not use --missingDataAsZero for methylation data. That converts all of the non-CpG and uncovered positions to look like they have 0 methylation, which will profoundly skew the results. What you expect to then see is a matrix of mostly NaN values. That then makes sense given the relative sparsity of CpGs.

@semacu
Copy link
Author

semacu commented Jan 21, 2019

Thanks @dpryan79 ,

I tried removing --missingDataAsZero and changing options -a and -b however I have the same issue: the resulting matrix is not sparse, it is only made up of NaNs. I do not quite understand why this is the case since my original bedgraph has e.g. a methylation value different from 0 at position 10 - see above.

computeMatrix scale-regions \
-S input.bw \
-R lambda.allCpG.bed \
-out data.mat.gz \
--outFileNameMatrix data.tab \
-m 2 \
--startLabel "C" \
--endLabel "G" \
-b 5 \
-a 5 \
-bs 1 \
-p "max"

zcat data.mat.gz | head
#@{"verbose":true,"scale":1,"skip zeros":false,"nan after end":false,"sort using":"mean","unscaled 5 prime":0,"body":2,"sample_labels":["SLX-15861.DNAA011.HJ23HBGX7.s_1.r_1_bismark_bt2.deduplicated"],"downstream":5,"unscaled 3 prime":0,"group_labels":["genes"],"bin size":1,"upstream":5,"group_boundaries":[0,3113],"sample_boundaries":[0,12],"missing data as zero":false,"ref point":null,"min threshold":null,"sort regions":"keep","proc number":40,"bin avg type":"mean","max threshold":null}
#J02459.1	3	5	J02459.1:3-5	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	6	8	J02459.1:6-8	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	12	14	J02459.1:12-14	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	14	16	J02459.1:14-16	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	22	24	J02459.1:22-24	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	42	44	J02459.1:42-44	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	52	54	J02459.1:52-54	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	58	60	J02459.1:58-60	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan
#J02459.1	68	70	J02459.1:68-70	.	.	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan

Am I missing something?

@dpryan79
Copy link
Collaborator

I imagine the problem is that your bedGraph needs to have all of its start or end coordinates shifted by 1. BedGraph files are 0-based, half open, so the first entry should have ended at 1. So the bigWig file likely contains no entries, which is what's causing this.

@semacu
Copy link
Author

semacu commented Feb 4, 2019

Thanks @dpryan79 , yes - that was the issue. It seems it works ok now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants