Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running calculateDiffMeth function #206

Open
danicic7 opened this issue Jun 10, 2020 · 18 comments
Open

Error when running calculateDiffMeth function #206

danicic7 opened this issue Jun 10, 2020 · 18 comments

Comments

@danicic7
Copy link

danicic7 commented Jun 10, 2020

Hi,
I'm experiencing the following issue when running the calculateDiffMeth function:
calculateDiffMeth(data_obj, mc.cores = 32 , overdispersion = "MN" , test = "Chisq")
and I get the error:
two groups detected: will calculate methylation difference as the difference of treatment (group: 1) - control (group: 0) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

I am using the methylKit version 1.10, but I get the same error when I run the same code on methylKit version 1.14. It also fails when running the code without specifying the test and/or overdispersion parameters ( calculateDiffMeth(data_obj, mc.cores = 32) ).
The data_obj includes ~39 million positions, so I tried subsetting the data object I am running this function with. Some subsets fail with the same error when tested with methylKit 1.10, even though they are processed successfully with methylKit 1.14.
However, the data object cannot be processed in its entirety with any methylKit version.

I saw some closed issues related to the same error, but this issue is expected to be fixed in the later versions.
Any help is appreciated.
I can share my code and data if it helps.

Thank you!
Aleksandar

@alexg9010
Copy link
Collaborator

Hi @danicic7,

It would be great if you could provide us a reproducible example, such that we can test the error ourselves. Best would be if you could provide us a subset of your data that results in the error.

Best,
Alex

@pooja19862
Copy link

Hello @alexg9010 ,

My name is Pooja Shah and I am colleague of @danicic7 .
I can send you the whole data which includes ~39 million positions.
@danicic7 tried to subset the data,but we don't see the error on smaller subset.
Since its a company data generated for internal use, is there any other way I can directly send the data to you instead of sharing it on github?

Your help is highly appreciated.

Thank you,
Pooja

@alexg9010
Copy link
Collaborator

alexg9010 commented Jun 16, 2020 via email

@pooja19862
Copy link

Thanks for your reply.
I just emailed you the dataset.
Let me know if you have any trouble accessing it.

@pooja19862
Copy link

Hello @alexg9010 ,

Any update on this open ticket?

Thanks,
Pooja

@alexg9010
Copy link
Collaborator

Hi @pooja19862 ,

I dowloaded the data and was able to reproduce your error, but I am still working on a way to figure out which rows are causing the issue, however with the size of your dataset this will still take some time.

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

Best,
Alex

@al2na
Copy link
Owner

al2na commented Jun 23, 2020

@pooja19862 please share the full code including the differential methylation call. This error occurs when all the groups or one of them are NA only. This wouldn't happen normally even if you use min.per.group=1L. you may have altered the treatment vector after a unite() operation.

@alexg9010 I would wait to see the code before I spend more time on it :)

@pooja19862
Copy link

pooja19862 commented Jun 23, 2020

Hello @al2na @alexg9010

I have shared the whole code with you.
No changes have been done to treatment vector after unite() operation.
As shared in the code after unite() operation I have directly done calculateDiffMeth().
I used with and without overdispersion and gets the same error.

my_diff = calculateDiffMeth(data_obj, mc.cores = 16 , overdispersion = "MN" , test = "Chisq")
my_diff = calculateDiffMeth(data_obj, mc.cores = 16)

@alexg9010
Copy link
Collaborator

@al2na he shared the code and files in private, I downloaded them already.

@roshmisarma
Copy link

Hello,
I'm encountering a similar problem.
Is there any update on this error?

@alexg9010
Copy link
Collaborator

Hi @roshmisarma ,

Unfortunately, I do not have an update on this issue yet.
To mitigate the problem until a fix is there, please see me previous message:

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

Best,
Alex

@r-mashoodh
Copy link

r-mashoodh commented Jan 29, 2021

One simple but general solution to your problem would be to not set the min.per.group=1L argument in the unite function, which would only consider bases that have been called by all of your samples. While this will lead to loss of some bases, the remaining ones are more reliable, also in terms of differential analysis.

I have the opposite problem. I'm looking at region counts of a subset of genes I'm interested in.

So I bring in cov files, filter by Coverage, normalise by Coverage and then unite (min.per.group=1L). Then calculate region counts. This works -- I can reorganise and do a contrast within a subset etc. eg. main effect is population 1 vs population 2, then can do pop1-treat vs pop1-control.

The idea is that even though all sites are not represented there's enough within a region to make an estimate. However, there is a risk of low confidence in some of the samples because one sample could have only 1 CpG represented etc. So when I try to increase min.per.group that's when I start to get errors.

two groups detected:
 will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I assume this is to do with too many NAs too?

@yaaminiv
Copy link

I did not set min.per.group when using unite, but I also encountered the same error when running calculateDiffMeth in with version 1.17.4 while also specifying a covariate:

two groups detected:
 will calculate methylation difference as the difference of
treatment (group: 1) - control (group: 0)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

My script, which includes instructions to download data, can be found here. Because I did not set a min.per.group when running unite, I'm not sure if the issue is related to NAs in my dataset. Is there anything I should be doing differently?

@al2na
Copy link
Owner

al2na commented Mar 12, 2021 via email

@yaaminiv
Copy link

I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.

However, I think the error is actually related to mc.cores. I tried differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2) and got the same lm.fit error.

I ran differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq") without the lm.fit error.

Screen Shot 2021-03-12 at 1 18 37 AM

I am now running the code below on all of my samples and have not encountered an lm.fit error.

differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")

@jadonWong
Copy link

I haven't figured out the smallest dataset that reproduces the error yet, and will send the data and code along when I do so.

However, I think the error is actually related to mc.cores. I tried differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq", mc.cores = 2) and got the same lm.fit error.

I ran differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, overdispersion = "MN", test = "Chisq") without the lm.fit error.

Screen Shot 2021-03-12 at 1 18 37 AM

I am now running the code below on all of my samples and have not encountered an lm.fit error.

differentialMethylationStatsTreatment <- methylKit::calculateDiffMeth(methylationInformationFilteredCov5, covariates = covariateMetadata, overdispersion = "MN", test = "Chisq")

it's helpful, without set the mc.cores , no lm.fit error occurred

@RJDan
Copy link

RJDan commented Nov 4, 2022

Hello,
I had the same issue which was resolved by removing the NAs from the dataset before running the diffmeth. I tested this by removing the NAs after running unite allowing for 0.75 overlap to retain loci. Posting just FYI.

subset.drop = getData(meth.united.db) %>% as.data.frame %>% dplyr::select(contains("num")) %>% apply(., 1, function(x){is.na(x) %>% any}) %>% which

@therealgenna
Copy link

therealgenna commented Aug 19, 2024

It does seem to be related to cores used, in a way. When running with 20 cores, I got

#    Error in `lm.fit()`:
#    ! 0 (non-NA) cases
#    Backtrace:
#     1. methylKit::calculateDiffMeth(obj, mc.cores = mc.cores, covariates = covariates)
#     2. methylKit::calculateDiffMeth(obj, mc.cores = mc.cores, covariates = covariates)
#     3. methylKit:::.calculateDiffMeth(...)
#     5. methylKit:::p.adjusted(tmp$q.value, method = adjust)
#     7. methylKit:::SLIMfunc(...)
#     8. stats::lm(gamma_mtx ~ lambda)
#     9. stats::lm.fit(...)

But it was working fine with 4 cores and with 8 cores.

However, I think the most likely underlying problem is insufficient memory, which does NOT lead to job cancellation, for whatever reason. Here're the timelines I got with 20 cores:
image
image
That's kind of a sneaky problem, as it's almost impossible to detect; it might be that you don't get an error but it was hitting insufficient memory and that changed the output ...

I've allocated more memory when running with 4 and 8 cores, so I had enough (allocated 350GB with 8 cores; max reached 339GB; I am running calculateDiffMeth six times on different comparisons):
image

It might be that using DB object versions in methylKit will require less RAM - I haven't tested that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants