-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ordLORgee got killed due to memory issue #1
Comments
It is hard for me to comment on the nature of the presumably memory related issue without the code/data that cause R session to crash. My advice is first to use the independence working model instead of a more complicated odds ratios structure in order to fit a GEE model. If the R session continues to crash, then you should either try to use a more powerful PC and/or HPC or use a subset of your data (if there is a reasonable way to do this). |
Hi Anestis
Thanks a lot for your reply. I felt that our problem is more about the choice of methods than the implementation (code etc). Actually we tried to use a supercomputer with huge memory etc but still it couldn’t fit the model with 2000 repeat measures with independence structure. If I may, please allow me to more formally describe the problem.
Here the “patients” are actually drugs. At FDA we are developing a method to predict a drug’s safety based on in vitro measurement (testing a drug’s effect on a dozen cultured cells) and an in silico cardiac model (systems biology type model to “translate” in vitro effects to in vivo effects).
For the in vitro measurement, even though typically we only use 10-20 cells per drug, it doesn’t mean we only have 10-20 “repeats”. These measurements will have to go through some Hill equation to estimate some in vitro parameters (IC50 and h), and we use MCMC to estimate the joint distribution of these two in vitro parameters. In the process we will generate 2000 IC50-h parameter pairs to represent this joint distribution.
Now we feed these 2000 parameter pairs (sets) into the cardiac model, and naturally we will get 2000 predicted in vivo measurement per drug.
And finally we want to use a ordinal logistic regression model to use the in vivo measurement to explain drug safety: each drug has a known safety label (low, intermediate, or high), and each drug has 2000 in vivo “measurement”. Apparently these 2000 are correlated (actually perfectly correlated because they all have the same safety class label for the same drug).
I was wondering, other than GEE, if there’s another way to do such regression (safety class ~ in vivo measurement) accounting for the internal correlation. Or there’s no need to account for such correlation since it is always perfect correlation within each drug?
Any suggestion is highly appreciated.
Thanks!
Zhihua
From: Anestis Touloumis [mailto:notifications@github.com]
Sent: Monday, October 09, 2017 7:47 AM
To: AnestisTouloumis/multgee
Cc: Li, Zhihua; Author
Subject: Re: [AnestisTouloumis/multgee] ordLORgee got killed due to memory issue (#1)
It is hard for me to comment on the nature of the presumably memory related issue without the code/data that cause R session to crash.
My advice is first to use the independence working model instead of a more complicated odds ratios structure in order to fit a GEE model. If the R session continues to crash, then you should either try to use a more powerful PC and/or HPC or use a subset of your data (if there is a reasonable way to do this).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AfEqW_nFWqESl0HNTp0NxDWCjrTlbcu-ks5sqge8gaJpZM4PxBtT>.
|
Hi Zhihua, As I mentioned in my previous response, if you think that there is a reasonable way to reduce the data then you might want to try multgee again. However, I trust that you are in a better position than me to judge whether any "data reduction" is feasible. I will now close this issue as this is rather a limitation of R rather than the package multgee. |
I have a dataset with 12 patients, each has one explanatory variable and one response variable, and each patient was measured 2000 times (I know it's huge!). When running ordLORgee the whole R session crashed, presumably because of a memory issue. I was wondering if there's any way to side step this memory issue when the repeat number is huge.
Thanks a lot!
The text was updated successfully, but these errors were encountered: