-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: cannot allocate vector of size 18.6 Gb #444
Comments
f1 |
Hey @erigdon, unfortunately, the dataset is not attached, so I cannot replicate the error. Could you send me your file via email: f.schuberth@utwente.nl It sounds that this is rather an R 'problem/a problem of your machine than a problem of cSEM. The error messages means that one produced object is too large so that R cannot handle it. In general, R saves all objects in your RAM. Once the RAM is full, you receive such an error. Of course, there might be more clever ways of implementing GSCA, e.g., avoid that such big objects can occur, which overcome this issue. If you send me your dataset, I will have a closer look. Best regards, |
The problem is identified: It is casued by the qr.Q function to obtain the Q matrix of the QR decomposition used in GSCAm (line 671 in estimators_weights.R). To solve this issue, we need to find another way to perform the QR decomposition. |
Perhaps the bigalgebra package can be used. |
This back from Heungsun.
--Ed R.
From: ***@***.*** ***@***.***
Sent: Friday, August 13, 2021 3:23 PM
To: Ed Rigdon ***@***.***>
Subject: Re: FW: cSEM error: cannot allocate
Hi Ed,
I found that my coding of the QR decomposition was not efficient for such big data, taking too much memory. I rewrote the code, reformulating the decomposition, and checked that it ran well for your data. I attached the results.
I just sent the new MATLAB code to other team members, who will convert it in C and Python. I think we may release an updated version next week. I will keep you posted. Thanks for your patience and letting us know about the bug.
Have a nice weekend,
Heungsun
On Wed, Aug 11, 2021 at 10:02 AM Ed Rigdon ***@***.******@***.***>> wrote:
Heungsun—
I have heard from Florian Schuberth. Recall, I said that csem was having the same problem as GSCA Pro with the same data and model. Here is his explanation of the problem in csem. Maybe GSCA Pro has the same issue.
--Ed Rigdon
From: ***@***.******@***.***> ***@***.******@***.***>]
Sent: Wednesday, August 11, 2021 9:58 AM
To: Ed Rigdon ***@***.******@***.***>>
Subject: RE: cSEM error: cannot allocate
Hey Ed,
Thank you for the syntax, I found the problem.
If you set disattenuated to TRUE, GSCAm is applied. For the implementation of GSCAm we mainly followed the procedure described in the Supplementary Material of Hwang et al. (2017), see the file attached. Below Equation A5, a QR decomposition is applied to the Gamma matrix: Gamma = QR. In our implementation we use the qr.Q function to obtain the Q matrix (line 671 in estimators_weights.R. This function produces the error.
Interesting side facts: This function contains the argument “complete”. Here is the explanation of the argument:
logical expression of length 1. Indicates whether an arbitrary orthogonal completion of the \bold{Q} or \bold{X} matrices is to be made, or whether the \bold{R} matrix is to be completed by binding zero-value rows beneath the square upper triangle.
Currently, this argument is set to TRUE. As a consequence, one obtains a Q matrix of dimension N x N from which we select the P+1 to N columns as suggested in Hwang et al. (2017). In your case, this matrix would be of dimension 50,000 times 50,000. This matrix is simply too large to be stored in the RAM. You can try to create such a matrix manually, e.g., test=matrix(0,50000,50000). On my machine, I get a similar error for this case.
If complete is set to F, the qr.Q function works, however, I only obtain a P x N matrix as output which conflicts with Hwang et al.’s proposal.
I must admit that I have not fully understood the difference between complete = T and complete = F, however, it seems that for GSCAm we need the P+1:N columns of Q matrix. I also searched briefly for another function to perform QR decomposition, but without success. Therefore, I am a bit lost now :/
The reason why the problem does not occur in GSCA or PLS(c) is that there we do not apply a QR decomposition and thus we do not need the qr.Q function.
A general question: Why do work with such a big dataset? Is that because you want to have a dataset with moments close to the population moments? If yes, a potential workaround might be to use the mvrnorm function of the MASS package. This function contains an argument ‘empirical’. If this is set to TRUE, the generated dataset from the multivariate normal distribution has the same moments as those provided to the function. Perhaps this works for you.
Best regards,
Flo
From: Ed Rigdon ***@***.******@***.***>>
Sent: woensdag 11 augustus 2021 14:57
To: Schuberth, F. (ET) ***@***.******@***.***>>
Subject: RE: cSEM error: cannot allocate
Florian—
I have been working on something related, which I am a bit excited about. Also, my course is coming up this Fall.
The data set that produces the error, datacc, was generated using Cho and Choi’s (2020) method. The other data set, bigovdat5, was created starting with common and specific factors.
Here is R code for the model (lavaan syntax):
# model
model2<-'f1=~y1+y2+y3
f2=~y4+y5+y6
f3=~y7+y8+y9
f1~~f2
f2~~f3
f1~~f3'
# and here is the call to cSEM:
# This works fine
csemout.datacc<-csem(.data=datacc,.model=model2,.approach_weights = "GSCA",.disattenuate=F)
# but this produces an error message:
csemout.datacc<-csem(.data=datacc,.model=model2,.approach_weights = "GSCA")
I can run both data files using PLS approach weights, both with and without attenuation, and it runs fine. For PLS, though I must specify a structural model, so I use:
model2str<-'f1=~y1+y2+y3
f2=~y4+y5+y6
f3=~y7+y8+y9
f2~f1
f3~f2'
csem.out.plsa<-csem(.data=datacc,.model=model2str,.approach_weights = "PLS-PM",.disattenuate = F)
csem.out.plsa<-csem(.data=datacc,.model=model2str,.approach_weights = "PLS-PM")
…--Ed Rigdon
From: ***@***.******@***.***> ***@***.***
Sent: Wednesday, August 11, 2021 8:42 AM
To: Ed Rigdon ***@***.******@***.***>>
Subject: RE: cSEM error: cannot allocate
Hey Ed,
By chance, can you also send me your R code, then I don’t have to write it myself.
Best regards,
Flo
From: Ed Rigdon ***@***.******@***.***>>
Sent: woensdag 11 augustus 2021 14:38
To: Schuberth, F. (ET) ***@***.******@***.***>>
Subject: cSEM error: cannot allocate
Florian—
Interestingly:
This dataset works fine until I turn disattenuation = T, then I get the error.
This dataset also crashes Hwang et al’s GSCA Pro under the same conditions.
I generated a dataset of the same size under a factor-based population and that works fine—does not produce an error. I will attach that dataset in a second email.
--Ed Rigdon
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
|
Hey Ed,
Thank you for letting me know. I have contacted Heungsun about sharing his code with me (the last email where you have been in the CC). Once he replied, I will have a look whether we can use the same approach in cSEM.
Best regards and have a nice weekend,
Flo
From: Edward Rigdon ***@***.***>
Sent: vrijdag 13 augustus 2021 21:30
To: M-E-Rademaker/cSEM ***@***.***>
Cc: Schuberth, F. (ET) ***@***.***>; Comment ***@***.***>
Subject: Re: [M-E-Rademaker/cSEM] Error: cannot allocate vector of size 18.6 Gb (#444)
This back from Heungsun.
--Ed R.
From: ***@***.***<mailto:***@***.***> ***@***.***<mailto:***@***.***>
Sent: Friday, August 13, 2021 3:23 PM
To: Ed Rigdon ***@***.***<mailto:***@***.***>>
Subject: Re: FW: cSEM error: cannot allocate
Hi Ed,
I found that my coding of the QR decomposition was not efficient for such big data, taking too much memory. I rewrote the code, reformulating the decomposition, and checked that it ran well for your data. I attached the results.
I just sent the new MATLAB code to other team members, who will convert it in C and Python. I think we may release an updated version next week. I will keep you posted. Thanks for your patience and letting us know about the bug.
Have a nice weekend,
Heungsun
On Wed, Aug 11, 2021 at 10:02 AM Ed Rigdon ***@***.******@***.***<mailto:***@***.******@***.***>>> wrote:
Heungsun—
I have heard from Florian Schuberth. Recall, I said that csem was having the same problem as GSCA Pro with the same data and model. Here is his explanation of the problem in csem. Maybe GSCA Pro has the same issue.
--Ed Rigdon
From: ***@***.******@***.***<mailto:***@***.******@***.***>> ***@***.******@***.***<mailto:***@***.******@***.***>>]
Sent: Wednesday, August 11, 2021 9:58 AM
To: Ed Rigdon ***@***.******@***.***<mailto:***@***.******@***.***>>>
Subject: RE: cSEM error: cannot allocate
Hey Ed,
Thank you for the syntax, I found the problem.
If you set disattenuated to TRUE, GSCAm is applied. For the implementation of GSCAm we mainly followed the procedure described in the Supplementary Material of Hwang et al. (2017), see the file attached. Below Equation A5, a QR decomposition is applied to the Gamma matrix: Gamma = QR. In our implementation we use the qr.Q function to obtain the Q matrix (line 671 in estimators_weights.R. This function produces the error.
Interesting side facts: This function contains the argument “complete”. Here is the explanation of the argument:
logical expression of length 1. Indicates whether an arbitrary orthogonal completion of the \bold{Q} or \bold{X} matrices is to be made, or whether the \bold{R} matrix is to be completed by binding zero-value rows beneath the square upper triangle.
Currently, this argument is set to TRUE. As a consequence, one obtains a Q matrix of dimension N x N from which we select the P+1 to N columns as suggested in Hwang et al. (2017). In your case, this matrix would be of dimension 50,000 times 50,000. This matrix is simply too large to be stored in the RAM. You can try to create such a matrix manually, e.g., test=matrix(0,50000,50000). On my machine, I get a similar error for this case.
If complete is set to F, the qr.Q function works, however, I only obtain a P x N matrix as output which conflicts with Hwang et al.’s proposal.
I must admit that I have not fully understood the difference between complete = T and complete = F, however, it seems that for GSCAm we need the P+1:N columns of Q matrix. I also searched briefly for another function to perform QR decomposition, but without success. Therefore, I am a bit lost now :/
The reason why the problem does not occur in GSCA or PLS(c) is that there we do not apply a QR decomposition and thus we do not need the qr.Q function.
A general question: Why do work with such a big dataset? Is that because you want to have a dataset with moments close to the population moments? If yes, a potential workaround might be to use the mvrnorm function of the MASS package. This function contains an argument ‘empirical’. If this is set to TRUE, the generated dataset from the multivariate normal distribution has the same moments as those provided to the function. Perhaps this works for you.
Best regards,
Flo
From: Ed Rigdon ***@***.******@***.***<mailto:***@***.******@***.***>>>
Sent: woensdag 11 augustus 2021 14:57
To: Schuberth, F. (ET) ***@***.******@***.***<mailto:***@***.******@***.***>>>
Subject: RE: cSEM error: cannot allocate
Florian—
I have been working on something related, which I am a bit excited about. Also, my course is coming up this Fall.
The data set that produces the error, datacc, was generated using Cho and Choi’s (2020) method. The other data set, bigovdat5, was created starting with common and specific factors.
Here is R code for the model (lavaan syntax):
# model
model2<-'f1=~y1+y2+y3
f2=~y4+y5+y6
f3=~y7+y8+y9
f1~~f2
f2~~f3
f1~~f3'
# and here is the call to cSEM:
# This works fine
csemout.datacc<-csem(.data=datacc,.model=model2,.approach_weights = "GSCA",.disattenuate=F)
# but this produces an error message:
csemout.datacc<-csem(.data=datacc,.model=model2,.approach_weights = "GSCA")
I can run both data files using PLS approach weights, both with and without attenuation, and it runs fine. For PLS, though I must specify a structural model, so I use:
model2str<-'f1=~y1+y2+y3
f2=~y4+y5+y6
f3=~y7+y8+y9
f2~f1
f3~f2'
csem.out.plsa<-csem(.data=datacc,.model=model2str,.approach_weights = "PLS-PM",.disattenuate = F)
csem.out.plsa<-csem(.data=datacc,.model=model2str,.approach_weights = "PLS-PM")
…--Ed Rigdon
From: ***@***.******@***.***<mailto:***@***.******@***.***>> ***@***.***<mailto:***@***.***>
Sent: Wednesday, August 11, 2021 8:42 AM
To: Ed Rigdon ***@***.******@***.***<mailto:***@***.******@***.***>>>
Subject: RE: cSEM error: cannot allocate
Hey Ed,
By chance, can you also send me your R code, then I don’t have to write it myself.
Best regards,
Flo
From: Ed Rigdon ***@***.******@***.***<mailto:***@***.******@***.***>>>
Sent: woensdag 11 augustus 2021 14:38
To: Schuberth, F. (ET) ***@***.******@***.***<mailto:***@***.******@***.***>>>
Subject: cSEM error: cannot allocate
Florian—
Interestingly:
This dataset works fine until I turn disattenuation = T, then I get the error.
This dataset also crashes Hwang et al’s GSCA Pro under the same conditions.
I generated a dataset of the same size under a factor-based population and that works fine—does not produce an error. I will attach that dataset in a second email.
--Ed Rigdon
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
CAUTION: This email was sent from someone outside of the university. Do not click links or open attachments unless you recognize the sender and know the content is safe.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#444 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD5KRRBAUCYJVOMS6XEUMNLT4VXDBANCNFSM5BWG27BQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>.
|
@erigdon : I have now implemented the new GSCAm version using singular value decomposition. Now your example should work. Note that you have to donwload the new version from the master branch. Best regards, |
Estimating a model with n = 50,000 fabricated data
Using lavaan syntax:
model2<-'f1=~y1+y2+y3
f2=~y4+y5+y6
f3=~y7+y8+y9
f1
f2f3f2
f1~~f3'
I can run csem with "GSCA" approach weights and disattenuation off:
csemout.datacc<-csem(.data=datacc,.model=model2,.approach_weights = "GSCA",.disattenuate=F)
but if I try that with disattenuation on:
csemout.datacc.dis<-csem(.data=datacc.df,.model=model2,.approach_weights = "GSCA")
I get the error message:
Error: cannot allocate vector of size 18.6 Gb
The fabricated dataset is attached (I think).
The text was updated successfully, but these errors were encountered: