-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlation matrix is not positive definite #126
Comments
Hi Steafy, This fully depends on what estimator you want to use. For time-series, I would always recommend using a graphical VAR routine that includes temporal effects, but then things are trickyer. If we aim to estimate a Gaussian graphical model (partial correlation matrix), then most methods require a correlation matrix as input and invert it (coupled with regularization or model selection) to get the estimated network. Inverting a correlation matrix usually only works if the matrix is positive semi-definite, and a correlation matrix by definition is not positive semi-definite if the number of variables exceeds the number of observations. There are some methods that can handle this situation though, especially the LASSO estimators. For example, EBICglasso in theory can handle a situation with less nodes than observations. However, I definitly do not recommend doing this, as in practice the results will be highly unstable. To this end, I included checks in both the bootnet default wrapper as well as the underlying estimation function in qgraph. If you really want to do this, then you need to disable both checks using the argument
Again, I don't recommend this. For time-series, I also recommend modeling temporal effects to handle violations of independence of cases. In this paper we already see that network analyis for N = 100 timeseries is hard with 8 nodes, so 100 nodes would be ambitious indeed. Best, |
Hi Sacha,
first I want to thank you for your great work and your very good explanations series in YT.
I encounter a problem when trying to estimate a adjacency matrix from synthetic timeseries data that have more nodes than measurement points. So when I have say 100 nodes and ≤ 100 datapoints I get the " Correlation matrix is not positive definite" error.
Is this issued situated on the theoretical or implementation level, i.e. do I always need more
measurement points than nodes to get the A matrix or do I have to tweak the function in some way?
best,
steafy
###Edit: I just saw the someone asked this already. Is there any method to solve this with?
The text was updated successfully, but these errors were encountered: