Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlation matrix is not positive definite #126

Closed
steafy opened this issue Feb 5, 2024 · 1 comment
Closed

Correlation matrix is not positive definite #126

steafy opened this issue Feb 5, 2024 · 1 comment

Comments

@steafy
Copy link

steafy commented Feb 5, 2024

Hi Sacha,

first I want to thank you for your great work and your very good explanations series in YT.

I encounter a problem when trying to estimate a adjacency matrix from synthetic timeseries data that have more nodes than measurement points. So when I have say 100 nodes and ≤ 100 datapoints I get the " Correlation matrix is not positive definite" error.

Is this issued situated on the theoretical or implementation level, i.e. do I always need more
measurement points than nodes to get the A matrix or do I have to tweak the function in some way?

best,
steafy

###Edit: I just saw the someone asked this already. Is there any method to solve this with?

@steafy steafy closed this as completed Feb 5, 2024
@steafy steafy reopened this Feb 5, 2024
@SachaEpskamp
Copy link
Owner

Hi Steafy,

This fully depends on what estimator you want to use. For time-series, I would always recommend using a graphical VAR routine that includes temporal effects, but then things are trickyer. If we aim to estimate a Gaussian graphical model (partial correlation matrix), then most methods require a correlation matrix as input and invert it (coupled with regularization or model selection) to get the estimated network. Inverting a correlation matrix usually only works if the matrix is positive semi-definite, and a correlation matrix by definition is not positive semi-definite if the number of variables exceeds the number of observations.

There are some methods that can handle this situation though, especially the LASSO estimators. For example, EBICglasso in theory can handle a situation with less nodes than observations. However, I definitly do not recommend doing this, as in practice the results will be highly unstable. To this end, I included checks in both the bootnet default wrapper as well as the underlying estimation function in qgraph.

If you really want to do this, then you need to disable both checks using the argument nonPositiveDefinite for the bootnet check and the argument checkPD for the qgraph check. Here is a reproducible example:

df <- as.data.frame(matrix(rnorm(100*90),90,100))
library("bootnet")
net <- estimateNetwork(df, default = "EBICglasso", nonPositiveDefinite = "continue", checkPD = FALSE)
plot(net)

Again, I don't recommend this. For time-series, I also recommend modeling temporal effects to handle violations of independence of cases. In this paper we already see that network analyis for N = 100 timeseries is hard with 8 nodes, so 100 nodes would be ambitious indeed.

Best,
Sacha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants