Correlation matrix is not positive definite #126

steafy · 2024-02-05T15:21:05Z

Hi Sacha,

first I want to thank you for your great work and your very good explanations series in YT.

I encounter a problem when trying to estimate a adjacency matrix from synthetic timeseries data that have more nodes than measurement points. So when I have say 100 nodes and ≤ 100 datapoints I get the " Correlation matrix is not positive definite" error.

Is this issued situated on the theoretical or implementation level, i.e. do I always need more
measurement points than nodes to get the A matrix or do I have to tweak the function in some way?

best,
steafy

###Edit: I just saw the someone asked this already. Is there any method to solve this with?

SachaEpskamp · 2024-02-07T02:38:12Z

Hi Steafy,

This fully depends on what estimator you want to use. For time-series, I would always recommend using a graphical VAR routine that includes temporal effects, but then things are trickyer. If we aim to estimate a Gaussian graphical model (partial correlation matrix), then most methods require a correlation matrix as input and invert it (coupled with regularization or model selection) to get the estimated network. Inverting a correlation matrix usually only works if the matrix is positive semi-definite, and a correlation matrix by definition is not positive semi-definite if the number of variables exceeds the number of observations.

There are some methods that can handle this situation though, especially the LASSO estimators. For example, EBICglasso in theory can handle a situation with less nodes than observations. However, I definitly do not recommend doing this, as in practice the results will be highly unstable. To this end, I included checks in both the bootnet default wrapper as well as the underlying estimation function in qgraph.

If you really want to do this, then you need to disable both checks using the argument nonPositiveDefinite for the bootnet check and the argument checkPD for the qgraph check. Here is a reproducible example:

df <- as.data.frame(matrix(rnorm(100*90),90,100))
library("bootnet")
net <- estimateNetwork(df, default = "EBICglasso", nonPositiveDefinite = "continue", checkPD = FALSE)
plot(net)

Again, I don't recommend this. For time-series, I also recommend modeling temporal effects to handle violations of independence of cases. In this paper we already see that network analyis for N = 100 timeseries is hard with 8 nodes, so 100 nodes would be ambitious indeed.

Best,
Sacha

steafy closed this as completed Feb 5, 2024

steafy reopened this Feb 5, 2024

SachaEpskamp closed this as completed Feb 7, 2024

SachaEpskamp mentioned this issue Apr 12, 2024

Correlation matrix is not positive definite #129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlation matrix is not positive definite #126

Correlation matrix is not positive definite #126

steafy commented Feb 5, 2024 •

edited

Loading

SachaEpskamp commented Feb 7, 2024

Correlation matrix is not positive definite #126

Correlation matrix is not positive definite #126

Comments

steafy commented Feb 5, 2024 • edited Loading

SachaEpskamp commented Feb 7, 2024

steafy commented Feb 5, 2024 •

edited

Loading