-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding correlation dimension #20
Comments
Hi Aleksejs, I have to admit that of all measures in nolds, the correlation dimension is the one where I am least confident in interpreting the results. This is also reflected by the fact that I currently have no good unit tests of "real" applications for this measure. This is one of the TODOs for version 1.0. It took me so long to answer, because I wanted to at least perform some small experiments to be able to provide you with some meaningful insight. However, I currently struggle to find the time for that, so I will just tell you my current understanding. I wrote a test script calculating the correlation dimension following the Grassberger-Procaccia-algorithm that is also implemented in nolds, but on the way I noticed some discrepancies between the implementation in nolds and the Scholarpedia article, which is written by Grassberger himself. I found two potential sources for errors in the nolds implementation:
So in conclusion I think that there is defenitely something off here, which I will need to investigate further when I find the time. In the meantime, please let me know, if you find out more about this issue or if you come across a nice test case with a known correlation dimension, which I could use for debugging. I also drop a quick notice to @DominiqueMakowski here, since he uses the correlation dimension from nolds in Neurokit2 and has proven to be excellent in spotting discrepancies between different implementations of fractal measures. 😉 |
Hi Christopher, Thanks for paying attention to this. For me, it is not urgent. I just read about the correlation dimension and was fascinated by it. I really like the idea of metrics that can indicate that multivariate data may be occupying a low-dimensional subdomain/manifold, especially if it is nonlinear and does not get picked up by PCA. Unsurprisingly, I also work in neuroscience at the moment: where else would you find high-dimensional time series which most certainly must be implementing some sophisticated nonlinear low-dimensional code (for reasons like error correction and speed), but at the same time there is not much prior knowledge on how this is actually done, especially in highly plastic parts of the brain such as cortex and hippocampus. So inputs from Dominique are highly appreciated :) As I understand, the tests for this metric should be able to converge to the theoretical result, namely, that the correlation dimension would converge to the real dimension of the subdomain/manifold for large number of datapoints:
|
Thank you very much for sharing your understanding and ideas for tests! That sounds indeed like a good approach. I will also go through the literature and see if there are some plots and/or results that I can try to replicate. I will report back here when I have found the time to do so. As a side note, since you also asked about the stability of these measures: From my limited experience I get the impression that one has to be very careful to come up with a objective and well justified way of determining the parameters for these algorithms. You can get vastly different answers with different parameter settings, which allows you to tweak parameters to always get a result that "feels" right, but that is of course not scientific or objective at all. This makes me quite sceptic about papers that use these algorithms but do not provide a rationale for their choice of parameter values. One possible counter-measure for this is to look at intermediate results, especially the plot in which the line-fitting takes place. In nolds this can be done by setting the parameter |
Yes, I experience exactly what you suggest - the curve was not a line. I will now say a few things that are speculations, but I suspect that they might be true. I think there is a fundamental problem in the underlying theory actually. I assume that the theory is correct in that c ~ \eps^\nu for all reasonable |
Hi. I installed your library (python3) and wrote a basic script to calculate the correlation dimension, plugging in some random parameters that came into my mind
I get an answer 0.01. What does that mean? I thought that the correlation dimension is supposed to estimate the true dimension in which the points are located. Since an uncorrelated gaussian blob should have full rank, I expected a result somewhere close to the embedding dimension, namely, 5 in this case. Am I doing something wrong? When I try higher embedding dimensions, like 10, I get answers 10^{-15}, is this expected? Also, do you have a version of this algorithm that works for arbitary dimensions?
Below I provide an algorithm I have written myself to calculate correlation dimension for arbitrary dimension inputs (no embedding dimension, just providing 2D arrays as inputs). It behaves more like what I expect, but still underestimates the true dimension by a factor ~2. How stable are such estimators in general?
Best,
Aleksejs
The text was updated successfully, but these errors were encountered: