New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KCI-tests: Remove TestKCI and add assertions to TestCIT_KCI #51
Conversation
Thanks for the great work!! :) Some general comments that needs to be fixed:
For this PR, your test plan should be: the test you run and the test results (your tests should all pass). A test pass screenshot would be great here.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great work!!!
Thanks @aoqiz for your great work! Some of my comments:
|
Thanks @MarkDana for your comments. Regarding "making sure the code logic is unchanged after every updates", I am a little worried about the current code correctness. This only makes sense if we can ensure the current version will gives correct output and no randomness. I am thinking another way to make sure the test is doing as expected from stat point of view: under the null, p-value should follow uniform distribution. We can run the current KCI code for multiple times when null is true to see whether the p-value follows uniform distribution? |
Hi @aoqiz, Could you address @MarkDana's previous comments and push the PR to be accepted? Currently there are still some issues in this PR. Some important ones:
Let me know if you have more questions --- we can meet to discuss! :) I know usually it's very hard to get to know how to write tests and design tests in the beginning (and write industry level code). :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the great work! Overall the structure looks quite good, but I have some stat related questions.
Also, please update your test plan to include the screenshot after you run the command.
@aoqiz Do you mind sharing some updates here? What's the current status regarding comparing matlab version and python version? Maybe the first step would be make sure the python version can return the identical results with matlab version |
causallearn/utils/KCI/KCI.py
Outdated
elif self.est_width == 'empirical': | ||
kernelX.set_width_empirical_kci(data_x) | ||
kernelX.set_width_empirical_kci(data_z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I'm not sure: you mean kernelX
's and kernelY
's width should be set empirically both from data_z
, instead of data_x
and data_y
themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this doesn't matter for empirical width right? data_z
and data_x
, data_y
have the same sample size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, according to the Matlab version, the empirical kernel width of kernelX
and kernelY
should depend on the shape of data_z
, not data_x
or data_y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they have the same sample size but a different column size. Here, data_x = np.concatenate((data_x, 0.5 * data_z), axis=1)
, so data_x.shape[1]
is 2, while data_z.shape[1]
is 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Wow, thanks for being so careful!!
Then,
- Just for clarity, could you please add a comment and reference to the original code (with file and line number)? (so that readers won't feel like it's a typo).
- Is there any empirical performance gain / theoretical guarantee for this change? (so that we can ensure that it's not a typo in the matlab version).
- After this change, will KCI in python and in matlab return the same value over the same dataset (try to fix randomness)?
- And does this have anything to do with The performance gap bewteen
KCI_UInd
andKCI_CInd
under a similar setting #56?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aoqiz Could you address Haoyue's question? I am also curious....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm working on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aoqiz Could you answer @MarkDana's question? I am also curious haha
-
"Is there any empirical performance gain / theoretical guarantee for this change? (so that we can ensure that it's not a typo in the matlab version)." I am also worried about this, and I don't know KCI much. To just make sure, in the paper, here we should data_z shape, right? And it's theoretically sound?
-
based on your updated description, if I understand correctly, now we can ensure almost identical result between matlab and python now, right?
-
Could you take a look at the issue @MarkDana mentioned? I am also curious whether it's related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tofuwen @MarkDana I'm sorry, I think I cannot answer these questions now, and I'm still investigating. I hope I can give you an answer before this weekend. Thanks for your patience!
Is there any empirical performance gain / theoretical guarantee for this change? (so that we can ensure that it's not a typo in the Matlab version).
Actually, I didn't find anything about this in the paper, and I'm discussing it with Mingming now.
After this change, will KCI in python and Matlab return the same value over the same dataset (try to fix randomness)?
After fixing that bug, they can return almost the same p-value over the same dataset. For three datasets, the program was run 100 times on each dataset, the mean of the p-value returned by Python and Matlab are 0.2494 and 0.2463 respectively (unconditional KCI); 0.36024, 0.3585 (conditional KCI); 0.34238, 0.3381 (unconditional KCI); 0.4971, 0.4910 (conditional KCI); 0.62922, 0.6281 (unconditional KCI); 0.3795, 0.3793 (conditional KCI).
But I think the mean of the p-value should be the same for Python and Matlab if it's only due to the randomness. So what I'm doing now is checking whether there is any difference in the deterministic part.
Could you take a look at the issue @MarkDana mentioned? I am also curious whether it's related
Once I fix the difference in the deterministic part, I'll look into this issue.
Thanks for your patience again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work, I think we are almost done!
And thanks for finishing debugging this very tricky KCI bug! Fixing the bug makes causal-learn much better --- KCI is so fundamental that we must ensure our implementation is correct.
tests/TestCIT_KCI.py
Outdated
pvalue03_truth = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, | ||
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, | ||
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.138, 0.138, 0.143, 0.113, | ||
0.138, 0.138, 0.141, 0.15, 0.051, 0.051, 0.066, 0.057, 0.051, 0.051, 0.056, 0.048, 0.004, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the result is a little weird.
Is this expected?
tests/TestCIT_KCI.py
Outdated
0.142, 0.132, 0.13, 0.133, 0.133, 0.131, 0.12, 0.142, 0.142, 0.141, 0.135, 0.12, 0.12, | ||
0.114, 0.116, 0.116, 0.116, 0.109, 0.108, 0.12, 0.12, 0.105, 0.105, 0.116, 0.116, 0.105, | ||
0.101, 0.12, 0.12, 0.108, 0.108, 0.116, 0.116, 0.108, 0.105, 0.504, 0.504, 0.494, 0.518, | ||
0.005, 0.005, 0.005, 0.005, 0.477, 0.477, 0.473, 0.479, 0.076, 0.076, 0.076, 0.077, 0.549, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, 0.005 seems weird. so small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the above case are two cases that the Matlab itself will give weird results. The details are listed in Weird_test_cases.pdf.
I‘ve turned to Mingming and Kun for help and waiting for the answer. Hope I can give you the answer before this weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's super clear. I think we can leave it as it is for now, and after Kun and Mingming respond, we can change it if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice!! Thanks!!
Hi, @tofuwen @MarkDana, here is a summary of the comparison of the current Python version and the Matlab version on KCI test. Since the Matlab version cannot support too flexible arguments, we only test the basic setting: applying Gaussian kernel to all variables X, Y and Z and setting the kernel width using empirical rules. The results for Python are as follows: import numpy as np
import causallearn.utils.cit as cit
# np.random.seed(50)
# X = np.random.randn(30, 1)
# X_prime = np.random.randn(30, 1)
# Y = X + 0.5 * np.random.randn(30, 1)
# Z = Y + 0.5 * np.random.randn(30, 1)
# data = np.hstack((X, X_prime, Y, Z))
# np.savetxt("GaussianData.csv", data, delimiter=",")
data = np.loadtxt("GaussianData.csv", delimiter=",")
approx = True
for use_gp in [True, False]:
# The argument 'use_gp' only relates the conditional KCI, so only 'pXZY' will be affected.
cit_CIT = cit.CIT(data, 'kci', kernelX='Gaussian', kernelY='Gaussian', kernelZ='Gaussian',
est_width='empirical', approx=approx, use_gp=use_gp)
pXX = cit_CIT(0, 1)
pXZ = cit_CIT(0, 3)
pXZY = cit_CIT(0, 3, {2})
print('\napprox =', approx, ', use_gp =', use_gp, ':')
print('pXX:', round(pXX, 4))
print('pXZ:', format(pXZ, '.4e'))
print('pXZY', round(pXZY, 4)) Output:
approx = False
for use_gp in [True, False]:
np.random.seed(50)
pXX = []
pXZ = []
pXZY = []
for i in range(500):
cit_CIT = cit.CIT(data, 'kci', kernelX='Gaussian', kernelY='Gaussian', kernelZ='Gaussian',
est_width='empirical', approx=approx, use_gp=use_gp)
p01 = cit_CIT(0, 1)
pXX.append(p01)
p03 = cit_CIT(0, 3)
pXZ.append(p03)
p032 = cit_CIT(0, 3, {2})
pXZY.append(p032)
print('\napprox =', approx, ', use_gp =', use_gp, ':')
print('pXX: mean = {}, var = {}'.format(round(np.mean(pXX), 4), format(np.var(pXX), '.4e')))
print('pXZ: mean = {}, var = {}'.format(round(np.mean(pXZ), 4), format(np.var(pXZ), '.4e')))
print('pXZY: mean = {}, var = {}'.format(round(np.mean(pXZY), 4), format(np.var(pXZY), '.4e'))) Output:
Using the same dataset GaussianData.csv generated by Python, the Matlab version results are provided in Matlab_results.pdf. We can see that they give the same p-value in deterministic cases and almost the same p-value in non-deterministic cases. This conclusion also applies to other datasets I tried on. |
Thanks, this is awesome @aoqiz! I think the only thing left is we need to debug the matlab KCI version when it outputs weird values right? What's the progress there? But I think matlab KCI issue is not very relevant to our code here though. I think we can merge this code first. Do you mind adding a todo in your code to remind us debug matlab KCI version, and update the "issue tracking doc" accordingly? :) Thanks so much for spending a lot of time to debug this very tricky bug!! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nice!
I think we are really close to merge this PR.
But since our codebase changed quite a lot, I noticed that you may need to rebase your code.
You can ask @MarkDana if you need help, I believe he should be familiar with it (?). I know for people who don't have much coding experience, rebasing is usually painful. So don't hesitate to ask for help!
Oh, btw, after your fix, how about the KCI issue we mentioned previously? |
Thank you, @tofuwen. I haven't submitted the current test file yet in case of new changes. But I'll do this now. Once finished, I'll remind you. Thanks!
|
What we left now is
|
@tofuwen After checking with the expert, the weird results in the Matlab version are expected. More care should be taken as to what kernel width is set manually. So there's no more issue remaining in this PR, I think. It's ready to be merged. @MarkDana I'm not familiar with resolving conflicts and rebasing the code, so could you please help me with this? Many Thanks! I will look into the issue #56 after #62 is merged, which is ready to merge, right? since I need the code for multi-dimensional variables in PR #62. |
@aoqiz great to know! Then in the test in your code, could you also set kernel width correctly so that we don't have weird p-value results in your test? Also, how should we set KCI width? Is there any instructions? I guess the users may not be able to set Kernel width correctly as well. |
@tofuwen The weird p-value only happens when users set kernel width manually in two cases listed in Weird_test_cases.pdf. But, usually, they may set the parameter I don' think we need to set kernel width 'correctly' when we're testing setting kernel width manually since it's just an exhaustive test on any possible argument values we could have. The weird p-values are expected in these two cases, while in other cases, these manually setting kernel width works quite well. |
@aoqiz I see. I still prefer to set a reasonable kernel_width and we shouldn't allow weird p-values exist in our test. If users check our code and saw these extremely weird p-value, they will lose trust. And the test should cover real use case --- if our test return super weird p-value, this means it should never be the real use case. |
@tofuwen Ok, let me think about it. |
@tofuwen Fixed. There's no more weird p-values now. Please have a double check, thanks! |
Thanks, looks super nice now. :) We can merge it after resolving conflicts. |
Aoqi kci rebase
Updates:
Fix bugs (or typos) in
causallearn/utils/KCI/KCI.py
to conform to the original Matlab implementation in http://people.tuebingen.mpg.de/kzhang/KCI-test.zip:KCI_CInd.kernel_matrix
:KernelX
andKernelY
's empirical width depends ondata_z
's shape. The same applies toKernelY
(Line 349, 373).KCI_CInd.kernel_matrix
: centering kernel matrixkzx
(Line 404, 457, 462) before feeding it intoKCI_V_statistic
. After double checking, the optimization @MarkDana did previously (https://github.com/cmu-phil/causal-learn/pull/49) still applies.KCI_CInd.kernel_matrix
: change the data normalization way by adjusting the degrees of freedom correction in calculating the standard deviation.Add test cases where the datasets are generated by nonGaussian distributions, e.g., exponential distribution, uniform distribution and the mixture of these distributions.
Remove
print
commands and add assertions on the exact p-value. The ground truth p-value is computed from the current version26d8e36
.Resolving the concerns:
When the kernel width is set manually, there are two cases that may return a weird p-value even in the Matlab version. The details are listed in Weird_test_cases.pdf. After checking with the expert, these results are expected. More care should be taken as to what kernel width is set manually.
Comparison of the current Python version with the Matlab version on KCI-test results:
Since the Matlab version cannot support too flexible arguments, we only test the basic setting: applying Gaussian kernel to all variables X, Y and Z and setting the kernel width using empirical rules. The results for Python are as follows:
approx=True
), the resulting p-value is deterministic.Output:
approx=False
), the resulting p-value is not deterministic due to the simulation of null distribution in the functionnull_sample_spectral(xxx)
in the classKCI_UInd
andKCI_CInd
. (I've checked that other deterministic parts can have the same output with the Matlab version on the same input.) Therefore, we run the program 500 times and calculate the mean of p-values.Output:
Using the same dataset GaussianData.csv generated by Python, the Matlab version results are provided in Matlab_results.pdf.
We can see that they give the same p-value in deterministic cases and almost the same p-value in non-deterministic cases. This conclusion also applies to other datasets I tried on.
Test plan:
TODO: