New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw exception in fisherz test #58
Conversation
inv = np.linalg.inv(sub_corr_matrix) | ||
try: | ||
inv = np.linalg.inv(sub_corr_matrix) | ||
except np.linalg.LinAlgError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure all the np.linalg.LinAlgError will be singularity problem?
Also please make your description more informative: e.g. "data correlation matrix is singular. Cannot run fisherz test. Please check your data"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all the np.linalg.LinAlgError in the function numpy.linalg.inv is the error of the singular matrix. Please refer to https://github.com/numpy/numpy/blob/v1.23.0/numpy/linalg/linalg.py#L483-L553.
Thanks for your suggestion! I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did you get this information?
I checked the link you gave, and it says:
Raises
------
LinAlgError
If `a` is not square or inversion fails.
Does "inversion fails" guarantee "singular matrix"? I am not sure lol --- not very familiar it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the function np.linalg.inv gets error in here. The error is singular matrix. We can guarantee that the input matrix ’a‘ is a square matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, this is clear. Thanks for the clear instructions.
I am curious, without your change, what would be the output if we have data with singular correlation? I think it would directly raise LinAlgError() with "Singular Matrix" as error message?
Do you mind running an example and attach the screenshot to your test plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test plan section
Test fisherz under code version 5b30aad. The test code is as follows.
import unittest
import numpy as np
from causallearn.utils.cit import CIT
class TestCIT(unittest.TestCase):
def test_fisherz_singularity_problem(self):
X1 = X2 = np.random.normal(size=1000)
X = np.array([X1, X2]).T
cit = CIT(data=X, method='fisherz')
cit.fisherz(0, 1, tuple())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all the np.linalg.LinAlgError in the function numpy.linalg.inv is the error of the singular matrix. Please refer to https://github.com/numpy/numpy/blob/v1.23.0/numpy/linalg/linalg.py#L483-L553.
Thanks for your suggestion! I will update it.
I'm not sure but will np.linalg.pinv
solve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing the inverse to pseudo-inverse and tested fisherz tests as the code below, and the image below appeared. r in the picture is the partial correlation coefficient statistic of the variables X1 and X2 in both cases. The partial correlation coefficient of the two variables in the first case is negative, which I think is wrong. I looked at the partial correlation coefficient as described on Wikipedia, and the inverse method does require the correlation coefficient matrix to be invertible.
import unittest
import numpy as np
from causallearn.utils.cit import CIT
class TestCIT(unittest.TestCase):
def test_fisherz_singularity_problem(self):
np.random.seed(767)
X1 = X2 = np.random.normal(size=1000)
X = np.array([X1, X2]).T
cit = CIT(data=X, method='fisherz')
cit.fisherz(0, 1, tuple())
X1 = X1 + 0.01 * np.random.normal(size=1000)
X2 = X2 + 0.01 * np.random.normal(size=1000)
X = np.array([X1, X2]).T
cit = CIT(data=X, method='fisherz')
cit.fisherz(0, 1, tuple())
tests/TestCIT.py
Outdated
assert np.isclose(gsq(data, X, Y, S, cardinalities), gsq_notoptimized(data, X, Y, S)) | ||
assert np.isclose(chisq(data, X, Y, S, cardinalities), chisq_notoptimized(data, X, Y, S)) | ||
print(f'{X};{Y}|{S} passed') | ||
# def test_new_old_gsq_chisq_equivalent(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions chisq_notoptimized and gsq_notoptimized have been commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, why do you comment out this test?
In the main branch, the test exists: https://github.com/cmu-phil/causal-learn/blob/main/tests/TestCIT.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions chisq_notoptimized
and gsq_notoptimized
have been commented out in causallearn/utils/cit.py
. I am not sure if I want to remove this test from the code. But I need to comment out the code in order to test my code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhi-yi-huang Cool, now things is clear. Sorry I didn't read your previous message clearly. For some reason I thought chisq_notoptimized() is in the this test lol
I think @MarkDana should have context. Do you mind taking a look? Did you remove chisq_notoptimized() and forget to remove the test?
Normally, in our code base, we shouldn't keep lots of commented out code --- this will make our code more crowded without providing additional value.
I would suggest if chisq_notoptimized() is removed, we can just remove all the code related to it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the chisq_notoptimized
was removed by me in the recent commit of CIT class.
It was intended to ensure the code consistency of chisq test w/ or w/o optimization.
And I would suggest to remove the related test here too. Thanks @zhi-yi-huang and @tofuwen !
tests/TestCIT.py
Outdated
def test_fisherz_singularity_problem(self): | ||
X1 = X2 = np.random.normal(size=1000) | ||
X = np.array([X1, X2]) | ||
|
||
def test_new_old_gsq_chisq_equivalent(self): | ||
def powerset(iterable): | ||
return chain.from_iterable(combinations(list(iterable), r) for r in range(len(iterable) + 1)) | ||
cit = CIT(data=X, method='fisherz') | ||
|
||
def _unique(column): | ||
return np.unique(column, return_inverse=True)[1] | ||
try: | ||
cit.fisherz(0, 1, tuple()) | ||
except ValueError: | ||
print('Catch Singularity Problem') | ||
return | ||
|
||
data_path = "data_discrete_10.txt" | ||
data = np.loadtxt(data_path, skiprows=1) | ||
data = np.apply_along_axis(_unique, 0, data).astype(np.int32) | ||
cardinalities = np.max(data, axis=0) + 1 | ||
assert False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good way to write test in this case.
I did a simple google search using keyword: "python test expect exception", and this result looks reasonable: https://stackoverflow.com/questions/129507/how-do-you-test-that-a-python-function-throws-an-exception
Maybe you can try it. (Or maybe you can even find something better! :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! I will try it.
Oh, please attach a screenshot of your terminal after running the test to the test plan section, e.g. #59 |
@zhi-yi-huang Thanks for your great work! Your comments are very clear, and cleared all my doubts, well-done! :) Two follow-ups:
|
I can't seem to reproduce his case #29. The results I reproduced are as follows. sub_corr_matrix inverse of sub_corr_matrix sub_corr_matrix determinant 7.8788205405274e-09 His result for the inverse is as follows. inverse of sub_corr_matrix sub_corr_matrix determinant -1.356916280487041e-15 No math domain error occurs. |
@zhi-yi-huang Cool, please remove the commented code and after that, I think this PR is ready to be pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work!
good to go! cc @kunwuz
one small nit: in title, "expection" => "exception" |
Updated files:
causallearn/utils/cit.py
: Added exception in fisherz testtests/TestCIT.py
: Added a unit tests for fisherz testTest plan section