Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw exception in fisherz test #58

Merged
merged 4 commits into from Jul 18, 2022
Merged

Throw exception in fisherz test #58

merged 4 commits into from Jul 18, 2022

Conversation

zhi-yi-huang
Copy link
Collaborator

@zhi-yi-huang zhi-yi-huang commented Jul 14, 2022

Updated files:

  1. causallearn/utils/cit.py: Added exception in fisherz test
  2. tests/TestCIT.py: Added a unit tests for fisherz test

Test plan section

python -m unittest tests.TestCIT # should pass

image

inv = np.linalg.inv(sub_corr_matrix)
try:
inv = np.linalg.inv(sub_corr_matrix)
except np.linalg.LinAlgError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure all the np.linalg.LinAlgError will be singularity problem?

Also please make your description more informative: e.g. "data correlation matrix is singular. Cannot run fisherz test. Please check your data"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all the np.linalg.LinAlgError in the function numpy.linalg.inv is the error of the singular matrix. Please refer to https://github.com/numpy/numpy/blob/v1.23.0/numpy/linalg/linalg.py#L483-L553.

Thanks for your suggestion! I will update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you get this information?

I checked the link you gave, and it says:

Raises
------
LinAlgError
    If `a` is not square or inversion fails.

Does "inversion fails" guarantee "singular matrix"? I am not sure lol --- not very familiar it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the function np.linalg.inv gets error in here. The error is singular matrix. We can guarantee that the input matrix ’a‘ is a square matrix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, this is clear. Thanks for the clear instructions.

I am curious, without your change, what would be the output if we have data with singular correlation? I think it would directly raise LinAlgError() with "Singular Matrix" as error message?

Do you mind running an example and attach the screenshot to your test plan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan section

Test fisherz under code version 5b30aad. The test code is as follows.

import unittest

import numpy as np
from causallearn.utils.cit import CIT


class TestCIT(unittest.TestCase):
    def test_fisherz_singularity_problem(self):
        X1 = X2 = np.random.normal(size=1000)
        X = np.array([X1, X2]).T
        cit = CIT(data=X, method='fisherz')
        cit.fisherz(0, 1, tuple())

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all the np.linalg.LinAlgError in the function numpy.linalg.inv is the error of the singular matrix. Please refer to https://github.com/numpy/numpy/blob/v1.23.0/numpy/linalg/linalg.py#L483-L553.

Thanks for your suggestion! I will update it.

I'm not sure but will np.linalg.pinv solve it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing the inverse to pseudo-inverse and tested fisherz tests as the code below, and the image below appeared. r in the picture is the partial correlation coefficient statistic of the variables X1 and X2 in both cases. The partial correlation coefficient of the two variables in the first case is negative, which I think is wrong. I looked at the partial correlation coefficient as described on Wikipedia, and the inverse method does require the correlation coefficient matrix to be invertible.

import unittest

import numpy as np
from causallearn.utils.cit import CIT

class TestCIT(unittest.TestCase):
    def test_fisherz_singularity_problem(self):
        np.random.seed(767)
        
        X1 = X2 = np.random.normal(size=1000)
        X = np.array([X1, X2]).T
        cit = CIT(data=X, method='fisherz')
        cit.fisherz(0, 1, tuple())
        
        X1 = X1 + 0.01 * np.random.normal(size=1000)
        X2 = X2 + 0.01 * np.random.normal(size=1000)
        X = np.array([X1, X2]).T
        cit = CIT(data=X, method='fisherz')
        cit.fisherz(0, 1, tuple())

image

tests/TestCIT.py Outdated
assert np.isclose(gsq(data, X, Y, S, cardinalities), gsq_notoptimized(data, X, Y, S))
assert np.isclose(chisq(data, X, Y, S, cardinalities), chisq_notoptimized(data, X, Y, S))
print(f'{X};{Y}|{S} passed')
# def test_new_old_gsq_chisq_equivalent(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason to remove this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions chisq_notoptimized and gsq_notoptimized have been commented out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, why do you comment out this test?

In the main branch, the test exists: https://github.com/cmu-phil/causal-learn/blob/main/tests/TestCIT.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions chisq_notoptimized and gsq_notoptimized have been commented out in causallearn/utils/cit.py. I am not sure if I want to remove this test from the code. But I need to comment out the code in order to test my code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhi-yi-huang Cool, now things is clear. Sorry I didn't read your previous message clearly. For some reason I thought chisq_notoptimized() is in the this test lol

I think @MarkDana should have context. Do you mind taking a look? Did you remove chisq_notoptimized() and forget to remove the test?

Normally, in our code base, we shouldn't keep lots of commented out code --- this will make our code more crowded without providing additional value.

I would suggest if chisq_notoptimized() is removed, we can just remove all the code related to it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the chisq_notoptimized was removed by me in the recent commit of CIT class.
It was intended to ensure the code consistency of chisq test w/ or w/o optimization.
And I would suggest to remove the related test here too. Thanks @zhi-yi-huang and @tofuwen !

tests/TestCIT.py Outdated
Comment on lines 10 to 22
def test_fisherz_singularity_problem(self):
X1 = X2 = np.random.normal(size=1000)
X = np.array([X1, X2])

def test_new_old_gsq_chisq_equivalent(self):
def powerset(iterable):
return chain.from_iterable(combinations(list(iterable), r) for r in range(len(iterable) + 1))
cit = CIT(data=X, method='fisherz')

def _unique(column):
return np.unique(column, return_inverse=True)[1]
try:
cit.fisherz(0, 1, tuple())
except ValueError:
print('Catch Singularity Problem')
return

data_path = "data_discrete_10.txt"
data = np.loadtxt(data_path, skiprows=1)
data = np.apply_along_axis(_unique, 0, data).astype(np.int32)
cardinalities = np.max(data, axis=0) + 1
assert False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good way to write test in this case.

I did a simple google search using keyword: "python test expect exception", and this result looks reasonable: https://stackoverflow.com/questions/129507/how-do-you-test-that-a-python-function-throws-an-exception

Maybe you can try it. (Or maybe you can even find something better! :) )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! I will try it.

@tofuwen
Copy link
Contributor

tofuwen commented Jul 15, 2022

Oh, please attach a screenshot of your terminal after running the test to the test plan section, e.g. #59

@tofuwen
Copy link
Contributor

tofuwen commented Jul 16, 2022

@zhi-yi-huang Thanks for your great work!

Your comments are very clear, and cleared all my doubts, well-done! :)

Two follow-ups:

  1. do you mind running your new code using the user reported data, i.e. "[Unsolved] 2022/6/27 Two issues on PC/fisherz: math domain error and singular matrix" section in our tracking doc? https://docs.google.com/document/d/1ENPoi3nKDTK0ba1X8Gwmj-BMcXkMEkam1gGXfXSaFTM/edit?usp=sharing

  2. could you remove the commented-out tests (and import) to make the code looks better?

@zhi-yi-huang
Copy link
Collaborator Author

I can't seem to reproduce his case #29. The results I reproduced are as follows.

sub_corr_matrix
[[ 1. 0.05900512 -0.53968299 -0.77947494]
[ 0.05900512 1. -0.04393224 -0.0369982 ]
[-0.53968299 -0.04393224 1. -0.10670507]
[-0.77947494 -0.0369982 -0.10670507 1. ]]

inverse of sub_corr_matrix
[[ 1.25014681e+08, -1.01095444e+00, 7.87630530e+07, 1.05850228e+08],
[-1.01095444e+00, 1.00370144e+00, -5.88321213e-01, -8.13655359e-01],
[ 7.87630530e+07, -5.88321213e-01, 4.96231209e+07, 6.66888646e+07],
[ 1.05850228e+08, -8.13655359e-01, 6.66888646e+07, 8.96236413e+07]]

sub_corr_matrix determinant 7.8788205405274e-09


His result for the inverse is as follows.

inverse of sub_corr_matrix
[[-7.25887259e+14 -2.94842405e-01 -4.57331063e+14 -6.14610472e+14]
[-3.00253318e-01 1.00370143e+00 -1.40868234e-01 -2.12238508e-01]
[-4.57331063e+14 -1.37148843e-01 -2.88132487e+14 -3.87223301e+14]
[-6.14610472e+14 -2.07321594e-01 -3.87223301e+14 -5.20392152e+14]]

sub_corr_matrix determinant -1.356916280487041e-15

No math domain error occurs.

@tofuwen
Copy link
Contributor

tofuwen commented Jul 17, 2022

@zhi-yi-huang Cool, please remove the commented code and after that, I think this PR is ready to be pushed.

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work!

good to go! cc @kunwuz

@tofuwen
Copy link
Contributor

tofuwen commented Jul 18, 2022

one small nit: in title, "expection" => "exception"

@kunwuz kunwuz changed the title Throw expection in fisherz test Throw exception in fisherz test Jul 18, 2022
@kunwuz kunwuz merged commit ecbd635 into py-why:main Jul 18, 2022
@MarkDana MarkDana mentioned this pull request Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants