Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise test for a group of distributions #1

Open
psyguy opened this issue Mar 5, 2020 · 2 comments
Open

Pairwise test for a group of distributions #1

psyguy opened this issue Mar 5, 2020 · 2 comments

Comments

@psyguy
Copy link

psyguy commented Mar 5, 2020

Hi there,

Imagine you have N different distributions and want to do pairwise HHG test of dependence on them. If the N is not too big, you can do N(N-1)/2 pairwise tests—but it is be challenging if there are many distributions to be compared. I needed to do this for N=50 (making 1225 pairs; DOI:10.31234/osf.io/EAGZD) and had to put it on a cluster.

I haven't looked under the hood of the hhg.test() function yet. I can imagine one can at least avoid partitioning over and over if the internal objects were passed along the results. Then, another function could take over and do the pairwise comparisons.

Is that—or something similar—reasonable/feasible? How hard is implementing it?

Thanks in advance!

@barakbri
Copy link
Owner

barakbri commented Mar 9, 2020

Hi,
The partitioning procedure of the hhg.test over all n^2 partitions of the data (n = the sample size) is computed in O(n log n) time, as described in https://academic.oup.com/biomet/article/100/2/503/202568

(As far as I see ) There is no way of "transferring information" from one pairwise partitioning scheme to another since the partitioning scheme depends on the joint distribution of the variables being tested.

Are the variables univariate?
If they are - you could use the test supplied by HHG::hhg.univariate.ind.combined.test and HHG::Fast.independence.test. These tests aim at testing dependence between two univariate variables. The distribution of the test statistic under the null hypothesis is distribution-free, meaning it depends on the size of the data (n) alone. Our code allows you to build a look-up table object (using a single function call), and then test all hypotheses using the same lookup table.

You can find an example of how to construct a look-up table and test it in the documentation of the two functions I mentioned. The second function is a wrapper for the first one, with parameters optimized for large sample sizes.

You can read more about this test here:
http://www.jmlr.org/papers/volume17/14-441/14-441.pdf
and about the adaption of the algorithm to large sample sizes:
https://journal.r-project.org/archive/2018/RJ-2018-008/RJ-2018-008.pdf

Please let me know if I can assist you with the univariate tests or anything else.

@psyguy
Copy link
Author

psyguy commented Mar 12, 2020

Hi @barakbri,

Thanks for your comment! I see your point. So, in principle, there is no way of optimizing it for pairwise comparisons of multivariate distributions, right?

The distributions I had to deal with were multivariate (7 dimensional). I will get into that lookup table thingy later, in case I find myself in a similar situation with univariate distributions.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants