Pairwise test for a group of distributions #1

psyguy · 2020-03-05T15:56:05Z

Hi there,

Imagine you have N different distributions and want to do pairwise HHG test of dependence on them. If the N is not too big, you can do N(N-1)/2 pairwise tests—but it is be challenging if there are many distributions to be compared. I needed to do this for N=50 (making 1225 pairs; DOI:10.31234/osf.io/EAGZD) and had to put it on a cluster.

I haven't looked under the hood of the hhg.test() function yet. I can imagine one can at least avoid partitioning over and over if the internal objects were passed along the results. Then, another function could take over and do the pairwise comparisons.

Is that—or something similar—reasonable/feasible? How hard is implementing it?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

barakbri · 2020-03-09T06:05:52Z

Hi,
The partitioning procedure of the hhg.test over all n^2 partitions of the data (n = the sample size) is computed in O(n log n) time, as described in https://academic.oup.com/biomet/article/100/2/503/202568

(As far as I see ) There is no way of "transferring information" from one pairwise partitioning scheme to another since the partitioning scheme depends on the joint distribution of the variables being tested.

Are the variables univariate?
If they are - you could use the test supplied by HHG::hhg.univariate.ind.combined.test and HHG::Fast.independence.test. These tests aim at testing dependence between two univariate variables. The distribution of the test statistic under the null hypothesis is distribution-free, meaning it depends on the size of the data (n) alone. Our code allows you to build a look-up table object (using a single function call), and then test all hypotheses using the same lookup table.

You can find an example of how to construct a look-up table and test it in the documentation of the two functions I mentioned. The second function is a wrapper for the first one, with parameters optimized for large sample sizes.

You can read more about this test here:
http://www.jmlr.org/papers/volume17/14-441/14-441.pdf
and about the adaption of the algorithm to large sample sizes:
https://journal.r-project.org/archive/2018/RJ-2018-008/RJ-2018-008.pdf

Please let me know if I can assist you with the univariate tests or anything else.

psyguy · 2020-03-12T11:19:15Z

Hi @barakbri,

Thanks for your comment! I see your point. So, in principle, there is no way of optimizing it for pairwise comparisons of multivariate distributions, right?

The distributions I had to deal with were multivariate (7 dimensional). I will get into that lookup table thingy later, in case I find myself in a similar situation with univariate distributions.

Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pairwise test for a group of distributions #1

Pairwise test for a group of distributions #1

psyguy commented Mar 5, 2020

barakbri commented Mar 9, 2020

psyguy commented Mar 12, 2020

Pairwise test for a group of distributions #1

Pairwise test for a group of distributions #1

Comments

psyguy commented Mar 5, 2020

barakbri commented Mar 9, 2020

psyguy commented Mar 12, 2020