add multigroup permutations #198

yakir12 · 2020-04-27T19:57:36Z

Do not merge yet. This is for discussion and collaborations. But could be cool!

yakir12 · 2020-04-28T09:26:41Z

So this seems problematic, the P-value is heavily affected by the order of the groups in the dataset. Here I run the test on a dataset that has 3 groups (means are -1, 0, or 1), but I vary the order of the groups. The resulting P-value is very different depending on the order.

julia> using Combinatorics

julia> data = [rand(Normal(μ, 1), 20) for μ in (-1, 0, 1)];

julia> for x in permutations(data)
           P = pvalue(ApproximatePermutationTest(x, mean, 10^5))
           @show P
       end
P = 1.0e-5
P = 0.0
P = 0.76717
P = 0.76947
P = 0.0
P = 0.0

I suspect that in mapreduce(i -> user_function(view(data_vector, i)), -, grouping_indices), subtracting the (in this case) mean sequentially for all the groups will of course depend on the order of the groups. So that does make total sense.

My question is, how do we make the test independent of the order of the groups? Right now, it treats the data similarly to a lm, where the data is at least ordinal.

Perhaps this isn't a problem. In reality, if the 3 groups came about by a ordinal predictor AND the predictors were aligned to the means: 1 => -1, 2 => 0, and 3 => 1, then we should order the groups according to the predictors (1,2, and 3). Which of course does result, as expected, in a very low P-value:

julia> P = pvalue(ApproximatePermutationTest(data, mean, 10^5))
0.0

After some more reading, maybe in this specific implementation of a multi-group permutation test it is totally fine to require the predictor to be:

ordinal (so it can't be nominal, e.g. countries)
uniformly spaced (e.g. 1:5 and not 1, 2, 3, 50, 937)
and for the groups to be sorted according to the predictor's order (see example above)

kleinschmidt · 2020-04-28T16:04:32Z

I think the basic issue here is that you're using - as a reducer to combine the output of the test statistic function f over groups.

Maybe a (more radical) revised design would treat f not as a test statistic that is applied to each group, but a function that's applied to the vector of groups. So then the equivalent to the existing two-group permutation

ApproximatePermutationTest(x, y, f, n)

would be

ApproximatePermutationTest([x, y], xy -> mapreduce(f, -, xy), n)

Edit: the important thing here is that the reducer is part of the definition of the null hypothesis you're specifying, so people should have to specify it manually when there are more than two groups and the default H0 of "value is the same for all groups" doesn't map easily onto subtraction.

yakir12 · 2020-04-28T17:05:25Z

Sure! That's easy to amend.

But to be clear, other than that, there's nothing statistically wrong with this setup? I'd like to include an example for testing the difference in say, means, or variability, between multiple groups. Where the supplied reducer will for instance be xy -> mapreduce(std, -, xy) for the variability example. Will this checkout? If so, I plan to make the changes you mentioned, add docs + examples, and tests. Tonight...

yakir12 · 2020-04-28T20:42:51Z

OK, done, minus the tests... Let me know if this looks right.

yakir12 · 2020-04-29T09:40:24Z

Tests added and passed, but I didn't include tests for more than 2 groups because I'm not certain what standard tests I could compare to (other than self generated data, which seems like cheating).

add multigroup permutations

3970f9f

ararslan marked this pull request as draft April 27, 2020 20:14

expose f to the user

051b89f

add permutation tests

2681651

yakir12 marked this pull request as ready for review April 29, 2020 06:17

yakir12 added 2 commits April 29, 2020 14:04

improved the docs

221b568

spell

4b77ec4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multigroup permutations #198

add multigroup permutations #198

yakir12 commented Apr 27, 2020

yakir12 commented Apr 28, 2020 •

edited

kleinschmidt commented Apr 28, 2020 •

edited

yakir12 commented Apr 28, 2020

yakir12 commented Apr 28, 2020

yakir12 commented Apr 29, 2020

add multigroup permutations #198

Are you sure you want to change the base?

add multigroup permutations #198

Conversation

yakir12 commented Apr 27, 2020

yakir12 commented Apr 28, 2020 • edited

kleinschmidt commented Apr 28, 2020 • edited

yakir12 commented Apr 28, 2020

yakir12 commented Apr 28, 2020

yakir12 commented Apr 29, 2020

yakir12 commented Apr 28, 2020 •

edited

kleinschmidt commented Apr 28, 2020 •

edited