Since |CR_24|=2704156 is too big to do by hand. We can either use **Randomization test** or do **full permutation test without random sampling of all possible assignmetns** We tried to from scratch rather than using packages in R like perm, jmuOutlier, or potential other Python packages as well.

Here we try to do it by python coding from scratch to implement the permutation test.

In [1]:
import numpy as np

import scipy
import scipy.stats
from scipy.stats import ks_2samp as ksstats

In [2]:
# The total number of assignments we can do = C(24,12)=

def numCombination(A,B):
    A=int(A)
    B=int(B)
    
    if B==0:
        return int(1)
    
    nominator=1
    for i in range(A,(A-B),-1):
        nominator=nominator*i

    denominator=1
    for i in range(B,0,-1):
        denominator=denominator*i
        
    return int(nominator/denominator)

numCombination(24,12)

2704156

As the exact complete enumeration/permutation assignment is harder to code, we use Monte Carlo to approximate the exact result. But we can set the number of MC to be really big ** (10 times the permutation number for mean and 3 times for median) ** to reduce the potential bias.

Codes referred to:
https://stackoverflow.com/questions/24795535/pythons-implementation-of-permutation-test-with-permutation-number-as-input

For KS statistics, the computation time is too long, so **randomization is used**, and we choose a relatively big B (number of records used) to be 100000. 

The structures of codes for both methods are the same when Monte Carlo is used, but the p-value equation is differnet in terms of smoothing tems "+1".


In [3]:
def exact_mc_perm_test(xs, ys, nmc,method):
    if method=="mean":
        n, k = len(xs), 0
        tao = (sum(xs) - sum(ys))*2/n
#             same with np.mean(xs)-np.mean(ys)
        zs = np.concatenate([xs, ys])
        for j in range(nmc):
            np.random.shuffle(zs)
            tao2=(sum(zs[:n]) - sum(zs[n:]))*2/n
            k +=  (  np.abs(tao) <= np.abs(tao2) )
        fisher_p=k / nmc
        return fisher_p
    
    if method=="median":
        n, k = len(xs), 0
        tao = np.median(xs) - np.median(ys)
        zs = np.concatenate([xs, ys])
        for j in range(nmc):
            np.random.shuffle(zs)
            tao2 = np.median(zs[:n]) - np.median(zs[n:])
            k +=  (  np.abs(tao) <= np.abs(tao2) )
        fisher_p=k / nmc
        return fisher_p

    if method=="ks-random":
#         here nmc is in fact B 
        n, k = len(xs), 0
        tao = ksstats(xs,ys)[0]
        zs = np.concatenate([xs, ys])
        for j in range(nmc):
            np.random.shuffle(zs)
            tao2 = ksstats(zs[:n],zs[n:])[0]
            k +=  (  np.abs(tao) <= np.abs(tao2) )
        k+=1
        nmc+=1
        fisher_p=k / nmc
        return fisher_p
    
    
    # we use scipy.stats.ks_2samp to get the value of KS 
# https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.ks_2samp.html


Data input

In [5]:
y_0 = np.array([21.51,28.14,24.04,23.45,23.68,19.79,28.4,20.98,22.51,20.1,26.91,26.25])
y_1 = np.array ([25.71,26.37,22.8,25.34,24.97,28.14,29.58,30.92,34.02,21.9,31.53,20.73])
n=len(y_0)+len(y_1)

### Result

1) The mean difference and p-value for the sharp null

In [5]:
(sum(y_1) - sum(y_0))*2/n

3.0208333333333335

In [6]:
exact_mc_perm_test(y_1,y_0,2704156*10,"mean")

0.052411954044071424

2) The median difference and p-value for it

In [7]:
np.median(y_1) - np.median(y_0)

2.4750000000000014

In [14]:
exact_mc_perm_test(y_1,y_0,2704156*3,"median")

0.24404965295394693

3) The median difference and p-value for KS 

In [6]:
ksstats(y_1 ,y_0)
# ksstats(y_1 ,y_1) -> # Ks_2sampResult(statistic=0.0, pvalue=1.0)

Ks_2sampResult(statistic=0.41666666666666663, pvalue=0.186196839004176)

In [7]:
exact_mc_perm_test(y_1,y_0,10000,"ks-random")

0.25137486251374863

In [8]:
exact_mc_perm_test(y_1,y_0,100000,"ks-random")

0.25429745702542972

Explanation:

Median and KS statistics gave us similar p-values which showed that the treatement effect was not significant. However, using mean difference would give us significant treatmenet effect. I would choose not to take the result of using mean difference as it might be not so robust to outliers as the other two methods.