# My Wilcoxon Tests

This notebook will cover the basic Wilcoxon data tests.

Author: Alvaro Paricio. sept.2016

## Scenarios


## References
* MATLAB:
    * Signtest: http://es.mathworks.com/help/stats/signtest.html?searchHighlight=signtest
* Numpy:
    * http://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html
* For Wilcoxon tests:
    * http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.wilcoxon.html
    * https://gist.github.com/mblondel/1761714

http://osdir.com/ml/python-scientific-user/2009-07/msg00014.html

I did a quick comparison between Matlab/stats (R14SP3), R (2.8.1), and Python/SciPy (0.7). Maybe this is somehow useful for others too.

(I’m intentionally violating the continuous distribution assumptions.)

Samples:

A1 <-> B: not paired with ties

A2 <-> B: not paired without ties

A1 <-> C: paired with zeros

A2 <-> C: paired without zeros

- Matlab

      A1 = 0:19

      A2 = A1 + (1:20)./100

      B = 0:39

      C = [0:14,16:20]

 

- R

      A1 <- 0:19

      A2 <- A1 + 1:20/100

      B <- 0:39

      C <- c(0:14,16:20)

 

- SciPy

A1 = numpy.arange(20)

A2 = A1 + numpy.arange(1,21)/100.0

B = numpy.arange(40)

C = numpy.array(range(15) + range(16,21))

 

 

2 Samples, Not Paired

=====================

 

(from scipy.stats import stats)

 

Kruskal-Wallis Test

-------------------

 

Same p-values for all.

 

Samples contain ties:

 

- Matlab: kruskalwallis([A1,B],[A1*0,B*0+1]) = 0.00170615101265

- R: kruskal.test(list(A1,B)) = 0.00170615101265

- R: wilcox.test(A1,B, correct=FALSE) = 0.00170615101265 (+warning: ties)

- SciPy: stats.kruskal(A1,B) = 0.00170615101265

 

(R: kruskal = wilcox without correction for continuity)

 

Samples without ties:

 

- Matlab: kruskalwallis([A2,B], [A2*0,B*0+1]) = 0.00288777919292

- R: kruskal.test(list(A2,B)) = 0.00288777919292

- SciPy: stats.kruskal(A2,B) = 0.00288777919292

     

 

Wilcoxon Rank Sum (aka Mann Whitney U) Test

-------------------------------------------

 

Matlab and R identical (but different defaults wrt exact/approximate),

SciPy computes approximate results and does not correct for continuity (changed in version 7.1 for stats.mannwhitneyu?).

 

Samples contain ties:

 

- Matlab: ranksum(A1,B) = 0.00175235702866

- R: wilcox.test(A1,B) = 0.00175235702866 (+warning: ties)

 

- R: wilcox.test(A1,B,correct=FALSE) = 0.001706151012654 (+warning: ties)

 

- SciPy: stats.mannwhitneyu(A1,B)[1]*2 = 0.0017086895586986284

 

- SciPy: stats.ranksums(A1,B) = 0.0017112312247389294

 

Samples without ties:

 

- Matlab: ranksum(A2,B) = 0.00296255173431

- R: wilcox.test(A2,B, exact=FALSE) = 0.00296255173431

 

- Matlab: ranksum(A2,B,'method','exact') = 0.00246078580826

- R: wilcox.test(A2,B) = 0.00246078580826

 

- R: wilcox.test(A2,B, exact=FALSE, correct=FALSE) = 0.00288777919292

- SciPy: stats.mannwhitneyu(A2,B)[1]*2 = 0.00288777919292

- SciPy: stats.ranksums(A2,B) = 0.00288777919292

 

(SciPy: mannwhitneyu = ranksums = kruskal if no ties)

 

 

2 Samples, Paired, Wilcoxon Sign Rank Test

==========================================
(from scipy.stats import wilcoxon)

Matlab and SciPy do not correct for continuity and R does.

Matlab and R have different defaults for exact/approximate.

Matlab computes exact results also if ties/zeros exist.

With zeros:
- Matlab: signrank(A1,C,'method','approximate') = 0.02534731867747

- R: wilcox.test(A1 - C, correct=FALSE) = 0.02534731867747 (+warnings: ties + zeros)

- Matlab: signrank(A1,C) = 0.06250000000000

- R: wilcox.test(A1 - C) = 0.0368884257070 (+warnings: ties + zeros)

- SciPy: wilcoxon(A1,C) = nan (+error: sample size too small)

Without zeros:

- Matlab: signrank(A2,C,'method','exact') = 0.59581947326660

- R: wilcox.test(A2 - C) = 0.59581947326660


- Matlab: signrank(A2,C) = 0.57548622813650    

- R: wilcox.test(A2 - C, exact=FALSE, correct=FALSE) = 0.57548622813650

- SciPy: wilcoxon(A2,C) = 0.57548622813650


- R: wilcox.test(A2 - C, exact=FALSE) = 0.5882844808893

In [7]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.lines as mlines
from scipy.stats import ttest_1samp, wilcoxon, ttest_ind, mannwhitneyu

import sklearn as sk
import pandas as pd

In [45]:
# np.random.randint(0, 100, (3, 6, 5))
a = np.random.randint( 0, 100, 100)
b = np.random.randint( 0, 50, 100)
print(a.__repr__, "\n", a)
print(b.__repr__, "\n", b)

<method-wrapper '__repr__' of numpy.ndarray object at 0x118680da0> 
 [59  0  1 74 90 95 74 18 62 95 31 44 32 77 10 30 42 95 88  3 66 23 13 30 64
  1 15  6 93 56 73 82 83 28 39 34 80 45 11 54 22 19 20  8 87 11 97 53 17 86
 30 71 69 17 22 86 25 11 35 10 88 91 68 58 31 69 35 88 88 22 99 94 63 65 77
 54 21 52 78 33 26 12  7 35 24 40  4 89 75 90 86 66 91 91 96 81 63 33  3 87]
<method-wrapper '__repr__' of numpy.ndarray object at 0x118680d00> 
 [29 30 38 13 26 46 30 49 48  5 47  8 28 22 12 27 25 23 48 27 34 18 32 31  2
 26 30 30 26 41 43 35  7  4 18 40 25 12 43 20 33 16 29 29 40 18 31 15 38  7
 27 38 14 43 13  7 32  1 30 19 26 45  2  3 29 33 27 37 15  4 37 22  7 12 22
 38 11  5  5 24 38 13  6 47 11 18 39 15 40 45 49 16  3 15 20 40 29 22 36  8]


In [68]:
def check_significance( s, p ):
    return (p > s)

def signtest( s, a, b ):
    z_stats, p_value = wilcoxon(a,b)
    h = check_significance(s, p)
    return h, p_value, z_stats

# verbose signtest
def signtest_v( hipotesys, label_a, label_b, num_a, num_b ):
    significance = 0.05
    h,p,z = signtest( significance, num_a, num_b )
    txt = hipotesys+" --> "+label_a+" and "+label_b+" are "
    if( h ):
        txt += "EQUAL"
    else:
        txt += "DISTINCT"
    txt += "\n     significance = "+str(significance)+"\n     p = "+str(p)+"\n     z = "+str(z)
    print( txt )
    return h,p,z,txt

In [53]:
h, p, z, txt = signtest_v( "wilcoxon Signed Test", "Series A", "Series B", a, b)


wilcoxon Signed Test --> Series A and Series B are EQUAL
     p = 2.45905585301e-09
     z = 790.5


In [56]:
# daily intake of energy in kJ for 11 women
daily_intake = np.array([5260,5470,5640,6180,6390,6515,
                         6805,7515,7515,8230,8770])
#print(daily_intake.__repr__, "\n", daily_intake)

# one sample t-test
# null hypothesis: expected value = 7725
t_statistic, p_value = ttest_1samp(daily_intake, 7725)

# p_value < 0.05 => alternative hypothesis:
# data deviate significantly from the hypothesis that the mean
# is 7725 at the 5% level of significance
print( "one-sample t-test (ttest_1samp)     ", p_value, check_significance(p_value))

# one sample wilcoxon-test
z_statistic, p_value = wilcoxon(daily_intake - 7725)
print( "one-sample wilcoxon-test (wilcoxon) ", p_value, check_significance(p_value))

energ = np.array([
# energy expenditure in mJ and stature (0=obese, 1=lean)
[9.21, 0],
[7.53, 1],
[7.48, 1],
[8.08, 1],
[8.09, 1],
[10.15, 1],
[8.40, 1],
[10.88, 1],
[6.13, 1],
[7.90, 1],
[11.51, 0],
[12.79, 0],
[7.05, 1],
[11.85, 0],
[9.97, 0],
[7.48, 1],
[8.79, 0],
[9.69, 0],
[9.68, 0],
[7.58, 1],
[9.19, 0],
[8.11, 1]])

# similar to expend ~ stature in R
group1 = energ[:, 1] == 0
group1 = energ[group1][:, 0]
group2 = energ[:, 1] == 1
group2 = energ[group2][:, 0]

# two-sample t-test
# null hypothesis: the two groups have the same mean
# this test assumes the two groups have the same variance...
# (can be checked with tests for equal variance)
# independent groups: e.g., how boys and girls fare at an exam
# dependent groups: e.g., how the same class fare at 2 different exams
t_statistic, p_value = ttest_ind(group1, group2)

# p_value < 0.05 => alternative hypothesis:
# they don't have the same mean at the 5% significance level
print( "two-sample t-test (ttest_ind)", p_value, check_significance(p_value))

# two-sample wilcoxon test
# a.k.a Mann Whitney U
u, p_value = mannwhitneyu(group1, group2)
print( "two-sample wilcoxon-test (mannwhitneyu)", p_value, check_significance(p_value))

# pre and post-menstrual energy intake
intake = np.array([
[5260, 3910],
[5470, 4220],
[5640, 3885],
[6180, 5160],
[6390, 5645],
[6515, 4680],
[6805, 5265],
[7515, 5975],
[7515, 6790],
[8230, 6900],
[8770, 7335],
])

pre = intake[:, 0]
post = intake[:, 1]

# paired t-test: doing two measurments on the same experimental unit
# e.g., before and after a treatment
t_statistic, p_value = ttest_1samp(post - pre, 0)

# p < 0.05 => alternative hypothesis:
# the difference in mean is not equal to 0
print("paired t-test (ttest_1samp)", p_value, check_significance(p_value))

# alternative to paired t-test when data has an ordinary scale or when not
# normally distributed
z_statistic, p_value = wilcoxon(post - pre)

print("paired wilcoxon-test (wilcoxon)", p_value, check_significance(p_value))

one-sample t-test (ttest_1samp)      0.0181372351761 False
one-sample wilcoxon-test (wilcoxon)  0.0261571823293 False
two-sample t-test (ttest_ind) 0.00079899821117 False
two-sample wilcoxon-test (mannwhitneyu) 0.00106080669294 False
paired t-test (ttest_1samp) 3.05902094293e-07 False
paired wilcoxon-test (wilcoxon) 0.00333001391175 False


Ejemplo 16.2 del libro "Statistics for Engineers and Scientists". Pagina 659

In [59]:
cars = (
    [4.2, 4.7, 6.6, 7.0, 6.7, 4.5, 5.7, 6.0, 7.4, 4.9, 6.1, 5.2, 5.7, 6.9, 6.8, 4.9],
    [4.1, 4.9, 6.2, 6.9, 6.8, 4.4, 5.7, 5.8, 6.9, 4.9, 6.0, 4.9, 5.3, 6.5, 7.1, 4.8]
)
tires_radial = [4.2, 4.7, 6.6, 7.0, 6.7, 4.5, 5.7, 6.0, 7.4, 4.9, 6.1, 5.2, 5.7, 6.9, 6.8, 4.9]
tires_belted = [4.1, 4.9, 6.2, 6.9, 6.8, 4.4, 5.7, 5.8, 6.9, 4.9, 6.0, 4.9, 5.3, 6.5, 7.1, 4.8]



In [69]:
h, p, z, txt = signtest_v( "FUEL CONSUMPTION",
                          "Radial Tires", "Belted Tires", tires_radial, tires_belted)

FUEL CONSUMPTION --> Radial Tires and Belted Tires are DISTINCT
     significance = 0.05
     p = 0.0376353137873
     z = 19.5


In [75]:
def non_parametric_test( hipotesys, label1, label2, test, vals1, vals2 ):
    test_catalog = { 'signtest': signtest_v }
    try:
        return test_catalog[test]( hipotesys, label1, label2, vals1, vals2 )
    except:
        print( "ERROR. test not found: " + test )


In [76]:
h, p, z, txt = non_parametric_test(
    "FUEL CONSUMPTION", "Radial Tires", "Belted Tires",
    'signtest', tires_radial, tires_belted )


FUEL CONSUMPTION --> Radial Tires and Belted Tires are DISTINCT
     significance = 0.05
     p = 0.0376353137873
     z = 19.5
