<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Welch’s-T-Test" data-toc-modified-id="Welch’s-T-Test-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Welch’s T-Test</a></span></li><li><span><a href="#Paired-Student's-t-test" data-toc-modified-id="Paired-Student's-t-test-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Paired Student's t-test</a></span></li><li><span><a href="#Mann–Whitney-U-test" data-toc-modified-id="Mann–Whitney-U-test-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Mann–Whitney U test</a></span></li><li><span><a href="#Wilcoxon-signed-rank-test" data-toc-modified-id="Wilcoxon-signed-rank-test-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Wilcoxon signed-rank test</a></span></li><li><span><a href="#Chi-Square-Test-of-Independence" data-toc-modified-id="Chi-Square-Test-of-Independence-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Chi-Square Test of Independence</a></span></li><li><span><a href="#Fisher’s-Exact-Test" data-toc-modified-id="Fisher’s-Exact-Test-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Fisher’s Exact Test</a></span></li></ul></div>

In [1]:
%matplotlib inline
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import IPython as ip
mpl.style.use('ggplot')
# mpl.rc('font', family='Noto Sans CJK TC')
ip.display.set_matplotlib_formats('svg')

In [2]:
np.random.seed(20180701)

# Welch’s T-Test

In [3]:
group_ctl = sp.stats.norm.rvs(loc=170, scale=5, size=100)
group_exp = sp.stats.norm.rvs(loc=170, scale=5, size=3)
# the two groups are from the same population
# just sampled with different sizes

In [4]:
sp.stats.ttest_ind(group_ctl, group_exp)
# # or
# from scipy import stats
# stats.ttest_ind(...)

Ttest_indResult(statistic=-1.4825003632205942, pvalue=0.1413207618623851)

In [5]:
# ttest_ind(..., equal_var=False) === Welch’s t-test
sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False)

Ttest_indResult(statistic=-1.474079944377262, pvalue=0.2716372989009761)

Welch’s t-test is better – p-value is greater – to resist unequal variances and/or unequal sample sizes.

# Paired Student's t-test

In [6]:
group_ctl = sp.stats.norm.rvs(loc=170, scale=5, size=100)
# ctl + norm.rvs(...) === ctl + some treatment effect
group_exp = group_ctl + sp.stats.norm.rvs(loc=1, scale=1, size=100)
# so they are different

In [7]:
sp.stats.ttest_ind(group_ctl, group_exp)

Ttest_indResult(statistic=-1.4784029362607924, pvalue=0.14088899541937028)

In [8]:
sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False)

Ttest_indResult(statistic=-1.4784029362607924, pvalue=0.14089044320062483)

In [9]:
sp.stats.ttest_rel(group_ctl, group_exp)

Ttest_relResult(statistic=-9.62612703334583, pvalue=7.157064207762357e-16)

The other tests are even wrong – saying they are the same.

# Mann–Whitney U test

In [10]:
group_ctl = [11, 22, 33, 44, 55, 66, 77]
group_exp = [11, 22, 33, 44, 55, 66, 7700]
# the two groups are the same
# just exp has an outlier

In [11]:
sp.stats.ttest_ind(group_ctl, group_exp)

Ttest_indResult(statistic=-0.9949204413527747, pvalue=0.3394143067507971)

In [12]:
sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False)

Ttest_indResult(statistic=-0.9949204413527745, pvalue=0.3581854342693085)

In [13]:
sp.stats.mannwhitneyu(group_ctl, group_exp)

MannwhitneyuResult(statistic=24.0, pvalue=0.5)

Mann–Whitney U test is better – the p-value is greater – to resist outliers.

# Wilcoxon signed-rank test

In [14]:
group_ctl = sp.stats.zipf.rvs(loc=0, a=1.01, size=12)
group_exp = group_ctl*2
group_ctl, group_exp

(array([           11077071,  167785941635581952,    4780512611905882,
          43429257347420112,              581252,             2766317,
               358819568736,               39643,               11936,
                         48, 3161482534121832960,               99195]),
 array([           22154142,  335571883271163904,    9561025223811764,
          86858514694840224,             1162504,             5532634,
               717639137472,               79286,               23872,
                         96, 6322965068243665920,              198390]))

In [15]:
sp.stats.ttest_ind(group_ctl, group_exp)

Ttest_indResult(statistic=-0.4800692621534574, pvalue=0.6359168660580073)

In [16]:
sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False)

Ttest_indResult(statistic=-0.4800692621534574, pvalue=0.6376052438968551)

In [17]:
sp.stats.ttest_rel(group_ctl, group_exp)

Ttest_relResult(statistic=-1.0734675040832977, pvalue=0.3060458815782881)

In [18]:
sp.stats.mannwhitneyu(group_ctl, group_exp)

MannwhitneyuResult(statistic=66.0, pvalue=0.37541594204455847)

In [19]:
sp.stats.wilcoxon(group_ctl, group_exp)

WilcoxonResult(statistic=0.0, pvalue=0.002217721464237049)

The other tests are even wrong – saying they are the same.

# Chi-Square Test of Independence

In [20]:
sp.stats.chi2_contingency([
    # men, women
    [43, 44],  # right-handed
    [ 9,  4],  # left-handed
])

(1.0724852071005921, 0.300384770390566, 1, array([[45.24, 41.76],
        [ 6.76,  6.24]]))

In [21]:
# expected frequency
(43+9)*((43+44)/(43+44+9+4))

45.24

In [22]:
sp.stats.chi2_contingency([
    # men, women
    [1, 7],  # studying
    [8, 4],  # not-studying
])

(3.712121212121211, 0.05401870019830758, 1, array([[3.6, 4.4],
        [5.4, 6.6]]))

In [23]:
# expected frequency
(1+8)*((1+7)/(1+7+8+4))

3.6

When the sample size is not enough, the p-value of chi-squared test has huge difference from the exact test.

# Fisher’s Exact Test

In [24]:
sp.stats.fisher_exact([
    # men, women
    [43, 44],  # right-handed
    [ 9,  4],  # left-handed
])

(0.43434343434343436, 0.23915695682224267)

In [25]:
# odds ratio
(43/9) / (44/4)

0.4343434343434343

In [26]:
sp.stats.fisher_exact([
    # men, women
    [1, 7],  # studying
    [8, 4],  # not-studying
])

(0.07142857142857142, 0.028101929030721624)

In [27]:
# odds ratio
(1/8) / (7/4)

0.07142857142857142