Chapter 14
# Effect Size

Statistical hypothesis tests report on the likelihood of the observed results given an assumption (such as no association between variables, or no difference between groups)

Hypothesis tests do not comment on the size of the effect if the association or difference is statistically significant.  The results of an experiement could be significant, but the effect so small that it has little consequence.

Effect size methods refer to a suite of statistical tools for quantifying the size of an effect in the results of experiments, that can be used to complement the results from statistical hypothesis tests.

An effect size refers to the size or magnitude of an effect or result as it would be expected to occur in a population.  The effect size is estimated from samples of data.

An effect can be the result of a treatment revealed in a comparison between groups (e.g. treated and untreated groups), or can describe the degree of assocaition between two related variables (e.g. treatment dosage and health)

Effect size methods refers to a collection of statistical tools used to calculate the effect size.  It is common to organise the methods into groups, based on the type of affect to be quantified.  Two main groups of methods for calculating effect size are:
- Association - statistical methods for quantifying an assocation between variables (e.g. correlation)
- Difference - statistical methods for quantifying the difference between variables (e.g. difference between means)

The result of an effect size calculation must be interpreted.  A measure must be chosen based on the goals of the interpretation.  Three types of calculated result include:
- Standardised Result - the effect size has a standard scale allowing it to be interpreted generally regardless of application (e.g. Cohen's d calculation)
- Original Units Result - the effect size may use the original units of the variable, which can aid in the interpretation within the domain (e.g. difference between two sample means)
- Unit Free Result - the effect size may not have units, such as a count or proportion (e.g. a correlation coefficient)

It may be a good idea to report an effect size using multiple measures, to aid the different types of readers of your findings

The effect size does not replace the results of the statistical hypothesis test: it complements them:
- Hypothesis Test - quantify the likelihood of observing the data given an assumption (null hypothesis)
- Effect Size - quantify the size of the effect assuming that the effect is present

# Calculate Association Effect Size
The association between variables is often referred to as the r family of effect size methods.

The Pearson's correlation coefficient (aka Pearson's r) measures the degree of linear association between two real-valued variables.

It is a unit-free effect size measure, that can be interpreted in a standard way:
- -1.0 = perfect negative relationship
- -0.7 = strong negative relationship
- -0.5 = moderate negative relationship
- -0.3 = weak negative relationship
- 0.0 = no relationship
- 0.3 = weak positive relationship
- 0.5 = moderate positive relationship
- 0.7 = strong positive relationship
- 1.0 = perfect positive relationship

The Pearson's correlation coefficient can be calculated using the SciPy function pearsonr()

Another very popular method for calculating the association effect size is the r-squared measure, also called the coefficient of determination.  It summarises the proportion of variance in one variable explained by the other

In [1]:
# calculate the pearson's correlation between two variables
from numpy.random import randn
from numpy.random import seed
from scipy.stats import pearsonr

# seed random number generator
seed(1)

# prepare data
data1 = 10 * randn(10000) + 50
data2 = data1 + (10 * randn(10000) + 50)

# calculate pearson's correlation
corr, _ = pearsonr(data1, data2)
print('Pearsons correlation: %.3f' % corr)

Pearsons correlation: 0.712


# Calculate Difference Effect Size
The difference between groups is often referred to as the d family of effect size methods.

Cohen's d measures the difference between the mean from two Gaussian-distributed vvariables.  It is a standard score theat summarises the difference in terms of the number of standard deviations:
- Small Effect Size: d = 0.20
- Medium Effect Size: d = 0.50
- Large Effect Size: d = 0.80

The Cohen's d calculation is not provivded in Python, but can be calculated manually:
- d = ((mean of first sample) - (mean of second sample))/(pooled standard deviation of both samples)

See the following example for calculation of the pooled standard deviation

Two other popular methods for quantifying the difference effect size are:
- Odds Ratio - measures the odds of an outcome occuring from one treatment compared to another
- Relative Risk Ratio - measures the probabilities of an outcome occuring from one treatment compared to another

In [2]:
# function to calculate Cohen's d for independent samples
from numpy import mean
from numpy import var
from math import sqrt

def cohend(d1, d2):
    # calculate the size of samples
    n1 = len(d1)
    n2 = len(d2)

    # calculate the variance of the samples (the degrees of freedom used in the calculation is n - ddof)
    s1 = var(d1, ddof=1)
    s2 = var(d2, ddof=1)

    # calculate the pooled standard deviation (the subtractions are adjustments for the degrees of freedom)
    s = sqrt(((n1 - 1) * s1 + (n2 - 1) * s2) / (n1 + n2 - 2))

    # calculate the means of the samples
    u1 = mean(d1)
    u2 = mean(d2)

    # calculate the effect size
    return (u1 - u2) / s

In [3]:
# calculate the cohen's d between two samples
from numpy.random import randn
from numpy.random import seed

# seed random number generator
seed(1)

# prepare data - the data is contrived such that the means are different by one half standard deviation, and both samples have the same standard deviation
data1 = 10 * randn(10000) + 60
data2 = 10 * randn(10000) + 55

# calculate cohen's d
d = cohend(data1, data2)
print('Cohens d: %.3f' % d)

Cohens d: 0.500


# Extensions

In [18]:
# implement a function to calculate the Cohen's d for paired samples, and demonstrate it on a test dataset
# Method 1: Cohen's effect size for repeated measures
from numpy.random import randn
from numpy.random import seed
from scipy.stats import pearsonr
from numpy import mean
from numpy import var
from math import sqrt

def cohend_paired(d1, d2):
    # calculate the size of samples
    n1 = len(d1)
    n2 = len(d2)

    # calculate the variance of the samples (the degrees of freedom used in the calculation is n - ddof)
    var1 = var(d1, ddof=1)
    var2 = var(d2, ddof=1)

    # calculate the means of the samples
    u1 = mean(d1)
    u2 = mean(d2)

    # to implement Cohen's effect size for repeated measures, we first need to calculate the correlation coefficient
    corr, p = pearsonr(d1, d2)

    # now calculate the Cohen's effect size
    s_z = sqrt(var1 + var2 - (2 * corr * sqrt(var1) * sqrt(var2)))
    s_rm = s_z/sqrt(2*(1-corr))
    return (u1 - u2) / s_rm

# seed random number generator
seed(1)

# prepare data - although the samples are independent, not paired, we can pretend for the sake of the demonstration that the observations are paired 
data1 = 10 * randn(10000) + 60
data2 = 10 * randn(10000) + 55

# calculate cohen's d
dz = cohend_paired(data1, data2)
print('Cohens d: %.3f' % dz)

# to use the interpretation table, d = dz * sqrt(2)

Cohens d: 0.500


In [17]:
# implement a function to calculate the Cohen's d for paired samples, and demonstrate it on a test dataset
# Method 2: Cohen's d using an average variance
from numpy.random import randn
from numpy.random import seed
from numpy import mean
from numpy import var
from math import sqrt

def cohend_paired(d1, d2):
    # calculate the size of samples
    n1 = len(d1)
    n2 = len(d2)

    # calculate the variance of the samples (the degrees of freedom used in the calculation is n - ddof)
    var1 = var(d1, ddof=1)
    var2 = var(d2, ddof=1)

    # calculate average standard deviation
    s = sqrt((var1 + var2)/2)

    # calculate the means of the samples
    u1 = mean(d1)
    u2 = mean(d2)

    # calculate the effect size
    return (u1 - u2) / s

# seed random number generator
seed(1)

# prepare data - although the samples are independent, not paired, we can pretend for the sake of the demonstration that the observations are paired 
data1 = 10 * randn(10000) + 60
data2 = 10 * randn(10000) + 55

# calculate cohen's d
d = cohend_paired(data1, data2)
print('Cohens d: %.3f' % d)

# This method is preferred, since its values can more easily be compared to d for two independent samples i.e. we can determine whether the effect measured in an experiment with paired samples (e.g. pre-treatment vs post-treatment) is higher or lower than the effect measured in an experiment with two independent samples (e.g. separate treatment and control groups)

Cohens d: 0.500


In [21]:
# implement a function to calculate the Cohen's d for paired samples, and demonstrate it on a test dataset
# Here's another version
from numpy.random import randn
from numpy.random import seed
from numpy import mean
from numpy import var
from math import sqrt

# this assumes both data sets have same length
def cohend_paired(d1, d2):
    # calculate the size of samples
    n1 = len(d1)
    n2 = len(d2)

    # calculate the effect size as the mean of the differences divided by the standard deviation of the differences
    mean_diff = sum([d1[i] - d2[i] for i in range(n1)])/n1
    sum_of_squares = sum([((d1[i] - d2[i]) - mean_diff)**2 for i in range(n1)])
    s = sqrt(sum_of_squares / (n1 - 1))
    return mean_diff / s

# seed random number generator
seed(1)

# prepare data - although the samples are independent, not paired, we can pretend for the sake of the demonstration that the observations are paired 
data1 = 10 * randn(10000) + 60
data2 = 10 * randn(10000) + 55

# calculate cohen's d
dz = cohend_paired(data1, data2)
print('Cohens d: %.3f' % dz)

Cohens d: 0.357


In [30]:
# implement and demonstrate another difference effect measure, such as the odds ratio

# the odds ratio is used to determine whether there is a significant assocation between two categorical values in a contingency table, and is calculated as (probability of event)/(probability of non-event)

# e.g. in a small sample
# - females: 8 voted labour and 4 voted conservative
# - males: 4 voted labour and 9 voted conservative
# Is there a statistically significant association between gender and political party preference
# H0 is no significant association i.e. odds ratio = 1
import scipy.stats as stats
contingency_table = [[8, 4], [4, 9]]
oddsratio, p = stats.fisher_exact(contingency_table)
print('Odds ratio', oddsratio)
print('Probability', p)

# interpret the significance
alpha = 0.05
if p > alpha:
    print('No significant association between gender and political party preference (fail to reject H0)')
else:
    print('Signifiant association between gender and political party preference (reject H0)')

# long-hand
# odds of female voting labour = 8/4 = 2
# odds of male voting labour = 4/9 = 0.444
# odds ratio = 2 / 0.444 = 4.5

Odds ratio 4.5
Probability 0.1152385439301579
No significant association between gender and political party preference (fail to reject H0)


In [None]:
# implement and demonstrate another difference effect measure, such as the risk ratio

# long-hand for above example
# risk of female voting labour = 8 / 12 = 0.67
# risk of male voting labour = 4 / 13 = 0.31
# risk ratio = 0.67 / 0.31 = 2.17