<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Effect-Size" data-toc-modified-id="Effect-Size-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Effect Size</a></span><ul class="toc-item"><li><span><a href="#Misclassification-rate" data-toc-modified-id="Misclassification-rate-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Misclassification rate</a></span><ul class="toc-item"><li><span><a href="#Probability-of-superiority" data-toc-modified-id="Probability-of-superiority-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Probability of superiority</a></span></li></ul></li><li><span><a href="#Cohen's-d-statistic" data-toc-modified-id="Cohen's-d-statistic-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Cohen's d statistic</a></span><ul class="toc-item"><li><span><a href="#Rule-of-Thumb-for-d-statistic" data-toc-modified-id="Rule-of-Thumb-for-d-statistic-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Rule of Thumb for d statistic</a></span></li></ul></li><li><span><a href="#Bringing-it-all-together" data-toc-modified-id="Bringing-it-all-together-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Bringing it all together</a></span></li></ul></li></ul></div>

# Effect Size

If the effect was significant, how significant was it?

Especially **Meta-Analysis** to compare multiple studies

## Misclassification rate

Can calculate a threshold to hold two populations against (where 2 PDFs cross):

$$\frac{\sigma_1 \mu_2 + \sigma_2 \mu_1}{\sigma_1 + \sigma_2}$$

Then use this to calculate the overlap (AUC) of the PDFs

![small cross over PDFs plot](images/small-cross-over.png)
![large cross over PDFs plt](images/large-cross-over.png)

### Probability of superiority

> Randomly pick a member each from the two populations and compare them

Useful for samples of unequal sizes since using portion/probability

## Cohen's d statistic

Pooled Variance:

$$ \sigma^2_{pooled} = \frac{\sigma^2_1 n_1 + \sigma^2_2 n_2}{n_1 + n_2}$$

Cohen's d statistic

$$ d = \frac{\mu_1 - \mu_2}{\sigma_{pooled}} $$

### Rule of Thumb for d statistic

>**Small effect = 0.2**
>
>**Medium Effect = 0.5**
>
>**Large Effect = 0.8**

## Bringing it all together

In [None]:
# Summary of functions found in curriculum
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
def evaluate_PDF(rv, x=4):
    '''Input: a random variable object, standard deviation
       output : x and y values for the normal distribution
       '''
    
    # Identify the mean and standard deviation of random variable 
    mean = rv.mean()
    std = rv.std()

    # Use numpy to calculate evenly spaced numbers over the specified interval (4 sd) and generate 100 samples.
    xs = np.linspace(mean - x*std, mean + x*std, 100)
    
    # Calculate the peak of normal distribution i.e. probability density. 
    ys = rv.pdf(xs)

    return xs, ys # Return calculated values

In [None]:
def overlap_superiority(group1, group2, n=1000):
    """Estimates overlap and superiority based on a sample.
    
    group1: scipy.stats rv object
    group2: scipy.stats rv object
    n: sample size
    """

    # Get a sample of size n from both groups
    group1_sample = group1.rvs(n)
    group2_sample = group2.rvs(n)
    
    # Identify the threshold between samples
    thresh = (group1.mean() + group2.mean()) / 2
    print(thresh)
    
    # Calculate no. of values above and below for group 1 and group 2 respectively
    above = sum(group1_sample < thresh)
    below = sum(group2_sample > thresh)
    
    # Calculate the overlap
    overlap = (above + below) / n
    
    # Calculate probability of superiority
    superiority = sum(x > y for x, y in zip(group1_sample, group2_sample)) / n

    return overlap, superiority

In [None]:
def plot_pdfs(cohen_d=2):
    """Plot PDFs for distributions that differ by some number of stds.
    
    cohen_d: number of standard deviations between the means
    """
    group1 = scipy.stats.norm(0, 1)
    group2 = scipy.stats.norm(cohen_d, 1)
    xs, ys = evaluate_PDF(group1)
    plt.fill_between(xs, ys, label='Group1', color='#ff2289', alpha=0.7)

    xs, ys = evaluate_PDF(group2)
    plt.fill_between(xs, ys, label='Group2', color='#376cb0', alpha=0.7)
    
    o, s = overlap_superiority(group1, group2)
    print('overlap', o)
    print('superiority', s)

In [None]:
plot_pdfs(cohen_d=0.2)

In [None]:
plot_pdfs(cohen_d=1)

In [None]:
plot_pdfs(cohen_d=4.0)