## Configuration

In [1]:
import numpy as np
import scipy.stats as st
from itertools import compress

## Define functions

In [2]:
def calculate_sample_error(prob_a, sample_a, prob_b, sample_b):
    """
    For given unequal proportions of two populations for which some condition is true,
    estimate the likelihood that samples of the two populations would lead you to the incorrect conclusion.
    
    Example:
    
        In village A, 52% <prob_a> of citizens wear hats. In village B, 47% <prob_b> of citizens wear hats.
        Estimate the likelihood that samples of 100 <sample_a> citizens from A and
        100 <sample_b> from B are taken, what is the likelihood that sample B would
        have a higher % of hat-wearers than sample A? (given sample_a and sample_b are large enough, random samples)
        
    Usage:
        Executing -> 'calculate_sample_error(0.52, 0.47, 100, 100)'
        Should return ->  0.23946
    """
    # Maximum and minimum probabilities
    prob_list = [prob_a, prob_b]
    sample_list = [sample_a, sample_b]
    max_prob, min_prob = max(prob_list), min(prob_list)
    
    # Samples associated with minimum and maximum probabilities
    max_sample = list(compress(sample_list, [p == max_prob for p in prob_list]))[0]
    min_sample = list(compress(sample_list, [p == min_prob for p in prob_list]))[0]
    
    # Estimate likelihood
    mean_diff = max_prob - min_prob
    stdev_diff = np.sqrt(((max_prob * (1 - max_prob)) / max_sample) + ((min_prob * (1 - min_prob)) / min_sample))
    z_score = (0 - mean_diff) / stdev_diff
    return st.norm.cdf(z_score)

## Question 1

In village A, 53.5% of citizens walk to work vs. village B where 48.5% of citizens walk to work. (true population probabilities)

If you polled a random sample of 250 people from village A and 130 people from village B on whether or not they walk to work, what is the likelihood that your polling group from village B would have a *higher* percentage of walkers than village A?

In [3]:
calculate_sample_error(.535, 250, .485, 130)

0.17726064399873104

## Question 2
Answer the same question, but assuming:
    
In village A, 60% of citizens walk to work vs. village B where 49% of citizens walk to work. You poll 100 people in each village.

In [4]:
calculate_sample_error(.60, 100, .49, 100)

0.058022955871784604