# Example 1 Data

We start by loading some useful libraries in and some functionality to provide a more functional approach to programming.

In [1]:
import pandas as pd
from scipy import stats
import altair as alt
from typing import List, Any, Tuple
from functools import reduce
import math as math

We start by defining a function which generates a data frame of the results from a single individual when they have been asked to flip a coin a fixed number of time.

In [2]:
def random_flips(num_flips: int,
                 prob_heads: float,
                 person_id: int) -> pd.DataFrame:
    coin_result = stats.bernoulli.rvs(p = prob_heads,
                                      size = num_flips)
    flip_number = range(1, num_flips + 1)
    flipper_id = num_flips * [person_id]
    return pd.DataFrame({"name": flipper_id,
                         "flip_number": flip_number,
                         "outcome": coin_result})

Then we can wrap this in a function that does this for a group of people and puts all of the results into a single data frame.

In [3]:
def random_experiment(num_flips: int,
                      person_ids: List[int],
                      prob_heads_list: List[float]) -> pd.DataFrame:
    rand_dfs = (random_flips(num_flips, prob, pid)
                for (prob,pid) in zip(prob_heads_list,person_ids))
    op = lambda df, x: df.append(x)
    return reduce(op, rand_dfs, pd.DataFrame())

Given the number of trials and the number of successes among those trials we can get an MLE for the probability of success and we can generate a Wald style confidence interval on the estimate. Note that we define a new type to make it clear what the result of this is.

In [4]:
EstimateAndCI = Tuple[float,Tuple[float,float]]

def wald_estimate_and_ci(num_trials: int, num_success: int) -> EstimateAndCI:
    p_hat = num_success / num_trials
    z = 1.96
    delta = z * math.sqrt(p_hat * (1 - p_hat) / num_trials)
    return (p_hat,(p_hat - delta, p_hat + delta))

The data set that we want will have a couple of outliers in it so that the audience has something interesting to find. We will also generate another false data set which leads to the correct point estimate but that has a structure which means that the binomial model is not appropriate. We will use two maps, `exp1` and `exp2`, to hold the specifics of each data set.

In [5]:
num_flips = 30

exp1 = {
    "experiment": 1,
    "num_people": 15,
    "person_ids": range(15),
    "num_outliers": 2,
    "prob_heads": 0.4,
    "output_csv": "experiment1.csv"
}

exp2 = {
    "experiment": 2,
    "num_people": 50,
    "person_ids": range(50),
    "prob_lower": 0.2,
    "prob_upper": 0.6,
    "output_csv": "experiment2.csv"
}

## Experiment 1

The last two people do not actually flip the coin, they just write heads for all trials.

In [6]:
prob_heads_1 = ((exp1["num_people"] - exp1["num_outliers"]) * [exp1["prob_heads"]] + 
                exp1["num_outliers"] * [1.0])

results_1 = random_experiment(
    num_flips,
    exp1["person_ids"],
    prob_heads_1
)

results_1.to_csv(exp1["output_csv"], index=False)

## Experiment 2

Everyone flips they coin that they are given, but the coins all have different probabilities of heads.

In [7]:
prob_inc = (exp2["prob_upper"] - exp2["prob_lower"]) / (exp2["num_people"] - 1)
prob_heads_2 = [exp2["prob_lower"] + prob_inc * n 
                for n in range(exp2["num_people"])]

results_2 = random_experiment(
    num_flips,
    exp2["person_ids"],
    prob_heads_2
)

results_2.to_csv(exp2["output_csv"], index=False)