# Week 5 Application Exercise

This is the starting notebook for the Week 5 application exercise.  It is intended to demonstrate several things:

- The use of simulation as a tool for understanding statistical methods
- Performing hypothesis tests
- The difference between paired and two-sample tests

Save the notebook file and `.py` file into the same folder to start work on the assignment.

Please treat the experiment as a black box and infer its behavior using the statistical techniques we have learned in class.  After class, I invite you to look at its source code and see how it works.

The core idea of this is to identify whether “fabulators” under condition A have higher (or lower) nonsense production than under condition B.  The code will describe an experimental design, and allow you to “run” the experiment to draw samples.

## Software Requirements

This exercise requires an additional Python package that is not included in a default Anaconda install - the [seedbank](https://seedbank.lenskit.org) library.  You can install this with Pip:

    pip install seedbank

It's also available in Conda-Forge:

    conda install -c conda-forge seedbank
    
Because it only has a few dependencies, and they are all included in almost all base Conda environments, the Pip installation works fine, and doesn't mix packages between Conda repositories.

## Setup

This project requires an additional 

We need to do our usual imports:

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt

Because we are using random number generation, we want to **seed** the random number generator.  If we initialize it with the a fixed seed, re-running the notebook repeatedly will produce the same results.  This is useful for debugging and reproducibility.  We'll often want to re-run with a *different* seed before submitting, just to make sure that our results aren't accidentally the result of a pathological choice of random seeds.

The Seedbank library initializes and seeds a wide range of Python random number generators.  The basic usage is to directly seed with a call to `initialize` (for teaching notebooks, I often use the current date as my random seed):

    import seedbank
    seedbank.initialize(20210923)

However, if we all ran that code, we would all get the same results, but for this exercise I would like different teams to get *different* results.  Therefore, we will take advantage of another Seedbank feature that allows us to specify additional string keys that get incorporated into the random seed.  In the following cell, edit it to use your team name:

In [3]:
TEAM_NAME = 'network'
import seedbank
seedbank.initialize(20210923, TEAM_NAME)

SeedSequence(
    entropy=20210923,
    spawn_key=(array([1652547376, 2490093471, 1609550347, 3146785967], dtype=uint32),),
)

The last piece is to import our custom module to get a 'world' from which we can sample:

In [4]:
from cs533_w5_world import Experiment

And then create our experiment:

In [5]:
exp = Experiment()

## Getting Data

We first need to know where our data is coming from.  The experiment describes itself:

In [6]:
print(exp.describe())

This experiment measures the nonsense output of fabulators under different
conditions.  It measures each fabulator twice, under two different conditions,
to see how much nonsense they produce in each condition.

Your goal is to measure whether condition A causes fabulators to produce more
(or less) nonsense than condition B.


We can run an instance of this experiment with size 50:

In [15]:
SAMPLE_SIZE = 50
data = exp.run_experiment(SAMPLE_SIZE)
data

Unnamed: 0,subject,CondA_Nonsense,CondB_Nonsense
0,1,489.969978,478.483402
1,2,489.323604,503.474736
2,3,488.960187,473.028075
3,4,486.142559,504.917211
4,5,491.851569,556.334645
5,6,487.045764,518.808118
6,7,491.533228,567.194303
7,8,489.178857,584.006927
8,9,494.346422,502.28374
9,10,496.480821,511.376209


## Comparing Conditions/Groups

Review the experiment description.  You need to compare A and B with a *t*-test, but the precise details will depend on your experiment structure.

✅ Do you need to use a **paired** t-test or an **independent two-sample** t-test for this analysis?

✅ What is the **null hypothesis** for the test with this data?

✅ Compute the means of both groups or conditions:

In [13]:
data.mean()[1:]

CondA_Nonsense    489.953874
CondB_Nonsense    500.392215
dtype: float64

✅ Compute the *difference* in means. How much more nonsense is produced in A vs. B?

This is also called the *effect size* (or specifically, the *unstandardized effect size*).

✅ Run the appropriate *t*-test to test if this difference is statistically significant and obtain a *p*-value:

✅ What does this result mean?

rng = np.random.default_rng(20200913)
rng.choice(data()[1:2], size=5)

## Bootstrap

✅ Bootstrap a confidence interval for the effect size (note that the bootstrap procedure will differ between paired and unpaired analyses):

In [None]:
def boot_mean_estimate(vals, nboot=10000):
    obs = vals.dropna()  # ignore missing values
    mean = obs.mean()
    n = obs.count()
    
    boot_means = [np.mean(rng.choice(obs, size=n)) for i in range(nboot)]

## Sampling Distribution

✅ Compute the effect size of **100 runs** of your experiment.  Describe the distribution of these effect sizes numerically and graphically.

In [16]:
rng = np.random.default_rng(20200913)
rng.choice(data[1:2], size=5)

array([[  2.        , 489.32360388, 503.47473588],
       [  2.        , 489.32360388, 503.47473588],
       [  2.        , 489.32360388, 503.47473588],
       [  2.        , 489.32360388, 503.47473588],
       [  2.        , 489.32360388, 503.47473588]])

⚠ While the confidence interval above will likely be close percentiles of the effect size distribution, they are not the same thing.  **Why is that?**

## The Answers

The experiment can tell you the answers (do **not** run this until you have completed the rest):

In [None]:
exp.answers()

## Other Analysis

If you have time, create a second experiment with the opposite configuration of your initial.  The experiment class takes a `paired` option that you can use to force a paired or unpaired design by passing `True` or `False`:

    exp2 = Experiment(paired=True)

If you needed a paired analysis above, create an unpaired experiment (`paired=False`); if you used an independent analysis above, create a paired analysis. Repeat as much of your analysis as you can with the new experimental design.