<h1>P-Value Hacking</h1>
<p>Suppose I am an scientist who has a theory that my 100 sided die is unfair.  I design an experiment that I'm going to roll this die 10,000 times to prove that there is a number that will come up more often than not.  In addition to this, I require that in order for a number to be loaded, there needs to be a 3-sigma deviation from the expected probability.  I don't designate which face should be the abnormal dice ahead of time.</p>

</br>

<p>What is wrong with this experiment?</p>

In [1]:
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

import random
import numpy as np
from math import sqrt

def roll_fair(n_face=6):
    return random.randint(1,n_face)

<h2>Let's start with a simple multinomial probability</h2>

In [11]:
n_rolls = 100000
res = [roll_fair(100) for x in range(n_rolls)]
occ, edges = np.histogram(res, bins=map(lambda x: x+0.5, range(1,101)), density=True)
bins = map(lambda x: x+.5, edges)
err = map(lambda x: sqrt(((1-x)*x)/n_rolls), occ)

data = [
    go.Scatter(
        name="1000 Trials",
        x=bins,
        y=occ,
        mode = 'markers',
        marker=dict(
            color='black',
        ),
        error_y=dict(
            type='data',
            array=err,
            visible=True,
            color='black'
        )
    ),
    go.Bar(
        name="Expectation Value",
        x=range(1,101),
        y=[.01]*100,
        marker=dict(
            color='#990000'
        )
    )   
]

layout = go.Layout(
    title='100 Faced Dice Experiment 1',
    xaxis=dict(
        title='Roll Result',
        titlefont=dict(
            size=14,
            color='black'
        )
    ),
    yaxis=dict(
        title='Probability',
        titlefont=dict(
            size=14,
            color='black'
        )
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='dice_pmf')

<h2>What are we seeing</h2>
<p>Here, there are several bins which are beyond the 3 sigma threshold (in my example, 12 had a probability of 0.0107 +/- 0.000326 => which is inconsistent with a probability of 0.01).  In fact there are several data points for which this is true.  Would you conclude that the dice is loaded to roll 3s more often than not?  The answer, of course, is no.  A 3-sigma variation means that 95% of points falls within 3 standard deviations of the mean.  If we look at a distribution that has 100 observations, we expect to get no less than 5 observations for which the variation is not consistent with the expected theory.  Essentially, this is what p-value hacking is.  When a scientist is either unscrupulous or incompotent, they will perform an experiment hoping to find some variation from a baseline.  Typically, the easiest way of refuting this is to perform the experiment a second time.</p>

In [12]:
n_rolls = 100000
res = [roll_fair(100) for x in range(n_rolls)]
occ, edges = np.histogram(res, bins=map(lambda x: x+0.5, range(1,101)), density=True)
bins = map(lambda x: x+.5, edges)
err = map(lambda x: sqrt(((1-x)*x)/n_rolls), occ)

data = [
    go.Scatter(
        name="1000 Trials",
        x=bins,
        y=occ,
        mode = 'markers',
        marker=dict(
            color='black',
        ),
        error_y=dict(
            type='data',
            array=err,
            visible=True,
            color='black'
        )
    ),
    go.Bar(
        name="Expectation Value",
        x=range(1,101),
        y=[.01]*100,
        marker=dict(
            color='#990000'
        )
    )   
]

layout = go.Layout(
    title='100 Faced Dice Experiment 2',
    xaxis=dict(
        title='Roll Result',
        titlefont=dict(
            size=14,
            color='black'
        )
    ),
    yaxis=dict(
        title='Probability',
        titlefont=dict(
            size=14,
            color='black'
        )
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='dice_pmf')

<h2>The results don't hold up</h2>
<p> In this trial, the value of 12 has a probability of 0.0968 +/- .00309 => consistent with the expected value.  If this