# p-hacking

The main point of this article will be gaining a greater understanding of what p-values are and why they are (and are not) evil. I specifically want to address the idea that using a p-value < 0.05 was never meant as a way to say something in particular about a specific experiment. According to Neyman and Pearson (see Statistics Done Wrong), the goal was to come up with a process that would guarantee that false positives are kept at, or below a certain threshold. What this means for us is that the normal process of science (using a p < 0.05 for signifigance testing) means that we are fine with a 5% false positive rate, i.e., we are fine with 5% of scientific findings appearing to be true even when they are not.

In [1]:
def factorial(n):
    """Calculates the factorial of `n`
    """
    vals = list(range(1, n + 1))
    if len(vals) <= 0:
        return 1

    prod = 1
    for val in vals:
        prod *= val
        
    return prod
    
    
def n_choose_k(n, k):
    """Calculates the binomial coefficient
    """
    return factorial(n) / (factorial(k) * factorial(n - k))


def binom_prob(n, k, p):
    """Returns the probability of see `k` heads in `n` coin tosses
    
    Arguments:
    
    n - number of trials
    k - number of trials in which an event took place
    p - probability of an event happening
    
    """
    return n_choose_k(n, k) * p**k * (1 - p)**(n - k)

In [2]:
def p_value(n, k, p):
    """Returns the p-value for the given the given set 
    """
    return sum(binom_prob(n, i, p) for i in range(k, n+1))

print("P-value: %0.1f%%" % (p_value(30, 22, 0.5) * 100))

P-value: 0.8%


In [3]:
# This experiment tosses a fair coin 100 times and records the number 
# of times that we get a false positive (i.e., that we would state that the coin is biased)
import random
n = 100
n_tests = 100
s = [sum(random.choice([0,1]) for _ in range(n)) for i in range(n_tests)]
sum(p_value(n, k, 0.5) < 0.05 for k in s)

3