<img width="300px" src="../images/learning-tree-logo.svg" alt="Learning Tree logo" />

# Statistical inference

Statistical inference is fairly intuitive---once you escape from all the different tests and their distributions.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import percentileofscore

We have a sample of 20 IQ scores. Is there evidence to support that this is a sample from a population with above average intelligence (i.e. with IQs above 100)?

In [None]:
iq_scores = [
    117.0,
    67.0,
    116.0,
    102.0,
    108.0,
    104.0,
    101.0,
    127.0,
    125.0,
    102.0,
    122.0,
    113.0,
    69.0,
    119.0,
    136.0,
    100.0,
    114.0,
    87.0,
    105.0,
    80.0,
]

Create a histogram of the values in `iq_scores`.

In [None]:
plt.hist(iq_scores, bins=6);

The majority of people in the sample certainty have above average IQs, but could this just be a result of sampling error? 

Calculate the mean IQ from `iq_scores`.

In [None]:
np.mean(iq_scores)

Considerably above 100...but this proves nothing.

Create a 10,000 random (re)samples from `iq_score` and calcuate their means. Store the 10,000 means in `sample_means`.

In [None]:
sample_means = []

for _ in range(0, 10000):
    sample_mean = np.random.choice(iq_scores, size=len(iq_scores), replace=True).mean()

    sample_means.append(sample_mean)

`sample_means` is an estimate of the [sampling distribution](https://en.wikipedia.org/wiki/Sampling_distribution).

Visualise the sampling distribution by creating a histogram from `sample_means`.

In [None]:
plt.hist(sample_means, bins=50)
plt.vlines(x=100, ymin=0, ymax=600, colors="red")
plt.show()

How unlikely is a average IQ at, or below, 100 given this distribution? 

Calculate the percentile of 100 using `sample_means`.

In [None]:
percentileofscore(sample_means, 100)

The percentile will vary, but it's around 8%. That's not small enough for us to be sure that an average IQ of 100 isn't compatible with our data, so we can't conclude our sample was taken from an above average population.

## Takeaway

Statisical inference is conceptually quite straightforward. 

The goal is to determine whether our data is compatible with our assumptions---in the presence of sampling error.