# T-tests and P-values

Notes: https://github.com/daviskregers/notes/blob/master/data-science/09-experimental-design-ml-in-real-world/03-t-tests-and-p-values.md

---

Let's say we're running an A/B test. We'll fabricate some data that randomly assigns order amounts from customers in sets A and B, with B being a little bit higher.

In [1]:
import numpy as np
from scipy import stats

A = np.random.normal(25.0, 5.0, 10000)
B = np.random.normal(26.0, 5.0, 10000)

stats.ttest_ind(A, B)

Ttest_indResult(statistic=-14.264919664050025, pvalue=6.084267160651872e-46)

In [2]:
A = np.random.normal(25.0, 5.0, 10000)
B = np.random.normal(25.0, 5.0, 10000)

stats.ttest_ind(A, B)

Ttest_indResult(statistic=-2.0194388671880903, pvalue=0.043454938496298404)

In [3]:
A = np.random.normal(25.0, 5.0, 1000000)
B = np.random.normal(25.0, 5.0, 1000000)

stats.ttest_ind(A, B)

Ttest_indResult(statistic=0.8077442120487217, pvalue=0.41923794248440305)

In [4]:
stats.ttest_ind(A, A)

Ttest_indResult(statistic=0.0, pvalue=1.0)

We can see from the tests above that from the first test we can see an actual difference (a negative change, an extremely low p-value as it should be). As for the orther tests - there isn't real difference, the p-value got a little better with more samples, but still it doesn't help.

The threshold of significance on p-value is really just a judgement call. As everyting is a matter of probabilities, you can never definitely say that an experiment's results are significant. But you can use the t-test and p-value as a measure of significance, and look at trends in these metrics as the experiment runs to see if there might be something real happening between the two.