# SciPy and Hypothesis Testing

We will cover the fundamental concepts of SciPy that will help us create tests to measure our confidence in our statistical results:

    - Sample means and population means
    - The Central Limit Theorem
    - Why we use hypothesis tests
    - What errors we can come across and how to classify them

Suppose you want to know the average height of an oak tree in your local park. On Monday, you measure 10 trees and get an average height of 32 ft. On Tuesday, you measure 12 different trees and reach an average height of 35 ft. On Wednesday, you measure the remaining 11 trees in the park, whose average height is 31 ft. Overall, the average height for all trees in your local park is 32.8 ft.

The individual measurements on Monday, Tuesday, and Wednesday are called samples. A sample is a subset of the entire population. The mean of each sample is the sample mean and it is an estimate of the population mean.

For a population, the mean is a constant value no matter how many times it's recalculated. But with a set of samples, the mean will depend on exactly what samples we happened to choose. From a sample mean, we can then extrapolate the mean of the population as a whole. There are many reasons we might use sampling, such as:

    - We don't have data for the whole population.
    - We have the whole population data, but it is so large that it is infeasible to analyze.
    - We can provide meaningful answers to questions faster with sampling.

In [1]:
import numpy as np

pop = np.random.normal(loc=65, scale=3.5, size=300)
pop_mean = np.mean(pop)

print("Population Mean: {}".format(pop_mean))

Population Mean: 64.9178578874379


In [2]:
sample_1 = np.random.choice(pop, size=30, replace=False)
sample_2 = np.random.choice(pop, size=30, replace=False)
sample_3 = np.random.choice(pop, size=30, replace=False)
sample_4 = np.random.choice(pop, size=30, replace=False)
sample_5 = np.random.choice(pop, size=30, replace=False)

In [3]:
sample_1_mean = np.mean(sample_1)
print("Sample 1 Mean: {}".format(sample_1_mean))

Sample 1 Mean: 65.023517007708


In [4]:
sample_2_mean = np.mean(sample_2)
print("Sample 2 Mean: {}".format(sample_2_mean))

Sample 2 Mean: 65.17276992093296


In [5]:
sample_3_mean = np.mean(sample_3)
print("Sample 3 Mean: {}".format(sample_3_mean))

Sample 3 Mean: 65.716104789324


In [6]:
sample_4_mean = np.mean(sample_4)
print("Sample 4 Mean: {}".format(sample_4_mean))

Sample 4 Mean: 65.50496903111885


In [7]:
sample_5_mean = np.mean(sample_5)
print("Sample 5 Mean: {}".format(sample_5_mean))

Sample 5 Mean: 64.24488107974511


## Central Limit Theorem

If our sample selection is poor we end up with a sample population that is skewed the enitre population. The sample mean will be different from our population mean.

To mitigate the risk of having a skewed sample mean — take a larger set of samples. The sample mean of a larger sample set will more closely approximate the population mean. This phenomenon, known as the `Central Limit Theorem`, states that if we have a large enough sample size, all of our sample means will be sufficiently close to the population mean.

In [8]:
# Create population and find population mean
population = np.random.normal(loc=65, scale=100, size=3000)
population_mean = np.mean(population)

In [9]:
# Select increasingly larger samples
extra_small_sample = population[:10]
small_sample = population[:50]
medium_sample = population[:100]
large_sample = population[:500]
extra_large_sample = population[:1000]

In [10]:
# Calculate the mean of those samples
extra_small_sample_mean = np.mean(extra_small_sample)
small_sample_mean = np.mean(small_sample)
medium_sample_mean = np.mean(medium_sample)
large_sample_mean = np.mean(large_sample)
extra_large_sample_mean = np.mean(extra_large_sample)

In [11]:
# Print them all out!
print("Extra Small Sample Mean: {}".format(extra_small_sample_mean))
print("Small Sample Mean: {}".format(small_sample_mean))
print("Medium Sample Mean: {}".format(medium_sample_mean))
print("Large Sample Mean: {}".format(large_sample_mean))
print("Extra Large Sample Mean: {}".format(extra_large_sample_mean))

print("\nPopulation Mean: {}".format(population_mean))

Extra Small Sample Mean: 56.663163209773245
Small Sample Mean: 72.48033183195949
Medium Sample Mean: 61.64815116445001
Large Sample Mean: 58.20599320978539
Extra Large Sample Mean: 61.088390025501965

Population Mean: 60.30633786052552


### Hypothesis Tests

The differences seen in result data may be due to random chance.

Suppose we want to know if men are more likely to sign up for a given programming class than women. We invite 100 men and 100 women to this class. After one week, 34 women sign up, and 39 men sign up. Is the difference real?

We have taken sample means from two different populations, men and women. We want to know if the difference that we observe in these sample means reflects a difference in the population means. To formally answer this question, we need to re-frame it in terms of probability:

"What is the probability that men and women have the same level of interest in this class and that the difference we observed is just chance?"

A more formal version is: "What is the probability that the two population means are the same and that the difference we observed in the sample means is just chance?

These statements are all ways of expressing a `null hypothesis`. A `null hypothesis` is a statement that the observed difference is the result of chance. Hypothesis testing is a mathematical way of determining whether we can be confident that the null hypothesis is false.

#### Errors in statistical tests

**Type I**
Or 'false positive' is finding a correlation between two items where there is none - the `null hypothesis` is rejected, when in fact it is true.

For example, let's say you conduct an A/B test for an online store and conclude that interface B is significantly better than interface A at directing traffic to a checkout page. You have rejected the null hypothesis that there is no difference between the two interfaces, resulting in a 'false positive'

**Type II**
Or 'false negative' is faling to find a correlation between items that are actually related - the `null hypothesis` is accepted even though it is false.

For example, with the A/B test situation, let's say that after the test, you concluded that there was no significant difference between interface A and interface B. If there actually is a difference in the population as a whole, your test has resulted in a false negative.

In [12]:
def intersect(list1, list2):
  return [sample for sample in list1 if sample in list2]

In [13]:
# the true positives and negatives:
actual_positive = [2, 5, 6, 7, 8, 10, 18, 21, 24, 25, 29, 30, 32, 33, 38, 39, 42, 44, 45, 47]
actual_negative = [1, 3, 4, 9, 11, 12, 13, 14, 15, 16, 17, 19, 20, 22, 23, 26, 27, 28, 31, 34, 35, 36, 37, 40, 41, 43, 46, 48, 49]

In [14]:
# the positives and negatives we determine by running the experiment:
experimental_positive = [2, 4, 5, 7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 26, 27, 28, 32, 35, 36, 38, 39, 40, 45, 46, 49]
experimental_negative = [1, 3, 6, 12, 14, 23, 25, 29, 30, 31, 33, 34, 37, 41, 42, 43, 44, 47, 48]

In [15]:
#define type_i_errors and type_ii_errors here
type_i_errors = intersect(experimental_positive, actual_negative)
type_ii_errors = intersect(experimental_negative, actual_positive)

In [16]:
print(type_i_errors)
print(type_ii_errors)

[4, 9, 11, 13, 15, 16, 17, 19, 20, 22, 26, 27, 28, 35, 36, 40, 46, 49]
[6, 25, 29, 30, 33, 42, 44, 47]
