# Hypothesis Testing, Statistical Analysis and Drawing Conclusions from data 

Conducting stastical tests and testing hypotheses are some of the fundamentals when it comes to analyzing data and drawing conclusions. The NumPy library is a particuarly powerful tool for data manipulation and we will see it come into play in some of the examples below.

In [17]:
import numpy as np

## Hypothesis Testing

Whenever analyzing a dataset or conducting a study to test the validity of a given hypothesis, the outcome may revise, reject or retain the hypothesis and this helps us determine the acceptability of the hypothesis and the theory from which it may be derived. 

#### Step 1: The Hypotheses

All statistical tests attempt to choose between two views of the world. Specifically, the choice is between two views about how the data were generated. These two views are called hypotheses.

The null hypothesis. This is a clearly defined model about chances. It says that the data were generated at random under clearly specified assumptions about the randomness. The word "null" reinforces the idea that if the data look different from what the null hypothesis predicts, the difference is due to nothing but chance.

From a practical perspective, the null hypothesis is a hypothesis under which you can simulate data.

On the other hand, the alternative hypothesis states that some external factor (not chance) is responsible for the given data differing from the prediction made in the null hypothesis. 

###### Example

Let us consider the example of an experiment analyzing whether a given coin is fair or biased. On tossing this coin 2000 times, we get heads 950 times. We will aim to understand whether this is down to chance or not.

#### The Null Hypothesis: The coin is fair, and any variation from the expected results (1000 heads) are due to only random chance.

#### The Alternative Hypothesis:  The coin is unfair and there is something other than chance causing the results.


We can use a Python array to represent a fair coin, with the two possible outcomes being heads and tails. We can use Python to make choices at random and this is where NumPy comes in to play. In numpy, there is a sub-module called random that contains many functions that involve random selection. One of these functions is called choice. It picks one item at random from an array, and it is equally likely to pick any of the items.

In [18]:
coin = np.array(['Heads', 'Tails'])

In [19]:
two_thousand_tosses = np.random.choice(coin, 2000) #equal chance of heads and tails

In [20]:
np.count_nonzero(two_thousand_tosses == 'Heads')

994

#### Step 2: The Test Statistic

In order to decide between the two hypothesis, we must choose a statistic that we can use to make the decision. This is called the test statistic. While a number of statistics may be appropriate in various situations, in our scenario exploring the fairness of the given coin we could simply use |the number of heads - 1000|. We would then compare the observed number of this test statistic to what we see under the null hypothesis. 

#### Step 3: The Distribution of the Test Statistic, Under the Null Hypothesis 

The main computational aspect of testing hypotheses is figuring out what the values of the test statistic might be if the null hypothesis were true. The test statistic is simulated based on the assumptions of the model in the null hypothesis. That model involves chance, so the statistic comes out differently when you simulate it multiple times.

Let's use Python to create a simulation of tossing a fair coin 2000 times. In orded to increase the reliability of our simulation, it is important to repeat 2000 coin tosses as many times as possible. In order to do this 10000 times, we can use iteration which basically involves employing a for loop.



In [21]:
heads = np.array([]) #will collect our simulated values
for i in np.arange(10000):
    outcomes = np.random.choice(coin, 2000)
    num_heads = np.count_nonzero(outcomes == 'Heads')
    heads = np.append(heads, num_heads)

In [22]:
len(heads)

10000

#### Step 4: The Conclusion of the Test 

The choice between the null and alternative hypotheses depends on the comparison between what you computed in Steps 2 and 3: the observed value of the test statistic and its distribution as predicted by the null hypothesis. If the two are consistent with each other, then the observed test statistic is in line with what the null hypothesis predicts. If the data do not support the null hypothesis, we say that the test rejects the null hypothesis.

#### Deciding whether the observed value of the test statistic and its distribution under the null hypothesis are consistent (p-value, cutoffs)

In [23]:
observed_test_stat = abs(1000 - 950) #where 950 is the observed number of heads
observed_test_stat

50

In [24]:
test_statistic_simulation = abs(heads - 1000)
test_statistic_simulation

array([37., 23., 18., ..., 14., 25., 22.])

Next, we'll try to find the chance of getting heads less than, or equal to, 950 times or more than, or equal to, 1050 times. 

In [25]:
np.count_nonzero(test_statistic_simulation >= observed_test_stat)/10000 #10000 is the number of reptitions

0.0264

The chance that we would get heads 950 times or less, or 1050 times or more, is < 5% when we flip a fair coin 2000 times, under 10000 simulations. This chance is called the observed significance level of the test aka the P-value. 

The P-value of a test is the chance, based on the model in the null hypothesis, that the test statistic will be equal to the observed value in the sample or even further in the direction that supports the alternative

If a P-value is small, that means the tail beyond the observed statistic is small and so the observed statistic is far away from what the null predicts. This implies that the data supports the alternative hypothesis better than they support the null.

##### Convention

If the P-value is less than 5%, it is considered small and the result is called "statistically significant."
If the P-value is even smaller – less than 1% – the result is called "highly statistically significant."

By this convention, our p-value of 2.75% is considered small and the result is indeed 'statistically significant'. Therefore, the null hypothesis in our experiment is not supported and the difference between the number of heads cannot be concluded to be solely due to chance. 