## Statistical Hypothesis Testing
### Null and Alternate Hypothesis
Statistical **Hypothesis Testing** is making an assumption (hypothesis) and testing with the test data to see if the assumption was correct or incorrect. Every hypothesis test, regardless of the data population and other parameters involved, requires the three steps below.
* Making an initial assumption.
* Collecting evidence (data).
* Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

The initial assumption made is called **Null Hypothesis (H-0)** and the alternative (opposite) to the **Null Hypothesis** is called the **Alternate Hypothesis (H-A)**

Two widely used approach to **hypothesis testing** are
* Critical value approach
* p-value approach

The **Critical value** approach involves comparing the observed test statistic to some cutoff value, called the **Critical Value**. If the test statistic is more extreme (i.e. more than the **Upper Critical Value** or less than the **Lower Critical Value**) than the **Critical Value**, then the null hypothesis is rejected in favor of the alternative hypothesis. If the test statistic is not as extreme as the critical value, then the null hypothesis is not rejected.

The **p-value** approach involves determining the probability of observing a more extreme test statistics in the direction of **Alternate Hypothesis**, assuming the null hypothesis were true. 

If the **p-value** is less than (or equal to) **α (the accepted level of p-value)**, then the null hypothesis **is rejected** in favor of the alternative hypothesis. If the P-value is greater than **α (the critical value)**, then the null hypothesis **is not rejected**.

### Z-Score and p-Value
In this section we are just learning the definitions of **Z-Score** and **p-Value** and their inter-relations. In a subsequent section we will use the Z-Score, p-value along with **Level of Confidence** or **Level of Significance** to test a hypothesis (i.e. Reject (i.e. the Alternate Hypothesis is acceptedas the new norm. the Null Hypothesis or Fail to Reject the Null Hypothesis (i.e. Null Hypothesis remains valid)

A **Z-Score** of a sample of data is a score that expresses the value of a distribution in standard deviation with respect to the mean. It shows how far (**how many Standard Deviation**) a specific value of data is from the sample **Mean**.
Z-Score is calcualted by the formula

**z = (X - X-bar)/Std-dev**

where 

X = a Data Value

X-bar = Sample Mean
      
Std-dev = Standard Deviation of the sample

**p-value** of a Data Value is the probability of obtaining a sample data that is "more extreme* than the ones observed in your data assuming the Null Hypothesis is true.

The p-value of a z-score can be obtained from a Statistical Z-Table or using a Python Library function. Here we will use the Python Library function.

**p-value = stats.norm.cdf(z-score)**

However, depending on the data we are trying to test (in the case 53) compared to the currently known data (National Average = 60, Standard Deviation = 3) we may have to use a slightly different formula. Do do that we need to learn the **Left Tail** and **Right Tail** tests.

### Left-Tail, Right-Tail and Two-Tail Tests of Hypothesis
If the data we are trying to test (53) is **less than** the **Mean** (60) we use the **Left Tail Test**. If the data (say the class average was 68 as opposed to 53) is **greater than** the **Mean** (60), we use the **Right Tail Test**.

For a **Right Tail Test** the formula for p-value (again using a Python Library function) is

**p-value =  1- stats.norm.cdf(z-score)**

***p-value for a z-score can be looked up from the Statistical Z-Table***

#### An Example of Z-Score and p-value
Assume that we have the scores of a test in Business Analytics in a class of 100. The Mean of the sample (100 test scores) is 53. The National Average of the same test is 60 with a Standard Deviation of 3. We want to calculate the Z-score and p-value for this class sample (Average is 53) with respect to the National data (Average = 60, Standard Deviation = 3) to test our hypothesis "the class score is similar to the National Average"

Here we will calculate the z-score and corresponding p-value for Case-1 where the **class average is 53** and Case-2 where the **class average is 63**
      

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import scipy.stats as stats

# Example of a Left Tail Test
print('========== Example of a Left Tail Test ============')
# Case-1 where class score mean = 53
print('Class score mean = ', 53)
# Calculating the z-score of 53 with respect to the National Score (Mean = 60, S-Dev = 3)
zscore1 = round((53 - 60)/3, 2)
print('Zscore for mean class score (53) = ', zscore1)
# Since 53 is less than the national average 60 we will do the Left Tail Test
prob1 = round(stats.norm.cdf(zscore1), 6)
print('p-value for the mean class score (53) = ',  prob1)

# Example of a Right Tail Test
print('========== Example of a Right Tail Test ============')
# Case-2 where class score mean = 63
print('Class score mean = ', 63)
# Calculating the z-score of 68 with respect to the National Score (Mean = 60, S-Dev = 3)
zscore2 = round((63 - 60)/3, 2)
print('Zscore for mean class score (63) = ', zscore2)
# Since 68 is more than the national average 60 we will do the Right Tail Test
prob2 = round(1 - stats.norm.cdf(zscore2), 6)
print('p-value for the mean class score (63) = ',  prob2)

Class score mean =  53
Zscore for mean class score (53) =  -2.33
p-value for the mean class score (53) =  0.009903
Class score mean =  63
Zscore for mean class score (63) =  1.0
p-value for the mean class score (63) =  0.158655


### Level of Confidence and Level of Significance
Since the results of statistical test are not **definite proof** of the conclusion, the results are always associsated with a **Level of Confidence** or a **Livel of Significance**. Normally we would strive for a high **Level of Confidence**  or a statistically significant result with high **Level of Significance** when we are testing if a Null Hypothesis is true or the Alternate Hypothesis should replace the Null Hypothesis.

Usually the **Level of Confidence (C)** used are 95% (0.95), 99% (0.99) etc. for the conclusions of a hypothesis testing to be considered **"reliable"**. **Level of Significance** is the inverse of Level of Confidence, i.e. 

**Level of Significance = 1 - Level of Confidence** or S = 1- C. For Level of Confidence of 99% (0.99) the Level of Significance is 0.01 and for the Level of Confidence of 95% (0.95), the Level of Significance is 0.05.

In majority of hypothesis tests a Level of Significance of 0.05 is used. This is called the **Critical Value α** to test the p-value (calculated in the previous step)

If the p-value is **less than** the **Critical Value α**, the test results are considered as "highly significant**. **Critical Value α = 0.01**, by the same token is considered as "very highly significant".

### Hypothesis Testing Using Z-Score, p-Value and Level of Significance
In a hypothesis test using -Score and p-value, if the p-value is less than **Critical Value α** (0.05 in our case), the test is considered statistically highly significant and Alternate Hypothesis is accepted and the Null Hypothesis is rejected and vice versa.

In our test case-1 where the mean class score is 53, the p-value is 0.00993 which is less than the Critical Value α (0.05), the Null Hypothesis, that the mean marks of the class is similar to the national average is **Rejected**

In test case-2 where the mean class score is 66, the p-value is 0.02275 which is more than the Critical Value α (0.05), the Null Hypothesis, that the mean marks of the class is similar to the national average is **Accepted/Retained**

A Two-Tailed test can also be used in the above case using the same concepts of Z-Score, p-value and α, the Critical Significance Level. We will discuss Hypothesis Testing in more details in the **Descriptive Analytics** section.


### Getting p-value from z-score and z-score from p-value
We have already used **stats.norm.cdf(zscore1)** to get p-value from z-score

***p-value = stats.norm.cdf(zscore1)***

Now we will use stats.norm.ppf(p-value) to get z-score from p-value

***z-score = stats.norm.ppf(c-value), remembering, p-value = 1 - c-value***

Let us calculate z-score for the most commonly used **Confidence Levels (C)** of 90% (0.9), 95% (0.95), 98% (0.98) and 99% (0.99), i.e. the most commonly used **Significance Levels (S)** of 0.1, 0.05, 0.02 and 0.01 respectively

In [2]:
import scipy.stats as stats
from scipy.stats import norm

z_score_1 = stats.norm.ppf(0.9) # for C= 0.9 i.e. p = 0.1
print(z_score_1)
z_score_2 = stats.norm.ppf(0.95) # for C= 0.95 i.e. p = 0.05
print(z_score_2)
z_score_3 = stats.norm.ppf(0.98) # for C= 0.98 i.e. p = 0.02
print(z_score_3)
z_score_4 = stats.norm.ppf(0.99) # for C= 0.99 i.e. p = 0.01
print(z_score_4)
# For 2-tail test the corresponding z-scores are (+-)1.645, 1.96, 2.33 and 2.575 respectively (show calc with α/2 )
print("===================================================================")
z_score_5 = stats.norm.ppf(0.95) # for C= 0.95 i.e. p = 0.05 on each tail
print(z_score_5)
z_score_6 = stats.norm.ppf(0.975) # for C= 0.975 i.e. p = 0.025 on each tail
print(z_score_6)
z_score_7 = stats.norm.ppf(0.99) # for C= 0.99 i.e. p = 0.01 on each tail
print(z_score_7)
z_score_8 = stats.norm.ppf(0.995) # for C= 0.995 i.e. p = 0.005 on each tail
print(z_score_8)
z_score_9 = stats.norm.ppf(0.900) # for C= 0.900 i.e. p = 0.01 on each tail
print(z_score_9)

1.2815515655446004
1.6448536269514722
2.0537489106318225
2.3263478740408408
1.6448536269514722
1.959963984540054
2.3263478740408408
2.5758293035489004
1.2815515655446004


In [3]:
std_dev = 0.5
for x in [0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99]:
    z_score = stats.norm.ppf(x)
    margin_of_error = round((1-x), 2)
    sample_size = round(round((z_score**2) * (std_dev * (1- std_dev)), 4)/round((margin_of_error**2), 4), 2)
    print('Confidence Level =', x, 'Margin of Error = ', margin_of_error, 'Z-Score = ', z_score, ' Standard Deviation = ', std_dev, 'Sample Size = ', sample_size)
    

Confidence Level = 0.9 Margin of Error =  0.1 Z-Score =  1.2815515655446004  Standard Deviation =  0.5 Sample Size =  41.06
Confidence Level = 0.91 Margin of Error =  0.09 Z-Score =  1.3407550336902165  Standard Deviation =  0.5 Sample Size =  55.48
Confidence Level = 0.92 Margin of Error =  0.08 Z-Score =  1.4050715603096329  Standard Deviation =  0.5 Sample Size =  77.12
Confidence Level = 0.93 Margin of Error =  0.07 Z-Score =  1.475791028179171  Standard Deviation =  0.5 Sample Size =  111.12
Confidence Level = 0.94 Margin of Error =  0.06 Z-Score =  1.5547735945968535  Standard Deviation =  0.5 Sample Size =  167.86
Confidence Level = 0.95 Margin of Error =  0.05 Z-Score =  1.6448536269514722  Standard Deviation =  0.5 Sample Size =  270.56
Confidence Level = 0.96 Margin of Error =  0.04 Z-Score =  1.7506860712521692  Standard Deviation =  0.5 Sample Size =  478.88
Confidence Level = 0.97 Margin of Error =  0.03 Z-Score =  1.8807936081512509  Standard Deviation =  0.5 Sample Size 

### Example Scenarios of Different Types of Hypothesis Tests
#### Example - 1

*** A company has stated that they make straw machine that makes straws that are 4 mm in diameter. A worker belives that the machine no longer makes straws of this size and samples 100 straws to perform a hypothesis test with 99% Confidence level. Write the null and alternate hypothesis and any other related data.***

                   H-0: µ = 4 mm H-a: µ != 4 mm n = 100, C = 0.99, Critical Value α = 1 - C = 0.01 

#### Example - 2
*** Doctors believe that the average teen sleeps on average no longer than 10 hours per day. A researcher belives that the teens sleep longer. Write the H-0 and H-a***

                   H-0: µ <= 10   H-a: µ > 10
                   
#### Example - 3
*** The school board claims that at least 60% of students bring a phone to school. A teacher believes this number is too high and randomly samples 25 students to test at a Significance Level of 0.02. Write the H-0, H-a and other related informations***

                  H-0: p >= 0.60  H-a: p < 0.60  n = 25  Critical Value α = 0.02   C = 1 - α = 1- 0.02 = 0.98 (98%)
                  
With the available information, it is possible to write the **null** and **alternate** hypotheses, but in these examples we do not have enough information to test them.

Recall the steps of hypothesis tests outlined above

* Write the hypotheses H-0 and H-a
* Given µ, standard deviation calculate the z-score for the number to be tested using formula z = (X-bar - µ)/Std-dev
* Calculate the p-value using the python function p-value = 1- stats.norm.cdf(z-score)
* Given Significance Level Critical Value α or given Confidence Level calculate Critical Value α = 1-C
* For **Left Tail** test use the p-value calculated
* For **Right Tail Test** p-value = 1- (calculated p-value)
* For **Two Tail Test** compare the calculated p-vlaue with  α/2
* If the calculated p-value is **less** than Critical Value α, **reject** Null Hypothesis else **fail to reject** the Null Hypothesis

***Note: If H-a has <, it is a Left Tail Test, if H-a has >, it is a Right Tail Test, if H-a has != it is a 2-Tail Test***

So, to be able to test the hypothesis we need to have x (the value to be tested), x-bar (sample mean), std-dev (sample standard deviation, required Confidence Level or the required Significance Level.

In the next example we will go through these steps (assuming all the necessary information are given)

#### Example - 4
Records show that students on average score less than or equal to 850 on a test. A test prep company says that the students who take their course will score higher than this. To test, they sample 1000 students who score on an average of 856 with a standard deviation of 98 after taking the course. At 0.05 Significance Level, test the company claim.

            H-0: µ <= 850  H-a: µ > 850  n = 1000  x-bar = 856  std-dev = 98  α = 0.05 (C = 0.95 or 95%)
       
Let's calculate the z-score and p-value to test the hypothesis. It is a **Right Tail Test**


In [5]:
import numpy as np
from scipy.stats import norm

x_bar = 856
µ = 850
s_dev = 98
z_score = (x_bar - µ)/s_dev
print("Z-score = ", z_score)
p_value = (1 - norm.cdf(z_score)) # since it is a Right Tail test
print("p-value = ", p_value)

Z-score =  0.061224489795918366
p-value =  0.4755902131389005


***Since the calculated p-value is greater than α (0.05) we fail to reject  the null hypothesis, i.e. company claim is invalid or NOT Statistically Significant***

#### Example - 5
A newspaper reports that the average age a woman gets married is 25 years or  less. A researcher thinks that the average age is higher. He samples 213 women and gets an average of 25.4 years with standard deviation of 2.3 years. With 95% Confidence Level, test the researcher's claim.

Let's calculate the z-score and p-value to test the hypothesis. It is a **Right Tail Test**


        H-0: µ <= 25  H-a: µ > 25  n = 213  x-bar = 25.4  s-dev = 2.3  C = 95% = 0.95  α = 0.05

Let's calculate the z-score and p-value to test the hypothesis. It is a **Right Tail Test**

In [6]:
import numpy as np
from scipy.stats import norm

x_bar = 25.4
µ = 25
s_dev = 2.3
z_score = (x_bar - µ)/s_dev
print("Z-score = ",z_score)
p_value = (1 - stats.norm.cdf(z_score)) # since it is a Right Tail test
print("p-value = ", p_value)

Z-score =  0.17391304347826025
p-value =  0.43096690081487876


***Since the calculated p-value is greater than α (0.05) we fail to reject  the null hypothesis, i.e. researcher's claim is invalid or NOT Statistically Significant***

#### Example - 6
A study showed that on an average women in a city had 1.48 kids. A researcher believes that the number is wrong. He surveys 128 women in the city and finds that on an average these women had 1.39 kids with standard deviation of 0.84 kids. At 90% Confidence Level, test the claim.

    H-0: µ = 1.48 H-a: µ != 1.48   n = 128   x-bar = 1.39   s-dev = 0.84   C = 90% = 0.9. 
    
    
Let's calculate the z-score and p-value to test the hypothesis. It is a **Two Tail Test**. This is a Two Tailed Test, so critical value = (1 -c) /2 = 0.05
    


In [7]:
import numpy as np
from scipy.stats import norm

x_bar = 1.39
µ = 1.48
s_dev = 0.84
z_score = (x_bar - µ)/s_dev
print("Z-score = ", z_score)
p_value = stats.norm.cdf(z_score) # since it is a Two Tail test
print("p-value = ",p_value)

Z-score =  -0.10714285714285725
p-value =  0.4573378238740764


***Since the calculated p-value is greater than α/2 (0.05) we fail to reject  the null hypothesis, i.e. researcher's claim is invalid or NOT Statistically Significant***

#### Example - 7
The government says the average weight of males is 162.9 pounds or greater. A researcher thinks this is too high. He does a study of 39 males and gets an average weight of 160.1 pounds with a standard deviation of 1.6 pounds. At 0.05 Significance Level, test the claim.

    H-0: µ >= 162.9   H-a: µ < 162.9   n = 39    x-bar = 160.1    s-dev = 1.6   α = 0.05

Let's calculate the z-score and p-value to test the hypothesis. It is a **Left Tail Test**

In [8]:
import numpy as np
from scipy.stats import norm

x_bar = 160.1
µ = 162.9
s_dev = 1.6
z_score = (x_bar - µ)/s_dev
print("Z-score = ", z_score)
p_value = stats.norm.cdf(z_score) # since it is a Left Tail test
print("p-value = ",p_value)

Z-score =  -1.750000000000007
p-value =  0.040059156863816475


***Since the calculated p-value is less than α (0.05) we reject  the null hypothesis, i.e. researcher's claim is valid or Statistically Significant***

