# Hypothesis Testing and Statistical Significance

This tutorial is a part of the [Zero to Data Analyst Bootcamp by Jovian](https://www.jovian.ai/data-analyst-bootcamp)

![](https://i.imgur.com/St2xfG8.jpg)


Hypothesis testing is a technique used by statisticians, scientists and data analysts for measuring whether the results of an experiment are meaningful and reliable. The statistical significance of the results of an experiment is often quantified using a P-value. This tutorial aims to build intuition for hypothesis testing using some real-world examples.




### How to Run the Code

The best way to learn the material is to execute the code and experiment with it yourself. This tutorial is an executable [Jupyter notebook](https://jupyter.org). You can _run_ this tutorial and experiment with the code examples in a couple of ways: *using free online resources* (recommended) or *on your computer*.

#### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


#### Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up [Python](https://www.python.org), download the notebook and install the required libraries. We recommend using the [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) distribution of Python. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.





## Problem Statement


Let's work through a real-world example to understand how statistical tests are performed:


> **QUESTION**: You're an analyst at the investment firm Capital Ventures, and you're evaluating the company [Jovian](https://www.jovian.ai) for a potential investment. The founders of Jovian claim that completing a data science bootcamp offered by Jovian  helps you land a data science job faster. 
>
> <img src="https://i.imgur.com/A22XaOy.png" width="480" style="border-radius:4px">
>
> A 2020 McKinley report suggests that candidates apply for an average of `37` data science job roles before getting hired. You've surveyed `42` Jovian bootcamp graduates who are now working in data science roles, and compiled data for the number of jobs each one applied to before getting hired: `31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39`
>
> Is there a **statistically significant** decrease in the number of jobs candidates need to apply to before getting hired if they've completed a bootcamp offered by Jovian?




In [1]:
jobs_applied = [31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 
                27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 
                16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 
                25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39]

In [2]:
len(jobs_applied)

42

Let's compute the mean of `jobs_applied` and compare it with the industry average.

In [3]:
import numpy as np

In [4]:
np.mean(jobs_applied)

32.04761904761905

The number is in fact lower than the industry average of 37. However, is this number reliable? Do you feel confident about making an investment in Jovian?

Before you make up your mind, consider the following scenarios:

- Suppose the average of `jobs_applied` was `35.2`. Would you still believe the founders' claims? What about `36.3` or `36.9`? 
- Suppose only `18` Jovian graduates were surveyed, instead of `42`. Would the average be reliable then? What if you have had surveyed `6` graduates? What if you had surveyed `75`?
- If you survey two groups of Jovian graduates, you are likely to get different observations and averages. Which set of observations can you rely on more?
- Even if the average for Jovian graduates is significantly lower the the industry average, can you say confidently that it's because of the bootcamp, and not some other factors?

In which of the above case is it likely that the observations are the result of random chance rather than an actual improvement?

Clearly, we need a way to **quantify our confidence** in the results based on the data we've collected. That's exactly what hypothesis testing is: a statistical technique for validating claims using sample observations.

The goal of **hypothesis testing** is to determine the **statistical significance** of sample observations supporting a claim by quantifying how likely they are to occur due to random chance.

![](https://www.omniconvert.com/blog/wp-content/uploads/2015/01/Is-that-test-Statistically-Significant-.png)

## Null and Alternate Hypothesis

Before we can determine the validity of a claim or a **hypothesis**, it's important to state it as precisely as possible.

1. **The Null Hypothesis** refers to the currently accepted hypothesis (industry average). It assumes that the claim we're trying to validate is invalid and new observations are the result of random chance. It is often denoted by $H_0$.

2. **The Alternate Hypothesis** refers to the new claim that we wish to validate. It is often denoted by $H_1$. 

Here's how a statistical test works:

* The null hypothesis and the alternate hypothesis must be mathematical opposites. 
* We always start out by assuming that the null hypothesis is true i.e. nothing has changed. 
* A statistical test attempts to reject the null hypothesis, thereby validating the alternate hypothesis.
* It does so by computing the probability of getting the sample observations assuming the null hypothesis is true. 
* If the probability is low, then the null hypothesis can be rejected, otherwise we fail to reject the null hypothesis.


> **QUESTION**: State the null and alternate hypothesis for the problem stated above.


The null and alternate hypotheses are commonly stated in the terms of the averages:

1. **Null Hypothesis ($H_0$)**: Jovian bootcamp graduates need to apply for an average of _at least_ 37 data science jobs before getting hired.

2. **Alternate Hypothesis ($H_1$)**: Jovian bootcamp graduates need to apply for _fewer than_ 37 data science jobs before getting hired.

Let $\mu$ (pronounced "myu") denote "the average number of data science jobs Jovian bootcamp graduates need to apply for before getting hired." We can now state the null and alternate hypotheses mathematically as follows:

$H_0: \mu \ge 37$

$H_1: \mu < 37$

Note how the two statements are mathematical opposites.

Let's save our work before continuing.

In [5]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/hypothesis-testing-and-statistical-significance" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance[0m


'https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance'

## Computing the Test Statistic

Once the hypotheses are stated, we compute a "test statistic" using the sample observations, assuming that the null hypothesis is true. A commonly used test statistic is the "Z statistic", which has the following formula:

<img src="https://i.imgur.com/AUJX4qi.png" width="120">

where:

* $\overline{X}$ is the sample mean (computed using the observed values)
* $\mu$ is the population mean (stated in the null hypothesis)
* $\sigma$ is the population standard deviation (if unavailable, use sample standard deviation as an approximation)
* $n$ is the number of samples collected


The **Z statistic** calculates how far away the sample mean $\overline{X}$ (which comes from observed values) is from the distribution mean $\mu$ (which comes from the null hypothesis), assuming that the null hypothesis is true. 

Here's an intuitive explanation of how different parameters in the formula affect the Z statistic:

* **Sample mean and population mean**: As the sample mean $\overline{X}$ gets further away from the population mean $\mu$, the magnitude of the Z statistic increases. 


* **Standard deviation**: The difference in means is divided population standard deviation $\sigma$ to correct for the spread in the distribution. If a distribution has a high standard variation i.e. a greater spread, the sample mean needs to quite far away 


* **Sample size**: We also divide the standard deviation by $\sqrt{n}$. This effectively amounts to multiplying the the difference in means by $\sqrt{n}$. This indicates that when we pick large samples, we can be more confident is our results, even if the different in means is small.


A Z statistic with a high magnitude indicates there's a very low chance that the observations were a result of random chance. Here's a graph showing the distribution Z statistics for various samples drawn from a population. Each dot in the above graph shows the relative likelihood of a particular Z statistic. 

<img src="https://i.imgur.com/JLtIq6O.png" width="480">

The distribution of Z values for various samples drawn from a population is _assumed_ to be a standard normal distribution. 

> **QUESTION**: Compute the Z-statistic for the problem stated above.


We can compute the Z-statistic step by step using the formula stated above:

<img src="https://i.imgur.com/AUJX4qi.png" width="120">

where:

* $\overline{X}$ is the sample mean (computed using the observed values)
* $\mu$ is the population mean (stated in the null hypothesis)
* $\sigma$ is the population standard deviation (if unavailable, use sample standard deviation as an approximation)
* $n$ is the number of samples collected





In [8]:
print(jobs_applied)

[31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39]


Let's compute each term used in the formula:

In [9]:
population_mean = 37

sample_mean = np.mean(jobs_applied)
sample_std = np.std(jobs_applied) # population standard deviation is unavailable
sample_size = len(jobs_applied)

In [10]:
print('Sample mean: {:.2f}'.format(sample_mean))
print('Population mean:', population_mean)
print('Sample standard deviation: {:.2f}'.format(sample_std))
print('Sample size:', sample_size)

Sample mean: 32.05
Population mean: 37
Sample standard deviation: 11.70
Sample size: 42


We can now compute the Z statistic.

In [11]:
z_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

In [12]:
z_statistic

-2.743522669932238

Let's create a helper function to compute the Z statistic given a set of observations, a population mean and standard deviation.

In [13]:
def compute_z(observations, population_mean, std=None):
    sample_mean = np.mean(observations)
    if std is None:
        std = np.std(observations)
    sample_size = len(observations)
    z = (sample_mean - population_mean) / (std / np.sqrt(sample_size))
    return z

In [14]:
compute_z(jobs_applied, population_mean)

-2.743522669932238

As we might expect, the Z statistic has a negative value `-2.74`. 

<img src="https://i.imgur.com/JLtIq6O.png" width="480">



Let's save our work before continuing.

In [15]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/hypothesis-testing-and-statistical-significance" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance[0m


'https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance'

## Left Tail, Right Tail and Two Tail Tests

The Z statistic can be used to compute the probability of getting the sample `jobs_applied` assuming the null hypothesis is true. If this probability low enough, then we can reject the null hypothesis (and accept the alternate hypothesis).

In [16]:
print(jobs_applied)

[31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39]


However, recall that the probability of getting any _specific_ set of observations from a continuous distribution is zero. Rather, we should be asking the question: What is the probability of getting a sample "equally or more extreme" than observed data. 

Or, in the context of the question we're solving, what is the probability of getting a sample of 35 students with a mean equal to lower than the mean of `jobs_applied`.

To compute this probability, we need to determine the range of Z values that support the alternate hypothesis.

![](https://i.imgur.com/rtLYm3c.png)

Depending on the alternate hypothesis, a test can be left-tailed, right-tailed or two-tailed. The shaded region in the above graphs shows the range of Z values for samples "equal to or more extreme" than observed values. 

**QUESTION**: Is the problem stated above a left-tailed, right-tailed or two-tailed test?

Let's recall the two hypotheses and the Z statistic:

$H_0: \mu \ge 37$

$H_1: \mu < 37$

$Z = -2.74$

A "more extreme" sample would have a lower mean than the observed data, so the Z statistic would take a larger negative value. Thus, the test is left-tailed.

**EXERCISE**: Come up with two examples each of left tailed, right tailed and two tailed tests.

Let's save our work before continuing.

In [17]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/hypothesis-testing-and-statistical-significance" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance[0m


'https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance'

## Computing the $p$ value

The $p$ value for a statistical test is the the probability of obtaining a sample “equally or more extreme” than the observed data, assuming that the null hypothesis is true.

Once we have a Z-score for the observations, and we know whether the test is left-tailed, right-tailed or two-tailed, the $p$ value is simply the area of the shaded region in the graph.

![](https://i.imgur.com/rtLYm3c.png)

It was common practice to use a Z table to estimate the area of a region. The table shows the area under the curve on the left side of a value for several positive and negative Z scores.

![](https://i.imgur.com/PPpG2wk.png)


We case use the `norm.cdf` function from the `scipy.stats` module of the Scipy Python library to compute this value without having to look up a Z table.

In [18]:
!pip install scipy --upgrade --quiet

In [19]:
from scipy.stats import norm

In [20]:
norm.cdf(-2.2)

0.013903447513498595

In [21]:
norm.cdf(1.3)

0.9031995154143897

Here's how we can compute the p-value for left-tailed, right tailed and two tailed tests:


* **Left tailed**: In this case, the Z statistic is negative, and the p-value is the area to the left of the observed Z statistic, so it can be computed simply as `norm.cdf(z)`




* **Right tailed**: In this case, the Z statistic is positive, and the value p-value is the area to the right of the observed Z statistic, so it can be computed as `1 - norm.cdf(z)` (since the total area under the curve representing the probability of all possible z values is 1).




* **Two tailed**: In this case, we need to consider both the positive and negative values of the Z statistic. The p-value is the sum of the area to the left of the negative Z statistic and the area to the right of the positive z statistic, so it can be computed as `norm.cdf(-z)` + `1 - norm.cdf(z)` (where `z` indicates the absolute value of the Z statistic)


Let's create helper functions for the above cases.

In [22]:
from scipy.stats import norm

def p_left_tailed(z):
    return norm.cdf(z)

def p_right_tailed(z):
    return 1 - norm.cdf(z)

def p_two_tailed(z):
    z_abs = abs(z)
    return norm.cdf(-z_abs) + (1 - norm.cdf(z_abs))

> **QUESTION**: Compute the $p$ value for the problem stated above.

Here's a summary of the problem:

$H_0: \mu \ge 37$

$H_1: \mu < 37$

$Z = -2.74$

Since this is a left-tailed test, it follows that the `p-value` is the area to the left of the z statistic.

<img src="https://i.imgur.com/JLtIq6O.png" width="480">

In [23]:
z_statistic

-2.743522669932238

In [24]:
p_value = norm.cdf(z_statistic)

In [25]:
print('The p value of the observed data is {:.5f}'.format(p_value))

The p value of the observed data is 0.00304


Recall that the $p$ value for a statistical test is the the probability of obtaining a sample “equally or more extreme” than the observed data, assuming that the null hypothesis is true.

Here are the null and alternate hypotheses for the problem restated in simple words:

1. **Null Hypothesis ($H_0$)**: Jovian bootcamp graduates need to apply for an average of _at least_ 37 data science jobs before getting hired.

2. **Alternate Hypothesis ($H_1$)**: Jovian bootcamp graduates need to apply for _fewer than_ 37 data science jobs before getting hired.


Here's what our p value tells us:

If Jovian bootcamp graduates need to apply for an average of at least 37 data science jobs to get hired, then probability of getting the observed values of `jobs_applied` is 0.00304 or 0.3%.

Do you think we can reject the null hypothesis based on this probability? Why or why not?


## Significance Level and Confidence Level

The $p$ value is also known as the **significance level** of the observations and $1 - p$ is knows as the **confidence level** of the observations. They are often stated in percentages. The confidence level indicates how confident we feel about the validity of the claim we're testing (the alternate hypothesis).

In the above case, the confidence level of the claim made by Jovian's founders is 99.7%.

It's common practice to decide the significance level for rejecting the null hypothesis before the test is performed. If the p-value is lower than the decided significance level, we say that the results are statistically significant.

For example, if we decide on a significance level of 0.01 (or 99% confidence level), then the null hypothesis can be rejected if the $p$ value is lower than 0.01.

> **Interpretation of p value**: A p value lower than 0.01 the probability of getting the observed data by random chance is less than 1%.

Common choices for significance level are 0.1 (90% confidence), 0.05 (95% confidence) and 0.01 (99% confidence). 

<img src="https://i.imgur.com/gBzN0lF.png" width="480">

In [26]:
population_mean

37

In [27]:
print(jobs_applied)

[31, 23, 19, 42, 37, 18, 7, 53, 33, 17, 27, 41, 36, 29, 60, 34, 21, 18, 45, 33, 16, 10, 48, 32, 19, 29, 40, 35, 28, 57, 25, 31, 19, 40, 37, 33, 38, 28, 40, 36, 42, 39]


In [29]:
p_value

0.0030391925528473413

Now that you know about hypothesis testing, try to answer the following questions:


- Suppose the average of `jobs_applied` was `35.2`. Would you still believe the founders' claims? What about `36.3` or `36.9`? 
- Suppose only `18` Jovian graduates were surveyed, instead of `42`. Would the average be reliable then? What if you have had surveyed `6` graduates? What if you had surveyed `75`?
- If you survey two groups of Jovian graduates, you are likely to get different observations and averages. Which set of observations can you rely on more?
- Even if the average for Jovian graduates is significantly lower the the industry average, can you say confidently that it's because of the bootcamp, and not some other factors?



Note the hypothesis testing assumes that a sample was drawn randomly from the population. By choosing a biased sample, you can "hack" the test to get any p value of your choice. p-hacking, unfortunately, is not a very uncommon practice in the research and industry. Before relying on p values, always ask the question:

![](https://i.imgur.com/dpVvPBJ.png)

Let's save our work before continuing.

In [30]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/hypothesis-testing-and-statistical-significance" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance[0m


'https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance'

## Step-by-Step Process for Hypothesis Testing

Here's a summary of the steps involved in hypothesis testing:

### Step 1. State the null hypothesis $H_0$ and alternate hypotheses $H_1$ as mathematical relations. 

The hypothesis are generally stated in terms of an average/mean $\mu$.

$H_0: \mathord{?}$

$H_0: \mathord{?}$

Start by assuming the null hypothesis is true.


### Step 2. Compute the test Z-statistic 

<img src="https://i.imgur.com/AUJX4qi.png" width="120">

where:

* $\overline{X}$ is the sample mean (computed using the observed values)
* $\mu$ is the population mean (stated in the null hypothesis)
* $\sigma$ is the population standard deviation (if unavailable, use sample standard deviation as an approximation)
* $n$ is the number of samples collected


### Step 3. Identify whether the test is left tailed, right tailed or two-tailed.

![](https://i.imgur.com/rtLYm3c.png)

### 4. Compute the p-value using the Z-table or `norm.cdf`

**Note**: If the sample size is smaller than 30, use [a T-table](https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf) instead of a Z-table for more reliable results.

* **Left tailed**: In this case, the Z statistic is negative, and the p-value is the area to the left of the observed Z statistic, so it can be computed simply as `norm.cdf(z)`


* **Right tailed**: In this case, the Z statistic is positive, and the value p-value is the area to the right of the observed Z statistic, so it can be computed as `1 - norm.cdf(z)` (since the total area under the curve representing the probability of all possible z values is 1).


* **Two tailed**: In this case, we need to consider both the positive and negative values of the Z statistic. The p-value is the sum of the area to the left of the negative Z statistic and the area to the right of the positive z statistic, so it can be computed as `norm.cdf(-z)` + `1 - norm.cdf(z)` (where `z` indicates the absolute value of the Z statistic)

<img src="https://i.imgur.com/gBzN0lF.png" width="480">

The results are **statistically significant** and the null hypothesis can be rejected if the p-value is lower than a predetermined significance level.

Let's save our work before continuing.

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m


## Standard Deviations for Common Distributions

Determining the standard deviation $\sigma$ in the formula for the $Z$ statistic can be tricky sometimes.

<img src="https://i.imgur.com/AUJX4qi.png" width="120">

If we know the underlying distribution of the population from which a sample is drawn, it's possible to compute the mean $\mu$ and the standard deviation $\sigma$  of the population.


#### Bernoulli Distribution

There are two possible outcomes: "success" or "failures" and the probability of success is $p$.

$\mu = p$ (mean is simply the probability of success) 

$\sigma = \sqrt{p(1-p)}$

#### Binomial Distribution

The number of successful outcomes when an experiment is repeated $n$ times with a probability of success $p$.

$\mu = np$ (the average no. of successes)

$\sigma = \sqrt{np(1-p)}$

#### Uniform Discrete Distribution

There are $n$ equally likely outcomes

$\mu = \textrm{average of outcomes}$

$\sigma = \sqrt{\frac{n^2 - 1}{12}}$

#### Uniform Continuous Distribution

All values in the range $(a, b)$ are equally likely

$\mu = (a+b)/2$

$\sigma = (b-a) /\sqrt{12}$

#### Exponential Distribution

$\mu = 1 / \lambda$

$\sigma = 1 / \lambda = \mu$

where is the parameter of the probability density function $f(x) = \lambda e^{-\lambda x}$.

#### Normal Distribution

A normal distribution is already defined in terms its mean $\mu$ and standard deviation $\sigma$

<img src="https://i.imgur.com/Pr60Ejg.png" width="240" style="margin-left:0">


#### All Other Cases


If you don't know the distribution of the population, or are unsure,  the standard deviation of the observed values can be used as an approximation for $\sigma$ in the formula for Z score.

> **EXERCISE**: A coin is tossed 1000 times and results in 476 heads. Is the coin biased? Use a significance level of 0.01.



> **EXERCISE**: A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.

> **EXERCISE**: Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an effect.

> **EXERCISE**: A researcher believes that the mean age in India, listed as 21 on Wikipedia, is incorrect. He picks a random sample of 200 people and finds that they have a mean age of of 21.7. The population standard deviation of 5. Test the researcher's hypothesis at a significance level of 0.05.

> **EXERCISE**: Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for 15 25-yard freestyle swims. For the 15 swims, Jeffrey’s mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds. Conduct a hypothesis test using a significance level of 0.05.

In [7]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/hypothesis-testing-and-statistical-significance" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance[0m


'https://jovian.ai/aakashns/hypothesis-testing-and-statistical-significance'

## Questions for Revision
1.	Define hypothesis.
2.	How does testing of hypothesis help an analyst in making a decision?
3.	How does one construct a Null hypothesis?
4.	Define the following terms: (i.) Small sample (ii.) Large sample
5.	What are the types of errors in testing of hypothesis? Explain each with an example.
6.	What is the relationship between α and β?
7.	What is a critical region?
8.	What is degree of freedom in testing of hypothesis?
9.	How are µ (mu) and x̄ (x-bar) related in hypothesis testing?
10.	What is the probability of getting any specific set of observations from a continuous distribution?
11.	What is the assumption we make for computing the p-value?
12.	If a sample has larger negative z statistic i.e. lower mean than the observed data, which tail is the test?
13.	How do you calculate p-value when the z-statistic is positive?
14.	In two-tailed condition, how do we calculate the p-value?
15.	What test can we conduct to check the equality of variances?
16.	What is standard error?
17.	In Z-test, when do we accept alternate hypothesis?
18.	When do we say that a result is statistically significant?
19.	What is p-hacking in testing of hypothesis?
20.	What are the four important steps in testing of hypothesis?
21.	Why is it important to check for normality of data before conducting a test?
22.	What problems does biased sample cause during hypothesis testing?

## Solutions for Exercises

In [3]:
import math
from scipy.stats import norm

> **EXERCISE**: A coin is tossed 1000 times and results in 476 heads. Is the coin biased? Use a significance level of 0.01.

**Solution:**
- $\mu = \text{Probability of getting a heads when a dice is tossed}$
- Let's begin with writing down our null and alternate hypothesis:
    - $H_0: \text{The coin is unbiased i.e }(\mu = 0.5)$
    - $H_1: \text{The coin is biased i.e } (\mu \neq 0.5)$ 

Assuming the null hypothesis $H_0$ is true.

In [4]:
n = 1000  # Sample Size
sample_mean = 476/1000
pop_mean = 0.5
std_dev = math.sqrt(0.5*0.5)

Throwing a coin is the perfect example of a bernoulli distribution. Which is why we can directly use the formula for standard deviation. 

$\sigma = \sqrt{p(1-p)}$

Now, let's calculate the z-statistic:

In [5]:
z_statistic = (sample_mean - pop_mean) / (std_dev/math.sqrt(n)) 
z_statistic

-1.5178932768808235

To check what tailed test this is, let's look at the graph. 

Our Alternate Hypothesis $H_1$ is $\mu \neq 0.5$ so we are looking at a two tailed test.

![Imgur](https://i.imgur.com/MKl5Q9G.jpg)

The coin will be biased in both cases. i.e if we get 476 heads and other tails, or if we get 476 tails and other heads. Both times the coin will biased.  

Hence: **Two tailed test**

In [8]:
z = abs(z_statistic)

In [9]:
#Left Plus Right Side
p_value = norm.cdf(-z) + (1-norm.cdf(z))
p_value

0.12904130509946768

So, if the null hypothesis is true, there is a 12.9% chance that we get these many number of heads. 

Given the significance level = 0.01, we can say that the coin is biased only if the coin has less than 1% chance of getting 476 heads. But since the p-value is much more than 1%, we cannot reject the null hypothesis.

Hence, we cannot say the coin is biased.

> **EXERCISE**: A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.

**Solution:**
- $\mu = \text{The avg IQ of the students from the school}$
- Let's begin with writing down our hypothesis:
    - $H_0: \text{The average IQ is =< the average IQ i.e }(\mu <= 100)$
    - $H_1: \text{The students have above average IQ. i.e } (\mu > 100)$

In [2]:
n = 30  # Sample Size 
sample_mean = 112.5
pop_mean = 100
std_dev = 15

In [3]:
z_statistic = (sample_mean - pop_mean) / (std_dev/math.sqrt(n)) 
z_statistic

4.564354645876384

Now, lets identify whether this is a left-tailed or a right tailed test. 

The z-score = 4.56, Our alternate hypothesis is that the average intelligence or the average IQ is greater than 100, so since the alternate hypothesis is looking for something greater, this is a right tailed test. You can have a look at this graph. 

![Imgur](https://i.imgur.com/tMGu7sj.jpg)

We can also say this because our z-statistic is positive, and we are interested in getting the probability of getting an equally extreme observation or an even more extreme observation. i.e a z-score > 4.5, the area under the graph after $z=4.5$  

Hence: **a right-tailed test**

Now, let's calculate the p-value:

In [5]:
p_value = 1 - norm.cdf(z_statistic)
p_value

2.50516597821715e-06

In [6]:
print('{:.7f}'.format(p_value)) 

0.0000025


So, with the p_value being very less, there's a very less chance that the null hypothesis is true. 

The p-value gives us the probability of obtaining a sample equal to or more extreme than the observed data, assuming that the null-hypothesis is true. Hence, the observations of the 30 sample students is statistically significant. 

> **EXERCISE**: Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an effect.

**Solution:**

Let's begin by writing the null and alternate hypothesis:
- $\mu = \text{The avg glucose level of obese patients}$
    - $H_0 = \text{The raw cornstarch does not have an effect i.e } \mu = 100$
    - $H_1 = \text{The raw cornstarch had an effect (Positive or Negative)i.e } (\mu > 100) or (\mu < 100) $

In [17]:
n = 30
sample_mean = 140
pop_mean = 100
std_dev = 15

In [18]:
z_statistic = (sample_mean - pop_mean) / (std_dev/math.sqrt(n)) 
z_statistic

14.60593486680443

To check what tailed test this is, let's look at the graph. 

Our Alternate Hypothesis $H_1$ is that $\mu \neq 100$ so we are looking at a two tailed test.

![Imgur](https://i.imgur.com/MKl5Q9G.jpg)

Now, the diet either has a positive effect i.e  $\mu > 100$ or a negative effect i.e $\mu < 100$, either case the value shall not be equal to 100. 

hence: **Two tailed Test**

In [19]:
z = abs(z_statistic)
p_value = norm.cdf(-z) + (1-norm.cdf(z))
p_value

1.2871333318488276e-48

In [22]:
print('{:.49f}'.format(p_value))

0.0000000000000000000000000000000000000000000000013


Considering the significance level to be 0.01, we can see that the p_value is very very less, which is why we can reject the null hypothesis and confidently say that our observations are statistically significant. 

> **EXERCISE**: A researcher believes that the mean age in India, listed as 21 on Wikipedia, is incorrect. He picks a random sample of 200 people and finds that they have a mean age of of 21.7. The population standard deviation of 5. Test the researcher's hypothesis at a significance level of 0.05.


**Solution:**

Let's begin by writing the null and alternate hypothesis:
- $\mu = \text{The avg age in India}$
    - $H_0 = \text{The avg age in India is 21 i.e } \mu = 21$
    - $H_1 = \text{The avg age in India is not 21 i.e } \mu \neq 21 $

In [14]:
n = 200
sample_mean =21.7
pop_mean = 21
std_dev = 5

Let's calculate the z-statistic now. 

In [15]:
z_statistic = (sample_mean - pop_mean) / (std_dev/math.sqrt(n)) 
z_statistic

1.9798989873223312

To check what tailed test this is, let's look at the graph. 

Our Alternate Hypothesis $H_1$ is $\mu \neq 21$ so we are looking at a two tailed test.

![Imgur](https://i.imgur.com/MKl5Q9G.jpg)

Now, for both cases when the avg age group is less than 21 or greater than 21, it is not equal to 21. 

hence: **Two tailed Test**

In [16]:
z = abs(z_statistic)
p_value = norm.cdf(-z) + (1-norm.cdf(z))
p_value

0.04771488023735143

Since the given significance level = 0.05 which is higher than the determined p-value, we can reject the null hypothesis and say that the avg in India is not equal to 21. 
Which means that the observations made by the researcher are statistically significant. 

In [None]:
jovian.commit(project="hypothesis-testing-solutions")

> **EXERCISE**: Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for 15 25-yard freestyle swims. For the 15 swims, Jeffrey’s mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds. Conduct a hypothesis test using a significance level of 0.05.

**Solution:**

Let's begin by writing the null and alternate hypothesis:
- $\mu = \text{The time taken by Jeffrey to complete 25-yard freestyle}$
    - $H_0 = \text{The avg time taken is more than 16 i.e } \mu >= 16.43$
    - $H_1 = \text{The avg time taken is equal to or less than 16 i.e } \mu < 16.43 $

In [23]:
n = 15
sample_mean = 16
pop_mean = 16.43
std_dev = 0.8

In [24]:
z_statistic = (sample_mean - pop_mean) / (std_dev/math.sqrt(n)) 
z_statistic

-2.081728548586485

To check what tailed test this is, let's look at the graph. 

![Imgur](https://i.imgur.com/Hdr40ke.jpg)

Now, the z-score = -2.08, our alternate hypothesis is that the time taken is either equal or lesser than 16. Hence we are looking for a value equivalent to 16 or more extreme than this. 

Hence: **a Left Tailed Test**



In [25]:
p_value = norm.cdf(z_statistic)
p_value

0.018683635713606015

The p_value is less that the specified significance level i.e 0.05. Hence we reject the null hypothesis and confidently say that the swimming goggles helped Jeffrey with reducing his swimming time.   