<img src="materials/images/introduction-to-statistics-II-cover.png"/>


# 👋 Welcome, before you start
<br>

### 📚 Module overview

We will go through eleven lessons with you:
    
- [**Lesson 1: Z-score**](Lesson_1_Z-score.ipynb)

- <font color=#E98300>**Lesson 2: P-value**</font>    `📍You are here.`

- [**Lesson 3: Welchs T-test**](Lesson_3_Welchs_T-test.ipynb)

- [**Lesson 4: Log2 Fold Change**](Lesson_4_Log2_Fold_Change.ipynb)

- [**Lesson 5: Pearson Correlation**](Lesson_5_Pearson_Correlation.ipynb)

- [**Lesson 6: Spearman Correlation**](Lesson_6_Spearman_Correlation.ipynb)

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>



<div class="alert alert-block alert-info">
<h3>⌨️ Keyboard shortcut</h3>

These common shortcut could save your time going through this notebook:
- Run the current cell: **`Enter + Shift`**.
- Add a cell above the current cell: Press **`A`**.
- Add a cell below the current cell: Press **`B`**.
- Change a code cell to markdown cell: Select the cell, and then press **`M`**.
- Delete a cell: Press **`D`** twice.

Need more help with keyboard shortcut? Press **`H`** to look it up.
</div>

---

# Lesson 2: P-value 

The term <mark>**p-value**</mark> is used when you want to test a hypothesis. Let's look at an example.

`🕒 This module should take about 15 minutes to complete.`

`✍️ This notebook is written using Python.`

---

<img src="materials/images/images_p-value/delivery_service.png"/>

Let's say that you believe that your food delivery service has gotten slower recently. You determine that previous delivery times averaged right at about 30 minutes, but you believe that it has slowed significantly over the last month or so. 

You decide to test your hypothesis. Over the next month, you decide to time each of your food deliveries. You are able to collect about 20 samples, and it turns out that the average delivery time is 37 minutes. Can you say that the previous average of 30 minutes and the average of 37 minutes that you recently collected are significantly different? 

In other words, are the delivery times actually slower, or did you just happen to get a few slower delivery persons by chance while, on the whole, the delivery times actually remain at the 30-minute average?

## Null Hypothesis versus Alternative Hypothesis

When you establish a hypothesis, it's typically stated in the form of the status quo. For example, in this case, you would state that there is no difference in average delivery times between the current deliveries and previous deliveries (known as the **null hypothesis**). 

You will try to gather evidence against this hypothesis in order to support your belief (**alternative hypothesis**) that there is a significant difference between the time it takes to have food delivered currently versus a few months ago. 

<img src="materials/images/images_p-value/hypothesis.png"/>

You will use your gathered results to test whether the null hypothesis or alternative hypothesis is most likely. However, technically, you are only testing the evidence against the null hypothesis.

## Hypothesis Testing

You would like to test the significance of your observed difference between previous and current delivery times. Is the 7-minute difference (30 minutes previously versus your recently obtained average of 37 minutes) significant, or is it just **<mark> due to randomness in your 20 samples**?

### Significance levels
The level of statistical significance is often expressed as the so-called **p-value** (probability value). You will calculate the probability of observing your sample results (or more extreme), given that the null hypothesis is true. In other words, if there really is no difference between previous and current delivery times, how likely would it be to see a difference as large as (or larger than) that which you observed in your samples? 

### p-value

Let's say that, after evaluating your results, you get a p-value of 0.04 (p = .04). This means that there is a 4% chance of finding a difference as large as (or larger than) the one that you obtained, given that the null hypothesis is true. In terms of significance, typically a p-value of 0.05 is used as the threshold. This is interpreted as, assuming that the null hypothesis is true, if there is a 5% chance or less of observing a difference as extreme (or larger) as you observed, you would reject the null hypothesis, and accept the alternative hypothesis. Alternately, if the probability was greater than 0.05, you would fail to reject the null hypothesis. This is because the result that you obtained could happen too frequently by chance to be confident that the current delivery times are truly different from previous deliveries.

<div class="alert alert-block alert-warning">
    <b>Alert:</b> Note that you cannot accept the null hypothesis, but only <mark>reject</mark>, and find evidence against it.
</div>

In our example, where p = .04, you would **reject the null hypothesis, and accept the alternative hypothesis** that current delivery times are significantly slower than previous deliveries. 

<div class="alert alert-block alert-info">
<b>Tip:</b> A p-value of 0.05 is commonly used as the threshold for significance. However, when increased confidence is desired, a more stringent p-value of 0.01 may be used.
</div>

# Understanding the p-value

- The p-value or calculated probability provides a universal language to interpret test results.
- The p-value is a number between 0 and 1 that provides the statistical significance of hypothesis testing.
- The p-value tests whether there is enough evidence to reject the null hypothesis.

## Interpreting significance

#### p-value less than 0.05
- A small p-value (< 0.05) indicates that the result is possible, but not very likely under the null hypothesis.
- Thus, for a hypothesis with a p-value less than 0.05, the null hypothesis is rejected, and the alternative hypothesis is accepted.
- This suggests that the results of the study are statistically significant.

#### p-value greater than 0.05
-  If the p-value is large (> 0.05), it indicates weak evidence against the null hypothesis.
- Thus, for a hypothesis with a p-value greater than 0.05, the null hypothesis is not rejected, and the alternative hypothesis is not accepted.
- This indicates that the results of the study are not statistically significant.

---

# 🌟 Ready for the next one?
<br>

    
- [**Lesson 3: Welchs T-test**](Lesson_3_Welchs_T-test.ipynb)

- [**Lesson 4: Log2 Fold Change**](Lesson_4_Log2_Fold_Change.ipynb)

- [**Lesson 5: Pearson Correlation**](Lesson_5_Pearson_Correlation.ipynb)

- [**Lesson 6: Spearman Correlation**](Lesson_6_Spearman_Correlation.ipynb)

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>

---

# Contributions & acknowledgment

Thanks Antony Ross for contributing the content for this notebook.

---

Copyright (c) 2022 Stanford Data Ocean (SDO)

All rights reserved.