<img src="materials/images/introduction-to-statistics-II-cover.png"/>


# 👋 Welcome, before you start
<br>

### 📚 Module overview

We will go through eleven lessons with you:
    
- [**Lesson 1: Z-score**](Lesson_1_Z-score.ipynb)

- [**Lesson 2: P-value**](Lesson_2_P-value.ipynb)

- <font color=#E98300>**Lesson 3: Welchs T-test**</font>    `📍You are here.`

- [**Lesson 4: Log2 Fold Change**](Lesson_4_Log2_Fold_Change.ipynb)

- [**Lesson 5: Pearson Correlation**](Lesson_5_Pearson_Correlation.ipynb)

- [**Lesson 6: Spearman Correlation**](Lesson_6_Spearman_Correlation.ipynb)

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>



<div class="alert alert-block alert-info">
<h3>⌨️ Keyboard shortcut</h3>

These common shortcut could save your time going through this notebook:
- Run the current cell: **`Enter + Shift`**.
- Add a cell above the current cell: Press **`A`**.
- Add a cell below the current cell: Press **`B`**.
- Change a code cell to markdown cell: Select the cell, and then press **`M`**.
- Delete a cell: Press **`D`** twice.

Need more help with keyboard shortcut? Press **`H`** to look it up.
</div>

---

# Lesson 3: Welchs T-test

<mark>**Welch’s t-test**</mark> (also called Welch’s t-test for unequal variances) is a modification of the commonly used Student's t-test to see if two sample means are significantly different. The modification tends to increase the test power for samples with unequal variances.

`🕒 This module should take about 15 minutes to complete.`

`✍️ This notebook is written using Python.`

---

# Comparing the means of two independent groups
Let's use Welch's t-test to evaluate a hypothesis about how different study groups perform on a test.

<img src="materials/images/images_welchs_t-test/study_group.png"/>

Consider an example where we have two independent study groups preparing for an upcoming exam. One group meets in person to prepare for the exam. The other group meets virtually over the Internet to prepare. Subsequently, we evaluate their exam scores to determine if there is a significant difference between the two groups' performances on the exam.

We would like to test the following hypothesis:

<img src="materials/images/images_welchs_t-test/hypothesis.png"/>

### ✅ `Run` each of the cells below:

In [None]:
import scipy.stats as stats
import numpy as np

### Exam scores for the two groups
Let's say that the following scores are how each group performed on the exam:

In [None]:
in_person_group = np.array([64, 75, 79, 86, 73, 88, 65, 87, 82, 74, 79, 90, 91, 84, 76])
virtual_group = np.array([66, 57, 74, 67, 54, 68, 78, 89, 79, 74, 87, 92, 64, 66, 93])

### The in-person group's mean and variance:

In [None]:
in_person_group.mean()

In [None]:
in_person_group.var()

### The virtual group's mean and variance:

In [None]:
virtual_group.mean()

In [None]:
virtual_group.var()

The two groups' variances look to be different. We would like to test whether the average exam score of **79.5** from the in-person study group is significantly better than the average exam score of **73.8** from the virtual study group (or whether this difference could occur by chance). We'll conduct Welch's t-test to evaluate our hypothesis.

## Conduct Welch's t-test

In [None]:
# By setting equal_var to False, the Welch’s t-test will be conducted

stats.ttest_ind(in_person_group, virtual_group, equal_var = False)

### Interpretation of the Output:

The test statistic turns out to be 1.47 and the corresponding p-value is 0.15. Here, the p-value is not less than typically sought after 0.05 so we fail to reject the null hypothesis and conclude that, from these samples, the difference between the in-person and virtual study groups' mean exam scores is not significant. (See p-value for more information.)

---

# 🌟 Ready for the next one?
<br>

- [**Lesson 4: Log2 Fold Change**](Lesson_4_Log2_Fold_Change.ipynb)

- [**Lesson 5: Pearson Correlation**](Lesson_5_Pearson_Correlation.ipynb)

- [**Lesson 6: Spearman Correlation**](Lesson_6_Spearman_Correlation.ipynb)

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>

---

# Contributions & acknowledgment

Thanks Antony Ross for contributing the content for this notebook.

---

Copyright (c) 2022 Stanford Data Ocean (SDO)

All rights reserved.