<img src="materials/images/introduction-to-statistics-II-cover.png"/>


# 👋 Welcome, before you start
<br>

### 📚 Module overview

We will go through eleven lessons with you:
    
- [**Lesson 1: Z-score**](Lesson_1_Z-score.ipynb)

- [**Lesson 2: P-value**](Lesson_2_P-value.ipynb)

- [**Lesson 3: Lesson 3: Welchs T-test**](Lesson_3_Welchs_T-test.ipynb)

- [**Lesson 4: Log2 Fold Change**](Lesson_4_Log2_Fold_Change.ipynb)

- [**Lesson 5: Pearson Correlation**](Lesson_5_Pearson_Correlation.ipynb)

- <font color=#E98300>**Lesson 6: Spearman Correlation**</font>    `📍You are here.`

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>



<div class="alert alert-block alert-info">
<h3>⌨️ Keyboard shortcut</h3>

These common shortcut could save your time going through this notebook:
- Run the current cell: **`Enter + Shift`**.
- Add a cell above the current cell: Press **`A`**.
- Add a cell below the current cell: Press **`B`**.
- Change a code cell to markdown cell: Select the cell, and then press **`M`**.
- Delete a cell: Press **`D`** twice.

Need more help with keyboard shortcut? Press **`H`** to look it up.
</div>

---

# Lesson 6: Spearman Correlation

`🕒 This module should take about 15 minutes to complete.`

`✍️ This notebook is written using Python.`

Correlation between two variables is a measure of the extent to which they tend to change together. The <mark>**Spearman's rank-order correlation coefficient** </mark> (or Spearman's correlation) is a measure of the strength and direction of association that exists between two ranked variables assumed to be monotonic (i.e. the direction of the relationship remains unchanged). The figure below illustrates a monotonic relationship.

<img src="materials/images/images_spearman_correlation/monotonic.png"/>

Spearman's correlation measures the strength and direction of the monotonic relationship between two variables. (Pearson's correlation measures the strength and direction of the linear relationship, when a change in one variable is proportional to the change in another). Monotonicity is not as restrictive as that of a linear relationship. For example, the middle image above shows a relationship that is monotonic but not linear. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate. The Spearman's correlation coefficient is based on the ranked values for each variable rather than the raw data.

## When to use Spearman's correlation

In order to use Spearman's correlation, the two variables should be be measured on an ordinal, interval or ratio scale. However, both variables do not need to be measured on the same scale (one variable can be ratio and one can be ordinal). The test may also be used for data that has failed the assumptions necessary for conducting Pearson's correlation (e.g., if the data is not normally distributed).

## Guidelines for interpreting Spearman's correlation coefficient:

<img src="materials/images/images_spearman_correlation/corr.png"/>

The Spearman's correlation coefficient can take values from +1 to -1. A correlation of +1 indicates a perfect positive association of ranks (as the rank of one variable increases, so does the rank of the other variable). A correlation score of zero indicates no association between ranks. And a correlation of -1 indicates a perfect negative association of ranks (as the rank of one variable increases, the rank of the other variable decreases). The further the correlation is from zero, the stronger the association between the ranks.

---

# Correlation Example

<img src="materials/images/images_spearman_correlation/gre_univrating.png"/>

To explore Spearman's correlation, we will import data of students, who applied to a university graduate program. Spearman's correlation is often used to evaluate relationships involving ordinal variables. For example, we can use Spearman's to explore the correlation between  **GRE Score** (a continuous variable) and **University Rating** (an ordinal variable). 

### ✅ `Run` each of the cells below:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("data/data_spearman_correlation/grad_admit.csv")
df = df.drop(["Serial No.", "Research"], axis=1)

# Preview the first 5 rows
df.head()

### Spearman's correlation coefficient

Let's run a Spearman's correlation between GRE Score and University Rating.

In [None]:
df[["GRE Score", "University Rating"]].corr(method="spearman")

The <mark>**correlation of 0.676**</mark> indicates that there is a farily strong relationship between the rating of the university attended and an applicant's GRE score.

---

# Other relationships

**Pearson** correlation coefficients measure only <mark>linear relationships</mark>. **Spearman's** correlation coefficients measure only <mark>monotonic relationships</mark>. Thus, a meaningful relationship can exist even if the correlation coefficients are 0. This happens when the relationship is nonlinear. See the example below:

<img src="materials/images/images_spearman_correlation/nonlinear.png"/>

The relationship illustarated in the figure above is not linear so the Pearson and Spearman correlations will be approximately 0 even though the relationship appears to be significant. It is often useful to examine a scatterplot or pairplot to determine the form of any relationships that may be present.

### Pairplot

In [None]:
import seaborn as sns

sns.set_palette("Set1")
sns.pairplot(data=df, hue="Admitted");
plt.show()

---

# 🌟 Ready for the next one?
<br>

- [**Lesson 7: False Discovery Rate**](Lesson_7_False_Discovery_Rate.ipynb)

- [**Lesson 8: Benjamini Hochberg**](Lesson_8_Benjamini_Hochberg.ipynb)

- [**Lesson 9: Dimensionality Reduction Methods: Principal Component Analysis**](Lesson_9_Dimensionality_Reduction_Methods_Principal_Component_Analysis.ipynb)

- [**Lesson 10: Dimensionality Reduction Methods: t-SNE**](Lesson_10_Dimensionality_Reduction_Methods_t-SNE.ipynb)

- [**Lesson 11: UMAP**](Lesson_11_UMAP.ipynb)
</br>

---

# Contributions & acknowledgment

Thanks Antony Ross for contributing the content for this notebook.

---

Copyright (c) 2022 Stanford Data Ocean (SDO)

All rights reserved.