# ðŸ§ª Interactive Exercise: Spearman Rank Correlation & Chi-square

This notebook helps you **actively practice non-parametric correlation analysis**.

You will:
- Identify ordinal variables
- Decide when Spearman correlation is appropriate
- Compute Spearman rank coefficients
- Perform Chi-square tests of independence
- Interpret statistical results correctly

**Do not skip ahead â€” try each step before opening hints or solutions.**

## Step 0 â€“ Setup

Run the cell below to load all required libraries and configure plotting.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pylab import rcParams
from scipy.stats import spearmanr, chi2_contingency

%matplotlib inline
rcParams['figure.figsize'] = 14, 7
sns.set_style("whitegrid")

## Step 1 â€“ Load and Inspect the Dataset

**Task:** Load the `mtcars` dataset and inspect the first few rows.

<details>
<summary>ðŸ’¡ Hint</summary>
Use `pd.read_csv()` and then call `.head()` on the dataframe.
</details>

In [None]:
# YOUR CODE HERE
# Load mtcars.csv and display the first 5 rows

## Step 2 â€“ Identify Ordinal Variables

**Task:** From the dataset, identify variables that are **numeric but categorical (ordinal)**.

Write down at least **four** such variables.

<details>
<summary>ðŸ’¡ Hint</summary>
Look for numeric columns that take only a few discrete values (e.g., 0/1, 4/6/8).
</details>

_Write your answer here before proceeding._

## Step 3 â€“ Visual Inspection

**Task:** Create a pairplot for the ordinal variables you identified.

Check visually whether relationships look **non-linear** and **non-normal**.

<details>
<summary>ðŸ’¡ Hint</summary>
Use `sns.pairplot()` on a dataframe subset.
</details>

In [None]:
# YOUR CODE HERE
# Create a subset and generate a pairplot

## Step 4 â€“ Decide on the Correlation Method

**Question:** Why is **Spearman Rank Correlation** more appropriate here than Pearson correlation?

<details>
<summary>ðŸ’¡ Hint</summary>
Think about data type, linearity, and distribution assumptions.
</details>

_Write your reasoning here._

## Step 5 â€“ Compute Spearman Rank Correlation

**Task:** Compute the Spearman rank correlation between:
- `cyl` and `vs`
- `cyl` and `am`
- `cyl` and `gear`

<details>
<summary>ðŸ’¡ Hint</summary>
Use `spearmanr(variable1, variable2)` from `scipy.stats`.
</details>

In [None]:
# YOUR CODE HERE
# Calculate Spearman correlations and print coefficients

## Step 6 â€“ Interpret Spearman Results

**Question:**
- Which pair shows the strongest relationship?
- Is the relationship positive or negative?

<details>
<summary>ðŸ’¡ Hint</summary>
Compare absolute values of the correlation coefficients.
</details>

_Write your interpretation here._

## Step 7 â€“ Chi-square Test of Independence

**Task:** Perform a Chi-square test of independence between:
- `cyl` and `am`
- `cyl` and `vs`
- `cyl` and `gear`

<details>
<summary>ðŸ’¡ Hint</summary>
First create a contingency table using `pd.crosstab()`.
</details>

In [None]:
# YOUR CODE HERE
# Create crosstabs and apply chi2_contingency

## Step 8 â€“ Interpret Chi-square Results

**Question:** Based on p-values:
- Which variable pairs are **dependent**?
- Which are **independent**?

<details>
<summary>ðŸ’¡ Hint</summary>
Use a significance level of 0.05.
</details>

_Write your interpretation here._

# âœ… Collapsed Solutions (Self-Check)

<details>
<summary>ðŸ“Œ Click to reveal full solution</summary>

### Step 1 Solution
```python
cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb']
cars.head()
```

### Step 2 Solution
Ordinal variables: `cyl`, `vs`, `am`, `gear`

### Step 3 Solution
```python
x = cars[['cyl','vs','am','gear']]
sns.pairplot(x)
```

### Step 4 Solution
Spearman is used because variables are ordinal, relationships are non-linear, and distributions are non-normal.

### Step 5 Solution
```python
spearmanr(cars['cyl'], cars['vs'])
spearmanr(cars['cyl'], cars['am'])
spearmanr(cars['cyl'], cars['gear'])
```

### Step 7 Solution
```python
table = pd.crosstab(cars['cyl'], cars['am'])
chi2_contingency(table)
```

### Step 8 Solution
p < 0.05 â†’ variables are dependent

</details>