# Spearman Rank Correlation and Chi-square Test

## Non-parametric Correlation Analysis

Non-parametric correlation methods are used when your data **does not meet the assumptions of parametric tests** such as Pearson correlation. These methods are especially useful when:

- Variables are **ordinal** (numeric but categorical in nature)
- Relationships are **non-linear**
- Data is **non-normally distributed**

In this notebook, we explore two important non-parametric techniques:

1. **Spearman Rank Correlation** – measures monotonic relationships between ordinal variables
2. **Chi-square Test of Independence** – tests whether two categorical variables are independent


## Import Required Libraries

We will use standard Python libraries for data analysis, visualization, and statistical testing.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
from pylab import rcParams

from scipy.stats import spearmanr, chi2_contingency

In [None]:
%matplotlib inline
rcParams['figure.figsize'] = 14, 7
sns.set_style("whitegrid")

## Understanding the Dataset

We will use the **mtcars** dataset, which contains technical specifications of automobiles. Several variables in this dataset are **ordinal**, meaning they are numeric but take on a limited number of ranked values.

Examples include:
- `cyl` – number of cylinders
- `vs` – engine shape (V-engine or straight)
- `am` – transmission type (automatic/manual)
- `gear` – number of forward gears


In [None]:
address = '/workspaces/python-for-data-science-and-machine-learning-essential-training-part-1-3006708/data/mtcars.csv'

cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']

cars.head()

## Visual Exploration of Variables

Before applying statistical tests, it's important to visually inspect the data. Pair plots help us determine:

- Whether relationships appear linear or non-linear
- Whether variables take on discrete, ordinal values
- Whether distributions look normal or non-normal


In [None]:
sns.pairplot(cars)

## Selecting Ordinal Variables

The full dataset contains many variables, making interpretation difficult. Based on the transcript, we focus on four ordinal variables suitable for non-parametric analysis:

- `cyl`
- `vs`
- `am`
- `gear`


In [None]:
x = cars[['cyl', 'vs', 'am', 'gear']]
sns.pairplot(x)

## Why Spearman Rank Correlation?

Spearman's rank correlation is appropriate here because:

1. Variables are **ordinal**
2. Relationships appear **non-linear**
3. Distributions are **non-normal**

The method ranks values and computes correlation based on these ranks rather than raw values.

### Interpreting Spearman's R:
- **+1** → strong positive relationship
- **-1** → strong negative relationship
- **0** → weak or no relationship


In [None]:
cyl = cars['cyl']
vs = cars['vs']
am = cars['am']
gear = cars['gear']

In [None]:
spearmanr_coefficient, p_value = spearmanr(cyl, vs)
print(f'Spearman Rank Correlation (cyl vs vs): {spearmanr_coefficient:.3f}, p-value: {p_value:.3f}')

In [None]:
spearmanr_coefficient, p_value = spearmanr(cyl, am)
print(f'Spearman Rank Correlation (cyl vs am): {spearmanr_coefficient:.3f}, p-value: {p_value:.3f}')

In [None]:
spearmanr_coefficient, p_value = spearmanr(cyl, gear)
print(f'Spearman Rank Correlation (cyl vs gear): {spearmanr_coefficient:.3f}, p-value: {p_value:.3f}')

## Chi-square Test of Independence

While Spearman correlation measures **strength and direction**, the **Chi-square test** answers a different question:

**Are two categorical variables independent of each other?**

### Hypotheses
- **Null hypothesis (H₀):** Variables are independent
- **Alternative hypothesis (H₁):** Variables are related

### Decision Rule
- **p < 0.05** → Reject H₀ → Variables are correlated
- **p ≥ 0.05** → Fail to reject H₀ → Variables are independent


In [None]:
table = pd.crosstab(cyl, am)
chi2, p, dof, expected = chi2_contingency(table.values)
print(f'Chi-square (cyl vs am): {chi2:.3f}, p-value: {p:.3f}')

In [None]:
table = pd.crosstab(cyl, vs)
chi2, p, dof, expected = chi2_contingency(table.values)
print(f'Chi-square (cyl vs vs): {chi2:.3f}, p-value: {p:.3f}')

In [None]:
table = pd.crosstab(cyl, gear)
chi2, p, dof, expected = chi2_contingency(table.values)
print(f'Chi-square (cyl vs gear): {chi2:.3f}, p-value: {p:.3f}')

## Summary

- **Spearman Rank Correlation** quantifies monotonic relationships between ordinal variables
- **Chi-square tests** determine whether categorical variables are statistically independent
- Visual inspection is a critical first step before choosing statistical methods

Together, these non-parametric tools provide robust insights when classical assumptions are violated.