# **Categorical relationships:**

When variable are categorical we dont look at mean or variance - we analyze counts,proportions and associations.

This will help understand:
- class imbalance
- category dominance 
- relationship between labels
- which categories are statistically related

#### **1.Frequency tables:**

A frequency table counts the `occurences` of each category

In [1]:
import pandas as pd

fruits = pd.Series(["apple", "banana", "apple", "orange", "banana", "apple"])

freq = fruits.value_counts()

print(freq)

apple     3
banana    2
orange    1
Name: count, dtype: int64


#### **2.Propotions:**

Instead of raw counts, porportions show `percentage share`.

![image.png](attachment:image.png)

Helps in detecting:
- class imbalance
- dominant categories 
- fairness issues

In [2]:
proportions = fruits.value_counts(normalize=True)

print(proportions)

apple     0.500000
banana    0.333333
orange    0.166667
Name: proportion, dtype: float64


#### **3.Crosstabs:**

Crosstabs show `how two categorical variables interact`.

In [4]:
df = pd.DataFrame({
    "gender": ["M", "F", "M", "F", "M"],
    "purchase": ["Yes", "No", "Yes", "Yes", "No"]
})

ct = pd.crosstab(df["gender"], df["purchase"])

print(ct)

purchase  No  Yes
gender           
F          1    1
M          1    2


#### **4.Chi-square Test:**

The chi-square test asks if categorical variables are independent, it compares the observed counts with the expected counts if the variables are unrelated.

![image.png](attachment:image.png)

if there is a large difference then the variables are likely related.

- Small statistic → categories independent
- Large statistic → association exists

This shows relationship, not causation.

In [5]:

from scipy.stats import chi2_contingency

chi2, p, dof, expected = chi2_contingency(ct)

print("Chi-square:", chi2)
print("p-value:", p)


Chi-square: 0.0
p-value: 1.0


p < 0.05 is evidence of association