## Exercise 6: Chi-Square Test

**Problem Description:**  
Test independence between non-negative features and categorical target using chi-square statistic.

**Solution Overview:**  
Scale to [0,1], apply `SelectKBest(chi2)`, vary _k_, and plot.



In [None]:
!pip install scikit-learn pandas matplotlib seaborn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.preprocessing import MinMaxScaler

In [None]:
# Scale features to
scaler = MinMaxScaler()
X_scaled = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

In [None]:
# Compute chi-sq scores for k=2
selector = SelectKBest(score_func=chi2, k=2)
selector.fit(X_scaled, y)
df_chi2 = pd.DataFrame({
'feature': X.columns,
'chi2_score': selector.scores_
}).sort_values('chi2_score', ascending=False)
print("Chi-Square scores:\n", df_chi2)
print("Top 2 features:", list(df_chi2['feature'][:2]))

In [None]:
# Vary k=1,2,3 and plot
for k in :
    s = SelectKBest(score_func=chi2, k=k).fit(X_scaled, y)
    print(f"Top {k} features:", list(X.columns[s.get_support()]))

plt.figure(figsize=(6,4))
sns.barplot(x='chi2_score', y='feature', data=df_chi2)
plt.title('Chi-Square Scores')
plt.show()

# 6.3 Analysis
- Why must features be non-negative for chi-square?

- Vary k from 1 to 3; how stable are selections?

- Plot chi-square scores in a bar chart.

# Exercise 6: Chi-Square Test
## Analysis to Include in Code

In [None]:
# k-Sensitivity

for k in [1,2,3,4]:
    sel = SelectKBest(chi2, k=k).fit(X_scaled,y)
    print(k, list(X.columns[sel.get_support()]))


In [None]:
# Score Distribution

sns.histplot(df_chi2['chi2_score'], bins=5, kde=True)
plt.title("Chi-Square Score Distribution"); plt.show()


In [None]:
# Feature vs. Class Counts

feat = df_chi2['feature'].iloc[0]
pd.crosstab(pd.cut(X_scaled[feat], bins=4), y, normalize='index')
