# 10. Uji Chi-Square (Chi-Square Test)

## Tujuan Pembelajaran
- Memahami konsep uji chi-square dan kapan menggunakannya
- Membedakan chi-square goodness of fit dan test of independence
- Menghitung chi-square statistic dan p-value
- Menginterpretasikan hasil uji chi-square

## Materi
1. Pengertian Uji Chi-Square (Chi-Square Test)
2. Chi-Square Goodness of Fit Test
3. Chi-Square Test of Independence
4. Chi-Square Distribution
5. Asumsi Uji Chi-Square (Chi-Square Assumptions)
6. Interpretasi Hasil (Interpretation of Results)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
from scipy.stats import chi2_contingency, chi2
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

print("Libraries imported successfully!")
print("SciPy version:", stats.__version__)


## 1. Pengertian Uji Chi-Square (Chi-Square Test)

**Uji Chi-Square** adalah uji statistik yang digunakan untuk menguji hubungan antara variabel kategorik atau menguji apakah data mengikuti distribusi tertentu.

### Jenis Uji Chi-Square:
1. **Chi-Square Goodness of Fit**: Menguji apakah data mengikuti distribusi tertentu
2. **Chi-Square Test of Independence**: Menguji apakah dua variabel kategorik independen

### Asumsi Uji Chi-Square:
- Data kategorik
- Sampel independen
- Frekuensi yang diharapkan ≥ 5 untuk setiap sel


In [None]:
# Simulasi Uji Chi-Square
np.random.seed(42)

# Chi-Square Goodness of Fit Test
print("=== CHI-SQUARE GOODNESS OF FIT TEST ===")
# Data: frekuensi observasi
observed = np.array([25, 30, 20, 15, 10])  # frekuensi observasi
expected = np.array([20, 20, 20, 20, 20])  # frekuensi yang diharapkan (uniform)

print(f"Frekuensi observasi: {observed}")
print(f"Frekuensi yang diharapkan: {expected}")

# Uji chi-square goodness of fit
chi2_stat, p_value = stats.chisquare(observed, expected)

print(f"\nHasil uji chi-square goodness of fit:")
print(f"Chi-square statistic: {chi2_stat:.4f}")
print(f"p-value: {p_value:.4f}")

# Interpretasi
alpha = 0.05
if p_value < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Data tidak mengikuti distribusi yang diharapkan")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Data mengikuti distribusi yang diharapkan")

# Chi-Square Test of Independence
print("\n=== CHI-SQUARE TEST OF INDEPENDENCE ===")
# Data: tabel kontingensi
contingency_table = np.array([
    [20, 30, 10],  # Kelompok A
    [15, 25, 20],  # Kelompok B
    [10, 20, 15]   # Kelompok C
])

print("Tabel kontingensi:")
print(contingency_table)

# Uji chi-square test of independence
chi2_stat_ind, p_value_ind, dof, expected_ind = chi2_contingency(contingency_table)

print(f"\nHasil uji chi-square test of independence:")
print(f"Chi-square statistic: {chi2_stat_ind:.4f}")
print(f"p-value: {p_value_ind:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"Frekuensi yang diharapkan:")
print(expected_ind)

# Interpretasi
if p_value_ind < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Ada hubungan antara kedua variabel")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Tidak ada hubungan antara kedua variabel")

# Visualisasi
plt.figure(figsize=(15, 10))

# Plot 1: Goodness of fit - Bar chart
plt.subplot(2, 3, 1)
categories = ['A', 'B', 'C', 'D', 'E']
x = np.arange(len(categories))
width = 0.35
plt.bar(x - width/2, observed, width, label='Observed', alpha=0.7, color='skyblue')
plt.bar(x + width/2, expected, width, label='Expected', alpha=0.7, color='orange')
plt.xlabel('Kategori')
plt.ylabel('Frekuensi')
plt.title('Chi-Square Goodness of Fit')
plt.xticks(x, categories)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 2: Goodness of fit - Residuals
plt.subplot(2, 3, 2)
residuals = observed - expected
plt.bar(categories, residuals, alpha=0.7, color='red')
plt.axhline(0, color='black', linestyle='-', linewidth=1)
plt.xlabel('Kategori')
plt.ylabel('Residuals (Obs - Exp)')
plt.title('Residuals Goodness of Fit')
plt.grid(True, alpha=0.3)

# Plot 3: Test of independence - Heatmap
plt.subplot(2, 3, 3)
sns.heatmap(contingency_table, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['A', 'B', 'C'])
plt.title('Contingency Table')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

# Plot 4: Test of independence - Expected vs Observed
plt.subplot(2, 3, 4)
observed_flat = contingency_table.flatten()
expected_flat = expected_ind.flatten()
plt.scatter(expected_flat, observed_flat, alpha=0.7, s=100, color='blue')
plt.plot([min(expected_flat), max(expected_flat)], [min(expected_flat), max(expected_flat)], 'r--', alpha=0.7)
plt.xlabel('Expected')
plt.ylabel('Observed')
plt.title('Expected vs Observed')
plt.grid(True, alpha=0.3)

# Plot 5: Chi-square distribution
plt.subplot(2, 3, 5)
x = np.linspace(0, 20, 100)
chi2_dist = chi2.pdf(x, dof)
plt.plot(x, chi2_dist, 'b-', linewidth=2, label=f'Chi-square (df={dof})')
plt.axvline(chi2_stat_ind, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_ind:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 6: Residuals test of independence
plt.subplot(2, 3, 6)
residuals_ind = contingency_table - expected_ind
sns.heatmap(residuals_ind, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['A', 'B', 'C'])
plt.title('Residuals Test of Independence')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

plt.tight_layout()
plt.show()


## 2. Chi-Square Goodness of Fit Test

**Chi-Square Goodness of Fit Test** digunakan untuk menguji apakah data mengikuti distribusi tertentu.

### Langkah-langkah:
1. **Hipotesis**:
   - H0: Data mengikuti distribusi yang diharapkan
   - H1: Data tidak mengikuti distribusi yang diharapkan

2. **Statistik Uji**:
   ```
   χ² = Σ (Oi - Ei)² / Ei
   ```
   dimana:
   - Oi = frekuensi observasi
   - Ei = frekuensi yang diharapkan

3. **Kriteria Keputusan**:
   - Tolak H0 jika χ² > χ²α,df atau p-value < α
   - df = k - 1 (k = jumlah kategori)

### Contoh Aplikasi:
- Uji distribusi normal
- Uji distribusi uniform
- Uji distribusi binomial
- Uji distribusi Poisson


In [None]:
# Demonstrasi Chi-Square Goodness of Fit Test
print("=== DEMONSTRASI CHI-SQUARE GOODNESS OF FIT TEST ===")

# 1. Uji Distribusi Uniform
print("\n1. UJI DISTRIBUSI UNIFORM:")
# Data: frekuensi observasi untuk 5 kategori
observed_uniform = np.array([25, 30, 20, 15, 10])
expected_uniform = np.array([20, 20, 20, 20, 20])  # Uniform distribution

print(f"Frekuensi observasi: {observed_uniform}")
print(f"Frekuensi yang diharapkan: {expected_uniform}")

# Uji chi-square goodness of fit
chi2_stat_uniform, p_value_uniform = stats.chisquare(observed_uniform, expected_uniform)

print(f"\nHasil uji chi-square goodness of fit:")
print(f"Chi-square statistic: {chi2_stat_uniform:.4f}")
print(f"p-value: {p_value_uniform:.4f}")

# Interpretasi
alpha = 0.05
if p_value_uniform < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Data tidak mengikuti distribusi uniform")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Data mengikuti distribusi uniform")

# 2. Uji Distribusi Normal
print("\n2. UJI DISTRIBUSI NORMAL:")
# Simulasi data normal
np.random.seed(42)
data_normal = np.random.normal(50, 10, 1000)

# Buat histogram dan hitung frekuensi
hist, bin_edges = np.histogram(data_normal, bins=10)
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2

# Hitung frekuensi yang diharapkan untuk distribusi normal
mean_data = np.mean(data_normal)
std_data = np.std(data_normal)
expected_normal = []
for i in range(len(bin_edges)-1):
    prob = stats.norm.cdf(bin_edges[i+1], mean_data, std_data) - stats.norm.cdf(bin_edges[i], mean_data, std_data)
    expected_normal.append(prob * len(data_normal))

expected_normal = np.array(expected_normal)

print(f"Mean data: {mean_data:.2f}")
print(f"Std data: {std_data:.2f}")
print(f"Frekuensi observasi: {hist}")
print(f"Frekuensi yang diharapkan: {expected_normal}")

# Uji chi-square goodness of fit
chi2_stat_normal, p_value_normal = stats.chisquare(hist, expected_normal)

print(f"\nHasil uji chi-square goodness of fit:")
print(f"Chi-square statistic: {chi2_stat_normal:.4f}")
print(f"p-value: {p_value_normal:.4f}")

# Interpretasi
if p_value_normal < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Data tidak mengikuti distribusi normal")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Data mengikuti distribusi normal")

# 3. Uji Distribusi Binomial
print("\n3. UJI DISTRIBUSI BINOMIAL:")
# Data: jumlah sukses dalam 20 percobaan
n_trials = 20
p_success = 0.3
observed_binomial = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
freq_binomial = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Simulasi data binomial
np.random.seed(42)
data_binomial = np.random.binomial(n_trials, p_success, 1000)
for value in data_binomial:
    if value <= 20:
        freq_binomial[value] += 1

# Hitung frekuensi yang diharapkan
expected_binomial = []
for k in range(21):
    prob = stats.binom.pmf(k, n_trials, p_success)
    expected_binomial.append(prob * 1000)

expected_binomial = np.array(expected_binomial)

print(f"n = {n_trials}, p = {p_success}")
print(f"Frekuensi observasi: {freq_binomial}")
print(f"Frekuensi yang diharapkan: {expected_binomial}")

# Uji chi-square goodness of fit
chi2_stat_binomial, p_value_binomial = stats.chisquare(freq_binomial, expected_binomial)

print(f"\nHasil uji chi-square goodness of fit:")
print(f"Chi-square statistic: {chi2_stat_binomial:.4f}")
print(f"p-value: {p_value_binomial:.4f}")

# Interpretasi
if p_value_binomial < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Data tidak mengikuti distribusi binomial")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Data mengikuti distribusi binomial")

# 4. Uji Distribusi Poisson
print("\n4. UJI DISTRIBUSI POISSON:")
# Data: jumlah kejadian per interval waktu
lambda_poisson = 3
observed_poisson = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
freq_poisson = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Simulasi data Poisson
np.random.seed(42)
data_poisson = np.random.poisson(lambda_poisson, 1000)
for value in data_poisson:
    if value <= 10:
        freq_poisson[value] += 1

# Hitung frekuensi yang diharapkan
expected_poisson = []
for k in range(11):
    prob = stats.poisson.pmf(k, lambda_poisson)
    expected_poisson.append(prob * 1000)

expected_poisson = np.array(expected_poisson)

print(f"Lambda = {lambda_poisson}")
print(f"Frekuensi observasi: {freq_poisson}")
print(f"Frekuensi yang diharapkan: {expected_poisson}")

# Uji chi-square goodness of fit
chi2_stat_poisson, p_value_poisson = stats.chisquare(freq_poisson, expected_poisson)

print(f"\nHasil uji chi-square goodness of fit:")
print(f"Chi-square statistic: {chi2_stat_poisson:.4f}")
print(f"p-value: {p_value_poisson:.4f}")

# Interpretasi
if p_value_poisson < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Data tidak mengikuti distribusi Poisson")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Data mengikuti distribusi Poisson")

# 5. Visualisasi Chi-Square Goodness of Fit
plt.figure(figsize=(20, 15))

# Plot 1: Uniform Distribution
plt.subplot(3, 4, 1)
categories = ['A', 'B', 'C', 'D', 'E']
x = np.arange(len(categories))
width = 0.35
plt.bar(x - width/2, observed_uniform, width, label='Observed', alpha=0.7, color='skyblue')
plt.bar(x + width/2, expected_uniform, width, label='Expected', alpha=0.7, color='orange')
plt.xlabel('Kategori')
plt.ylabel('Frekuensi')
plt.title('Uniform Distribution Test')
plt.xticks(x, categories)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 2: Normal Distribution
plt.subplot(3, 4, 2)
plt.hist(data_normal, bins=10, alpha=0.7, color='skyblue', density=True, label='Observed')
x_norm = np.linspace(data_normal.min(), data_normal.max(), 100)
y_norm = stats.norm.pdf(x_norm, mean_data, std_data)
plt.plot(x_norm, y_norm, 'r-', linewidth=2, label='Expected Normal')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution Test')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 3: Binomial Distribution
plt.subplot(3, 4, 3)
x_binom = np.arange(21)
plt.bar(x_binom, freq_binomial, alpha=0.7, color='skyblue', label='Observed')
plt.plot(x_binom, expected_binomial, 'ro-', linewidth=2, label='Expected')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.title('Binomial Distribution Test')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 4: Poisson Distribution
plt.subplot(3, 4, 4)
x_pois = np.arange(11)
plt.bar(x_pois, freq_poisson, alpha=0.7, color='skyblue', label='Observed')
plt.plot(x_pois, expected_poisson, 'ro-', linewidth=2, label='Expected')
plt.xlabel('Number of Events')
plt.ylabel('Frequency')
plt.title('Poisson Distribution Test')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 5: Residuals Uniform
plt.subplot(3, 4, 5)
residuals_uniform = observed_uniform - expected_uniform
plt.bar(categories, residuals_uniform, alpha=0.7, color='red')
plt.axhline(0, color='black', linestyle='-', linewidth=1)
plt.xlabel('Kategori')
plt.ylabel('Residuals (Obs - Exp)')
plt.title('Residuals Uniform Test')
plt.grid(True, alpha=0.3)

# Plot 6: Residuals Normal
plt.subplot(3, 4, 6)
residuals_normal = hist - expected_normal
plt.bar(range(len(residuals_normal)), residuals_normal, alpha=0.7, color='red')
plt.axhline(0, color='black', linestyle='-', linewidth=1)
plt.xlabel('Bin')
plt.ylabel('Residuals (Obs - Exp)')
plt.title('Residuals Normal Test')
plt.grid(True, alpha=0.3)

# Plot 7: Residuals Binomial
plt.subplot(3, 4, 7)
residuals_binomial = freq_binomial - expected_binomial
plt.bar(x_binom, residuals_binomial, alpha=0.7, color='red')
plt.axhline(0, color='black', linestyle='-', linewidth=1)
plt.xlabel('Number of Successes')
plt.ylabel('Residuals (Obs - Exp)')
plt.title('Residuals Binomial Test')
plt.grid(True, alpha=0.3)

# Plot 8: Residuals Poisson
plt.subplot(3, 4, 8)
residuals_poisson = freq_poisson - expected_poisson
plt.bar(x_pois, residuals_poisson, alpha=0.7, color='red')
plt.axhline(0, color='black', linestyle='-', linewidth=1)
plt.xlabel('Number of Events')
plt.ylabel('Residuals (Obs - Exp)')
plt.title('Residuals Poisson Test')
plt.grid(True, alpha=0.3)

# Plot 9: Chi-square Distribution
plt.subplot(3, 4, 9)
x_chi2 = np.linspace(0, 20, 100)
df_uniform = len(observed_uniform) - 1
chi2_dist_uniform = chi2.pdf(x_chi2, df_uniform)
plt.plot(x_chi2, chi2_dist_uniform, 'b-', linewidth=2, label=f'Chi-square (df={df_uniform})')
plt.axvline(chi2_stat_uniform, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_uniform:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Uniform)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 10: Chi-square Distribution (Normal)
plt.subplot(3, 4, 10)
df_normal = len(hist) - 1
chi2_dist_normal = chi2.pdf(x_chi2, df_normal)
plt.plot(x_chi2, chi2_dist_normal, 'b-', linewidth=2, label=f'Chi-square (df={df_normal})')
plt.axvline(chi2_stat_normal, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_normal:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Normal)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 11: Chi-square Distribution (Binomial)
plt.subplot(3, 4, 11)
df_binomial = len(freq_binomial) - 1
chi2_dist_binomial = chi2.pdf(x_chi2, df_binomial)
plt.plot(x_chi2, chi2_dist_binomial, 'b-', linewidth=2, label=f'Chi-square (df={df_binomial})')
plt.axvline(chi2_stat_binomial, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_binomial:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Binomial)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 12: Chi-square Distribution (Poisson)
plt.subplot(3, 4, 12)
df_poisson = len(freq_poisson) - 1
chi2_dist_poisson = chi2.pdf(x_chi2, df_poisson)
plt.plot(x_chi2, chi2_dist_poisson, 'b-', linewidth=2, label=f'Chi-square (df={df_poisson})')
plt.axvline(chi2_stat_poisson, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_poisson:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Poisson)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Kesimpulan dan Rekomendasi
print("\n6. KESIMPULAN DAN REKOMENDASI:")
print("   - Chi-square goodness of fit: Uji apakah data mengikuti distribusi tertentu")
print("   - Asumsi: Frekuensi yang diharapkan ≥ 5 untuk setiap sel")
print("   - Interpretasi: p-value < α berarti data tidak mengikuti distribusi yang diharapkan")
print("   - Aplikasi: Uji distribusi normal, uniform, binomial, Poisson")
print("   - Perhatian: Jika asumsi tidak terpenuhi, gunakan uji lain")
print("   - Selalu periksa residual untuk memahami pola penyimpangan")


## 3. Chi-Square Test of Independence

**Chi-Square Test of Independence** digunakan untuk menguji apakah dua variabel kategorik independen.

### Langkah-langkah:
1. **Hipotesis**:
   - H0: Dua variabel independen (tidak ada hubungan)
   - H1: Dua variabel dependen (ada hubungan)

2. **Statistik Uji**:
   ```
   χ² = Σ Σ (Oij - Eij)² / Eij
   ```
   dimana:
   - Oij = frekuensi observasi pada baris i, kolom j
   - Eij = frekuensi yang diharapkan pada baris i, kolom j
   - Eij = (Ri × Cj) / N

3. **Kriteria Keputusan**:
   - Tolak H0 jika χ² > χ²α,df atau p-value < α
   - df = (r-1)(c-1) (r = jumlah baris, c = jumlah kolom)

### Contoh Aplikasi:
- Uji hubungan gender dan preferensi produk
- Uji hubungan tingkat pendidikan dan status pekerjaan
- Uji hubungan kategori umur dan pilihan merek
- Uji hubungan jenis kelamin dan hobi


In [None]:
# Demonstrasi Chi-Square Test of Independence
print("=== DEMONSTRASI CHI-SQUARE TEST OF INDEPENDENCE ===")

# 1. Uji Hubungan Gender dan Preferensi Produk
print("\n1. UJI HUBUNGAN GENDER DAN PREFERENSI PRODUK:")
# Data: tabel kontingensi
contingency_table1 = np.array([
    [20, 30, 10],  # Laki-laki
    [15, 25, 20],  # Perempuan
    [10, 20, 15]   # Lainnya
])

print("Tabel kontingensi:")
print("                Produk A  Produk B  Produk C")
print("Laki-laki          20        30        10")
print("Perempuan          15        25        20")
print("Lainnya            10        20        15")

# Uji chi-square test of independence
chi2_stat1, p_value1, dof1, expected1 = chi2_contingency(contingency_table1)

print(f"\nHasil uji chi-square test of independence:")
print(f"Chi-square statistic: {chi2_stat1:.4f}")
print(f"p-value: {p_value1:.4f}")
print(f"Degrees of freedom: {dof1}")
print(f"Frekuensi yang diharapkan:")
print(expected1)

# Interpretasi
alpha = 0.05
if p_value1 < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Ada hubungan antara gender dan preferensi produk")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Tidak ada hubungan antara gender dan preferensi produk")

# 2. Uji Hubungan Tingkat Pendidikan dan Status Pekerjaan
print("\n2. UJI HUBUNGAN TINGKAT PENDIDIKAN DAN STATUS PEKERJAAN:")
# Data: tabel kontingensi
contingency_table2 = np.array([
    [25, 15, 10],  # SMA
    [20, 30, 20],  # S1
    [10, 25, 35]   # S2/S3
])

print("Tabel kontingensi:")
print("                Bekerja  Menganggur  Wiraswasta")
print("SMA                25        15          10")
print("S1                 20        30          20")
print("S2/S3              10        25          35")

# Uji chi-square test of independence
chi2_stat2, p_value2, dof2, expected2 = chi2_contingency(contingency_table2)

print(f"\nHasil uji chi-square test of independence:")
print(f"Chi-square statistic: {chi2_stat2:.4f}")
print(f"p-value: {p_value2:.4f}")
print(f"Degrees of freedom: {dof2}")
print(f"Frekuensi yang diharapkan:")
print(expected2)

# Interpretasi
if p_value2 < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Ada hubungan antara tingkat pendidikan dan status pekerjaan")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Tidak ada hubungan antara tingkat pendidikan dan status pekerjaan")

# 3. Uji Hubungan Kategori Umur dan Pilihan Merek
print("\n3. UJI HUBUNGAN KATEGORI UMUR DAN PILIHAN MEREK:")
# Data: tabel kontingensi
contingency_table3 = np.array([
    [30, 20, 10],  # 18-25
    [25, 35, 15],  # 26-35
    [15, 25, 25],  # 36-45
    [10, 20, 30]   # 46+
])

print("Tabel kontingensi:")
print("                Merek A  Merek B  Merek C")
print("18-25              30       20       10")
print("26-35              25       35       15")
print("36-45              15       25       25")
print("46+                10       20       30")

# Uji chi-square test of independence
chi2_stat3, p_value3, dof3, expected3 = chi2_contingency(contingency_table3)

print(f"\nHasil uji chi-square test of independence:")
print(f"Chi-square statistic: {chi2_stat3:.4f}")
print(f"p-value: {p_value3:.4f}")
print(f"Degrees of freedom: {dof3}")
print(f"Frekuensi yang diharapkan:")
print(expected3)

# Interpretasi
if p_value3 < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Ada hubungan antara kategori umur dan pilihan merek")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Tidak ada hubungan antara kategori umur dan pilihan merek")

# 4. Uji Hubungan Jenis Kelamin dan Hobi
print("\n4. UJI HUBUNGAN JENIS KELAMIN DAN HOBI:")
# Data: tabel kontingensi
contingency_table4 = np.array([
    [40, 20, 15, 25],  # Laki-laki
    [25, 35, 30, 10]   # Perempuan
])

print("Tabel kontingensi:")
print("                Olahraga  Musik  Membaca  Gaming")
print("Laki-laki          40       20       15       25")
print("Perempuan          25       35       30       10")

# Uji chi-square test of independence
chi2_stat4, p_value4, dof4, expected4 = chi2_contingency(contingency_table4)

print(f"\nHasil uji chi-square test of independence:")
print(f"Chi-square statistic: {chi2_stat4:.4f}")
print(f"p-value: {p_value4:.4f}")
print(f"Degrees of freedom: {dof4}")
print(f"Frekuensi yang diharapkan:")
print(expected4)

# Interpretasi
if p_value4 < alpha:
    print(f"Keputusan: Tolak H0 (p < {alpha})")
    print("Kesimpulan: Ada hubungan antara jenis kelamin dan hobi")
else:
    print(f"Keputusan: Gagal tolak H0 (p >= {alpha})")
    print("Kesimpulan: Tidak ada hubungan antara jenis kelamin dan hobi")

# 5. Visualisasi Chi-Square Test of Independence
plt.figure(figsize=(20, 15))

# Plot 1: Gender vs Preferensi Produk
plt.subplot(3, 4, 1)
sns.heatmap(contingency_table1, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Produk A', 'Produk B', 'Produk C'], 
            yticklabels=['Laki-laki', 'Perempuan', 'Lainnya'])
plt.title('Gender vs Preferensi Produk')
plt.xlabel('Produk')
plt.ylabel('Gender')

# Plot 2: Pendidikan vs Status Pekerjaan
plt.subplot(3, 4, 2)
sns.heatmap(contingency_table2, annot=True, fmt='d', cmap='Greens', 
            xticklabels=['Bekerja', 'Menganggur', 'Wiraswasta'], 
            yticklabels=['SMA', 'S1', 'S2/S3'])
plt.title('Pendidikan vs Status Pekerjaan')
plt.xlabel('Status Pekerjaan')
plt.ylabel('Pendidikan')

# Plot 3: Umur vs Pilihan Merek
plt.subplot(3, 4, 3)
sns.heatmap(contingency_table3, annot=True, fmt='d', cmap='Reds', 
            xticklabels=['Merek A', 'Merek B', 'Merek C'], 
            yticklabels=['18-25', '26-35', '36-45', '46+'])
plt.title('Umur vs Pilihan Merek')
plt.xlabel('Merek')
plt.ylabel('Umur')

# Plot 4: Jenis Kelamin vs Hobi
plt.subplot(3, 4, 4)
sns.heatmap(contingency_table4, annot=True, fmt='d', cmap='Purples', 
            xticklabels=['Olahraga', 'Musik', 'Membaca', 'Gaming'], 
            yticklabels=['Laki-laki', 'Perempuan'])
plt.title('Jenis Kelamin vs Hobi')
plt.xlabel('Hobi')
plt.ylabel('Jenis Kelamin')

# Plot 5: Residuals Gender vs Produk
plt.subplot(3, 4, 5)
residuals1 = contingency_table1 - expected1
sns.heatmap(residuals1, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['Produk A', 'Produk B', 'Produk C'], 
            yticklabels=['Laki-laki', 'Perempuan', 'Lainnya'])
plt.title('Residuals Gender vs Produk')
plt.xlabel('Produk')
plt.ylabel('Gender')

# Plot 6: Residuals Pendidikan vs Pekerjaan
plt.subplot(3, 4, 6)
residuals2 = contingency_table2 - expected2
sns.heatmap(residuals2, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['Bekerja', 'Menganggur', 'Wiraswasta'], 
            yticklabels=['SMA', 'S1', 'S2/S3'])
plt.title('Residuals Pendidikan vs Pekerjaan')
plt.xlabel('Status Pekerjaan')
plt.ylabel('Pendidikan')

# Plot 7: Residuals Umur vs Merek
plt.subplot(3, 4, 7)
residuals3 = contingency_table3 - expected3
sns.heatmap(residuals3, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['Merek A', 'Merek B', 'Merek C'], 
            yticklabels=['18-25', '26-35', '36-45', '46+'])
plt.title('Residuals Umur vs Merek')
plt.xlabel('Merek')
plt.ylabel('Umur')

# Plot 8: Residuals Jenis Kelamin vs Hobi
plt.subplot(3, 4, 8)
residuals4 = contingency_table4 - expected4
sns.heatmap(residuals4, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['Olahraga', 'Musik', 'Membaca', 'Gaming'], 
            yticklabels=['Laki-laki', 'Perempuan'])
plt.title('Residuals Jenis Kelamin vs Hobi')
plt.xlabel('Hobi')
plt.ylabel('Jenis Kelamin')

# Plot 9: Chi-square Distribution (Gender vs Produk)
plt.subplot(3, 4, 9)
x_chi2 = np.linspace(0, 20, 100)
chi2_dist1 = chi2.pdf(x_chi2, dof1)
plt.plot(x_chi2, chi2_dist1, 'b-', linewidth=2, label=f'Chi-square (df={dof1})')
plt.axvline(chi2_stat1, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat1:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Gender vs Produk)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 10: Chi-square Distribution (Pendidikan vs Pekerjaan)
plt.subplot(3, 4, 10)
chi2_dist2 = chi2.pdf(x_chi2, dof2)
plt.plot(x_chi2, chi2_dist2, 'b-', linewidth=2, label=f'Chi-square (df={dof2})')
plt.axvline(chi2_stat2, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat2:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Pendidikan vs Pekerjaan)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 11: Chi-square Distribution (Umur vs Merek)
plt.subplot(3, 4, 11)
chi2_dist3 = chi2.pdf(x_chi2, dof3)
plt.plot(x_chi2, chi2_dist3, 'b-', linewidth=2, label=f'Chi-square (df={dof3})')
plt.axvline(chi2_stat3, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat3:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Umur vs Merek)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 12: Chi-square Distribution (Jenis Kelamin vs Hobi)
plt.subplot(3, 4, 12)
chi2_dist4 = chi2.pdf(x_chi2, dof4)
plt.plot(x_chi2, chi2_dist4, 'b-', linewidth=2, label=f'Chi-square (df={dof4})')
plt.axvline(chi2_stat4, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat4:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Jenis Kelamin vs Hobi)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Kesimpulan dan Rekomendasi
print("\n6. KESIMPULAN DAN REKOMENDASI:")
print("   - Chi-square test of independence: Uji hubungan antara dua variabel kategorik")
print("   - Asumsi: Frekuensi yang diharapkan ≥ 5 untuk setiap sel")
print("   - Interpretasi: p-value < α berarti ada hubungan antara variabel")
print("   - Aplikasi: Uji hubungan gender-preferensi, pendidikan-pekerjaan, dll")
print("   - Perhatian: Jika asumsi tidak terpenuhi, gunakan uji lain")
print("   - Selalu periksa residual untuk memahami pola hubungan")


## 4. Chi-Square Distribution

**Chi-Square Distribution** adalah distribusi probabilitas yang digunakan dalam uji chi-square.

### Karakteristik:
1. **Bentuk**: Asimetris, selalu positif
2. **Parameter**: Degrees of freedom (df)
3. **Mean**: df
4. **Variance**: 2 × df
5. **Mode**: df - 2 (jika df > 2)

### Sifat-sifat:
- Semakin besar df, semakin mendekati distribusi normal
- Chi-square dengan df = 1 adalah distribusi normal kuadrat
- Chi-square dengan df = 2 adalah distribusi eksponensial
- Chi-square dengan df = 3 adalah distribusi gamma

### Aplikasi:
- Uji goodness of fit
- Uji independence
- Uji homogenitas
- Uji asosiasi


In [None]:
# Demonstrasi Chi-Square Distribution
print("=== DEMONSTRASI CHI-SQUARE DISTRIBUTION ===")

# 1. Karakteristik Chi-Square Distribution
print("\n1. KARAKTERISTIK CHI-SQUARE DISTRIBUTION:")

# Parameter untuk berbagai degrees of freedom
dfs = [1, 2, 3, 5, 10, 20]
print("Degrees of Freedom | Mean | Variance | Mode")
print("-" * 45)
for df in dfs:
    mean = df
    variance = 2 * df
    mode = max(0, df - 2) if df > 2 else 0
    print(f"{df:17d} | {mean:4d} | {variance:8d} | {mode:4d}")

# 2. Visualisasi Chi-Square Distribution
plt.figure(figsize=(20, 15))

# Plot 1: Chi-square Distribution untuk berbagai df
plt.subplot(3, 4, 1)
x = np.linspace(0, 30, 1000)
for df in [1, 2, 3, 5, 10, 20]:
    y = chi2.pdf(x, df)
    plt.plot(x, y, linewidth=2, label=f'df = {df}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution untuk Berbagai df')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 2: Chi-square Distribution vs Normal (df = 30)
plt.subplot(3, 4, 2)
df_large = 30
x_large = np.linspace(0, 60, 1000)
y_chi2_large = chi2.pdf(x_large, df_large)
y_normal = stats.norm.pdf(x_large, df_large, np.sqrt(2*df_large))
plt.plot(x_large, y_chi2_large, 'b-', linewidth=2, label=f'Chi-square (df={df_large})')
plt.plot(x_large, y_normal, 'r--', linewidth=2, label='Normal approximation')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Chi-square vs Normal (df=30)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 3: Chi-square Distribution (df=1) vs Normal Squared
plt.subplot(3, 4, 3)
x_small = np.linspace(0, 10, 1000)
y_chi2_1 = chi2.pdf(x_small, 1)
y_normal_sq = stats.norm.pdf(np.sqrt(x_small), 0, 1) / (2 * np.sqrt(x_small))
plt.plot(x_small, y_chi2_1, 'b-', linewidth=2, label='Chi-square (df=1)')
plt.plot(x_small, y_normal_sq, 'r--', linewidth=2, label='Normal squared')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Chi-square (df=1) vs Normal Squared')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 4: Chi-square Distribution (df=2) vs Exponential
plt.subplot(3, 4, 4)
x_exp = np.linspace(0, 10, 1000)
y_chi2_2 = chi2.pdf(x_exp, 2)
y_exp = stats.expon.pdf(x_exp, scale=2)
plt.plot(x_exp, y_chi2_2, 'b-', linewidth=2, label='Chi-square (df=2)')
plt.plot(x_exp, y_exp, 'r--', linewidth=2, label='Exponential (scale=2)')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Chi-square (df=2) vs Exponential')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 5: Chi-square Distribution (df=3) vs Gamma
plt.subplot(3, 4, 5)
x_gamma = np.linspace(0, 15, 1000)
y_chi2_3 = chi2.pdf(x_gamma, 3)
y_gamma = stats.gamma.pdf(x_gamma, 3/2, scale=2)
plt.plot(x_gamma, y_chi2_3, 'b-', linewidth=2, label='Chi-square (df=3)')
plt.plot(x_gamma, y_gamma, 'r--', linewidth=2, label='Gamma (shape=3/2, scale=2)')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Chi-square (df=3) vs Gamma')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 6: Cumulative Distribution Function
plt.subplot(3, 4, 6)
x_cdf = np.linspace(0, 20, 1000)
for df in [1, 2, 3, 5, 10]:
    y_cdf = chi2.cdf(x_cdf, df)
    plt.plot(x_cdf, y_cdf, linewidth=2, label=f'df = {df}')
plt.xlabel('Chi-square value')
plt.ylabel('Cumulative Probability')
plt.title('Cumulative Distribution Function')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 7: Percentiles
plt.subplot(3, 4, 7)
percentiles = [0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95]
df_percentiles = [1, 2, 3, 5, 10, 20]
percentile_values = np.zeros((len(df_percentiles), len(percentiles)))
for i, df in enumerate(df_percentiles):
    for j, p in enumerate(percentiles):
        percentile_values[i, j] = chi2.ppf(p, df)

plt.imshow(percentile_values, cmap='viridis', aspect='auto')
plt.colorbar(label='Chi-square value')
plt.xlabel('Percentile')
plt.ylabel('Degrees of Freedom')
plt.xticks(range(len(percentiles)), [f'{p*100:.0f}%' for p in percentiles])
plt.yticks(range(len(df_percentiles)), df_percentiles)
plt.title('Chi-square Percentiles')
plt.grid(True, alpha=0.3)

# Plot 8: Critical Values
plt.subplot(3, 4, 8)
alpha_values = [0.01, 0.05, 0.1]
df_critical = np.arange(1, 21)
critical_values = np.zeros((len(alpha_values), len(df_critical)))
for i, alpha in enumerate(alpha_values):
    for j, df in enumerate(df_critical):
        critical_values[i, j] = chi2.ppf(1 - alpha, df)

for i, alpha in enumerate(alpha_values):
    plt.plot(df_critical, critical_values[i], 'o-', linewidth=2, label=f'α = {alpha}')
plt.xlabel('Degrees of Freedom')
plt.ylabel('Critical Value')
plt.title('Chi-square Critical Values')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 9: Probability Density Function
plt.subplot(3, 4, 9)
x_pdf = np.linspace(0, 15, 1000)
y_pdf = chi2.pdf(x_pdf, 5)
plt.plot(x_pdf, y_pdf, 'b-', linewidth=2, label='PDF')
plt.fill_between(x_pdf, 0, y_pdf, alpha=0.3, color='blue')
plt.axvline(chi2.ppf(0.95, 5), color='red', linestyle='--', linewidth=2, label='95th percentile')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Probability Density Function (df=5)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 10: Survival Function
plt.subplot(3, 4, 10)
x_surv = np.linspace(0, 20, 1000)
y_surv = 1 - chi2.cdf(x_surv, 5)
plt.plot(x_surv, y_surv, 'b-', linewidth=2, label='Survival Function')
plt.axhline(0.05, color='red', linestyle='--', linewidth=2, label='α = 0.05')
plt.xlabel('Chi-square value')
plt.ylabel('Survival Probability')
plt.title('Survival Function (df=5)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 11: Hazard Function
plt.subplot(3, 4, 11)
x_haz = np.linspace(0.1, 20, 1000)
y_haz = chi2.pdf(x_haz, 5) / (1 - chi2.cdf(x_haz, 5))
plt.plot(x_haz, y_haz, 'b-', linewidth=2, label='Hazard Function')
plt.xlabel('Chi-square value')
plt.ylabel('Hazard Rate')
plt.title('Hazard Function (df=5)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 12: Moment Generating Function
plt.subplot(3, 4, 12)
t = np.linspace(0, 0.5, 1000)
y_mgf = (1 - 2*t)**(-5/2)  # MGF for chi-square with df=5
plt.plot(t, y_mgf, 'b-', linewidth=2, label='MGF (df=5)')
plt.xlabel('t')
plt.ylabel('MGF(t)')
plt.title('Moment Generating Function (df=5)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 3. Sifat-sifat Chi-Square Distribution
print("\n3. SIFAT-SIFAT CHI-SQUARE DISTRIBUTION:")

# Mean dan Variance
print("Mean dan Variance untuk berbagai df:")
print("df | Mean | Variance | Std Dev")
print("-" * 30)
for df in [1, 2, 3, 5, 10, 20]:
    mean = df
    variance = 2 * df
    std_dev = np.sqrt(variance)
    print(f"{df:2d} | {mean:4d} | {variance:8d} | {std_dev:6.2f}")

# Skewness dan Kurtosis
print("\nSkewness dan Kurtosis untuk berbagai df:")
print("df | Skewness | Kurtosis")
print("-" * 25)
for df in [1, 2, 3, 5, 10, 20]:
    skewness = 2 * np.sqrt(2/df)
    kurtosis = 12/df
    print(f"{df:2d} | {skewness:8.3f} | {kurtosis:8.3f}")

# 4. Aplikasi Chi-Square Distribution
print("\n4. APLIKASI CHI-SQUARE DISTRIBUTION:")

# Uji Goodness of Fit
print("Uji Goodness of Fit:")
observed_gof = np.array([25, 30, 20, 15, 10])
expected_gof = np.array([20, 20, 20, 20, 20])
chi2_stat_gof, p_value_gof = stats.chisquare(observed_gof, expected_gof)
print(f"  Chi-square statistic: {chi2_stat_gof:.4f}")
print(f"  p-value: {p_value_gof:.4f}")

# Uji Independence
print("\nUji Independence:")
contingency_table_ind = np.array([[20, 30, 10], [15, 25, 20], [10, 20, 15]])
chi2_stat_ind, p_value_ind, dof_ind, expected_ind = chi2_contingency(contingency_table_ind)
print(f"  Chi-square statistic: {chi2_stat_ind:.4f}")
print(f"  p-value: {p_value_ind:.4f}")
print(f"  Degrees of freedom: {dof_ind}")

# 5. Kesimpulan dan Rekomendasi
print("\n5. KESIMPULAN DAN REKOMENDASI:")
print("   - Chi-square distribution: Distribusi probabilitas untuk uji chi-square")
print("   - Karakteristik: Asimetris, selalu positif, bergantung pada df")
print("   - Sifat: Semakin besar df, semakin mendekati normal")
print("   - Aplikasi: Uji goodness of fit, independence, homogenitas")
print("   - Perhatian: Asumsi frekuensi yang diharapkan ≥ 5")
print("   - Interpretasi: p-value < α berarti tolak H0")


## 5. Asumsi Uji Chi-Square (Chi-Square Assumptions)

**Asumsi Uji Chi-Square** adalah kondisi yang harus dipenuhi agar uji chi-square valid.

### Asumsi Utama:
1. **Data Kategorik**: Variabel harus berupa data kategorik
2. **Sampel Independen**: Observasi harus independen satu sama lain
3. **Frekuensi yang Diharapkan**: Setiap sel harus memiliki frekuensi yang diharapkan ≥ 5
4. **Sampel yang Cukup**: Ukuran sampel harus cukup besar

### Asumsi Tambahan:
- **Random Sampling**: Sampel harus diambil secara acak
- **Mutually Exclusive**: Kategori harus saling lepas
- **Exhaustive**: Kategori harus mencakup semua kemungkinan

### Pelanggaran Asumsi:
- **Frekuensi < 5**: Gunakan uji Fisher's exact test
- **Data Ordinal**: Gunakan uji Mann-Whitney U atau Kruskal-Wallis
- **Data Kontinu**: Gunakan uji t atau ANOVA


In [None]:
# Demonstrasi Asumsi Uji Chi-Square
print("=== DEMONSTRASI ASUMSI UJI CHI-SQUARE ===")

# 1. Pengecekan Asumsi Frekuensi yang Diharapkan
print("\n1. PENGECEKAN ASUMSI FREKUENSI YANG DIHARAPKAN:")

# Contoh data yang memenuhi asumsi
print("Contoh data yang MEMENUHI asumsi:")
contingency_table_good = np.array([
    [20, 30, 25],  # Baris 1
    [25, 35, 30],  # Baris 2
    [15, 20, 25]   # Baris 3
])

print("Tabel kontingensi:")
print(contingency_table_good)

# Hitung frekuensi yang diharapkan
chi2_stat_good, p_value_good, dof_good, expected_good = chi2_contingency(contingency_table_good)
print(f"\nFrekuensi yang diharapkan:")
print(expected_good)

# Cek asumsi frekuensi ≥ 5
print(f"\nPengecekan asumsi frekuensi ≥ 5:")
print("Sel | Observed | Expected | Asumsi Terpenuhi")
print("-" * 45)
row_idx = 0
col_idx = 0
for i in range(contingency_table_good.shape[0]):
    for j in range(contingency_table_good.shape[1]):
        observed = contingency_table_good[i, j]
        expected = expected_good[i, j]
        assumption_met = "Ya" if expected >= 5 else "Tidak"
        print(f"{i+1},{j+1}  | {observed:8d} | {expected:8.1f} | {assumption_met:15s}")
        col_idx += 1
    row_idx += 1
    col_idx = 0

# Contoh data yang TIDAK memenuhi asumsi
print("\nContoh data yang TIDAK MEMENUHI asumsi:")
contingency_table_bad = np.array([
    [2, 3, 1],   # Baris 1
    [1, 2, 1],   # Baris 2
    [0, 1, 2]    # Baris 3
])

print("Tabel kontingensi:")
print(contingency_table_bad)

# Hitung frekuensi yang diharapkan
chi2_stat_bad, p_value_bad, dof_bad, expected_bad = chi2_contingency(contingency_table_bad)
print(f"\nFrekuensi yang diharapkan:")
print(expected_bad)

# Cek asumsi frekuensi ≥ 5
print(f"\nPengecekan asumsi frekuensi ≥ 5:")
print("Sel | Observed | Expected | Asumsi Terpenuhi")
print("-" * 45)
row_idx = 0
col_idx = 0
for i in range(contingency_table_bad.shape[0]):
    for j in range(contingency_table_bad.shape[1]):
        observed = contingency_table_bad[i, j]
        expected = expected_bad[i, j]
        assumption_met = "Ya" if expected >= 5 else "Tidak"
        print(f"{i+1},{j+1}  | {observed:8d} | {expected:8.1f} | {assumption_met:15s}")
        col_idx += 1
    row_idx += 1
    col_idx = 0

# 2. Pengecekan Asumsi Data Kategorik
print("\n2. PENGECEKAN ASUMSI DATA KATEGORIK:")

# Data kategorik yang valid
print("Data kategorik yang VALID:")
categorical_data = {
    'Gender': ['Laki-laki', 'Perempuan', 'Laki-laki', 'Perempuan', 'Laki-laki'],
    'Preferensi': ['A', 'B', 'A', 'C', 'B']
}
print(categorical_data)

# Data kontinu yang tidak valid
print("\nData kontinu yang TIDAK VALID:")
continuous_data = {
    'Tinggi': [170, 165, 180, 175, 160],
    'Berat': [65, 60, 80, 70, 55]
}
print(continuous_data)

# 3. Pengecekan Asumsi Sampel Independen
print("\n3. PENGECEKAN ASUMSI SAMPEL INDEPENDEN:")

# Sampel independen yang valid
print("Sampel independen yang VALID:")
independent_sample = np.random.choice(['A', 'B', 'C'], size=100, p=[0.3, 0.4, 0.3])
print(f"Ukuran sampel: {len(independent_sample)}")
print(f"Distribusi: {np.unique(independent_sample, return_counts=True)}")

# Sampel dependen yang tidak valid (contoh: data berulang)
print("\nSampel dependen yang TIDAK VALID:")
dependent_sample = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'] * 10
print(f"Ukuran sampel: {len(dependent_sample)}")
print(f"Distribusi: {np.unique(dependent_sample, return_counts=True)}")

# 4. Pengecekan Asumsi Sampel yang Cukup
print("\n4. PENGECEKAN ASUMSI SAMPEL YANG CUKUP:")

# Sampel yang cukup
print("Sampel yang CUKUP:")
sufficient_sample = np.random.choice(['A', 'B', 'C'], size=1000, p=[0.3, 0.4, 0.3])
print(f"Ukuran sampel: {len(sufficient_sample)}")
print(f"Distribusi: {np.unique(sufficient_sample, return_counts=True)}")

# Sampel yang tidak cukup
print("\nSampel yang TIDAK CUKUP:")
insufficient_sample = np.random.choice(['A', 'B', 'C'], size=10, p=[0.3, 0.4, 0.3])
print(f"Ukuran sampel: {len(insufficient_sample)}")
print(f"Distribusi: {np.unique(insufficient_sample, return_counts=True)}")

# 5. Visualisasi Asumsi Uji Chi-Square
plt.figure(figsize=(20, 15))

# Plot 1: Frekuensi yang Diharapkan (Baik)
plt.subplot(3, 4, 1)
sns.heatmap(contingency_table_good, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['Y1', 'Y2', 'Y3'])
plt.title('Data yang Memenuhi Asumsi')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

# Plot 2: Frekuensi yang Diharapkan (Buruk)
plt.subplot(3, 4, 2)
sns.heatmap(contingency_table_bad, annot=True, fmt='d', cmap='Reds', 
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['Y1', 'Y2', 'Y3'])
plt.title('Data yang Tidak Memenuhi Asumsi')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

# Plot 3: Expected vs Observed (Baik)
plt.subplot(3, 4, 3)
observed_flat_good = contingency_table_good.flatten()
expected_flat_good = expected_good.flatten()
plt.scatter(expected_flat_good, observed_flat_good, alpha=0.7, s=100, color='blue')
plt.plot([min(expected_flat_good), max(expected_flat_good)], [min(expected_flat_good), max(expected_flat_good)], 'r--', alpha=0.7)
plt.xlabel('Expected')
plt.ylabel('Observed')
plt.title('Expected vs Observed (Baik)')
plt.grid(True, alpha=0.3)

# Plot 4: Expected vs Observed (Buruk)
plt.subplot(3, 4, 4)
observed_flat_bad = contingency_table_bad.flatten()
expected_flat_bad = expected_bad.flatten()
plt.scatter(expected_flat_bad, observed_flat_bad, alpha=0.7, s=100, color='red')
plt.plot([min(expected_flat_bad), max(expected_flat_bad)], [min(expected_flat_bad), max(expected_flat_bad)], 'r--', alpha=0.7)
plt.xlabel('Expected')
plt.ylabel('Observed')
plt.title('Expected vs Observed (Buruk)')
plt.grid(True, alpha=0.3)

# Plot 5: Residuals (Baik)
plt.subplot(3, 4, 5)
residuals_good = contingency_table_good - expected_good
sns.heatmap(residuals_good, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['Y1', 'Y2', 'Y3'])
plt.title('Residuals (Baik)')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

# Plot 6: Residuals (Buruk)
plt.subplot(3, 4, 6)
residuals_bad = contingency_table_bad - expected_bad
sns.heatmap(residuals_bad, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['X1', 'X2', 'X3'], yticklabels=['Y1', 'Y2', 'Y3'])
plt.title('Residuals (Buruk)')
plt.xlabel('Variabel X')
plt.ylabel('Variabel Y')

# Plot 7: Chi-square Distribution (Baik)
plt.subplot(3, 4, 7)
x_chi2 = np.linspace(0, 20, 100)
chi2_dist_good = chi2.pdf(x_chi2, dof_good)
plt.plot(x_chi2, chi2_dist_good, 'b-', linewidth=2, label=f'Chi-square (df={dof_good})')
plt.axvline(chi2_stat_good, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_good:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Baik)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 8: Chi-square Distribution (Buruk)
plt.subplot(3, 4, 8)
chi2_dist_bad = chi2.pdf(x_chi2, dof_bad)
plt.plot(x_chi2, chi2_dist_bad, 'b-', linewidth=2, label=f'Chi-square (df={dof_bad})')
plt.axvline(chi2_stat_bad, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_bad:.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution (Buruk)')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 9: Sample Size vs Power
plt.subplot(3, 4, 9)
sample_sizes = np.arange(50, 1001, 50)
powers = []
for n in sample_sizes:
    # Simulasi power untuk berbagai ukuran sampel
    power = 1 - stats.chi2.cdf(chi2.ppf(0.95, dof_good), dof_good)
    powers.append(power)

plt.plot(sample_sizes, powers, 'b-', linewidth=2)
plt.axhline(0.8, color='red', linestyle='--', linewidth=2, label='Power = 0.8')
plt.xlabel('Sample Size')
plt.ylabel('Power')
plt.title('Sample Size vs Power')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 10: Effect Size vs Sample Size
plt.subplot(3, 4, 10)
effect_sizes = np.linspace(0.1, 1, 100)
sample_sizes_effect = []
for effect in effect_sizes:
    # Hitung sample size yang diperlukan untuk power 0.8
    n_required = (2 * (stats.norm.ppf(0.8) + stats.norm.ppf(0.975))**2) / effect**2
    sample_sizes_effect.append(n_required)

plt.plot(effect_sizes, sample_sizes_effect, 'b-', linewidth=2)
plt.xlabel('Effect Size')
plt.ylabel('Required Sample Size')
plt.title('Effect Size vs Sample Size')
plt.grid(True, alpha=0.3)

# Plot 11: Type I vs Type II Error
plt.subplot(3, 4, 11)
alpha_values = np.linspace(0.01, 0.1, 100)
beta_values = []
for a in alpha_values:
    df = dof_good
    t_critical = chi2.ppf(1 - a, df)
    beta = chi2.cdf(t_critical, df)
    beta_values.append(beta)

plt.plot(alpha_values, beta_values, 'b-', linewidth=2, label='Type II Error')
plt.plot(alpha_values, alpha_values, 'r--', linewidth=2, label='Type I Error')
plt.xlabel('Type I Error (α)')
plt.ylabel('Type II Error (β)')
plt.title('Type I vs Type II Error')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 12: Assumption Checklist
plt.subplot(3, 4, 12)
assumptions = ['Data Kategorik', 'Sampel Independen', 'Frekuensi ≥ 5', 'Sampel Cukup']
status_good = [True, True, True, True]
status_bad = [True, True, False, False]

x = np.arange(len(assumptions))
width = 0.35
plt.bar(x - width/2, status_good, width, label='Data Baik', alpha=0.7, color='green')
plt.bar(x + width/2, status_bad, width, label='Data Buruk', alpha=0.7, color='red')
plt.xlabel('Asumsi')
plt.ylabel('Status')
plt.title('Asumsi Uji Chi-Square')
plt.xticks(x, assumptions, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Kesimpulan dan Rekomendasi
print("\n6. KESIMPULAN DAN REKOMENDASI:")
print("   - Asumsi utama: Data kategorik, sampel independen, frekuensi ≥ 5")
print("   - Pengecekan: Selalu periksa frekuensi yang diharapkan")
print("   - Pelanggaran: Gunakan uji alternatif jika asumsi tidak terpenuhi")
print("   - Sampel: Pastikan ukuran sampel cukup untuk power yang diinginkan")
print("   - Interpretasi: Hasil uji hanya valid jika asumsi terpenuhi")
print("   - Alternatif: Fisher's exact test untuk frekuensi < 5")


## 6. Interpretasi Hasil (Interpretation of Results)

**Interpretasi Hasil Uji Chi-Square** adalah proses memahami dan menjelaskan hasil uji statistik.

### Langkah-langkah Interpretasi:
1. **Pengecekan Asumsi**: Pastikan asumsi uji terpenuhi
2. **Statistik Uji**: Lihat nilai chi-square statistic
3. **P-value**: Bandingkan dengan tingkat signifikansi (α)
4. **Keputusan**: Tolak atau gagal tolak H0
5. **Kesimpulan**: Jelaskan dalam konteks penelitian

### Interpretasi Chi-Square Statistic:
- **Nilai Besar**: Menunjukkan perbedaan yang signifikan
- **Nilai Kecil**: Menunjukkan perbedaan yang tidak signifikan
- **Nilai 0**: Tidak ada perbedaan sama sekali

### Interpretasi P-value:
- **p < 0.001**: Sangat signifikan (***)
- **p < 0.01**: Sangat signifikan (**)
- **p < 0.05**: Signifikan (*)
- **p ≥ 0.05**: Tidak signifikan

### Interpretasi Effect Size:
- **Cramer's V**: Ukuran kekuatan hubungan
- **Phi Coefficient**: Ukuran asosiasi untuk tabel 2x2
- **Contingency Coefficient**: Ukuran asosiasi umum


In [None]:
# Demonstrasi Interpretasi Hasil Uji Chi-Square
print("=== DEMONSTRASI INTERPRETASI HASIL UJI CHI-SQUARE ===")

# 1. Interpretasi Chi-Square Statistic
print("\n1. INTERPRETASI CHI-SQUARE STATISTIC:")

# Contoh data untuk interpretasi
contingency_table_interpret = np.array([
    [30, 20, 10],  # Kelompok A
    [25, 35, 20],  # Kelompok B
    [15, 25, 30]   # Kelompok C
])

print("Tabel kontingensi:")
print(contingency_table_interpret)

# Uji chi-square
chi2_stat_interpret, p_value_interpret, dof_interpret, expected_interpret = chi2_contingency(contingency_table_interpret)

print(f"\nHasil uji chi-square:")
print(f"Chi-square statistic: {chi2_stat_interpret:.4f}")
print(f"p-value: {p_value_interpret:.4f}")
print(f"Degrees of freedom: {dof_interpret}")

# Interpretasi chi-square statistic
print(f"\nInterpretasi chi-square statistic:")
if chi2_stat_interpret == 0:
    print("  - Nilai 0: Tidak ada perbedaan sama sekali")
elif chi2_stat_interpret < 5:
    print("  - Nilai kecil: Perbedaan yang tidak signifikan")
elif chi2_stat_interpret < 15:
    print("  - Nilai sedang: Perbedaan yang signifikan")
else:
    print("  - Nilai besar: Perbedaan yang sangat signifikan")

# 2. Interpretasi P-value
print("\n2. INTERPRETASI P-VALUE:")

alpha = 0.05
print(f"Tingkat signifikansi (α): {alpha}")

if p_value_interpret < 0.001:
    significance = "Sangat signifikan (***)"
elif p_value_interpret < 0.01:
    significance = "Sangat signifikan (**)"
elif p_value_interpret < 0.05:
    significance = "Signifikan (*)"
else:
    significance = "Tidak signifikan"

print(f"P-value: {p_value_interpret:.4f}")
print(f"Interpretasi: {significance}")

# Keputusan statistik
if p_value_interpret < alpha:
    decision = "Tolak H0"
    conclusion = "Ada hubungan yang signifikan antara variabel"
else:
    decision = "Gagal tolak H0"
    conclusion = "Tidak ada hubungan yang signifikan antara variabel"

print(f"Keputusan: {decision}")
print(f"Kesimpulan: {conclusion}")

# 3. Interpretasi Effect Size
print("\n3. INTERPRETASI EFFECT SIZE:")

# Hitung Cramer's V
n = np.sum(contingency_table_interpret)
cramers_v = np.sqrt(chi2_stat_interpret / (n * (min(contingency_table_interpret.shape) - 1)))

print(f"Cramer's V: {cramers_v:.4f}")

# Interpretasi Cramer's V
if cramers_v < 0.1:
    effect_size = "Sangat kecil"
elif cramers_v < 0.3:
    effect_size = "Kecil"
elif cramers_v < 0.5:
    effect_size = "Sedang"
else:
    effect_size = "Besar"

print(f"Interpretasi effect size: {effect_size}")

# Hitung Phi Coefficient (untuk tabel 2x2)
if contingency_table_interpret.shape == (2, 2):
    phi = np.sqrt(chi2_stat_interpret / n)
    print(f"Phi Coefficient: {phi:.4f}")
else:
    print("Phi Coefficient: Tidak dapat dihitung (bukan tabel 2x2)")

# Hitung Contingency Coefficient
contingency_coef = np.sqrt(chi2_stat_interpret / (chi2_stat_interpret + n))
print(f"Contingency Coefficient: {contingency_coef:.4f}")

# 4. Interpretasi Residuals
print("\n4. INTERPRETASI RESIDUALS:")

residuals_interpret = contingency_table_interpret - expected_interpret
print("Residuals (Observed - Expected):")
print(residuals_interpret)

print("\nInterpretasi residuals:")
print("  - Nilai positif: Frekuensi observasi lebih tinggi dari yang diharapkan")
print("  - Nilai negatif: Frekuensi observasi lebih rendah dari yang diharapkan")
print("  - Nilai mendekati 0: Frekuensi observasi mendekati yang diharapkan")

# Identifikasi sel dengan residual terbesar
max_residual_idx = np.unravel_index(np.argmax(np.abs(residuals_interpret)), residuals_interpret.shape)
max_residual = residuals_interpret[max_residual_idx]
print(f"  - Residual terbesar: {max_residual:.2f} pada sel {max_residual_idx}")

# 5. Interpretasi dalam Konteks Penelitian
print("\n5. INTERPRETASI DALAM KONTEKS PENELITIAN:")

print("Contoh interpretasi untuk penelitian:")
print(f"  - Judul: 'Hubungan antara Kategori Umur dan Preferensi Produk'")
print(f"  - Hipotesis: H0: Tidak ada hubungan antara umur dan preferensi")
print(f"  - Hasil: χ² = {chi2_stat_interpret:.2f}, p = {p_value_interpret:.4f}")
print(f"  - Keputusan: {decision}")
print(f"  - Kesimpulan: {conclusion}")
print(f"  - Effect size: {effect_size} (Cramer's V = {cramers_v:.3f})")

# 6. Visualisasi Interpretasi Hasil
plt.figure(figsize=(20, 15))

# Plot 1: Tabel Kontingensi
plt.subplot(3, 4, 1)
sns.heatmap(contingency_table_interpret, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Produk A', 'Produk B', 'Produk C'], 
            yticklabels=['18-25', '26-35', '36+'])
plt.title('Tabel Kontingensi')
plt.xlabel('Preferensi Produk')
plt.ylabel('Kategori Umur')

# Plot 2: Frekuensi yang Diharapkan
plt.subplot(3, 4, 2)
sns.heatmap(expected_interpret, annot=True, fmt='.1f', cmap='Greens', 
            xticklabels=['Produk A', 'Produk B', 'Produk C'], 
            yticklabels=['18-25', '26-35', '36+'])
plt.title('Frekuensi yang Diharapkan')
plt.xlabel('Preferensi Produk')
plt.ylabel('Kategori Umur')

# Plot 3: Residuals
plt.subplot(3, 4, 3)
sns.heatmap(residuals_interpret, annot=True, fmt='.1f', cmap='RdBu_r', center=0,
            xticklabels=['Produk A', 'Produk B', 'Produk C'], 
            yticklabels=['18-25', '26-35', '36+'])
plt.title('Residuals')
plt.xlabel('Preferensi Produk')
plt.ylabel('Kategori Umur')

# Plot 4: Chi-square Distribution
plt.subplot(3, 4, 4)
x_chi2 = np.linspace(0, 20, 100)
chi2_dist_interpret = chi2.pdf(x_chi2, dof_interpret)
plt.plot(x_chi2, chi2_dist_interpret, 'b-', linewidth=2, label=f'Chi-square (df={dof_interpret})')
plt.axvline(chi2_stat_interpret, color='red', linestyle='--', linewidth=2, label=f'Chi-square stat: {chi2_stat_interpret:.2f}')
plt.axvline(chi2.ppf(0.95, dof_interpret), color='green', linestyle=':', linewidth=2, label=f'Critical value: {chi2.ppf(0.95, dof_interpret):.2f}')
plt.xlabel('Chi-square value')
plt.ylabel('Density')
plt.title('Chi-square Distribution')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 5: P-value Interpretation
plt.subplot(3, 4, 5)
p_values = [0.001, 0.01, 0.05, 0.1, 0.2, 0.5, 1.0]
significance_levels = ['***', '**', '*', 'ns', 'ns', 'ns', 'ns']
colors = ['red', 'orange', 'yellow', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen']

plt.bar(range(len(p_values)), p_values, color=colors, alpha=0.7)
plt.axhline(0.05, color='red', linestyle='--', linewidth=2, label='α = 0.05')
plt.axhline(p_value_interpret, color='blue', linestyle='-', linewidth=2, label=f'P-value: {p_value_interpret:.4f}')
plt.xlabel('Significance Level')
plt.ylabel('P-value')
plt.title('P-value Interpretation')
plt.xticks(range(len(p_values)), significance_levels)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 6: Effect Size Interpretation
plt.subplot(3, 4, 6)
effect_sizes = [0.1, 0.3, 0.5, 0.7, 1.0]
effect_labels = ['Sangat Kecil', 'Kecil', 'Sedang', 'Besar', 'Sangat Besar']
colors_effect = ['lightblue', 'lightgreen', 'yellow', 'orange', 'red']

plt.bar(range(len(effect_sizes)), effect_sizes, color=colors_effect, alpha=0.7)
plt.axhline(cramers_v, color='blue', linestyle='-', linewidth=2, label=f"Cramer's V: {cramers_v:.3f}")
plt.xlabel('Effect Size Category')
plt.ylabel('Effect Size Value')
plt.title('Effect Size Interpretation')
plt.xticks(range(len(effect_sizes)), effect_labels, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 7: Residuals Distribution
plt.subplot(3, 4, 7)
residuals_flat = residuals_interpret.flatten()
plt.hist(residuals_flat, bins=10, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(0, color='red', linestyle='--', linewidth=2, label='Zero line')
plt.xlabel('Residual Value')
plt.ylabel('Frequency')
plt.title('Residuals Distribution')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 8: Observed vs Expected
plt.subplot(3, 4, 8)
observed_flat = contingency_table_interpret.flatten()
expected_flat = expected_interpret.flatten()
plt.scatter(expected_flat, observed_flat, alpha=0.7, s=100, color='blue')
plt.plot([min(expected_flat), max(expected_flat)], [min(expected_flat), max(expected_flat)], 'r--', alpha=0.7)
plt.xlabel('Expected')
plt.ylabel('Observed')
plt.title('Observed vs Expected')
plt.grid(True, alpha=0.3)

# Plot 9: Chi-square Statistic vs Critical Value
plt.subplot(3, 4, 9)
critical_value = chi2.ppf(0.95, dof_interpret)
plt.bar(['Chi-square Stat', 'Critical Value'], [chi2_stat_interpret, critical_value], 
        color=['blue', 'red'], alpha=0.7)
plt.ylabel('Value')
plt.title('Chi-square Statistic vs Critical Value')
plt.grid(True, alpha=0.3)

# Plot 10: P-value vs Alpha
plt.subplot(3, 4, 10)
plt.bar(['P-value', 'Alpha'], [p_value_interpret, alpha], 
        color=['blue', 'red'], alpha=0.7)
plt.ylabel('Value')
plt.title('P-value vs Alpha')
plt.grid(True, alpha=0.3)

# Plot 11: Effect Size Categories
plt.subplot(3, 4, 11)
categories = ['Sangat Kecil', 'Kecil', 'Sedang', 'Besar']
ranges = ['0-0.1', '0.1-0.3', '0.3-0.5', '0.5+']
colors_cat = ['lightblue', 'lightgreen', 'yellow', 'orange']

plt.bar(categories, [0.1, 0.2, 0.2, 0.1], color=colors_cat, alpha=0.7)
plt.axhline(cramers_v, color='blue', linestyle='-', linewidth=2, label=f"Cramer's V: {cramers_v:.3f}")
plt.xlabel('Effect Size Category')
plt.ylabel('Range')
plt.title('Effect Size Categories')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 12: Summary Statistics
plt.subplot(3, 4, 12)
stats_names = ['Chi-square', 'P-value', "Cramer's V", 'Critical Value']
stats_values = [chi2_stat_interpret, p_value_interpret, cramers_v, critical_value]
colors_stats = ['blue', 'green', 'orange', 'red']

plt.bar(stats_names, stats_values, color=colors_stats, alpha=0.7)
plt.ylabel('Value')
plt.title('Summary Statistics')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 7. Kesimpulan dan Rekomendasi
print("\n7. KESIMPULAN DAN REKOMENDASI:")
print("   - Interpretasi: Pahami hasil dalam konteks penelitian")
print("   - P-value: Bandingkan dengan tingkat signifikansi")
print("   - Effect size: Ukur kekuatan hubungan")
print("   - Residuals: Identifikasi pola penyimpangan")
print("   - Kesimpulan: Jelaskan dalam bahasa yang mudah dipahami")
print("   - Rekomendasi: Berikan saran untuk penelitian selanjutnya")
