# üìä 1.1 Measures of Central Tendency
# (Ukuran Tendensi Sentral)

---

## üéØ Tujuan Pembelajaran

Setelah mempelajari notebook ini, Anda akan memahami:
- ‚úÖ Konsep Mean (Rata-rata)
- ‚úÖ Konsep Median (Nilai Tengah)
- ‚úÖ Konsep Mode (Nilai yang Paling Sering Muncul)
- ‚úÖ Kapan menggunakan masing-masing ukuran
- ‚úÖ Kelebihan dan kekurangan setiap ukuran

---

## üìö Teori: Apa itu Tendensi Sentral?

**Tendensi Sentral** adalah nilai yang merepresentasikan "pusat" atau "nilai tipikal" dari suatu dataset.

### 1Ô∏è‚É£ **MEAN (Rata-rata)**

**Formula:**
$$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$$

**Pengertian:** Jumlah semua nilai dibagi dengan banyaknya data.

**Kelebihan:**
- Menggunakan semua data
- Mudah dihitung dan dipahami
- Cocok untuk analisis statistik lanjutan

**Kekurangan:**
- Sensitif terhadap outlier (nilai ekstrem)
- Tidak cocok untuk data skewed (miring)

---

### 2Ô∏è‚É£ **MEDIAN (Nilai Tengah)**

**Pengertian:** Nilai tengah ketika data diurutkan.

**Formula:**
- Jika n ganjil: nilai ke-$(\frac{n+1}{2})$
- Jika n genap: rata-rata dari nilai ke-$\frac{n}{2}$ dan $\frac{n}{2}+1$

**Kelebihan:**
- Tidak terpengaruh outlier
- Cocok untuk data skewed
- Mudah dipahami

**Kekurangan:**
- Tidak menggunakan semua informasi data
- Kurang cocok untuk analisis lanjutan

---

### 3Ô∏è‚É£ **MODE (Modus)**

**Pengertian:** Nilai yang paling sering muncul dalam dataset.

**Kelebihan:**
- Cocok untuk data kategorikal
- Mudah dipahami
- Bisa untuk data nominal

**Kekurangan:**
- Bisa ada lebih dari satu mode
- Bisa tidak ada mode sama sekali
- Tidak menggunakan semua informasi data

---

In [None]:
# üì¶ Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Setting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ Libraries imported successfully!")

## üéÆ PLAYGROUND 1: Basic Central Tendency

Mari kita mulai dengan data sederhana. **Ubah nilai di bawah ini dan lihat hasilnya!**

In [None]:
# üéÆ PLAYGROUND ZONE - Ubah nilai di sini!
# ==========================================

# Data nilai ujian mahasiswa
data = [75, 80, 85, 90, 95, 100, 75, 80, 85, 90]

# Coba ubah data di atas dengan:
# data = [10, 20, 30, 40, 50]  # Data berbeda
# data = [1, 2, 3, 4, 100]     # Data dengan outlier
# data = [5, 5, 5, 10, 10, 15] # Data dengan mode jelas

# ==========================================

# Hitung Central Tendency
mean_value = np.mean(data)
median_value = np.median(data)
mode_value = stats.mode(data, keepdims=True)[0][0]

# Tampilkan hasil
print("="*50)
print("üìä MEASURES OF CENTRAL TENDENCY")
print("="*50)
print(f"Data: {data}")
print(f"\nüìà Mean (Rata-rata)  : {mean_value:.2f}")
print(f"üìç Median (Tengah)   : {median_value:.2f}")
print(f"üéØ Mode (Modus)      : {mode_value:.2f}")
print("="*50)

In [None]:
# üìä Visualisasi
plt.figure(figsize=(12, 6))

# Histogram
plt.subplot(1, 2, 1)
plt.hist(data, bins=10, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(mean_value, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_value:.2f}')
plt.axvline(median_value, color='green', linestyle='--', linewidth=2, label=f'Median: {median_value:.2f}')
plt.axvline(mode_value, color='orange', linestyle='--', linewidth=2, label=f'Mode: {mode_value:.2f}')
plt.xlabel('Nilai')
plt.ylabel('Frekuensi')
plt.title('Distribusi Data dengan Central Tendency')
plt.legend()
plt.grid(True, alpha=0.3)

# Box Plot
plt.subplot(1, 2, 2)
plt.boxplot(data, vert=True)
plt.axhline(mean_value, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_value:.2f}')
plt.axhline(median_value, color='green', linestyle='--', linewidth=2, label=f'Median: {median_value:.2f}')
plt.ylabel('Nilai')
plt.title('Box Plot dengan Mean dan Median')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üéÆ PLAYGROUND 2: Pengaruh Outlier

Mari kita lihat bagaimana outlier mempengaruhi Mean dan Median!

In [None]:
# üéÆ PLAYGROUND ZONE - Ubah nilai outlier!
# ==========================================

# Data gaji karyawan (dalam juta rupiah)
normal_data = [5, 6, 5.5, 6.5, 7, 5.8, 6.2, 5.9, 6.8, 6.5]
outlier_value = 50  # Gaji CEO! Coba ubah: 30, 100, 200

# ==========================================

# Data tanpa outlier
data_without_outlier = normal_data.copy()

# Data dengan outlier
data_with_outlier = normal_data + [outlier_value]

# Hitung statistik
mean_without = np.mean(data_without_outlier)
median_without = np.median(data_without_outlier)

mean_with = np.mean(data_with_outlier)
median_with = np.median(data_with_outlier)

# Tampilkan hasil
print("="*60)
print("üîç PENGARUH OUTLIER TERHADAP CENTRAL TENDENCY")
print("="*60)
print("\nüìä TANPA OUTLIER:")
print(f"   Mean   : Rp {mean_without:.2f} juta")
print(f"   Median : Rp {median_without:.2f} juta")
print(f"\nüìä DENGAN OUTLIER (Rp {outlier_value} juta):")
print(f"   Mean   : Rp {mean_with:.2f} juta (Naik {((mean_with/mean_without)-1)*100:.1f}%)")
print(f"   Median : Rp {median_with:.2f} juta (Naik {((median_with/median_without)-1)*100:.1f}%)")
print("\nüí° INSIGHT:")
print(f"   Mean berubah drastis: {abs(mean_with - mean_without):.2f} juta")
print(f"   Median berubah sedikit: {abs(median_with - median_without):.2f} juta")
print("   ‚Üí Median lebih robust terhadap outlier!")
print("="*60)

In [None]:
# üìä Visualisasi Perbandingan
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot tanpa outlier
axes[0].hist(data_without_outlier, bins=8, alpha=0.7, color='lightblue', edgecolor='black')
axes[0].axvline(mean_without, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_without:.2f}')
axes[0].axvline(median_without, color='green', linestyle='--', linewidth=2, label=f'Median: {median_without:.2f}')
axes[0].set_xlabel('Gaji (juta)')
axes[0].set_ylabel('Frekuensi')
axes[0].set_title('Data TANPA Outlier')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot dengan outlier
axes[1].hist(data_with_outlier, bins=8, alpha=0.7, color='lightcoral', edgecolor='black')
axes[1].axvline(mean_with, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_with:.2f}')
axes[1].axvline(median_with, color='green', linestyle='--', linewidth=2, label=f'Median: {median_with:.2f}')
axes[1].set_xlabel('Gaji (juta)')
axes[1].set_ylabel('Frekuensi')
axes[1].set_title(f'Data DENGAN Outlier ({outlier_value} juta)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üéÆ PLAYGROUND 3: Skewed Data (Data Miring)

Mari kita lihat perilaku Mean vs Median pada data yang miring!

In [None]:
# üéÆ PLAYGROUND ZONE - Generate different distributions!
# ==========================================

np.random.seed(42)
sample_size = 1000  # Coba: 500, 2000, 5000

# Right-skewed (miring kanan) - contoh: pendapatan
right_skewed = np.random.exponential(scale=2, size=sample_size)

# Left-skewed (miring kiri) - contoh: umur pensiun
left_skewed = 100 - np.random.exponential(scale=2, size=sample_size)

# Normal distribution (simetris)
normal = np.random.normal(loc=50, scale=10, size=sample_size)

# ==========================================

# Function untuk analisis
def analyze_distribution(data, title):
    mean = np.mean(data)
    median = np.median(data)
    mode = stats.mode(data.round(), keepdims=True)[0][0]
    
    print(f"\n{'='*50}")
    print(f"üìä {title}")
    print(f"{'='*50}")
    print(f"Mean   : {mean:.2f}")
    print(f"Median : {median:.2f}")
    print(f"Mode   : {mode:.2f}")
    
    if mean > median:
        print("\nüí° Mean > Median ‚Üí Right-Skewed (Miring Kanan)")
    elif mean < median:
        print("\nüí° Mean < Median ‚Üí Left-Skewed (Miring Kiri)")
    else:
        print("\nüí° Mean ‚âà Median ‚Üí Symmetrical (Simetris)")
    
    return mean, median, mode

# Analisis setiap distribusi
stats_right = analyze_distribution(right_skewed, "RIGHT-SKEWED DATA")
stats_left = analyze_distribution(left_skewed, "LEFT-SKEWED DATA")
stats_normal = analyze_distribution(normal, "NORMAL DATA")

In [None]:
# üìä Visualisasi Tiga Distribusi
fig, axes = plt.subplots(3, 1, figsize=(12, 12))

distributions = [
    (right_skewed, stats_right, 'Right-Skewed', 'lightcoral'),
    (left_skewed, stats_left, 'Left-Skewed', 'lightblue'),
    (normal, stats_normal, 'Normal', 'lightgreen')
]

for idx, (data, stats_vals, title, color) in enumerate(distributions):
    mean, median, mode = stats_vals
    
    axes[idx].hist(data, bins=50, alpha=0.7, color=color, edgecolor='black')
    axes[idx].axvline(mean, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean:.2f}')
    axes[idx].axvline(median, color='green', linestyle='--', linewidth=2, label=f'Median: {median:.2f}')
    axes[idx].set_xlabel('Nilai')
    axes[idx].set_ylabel('Frekuensi')
    axes[idx].set_title(f'{title} Distribution')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üåç USE CASE 1: Analisis Gaji Karyawan

**Scenario:** Anda adalah HR manager yang ingin menganalisis gaji karyawan untuk menentukan standar gaji baru.

In [None]:
# üéÆ PLAYGROUND ZONE - Real World Data
# ==========================================

# Data gaji karyawan berbagai level (dalam juta/bulan)
staff_salaries = np.random.normal(loc=8, scale=1.5, size=100)  # Staff
supervisor_salaries = np.random.normal(loc=15, scale=2, size=30)  # Supervisor
manager_salaries = np.random.normal(loc=30, scale=5, size=15)  # Manager
executive_salaries = np.random.normal(loc=80, scale=15, size=5)  # Executive

all_salaries = np.concatenate([staff_salaries, supervisor_salaries, 
                                manager_salaries, executive_salaries])

# ==========================================

# Analisis
mean_salary = np.mean(all_salaries)
median_salary = np.median(all_salaries)

print("="*60)
print("üíº ANALISIS GAJI KARYAWAN PERUSAHAAN")
print("="*60)
print(f"\nTotal Karyawan: {len(all_salaries)}")
print(f"\nGaji Rata-rata (Mean)  : Rp {mean_salary:.2f} juta/bulan")
print(f"Gaji Tengah (Median)   : Rp {median_salary:.2f} juta/bulan")
print(f"\nGaji Minimum           : Rp {np.min(all_salaries):.2f} juta/bulan")
print(f"Gaji Maximum           : Rp {np.max(all_salaries):.2f} juta/bulan")

print("\n" + "="*60)
print("üí° REKOMENDASI HR:")
print("="*60)
if mean_salary > median_salary * 1.2:
    print("‚ö†Ô∏è  Mean jauh lebih tinggi dari Median!")
    print("    ‚Üí Ada gap gaji yang signifikan (high earners mempengaruhi mean)")
    print("    ‚Üí Gunakan MEDIAN sebagai patokan gaji standar")
    print(f"    ‚Üí Standard gaji baru: Rp {median_salary:.2f} juta/bulan")
else:
    print("‚úÖ Mean dan Median relatif dekat")
    print("    ‚Üí Distribusi gaji cukup merata")
    print("    ‚Üí MEAN atau MEDIAN bisa digunakan")
    print(f"    ‚Üí Standard gaji: Rp {mean_salary:.2f} juta/bulan")
print("="*60)

In [None]:
# üìä Visualisasi Distribusi Gaji
plt.figure(figsize=(14, 6))

# Histogram
plt.subplot(1, 2, 1)
plt.hist(all_salaries, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(mean_salary, color='red', linestyle='--', linewidth=2, 
            label=f'Mean: Rp {mean_salary:.2f} juta')
plt.axvline(median_salary, color='green', linestyle='--', linewidth=2, 
            label=f'Median: Rp {median_salary:.2f} juta')
plt.xlabel('Gaji (juta/bulan)')
plt.ylabel('Jumlah Karyawan')
plt.title('Distribusi Gaji Karyawan')
plt.legend()
plt.grid(True, alpha=0.3)

# Box Plot per Level
plt.subplot(1, 2, 2)
data_by_level = [staff_salaries, supervisor_salaries, manager_salaries, executive_salaries]
labels = ['Staff', 'Supervisor', 'Manager', 'Executive']
bp = plt.boxplot(data_by_level, labels=labels, patch_artist=True)

colors = ['lightblue', 'lightgreen', 'lightyellow', 'lightcoral']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

plt.ylabel('Gaji (juta/bulan)')
plt.title('Gaji per Level Jabatan')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üåç USE CASE 2: Analisis Nilai Ujian Siswa

**Scenario:** Guru ingin mengetahui performa kelas dan menentukan nilai passing grade.

In [None]:
# üéÆ PLAYGROUND ZONE - Student Grades
# ==========================================

np.random.seed(42)

# Simulasi nilai ujian (0-100)
excellent_students = np.random.normal(loc=90, scale=5, size=5)  # Siswa pintar
good_students = np.random.normal(loc=75, scale=8, size=15)      # Siswa baik
average_students = np.random.normal(loc=60, scale=10, size=20)  # Siswa rata-rata
struggling_students = np.random.normal(loc=45, scale=8, size=10) # Siswa kesulitan

all_grades = np.concatenate([excellent_students, good_students, 
                             average_students, struggling_students])

# Clip values between 0-100
all_grades = np.clip(all_grades, 0, 100)

# ==========================================

# Analisis
mean_grade = np.mean(all_grades)
median_grade = np.median(all_grades)
mode_grade = stats.mode(all_grades.round(), keepdims=True)[0][0]

# Count students by grade
grade_A = np.sum(all_grades >= 85)
grade_B = np.sum((all_grades >= 70) & (all_grades < 85))
grade_C = np.sum((all_grades >= 55) & (all_grades < 70))
grade_D = np.sum(all_grades < 55)

print("="*60)
print("üéì ANALISIS NILAI UJIAN KELAS")
print("="*60)
print(f"\nTotal Siswa: {len(all_grades)}")
print(f"\nNilai Rata-rata (Mean)  : {mean_grade:.2f}")
print(f"Nilai Tengah (Median)   : {median_grade:.2f}")
print(f"Nilai Tersering (Mode)  : {mode_grade:.2f}")

print(f"\n{'='*60}")
print("üìä DISTRIBUSI GRADE:")
print(f"{'='*60}")
print(f"Grade A (85-100): {grade_A:2d} siswa ({grade_A/len(all_grades)*100:5.1f}%)")
print(f"Grade B (70-84) : {grade_B:2d} siswa ({grade_B/len(all_grades)*100:5.1f}%)")
print(f"Grade C (55-69) : {grade_C:2d} siswa ({grade_C/len(all_grades)*100:5.1f}%)")
print(f"Grade D (<55)   : {grade_D:2d} siswa ({grade_D/len(all_grades)*100:5.1f}%)")

print(f"\n{'='*60}")
print("üí° REKOMENDASI GURU:")
print(f"{'='*60}")
if mean_grade >= 75:
    print("‚úÖ Performa kelas SANGAT BAIK!")
elif mean_grade >= 60:
    print("‚ö†Ô∏è  Performa kelas CUKUP, perlu perbaikan")
else:
    print("‚ùå Performa kelas KURANG, perlu intervensi serius")

print(f"\nüìå Passing Grade yang disarankan: {median_grade:.0f}")
print(f"   ‚Üí 50% siswa di atas nilai ini")
print(f"   ‚Üí {np.sum(all_grades >= median_grade)} siswa LULUS")
print(f"   ‚Üí {np.sum(all_grades < median_grade)} siswa REMEDIAL")
print("="*60)

In [None]:
# üìä Visualisasi Nilai Siswa
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Histogram
axes[0, 0].hist(all_grades, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[0, 0].axvline(mean_grade, color='red', linestyle='--', linewidth=2, 
                   label=f'Mean: {mean_grade:.2f}')
axes[0, 0].axvline(median_grade, color='green', linestyle='--', linewidth=2, 
                   label=f'Median: {median_grade:.2f}')
axes[0, 0].set_xlabel('Nilai')
axes[0, 0].set_ylabel('Jumlah Siswa')
axes[0, 0].set_title('Distribusi Nilai Ujian')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Box Plot
axes[0, 1].boxplot(all_grades, vert=True)
axes[0, 1].axhline(mean_grade, color='red', linestyle='--', linewidth=2, 
                   label=f'Mean: {mean_grade:.2f}')
axes[0, 1].axhline(median_grade, color='green', linestyle='--', linewidth=2, 
                   label=f'Median: {median_grade:.2f}')
axes[0, 1].set_ylabel('Nilai')
axes[0, 1].set_title('Box Plot Nilai')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# 3. Grade Distribution (Bar Chart)
grades = ['A\n(85-100)', 'B\n(70-84)', 'C\n(55-69)', 'D\n(<55)']
counts = [grade_A, grade_B, grade_C, grade_D]
colors_bar = ['green', 'lightgreen', 'yellow', 'red']
axes[1, 0].bar(grades, counts, color=colors_bar, alpha=0.7, edgecolor='black')
axes[1, 0].set_ylabel('Jumlah Siswa')
axes[1, 0].set_title('Distribusi Grade')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for i, (grade, count) in enumerate(zip(grades, counts)):
    axes[1, 0].text(i, count + 0.5, str(count), ha='center', va='bottom', fontweight='bold')

# 4. Cumulative Distribution
sorted_grades = np.sort(all_grades)
cumulative = np.arange(1, len(sorted_grades) + 1) / len(sorted_grades) * 100
axes[1, 1].plot(sorted_grades, cumulative, linewidth=2, color='blue')
axes[1, 1].axvline(median_grade, color='green', linestyle='--', linewidth=2, 
                   label=f'Median (50%): {median_grade:.2f}')
axes[1, 1].axhline(50, color='green', linestyle='--', linewidth=1, alpha=0.5)
axes[1, 1].set_xlabel('Nilai')
axes[1, 1].set_ylabel('Persentase Kumulatif (%)')
axes[1, 1].set_title('Distribusi Kumulatif')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üìù Summary & Key Takeaways

### ‚úÖ Yang Sudah Kita Pelajari:

1. **Mean (Rata-rata)**
   - Menggunakan semua data
   - Sensitif terhadap outlier
   - Cocok untuk data simetris

2. **Median (Nilai Tengah)**
   - Robust terhadap outlier
   - Cocok untuk data skewed
   - Nilai yang membagi data jadi 2 bagian sama

3. **Mode (Modus)**
   - Nilai yang paling sering muncul
   - Cocok untuk data kategorikal
   - Bisa ada lebih dari satu atau tidak ada sama sekali

### üéØ Kapan Menggunakan Apa?

| Kondisi | Gunakan |
|---------|----------|
| Data simetris, tidak ada outlier | **Mean** |
| Data skewed atau ada outlier | **Median** |
| Data kategorikal | **Mode** |
| Perlu analisis lanjutan | **Mean** |
| Perlu nilai yang robust | **Median** |

### üí° Tips Praktis:

1. **Selalu visualisasikan data** sebelum memilih ukuran tendensi sentral
2. **Cek outlier** menggunakan box plot
3. **Bandingkan Mean vs Median** untuk deteksi skewness
4. **Gunakan lebih dari satu ukuran** untuk gambaran lengkap
5. **Pertimbangkan konteks** bisnis/masalah yang dihadapi

---

## üöÄ Next Steps

Lanjut ke notebook berikutnya:
- [1.2 Measures of Dispersion](./02_dispersion.ipynb) - Pelajari Variance, Standard Deviation, Range, dll.

---

## üìö Further Reading

- [NumPy Statistical Functions](https://numpy.org/doc/stable/reference/routines.statistics.html)
- [SciPy Stats Module](https://docs.scipy.org/doc/scipy/reference/stats.html)
- [Khan Academy: Central Tendency](https://www.khanacademy.org/math/statistics-probability)

---

<div align="center">

**‚úÖ Selamat! Anda telah menyelesaikan modul Central Tendency!**

Made with ‚ù§Ô∏è by Statistics Enthusiasts

</div>