# 15. Distribusi Sampling dan Teorema Limit Pusat

## Pengenalan Distribusi Sampling (Sampling Distribution)

Distribusi sampling adalah distribusi probabilitas dari suatu statistik yang dihitung dari sampel-sampel berukuran sama yang diambil dari populasi yang sama. Konsep ini fundamental dalam statistika inferensial.

### Konsep Dasar

**Sampling Distribution** adalah distribusi dari:
- Statistik sampel (mean, proporsi, variance, dll.)
- Yang dihitung dari semua kemungkinan sampel berukuran n
- Yang dapat diambil dari populasi tertentu

### Teorema Limit Pusat (Central Limit Theorem)

**Central Limit Theorem (CLT)** menyatakan bahwa:
- Untuk sampel berukuran besar (n ≥ 30)
- Dari populasi dengan mean μ dan variance σ²
- Distribusi sampling dari mean sampel akan mendekati distribusi normal
- Dengan mean = μ dan standard error = σ/√n

### Law of Large Numbers

**Law of Large Numbers** menyatakan bahwa:
- Ketika ukuran sampel meningkat
- Mean sampel akan semakin mendekati mean populasi
- Variabilitas mean sampel akan menurun

### Bootstrap Sampling

**Bootstrap** adalah teknik resampling yang:
- Menggunakan sampel yang ada sebagai "populasi"
- Mengambil sampel berulang dengan replacement
- Membuat distribusi sampling empiris


In [None]:
# Import libraries yang diperlukan
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import norm, uniform, expon, gamma
import warnings
warnings.filterwarnings('ignore')

# Set style untuk visualisasi
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries berhasil diimport!")


## 1. Demonstrasi Central Limit Theorem

Kita akan mendemonstrasikan CLT dengan berbagai distribusi populasi dan ukuran sampel yang berbeda.


In [None]:
def demonstrate_clt(population_dist, pop_params, sample_sizes, n_simulations=10000):
    """
    Demonstrasi Central Limit Theorem
    
    Parameters:
    population_dist: distribusi populasi (function)
    pop_params: parameter distribusi populasi
    sample_sizes: list ukuran sampel
    n_simulations: jumlah simulasi
    """
    fig, axes = plt.subplots(2, len(sample_sizes), figsize=(4*len(sample_sizes), 8))
    if len(sample_sizes) == 1:
        axes = axes.reshape(2, 1)
    
    # Generate population data
    np.random.seed(42)
    population_data = population_dist(**pop_params, size=100000)
    pop_mean = np.mean(population_data)
    pop_std = np.std(population_data)
    
    for i, n in enumerate(sample_sizes):
        # Simulate sampling
        sample_means = []
        for _ in range(n_simulations):
            sample = np.random.choice(population_data, size=n, replace=False)
            sample_means.append(np.mean(sample))
        
        sample_means = np.array(sample_means)
        
        # Plot population distribution
        ax1 = axes[0, i]
        ax1.hist(population_data, bins=50, density=True, alpha=0.7, color='skyblue', edgecolor='black')
        ax1.axvline(pop_mean, color='red', linestyle='--', linewidth=2, label=f'Pop Mean: {pop_mean:.2f}')
        ax1.set_title(f'Population Distribution\n(μ={pop_mean:.2f}, σ={pop_std:.2f})')
        ax1.set_xlabel('Value')
        ax1.set_ylabel('Density')
        ax1.legend()
        
        # Plot sampling distribution
        ax2 = axes[1, i]
        ax2.hist(sample_means, bins=50, density=True, alpha=0.7, color='lightgreen', edgecolor='black')
        
        # Theoretical normal distribution
        se = pop_std / np.sqrt(n)
        x = np.linspace(sample_means.min(), sample_means.max(), 100)
        theoretical = norm.pdf(x, pop_mean, se)
        ax2.plot(x, theoretical, 'r-', linewidth=2, label=f'Theoretical N(μ, σ/√n)')
        
        ax2.axvline(pop_mean, color='red', linestyle='--', linewidth=2, label=f'Pop Mean: {pop_mean:.2f}')
        ax2.set_title(f'Sampling Distribution (n={n})\nSE={se:.3f}')
        ax2.set_xlabel('Sample Mean')
        ax2.set_ylabel('Density')
        ax2.legend()
    
    plt.tight_layout()
    plt.show()
    
    return sample_means

# Test dengan distribusi uniform
print("=== CENTRAL LIMIT THEOREM DEMONSTRATION ===")
print("1. Distribusi Uniform (0, 10)")
uniform_means = demonstrate_clt(
    population_dist=np.random.uniform,
    pop_params={'low': 0, 'high': 10},
    sample_sizes=[5, 15, 30, 50]
)


In [None]:
# Test dengan distribusi eksponensial
print("\n2. Distribusi Eksponensial (λ=1)")
exponential_means = demonstrate_clt(
    population_dist=np.random.exponential,
    pop_params={'scale': 1},
    sample_sizes=[5, 15, 30, 50]
)

# Test dengan distribusi gamma
print("\n3. Distribusi Gamma (α=2, β=1)")
gamma_means = demonstrate_clt(
    population_dist=np.random.gamma,
    pop_params={'shape': 2, 'scale': 1},
    sample_sizes=[5, 15, 30, 50]
)


## 2. Law of Large Numbers

Demonstrasi bagaimana mean sampel semakin mendekati mean populasi seiring bertambahnya ukuran sampel.


In [None]:
def demonstrate_law_of_large_numbers(population_data, max_sample_size=1000, n_trials=100):
    """
    Demonstrasi Law of Large Numbers
    """
    np.random.seed(42)
    pop_mean = np.mean(population_data)
    
    sample_sizes = np.arange(10, max_sample_size + 1, 10)
    mean_errors = []
    std_errors = []
    
    for n in sample_sizes:
        trial_means = []
        for _ in range(n_trials):
            sample = np.random.choice(population_data, size=n, replace=False)
            trial_means.append(np.mean(sample))
        
        mean_error = np.mean(np.abs(np.array(trial_means) - pop_mean))
        std_error = np.std(trial_means)
        
        mean_errors.append(mean_error)
        std_errors.append(std_error)
    
    # Plot results
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Mean error vs sample size
    ax1.plot(sample_sizes, mean_errors, 'b-', linewidth=2, label='Mean Absolute Error')
    ax1.axhline(y=0, color='r', linestyle='--', alpha=0.7, label='Population Mean')
    ax1.set_xlabel('Sample Size (n)')
    ax1.set_ylabel('Mean Absolute Error')
    ax1.set_title('Law of Large Numbers: Mean Convergence')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Standard deviation vs sample size
    theoretical_se = np.std(population_data) / np.sqrt(sample_sizes)
    ax2.plot(sample_sizes, std_errors, 'g-', linewidth=2, label='Observed SE')
    ax2.plot(sample_sizes, theoretical_se, 'r--', linewidth=2, label='Theoretical SE (σ/√n)')
    ax2.set_xlabel('Sample Size (n)')
    ax2.set_ylabel('Standard Error')
    ax2.set_title('Standard Error Decrease')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return sample_sizes, mean_errors, std_errors

# Generate population data
np.random.seed(42)
population_data = np.random.normal(50, 15, 10000)

print("=== LAW OF LARGE NUMBERS DEMONSTRATION ===")
sample_sizes, mean_errors, std_errors = demonstrate_law_of_large_numbers(population_data)
