# Prediksi Probabilitas dan Magnitudo Gempa di Indonesia Berdasarkan Pulau

## Tujuan
Notebook ini bertujuan untuk:
- Memprediksi **probabilitas terjadinya gempa bumi** di masing-masing pulau besar di Indonesia.
- Memprediksi **magnitudo rata-rata** gempa bumi berdasarkan fitur lokasi (latitude, longitude, depth).
- Membandingkan hasil prediksi dengan data historis sebagai validasi.

## Dataset
Data yang digunakan adalah `katalog_gempa.csv`, berisi data gempa dari BMKG dengan atribut:
- `tgl`, `lat`, `lon`, `depth`, `mag`, dan informasi lainnya.
- Data difokuskan pada tahun **2015 ke atas** untuk merepresentasikan kondisi terbaru.

## Metode
- **Random Forest Regressor** digunakan untuk memprediksi magnitudo gempa berdasarkan lokasi.
- **Random Forest Classifier** digunakan untuk memprediksi kemungkinan lokasi (pulau) berdasarkan fitur gempa.
- **K-Fold Cross Validation (k=10)** diterapkan untuk mengevaluasi performa beberapa model regresi: CART, C4.5, GBM, AdaBoost, dan Random Forest.

## Output
- Tabel yang menunjukkan:
  - Probabilitas gempa berdasarkan hasil model machine learning (`probability_model`).
  - Probabilitas historis dari frekuensi kejadian (`probability_historis`).
  - Rata-rata magnitudo prediksi (`avg_predicted_mag`) dan historis (`avg_mag`) per pulau.


1. Load dan Preprocessing Data

In [1]:
import pandas as pd
import numpy as np
from geopy.distance import geodesic

# Load data gempa
data = pd.read_csv('data/katalog_gempa.csv')  
data['tgl'] = pd.to_datetime(data['tgl'], errors='coerce')
data = data[data['tgl'].dt.year >= 2015]

# Hapus duplikat dan isi kosong
cols = ['lat', 'lon', 'depth', 'mag']
data.dropna(subset=cols, inplace=True)
data = data.drop_duplicates()

# Konversi numerik
for col in cols:
    data[col] = pd.to_numeric(data[col], errors='coerce')

print(data.shape)


(74038, 13)


2. Menentukan Pulau Berdasarkan Koordinat

In [2]:
# Data centroid provinsi
data_provinsi = [
    ['Aceh', 4.69513500, 96.74939930, 'Sumatera'],
    ['Sumatera Utara', 0.62469320, 123.97500180, 'Sumatera'],
    ['Sumatera Barat', -0.73993970, 100.80000510, 'Sumatera'],
    ['Riau', 0.29334690, 101.70682940, 'Sumatera'],
    ['Jambi', -1.61012290, 103.61312030, 'Sumatera'],
    ['Sumatera Selatan', -3.31943740, 103.91439900, 'Sumatera'],
    ['Bengkulu', -3.79284510, 102.26076410, 'Sumatera'],
    ['Lampung', -4.55858490, 105.40680790, 'Sumatera'],
    ['Kepulauan Bangka Belitung', -2.74105130, 106.44058720, 'Sumatera'],
    ['DKI Jakarta', -6.20876340, 106.84559900, 'Jawa'],
    ['Jawa Barat', -7.09091100, 107.66888700, 'Jawa'],
    ['Jawa Tengah', -7.15097500, 110.14025940, 'Jawa'],
    ['DI Yogyakarta', -7.87538490, 110.42620880, 'Jawa'],
    ['Jawa Timur', -7.53606390, 112.23840170, 'Jawa'],
    ['Banten', -6.40581720, 106.06401790, 'Jawa'],
    ['Bali', -8.34053890, 115.09195090, 'Bali'],
    ['Nusa Tenggara Barat', -8.65293340, 117.36164760, 'Nusa Tenggara'],
    ['Nusa Tenggara Timur', -8.65738190, 121.07937050, 'Nusa Tenggara'],
    ['Kalimantan Barat', 0.47734750, 106.61314050, 'Kalimantan'],
    ['Kalimantan Tengah', -1.68148780, 113.38235450, 'Kalimantan'],
    ['Kalimantan Selatan', -3.09264150, 115.28375850, 'Kalimantan'],
    ['Kalimantan Timur', 0.53865860, 116.41938900, 'Kalimantan'],
    ['Kalimantan Utara', 3.07309290, 116.04138890, 'Kalimantan'],
    ['Sulawesi Utara', 0.62469320, 123.97500180, 'Sulawesi'],
    ['Gorontalo', 0.54354420, 123.05676930, 'Sulawesi'],
    ['Sulawesi Tengah', -1.43002540, 121.44561790, 'Sulawesi'],
    ['Sulawesi Barat', -2.84413710, 119.23207840, 'Sulawesi'],
    ['Sulawesi Selatan', -3.66879940, 119.97405340, 'Sulawesi'],
    ['Sulawesi Tenggara', -4.14491000, 122.17460500, 'Sulawesi'],
    ['Maluku', -3.23846160, 130.14527340, 'Maluku'],
    ['Maluku Utara', 1.57099930, 127.80876930, 'Maluku'],
    ['Papua', -5.01222020, 141.34701590, 'Papua'],
    ['Papua Barat', -1.33611540, 133.17471620, 'Papua']
]

# Assign pulau ke tiap data gempa
df_prov = pd.DataFrame(data_provinsi, columns=['province', 'lat_prov', 'lon_prov', 'island'])
def cari_island(lat, lon):
    dists = (df_prov['lat_prov'] - lat)**2 + (df_prov['lon_prov'] - lon)**2
    return df_prov.loc[dists.idxmin(), 'island']
data['island'] = data.apply(lambda row: cari_island(row['lat'], row['lon']), axis=1)


3. Normalisasi Fitur dan Target

In [3]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler

fitur = ['lat', 'lon', 'depth']
scaler = StandardScaler()
normalizer = MinMaxScaler()

X_scaled = scaler.fit_transform(data[fitur])
X_norm = normalizer.fit_transform(X_scaled)
X = pd.DataFrame(X_norm, columns=fitur)
y = data['mag']

4. Evaluasi Model (K-Fold, Akurasi, NRMSE)

In [None]:
from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor
from sklearn.metrics import mean_squared_error

kf = KFold(n_splits=10, shuffle=True, random_state=42)
models = {
    'CART': DecisionTreeRegressor(random_state=42),
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'C4.5': DecisionTreeRegressor(random_state=42),
    'GBM': GradientBoostingRegressor(random_state=42),
    'AdaBoost': AdaBoostRegressor(random_state=42)
}

results = {}

for name, model in models.items():
    y_true_all, y_pred_all = [], []
    for train_idx, test_idx in kf.split(X):
        X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        y_true_all.extend(y_test)
        y_pred_all.extend(y_pred)
    mse = mean_squared_error(y_true_all, y_pred_all)
    rmse = np.sqrt(mse)
    nrmse = rmse / (y.max() - y.min())
    acc = (1 - nrmse) * 100
    results[name] = {'MSE': mse, 'RMSE': rmse, 'NRMSE': nrmse, 'Accuracy (%)': acc}

results_df = pd.DataFrame(results).T.sort_values(by='Accuracy (%)', ascending=False)
print("\nEvaluasi Model:")
print(results_df)

5. Gunakan Best Model (Random Forest) untuk Prediksi

In [None]:
best_model = RandomForestRegressor(n_estimators=100, random_state=42)
best_model.fit(X, y)
data['predicted_mag'] = best_model.predict(X)

6. Random Forest Classifier untuk Prediksi Pulau

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, data['island'], test_size=0.2, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

proba = clf.predict_proba(X_test)
pulau_labels = clf.classes_
prob_df = pd.DataFrame(proba, columns=pulau_labels)
avg_prob = prob_df.mean().sort_values(ascending=False)

7. Gabungan Hasil Akhir

In [None]:
# Historis
freq_df = data.groupby('island').agg(freq=('mag', 'count'), avg_mag=('mag', 'mean')).reset_index()
freq_df['probability_historis (%)'] = (freq_df['freq'] / freq_df['freq'].sum()) * 100

# Prediksi Magnitudo
pred_mag_df = data.groupby('island').agg(avg_predicted_mag=('predicted_mag', 'mean'), freq=('mag', 'count')).reset_index()
avg_prob_df = avg_prob.reset_index()
avg_prob_df.columns = ['island', 'probability_model']
avg_prob_df['probability_model (%)'] = avg_prob_df['probability_model'] * 100

# Gabungan Final
final_df = avg_prob_df.merge(pred_mag_df, on='island', how='left')
final_df = final_df.merge(freq_df[['island', 'avg_mag', 'probability_historis (%)']], on='island', how='left')
final_df = final_df[['island', 'probability_model (%)', 'probability_historis (%)', 'freq', 'avg_predicted_mag', 'avg_mag']]
final_df = final_df.sort_values(by='probability_model (%)', ascending=False)

# Tampilkan dan Simpan
import os
os.makedirs('outputs', exist_ok=True)
final_df.to_csv('outputs/probabilitas_dan_prediksi_magnitudo_per_pulau.csv', index=False)
print("\nHasil Akhir:")
print(final_df.to_string(index=False, float_format="%.2f"))


Hasil Akhir:
       island  probability_model (%)  probability_historis (%)  freq  avg_predicted_mag  avg_mag
       Maluku                  24.99                     24.77 18338               3.86     3.85
     Sulawesi                  18.91                     18.69 13836               3.20     3.20
Nusa Tenggara                  17.88                     18.33 13568               3.32     3.31
     Sumatera                  16.35                     16.21 12005               3.64     3.63
         Jawa                  10.44                     10.33  7647               3.50     3.49
        Papua                   6.65                      6.81  5041               3.80     3.79
         Bali                   4.64                      4.65  3440               3.14     3.13
   Kalimantan                   0.15                      0.22   163               4.16     4.19
