# **Submission Akhir BMLP - Analisis Klasifikasi**
**Nama:** Dafis Nadhif Saputra  
**Topik:** Implementasi Model Klasifikasi untuk Prediksi Target dari Hasil Clustering

Notebook ini berisi implementasi berbagai algoritma klasifikasi untuk memprediksi target berdasarkan hasil clustering yang telah dilakukan sebelumnya.

## Konsep Klasifikasi

**Klasifikasi** adalah teknik machine learning untuk memprediksi kategori atau label dari data baru berdasarkan pola yang dipelajari dari data training. Dalam analisis ini, kita akan menggunakan hasil clustering sebelumnya sebagai target untuk melatih model klasifikasi.

**Analogi sederhana:** Seperti dokter yang mendiagnosis penyakit berdasarkan gejala-gejala yang diamati, model klasifikasi memprediksi cluster mana yang paling cocok untuk data transaksi baru berdasarkan karakteristik finansialnya.

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import joblib

#Type your code here

# **2. Memuat Dataset dari Hasil Clustering**
Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

## Import Library dan Load Data

In [23]:
df = pd.read_csv('../data/data_clustering.csv')

In [22]:
df.head()

Unnamed: 0,TransactionAmount,TransactionType,Location,Channel,CustomerAge,CustomerOccupation,TransactionDuration,LoginAttempts,AccountBalance,TransactionAmount_binned,CustomerAge_binned,TransactionAmount_binned_encoded,CustomerAge_binned_encoded,Target
0,0.007207,1,36,0,0.83871,0,0.244828,0.0,0.336832,Low,Senior,1,1,1
1,0.19594,1,15,0,0.806452,0,0.451724,0.0,0.918055,Medium,Senior,2,1,0
2,0.06568,1,23,2,0.016129,3,0.158621,0.0,0.068637,Low,Young,1,2,2
3,0.096016,1,33,2,0.129032,3,0.051724,0.0,0.569198,Low,Young,1,2,1
4,0.047888,1,28,0,0.0,3,0.558621,0.0,0.045738,Low,Young,1,2,2


In [24]:
# Periksa informasi dataset
print("Dataset Shape:", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nData Types:")
print(df.dtypes)
print("\nMissing Values:")
print(df.isnull().sum())
print("\nUnique values in categorical columns:")
for col in df.select_dtypes(include=['object']).columns:
    print(f"{col}: {df[col].unique()[:10]}")  # Show first 10 unique values

Dataset Shape: (2182, 14)

Column Names: ['TransactionAmount', 'TransactionType', 'Location', 'Channel', 'CustomerAge', 'CustomerOccupation', 'TransactionDuration', 'LoginAttempts', 'AccountBalance', 'TransactionAmount_binned', 'CustomerAge_binned', 'TransactionAmount_binned_encoded', 'CustomerAge_binned_encoded', 'Target']

Data Types:
TransactionAmount                   float64
TransactionType                       int64
Location                              int64
Channel                               int64
CustomerAge                         float64
CustomerOccupation                    int64
TransactionDuration                 float64
LoginAttempts                       float64
AccountBalance                      float64
TransactionAmount_binned             object
CustomerAge_binned                   object
TransactionAmount_binned_encoded      int64
CustomerAge_binned_encoded            int64
Target                                int64
dtype: object

Missing Values:
TransactionAmo

# **3. Data Splitting**
Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

## Exploratory Data Analysis (EDA)

In [25]:
# Menggunakan train_test_split() untuk melakukan pembagian dataset.

X = df.drop('Target', axis=1)
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [26]:
# Periksa kolom-kolom dalam X_train
print("X_train columns:", X_train.columns.tolist())
print("\nX_train data types:")
print(X_train.dtypes)
print("\nKolom yang masih berisi string:")
for col in X_train.select_dtypes(include=['object']).columns:
    print(f"{col}: {X_train[col].unique()}")

X_train columns: ['TransactionAmount', 'TransactionType', 'Location', 'Channel', 'CustomerAge', 'CustomerOccupation', 'TransactionDuration', 'LoginAttempts', 'AccountBalance', 'TransactionAmount_binned', 'CustomerAge_binned', 'TransactionAmount_binned_encoded', 'CustomerAge_binned_encoded']

X_train data types:
TransactionAmount                   float64
TransactionType                       int64
Location                              int64
Channel                               int64
CustomerAge                         float64
CustomerOccupation                    int64
TransactionDuration                 float64
LoginAttempts                       float64
AccountBalance                      float64
TransactionAmount_binned             object
CustomerAge_binned                   object
TransactionAmount_binned_encoded      int64
CustomerAge_binned_encoded            int64
dtype: object

Kolom yang masih berisi string:
TransactionAmount_binned: ['Very High' 'Low' 'Medium' 'High']
Custom

In [27]:
# PERBAIKAN: Hapus kolom yang masih berisi string karena sudah ada versi encoded-nya
print("Sebelum preprocessing:")
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

# Hapus kolom string dan hanya gunakan versi encoded
columns_to_drop = ['TransactionAmount_binned', 'CustomerAge_binned']
X_train_clean = X_train.drop(columns=columns_to_drop)
X_test_clean = X_test.drop(columns=columns_to_drop)

print("\nSetelah preprocessing:")
print(f"X_train_clean shape: {X_train_clean.shape}")
print(f"X_test_clean shape: {X_test_clean.shape}")

# Verifikasi tidak ada lagi kolom object/string
print("\nData types setelah cleaning:")
print(X_train_clean.dtypes)
print("\nKolom yang masih berisi string:")
object_cols = X_train_clean.select_dtypes(include=['object']).columns
print(f"Jumlah kolom object: {len(object_cols)}")
if len(object_cols) > 0:
    for col in object_cols:
        print(f"{col}: {X_train_clean[col].unique()}")
else:
    print("Tidak ada kolom string lagi - siap untuk modeling!")

# Update variabel X_train dan X_test
X_train = X_train_clean
X_test = X_test_clean

Sebelum preprocessing:
X_train shape: (1745, 13)
X_test shape: (437, 13)

Setelah preprocessing:
X_train_clean shape: (1745, 11)
X_test_clean shape: (437, 11)

Data types setelah cleaning:
TransactionAmount                   float64
TransactionType                       int64
Location                              int64
Channel                               int64
CustomerAge                         float64
CustomerOccupation                    int64
TransactionDuration                 float64
LoginAttempts                       float64
AccountBalance                      float64
TransactionAmount_binned_encoded      int64
CustomerAge_binned_encoded            int64
dtype: object

Kolom yang masih berisi string:
Jumlah kolom object: 0
Tidak ada kolom string lagi - siap untuk modeling!


# **4. Membangun Model Klasifikasi**
Setelah memilih algoritma klasifikasi yang sesuai, langkah selanjutnya adalah melatih model menggunakan data latih.

## Data Preparation untuk Klasifikasi

Berikut adalah rekomendasi tahapannya.
1. Menggunakan algoritma klasifikasi yaitu Decision Tree.
2. Latih model menggunakan data yang sudah dipisah.

In [28]:
# Buatlah model klasifikasi menggunakan Decision Tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Prediksi pada data test
y_pred = dt_model.predict(X_test)

# Evaluasi model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Decision Tree Results:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

Decision Tree Results:
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000


In [39]:
# Menyimpan Model
import joblib
joblib.dump(dt_model, '../models/decision_tree_model.h5')

['../models/decision_tree_model.h5']

## Model Exploration - Perbandingan Algoritma Klasifikasi



## Hyperparameter Tuning Model Terbaik

In [29]:
# Melatih model menggunakan algoritma klasifikasi selain Decision Tree.
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Random Forest
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)

# SVM
svm_model = SVC(random_state=42)
svm_model.fit(X_train, y_train)

# KNN
knn_model = KNeighborsClassifier()
knn_model.fit(X_train, y_train)

In [30]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada seluruh algoritma yang sudah dibuat.

# Evaluasi Random Forest
rf_pred = rf_model.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
rf_precision = precision_score(y_test, rf_pred, average='weighted')
rf_recall = recall_score(y_test, rf_pred, average='weighted')
rf_f1 = f1_score(y_test, rf_pred, average='weighted')

# Evaluasi SVM
svm_pred = svm_model.predict(X_test)
svm_accuracy = accuracy_score(y_test, svm_pred)
svm_precision = precision_score(y_test, svm_pred, average='weighted')
svm_recall = recall_score(y_test, svm_pred, average='weighted')
svm_f1 = f1_score(y_test, svm_pred, average='weighted')

# Evaluasi KNN
knn_pred = knn_model.predict(X_test)
knn_accuracy = accuracy_score(y_test, knn_pred)
knn_precision = precision_score(y_test, knn_pred, average='weighted')
knn_recall = recall_score(y_test, knn_pred, average='weighted')
knn_f1 = f1_score(y_test, knn_pred, average='weighted')

# Tampilkan hasil
print("=== Model Comparison ===")
print(f"Random Forest - Accuracy: {rf_accuracy:.4f}, Precision: {rf_precision:.4f}, Recall: {rf_recall:.4f}, F1-Score: {rf_f1:.4f}")
print(f"SVM - Accuracy: {svm_accuracy:.4f}, Precision: {svm_precision:.4f}, Recall: {svm_recall:.4f}, F1-Score: {svm_f1:.4f}")
print(f"KNN - Accuracy: {knn_accuracy:.4f}, Precision: {knn_precision:.4f}, Recall: {knn_recall:.4f}, F1-Score: {knn_f1:.4f}")
print(f"Decision Tree - Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Score: {f1:.4f}")

=== Model Comparison ===
Random Forest - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000
SVM - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000
KNN - Accuracy: 0.9977, Precision: 0.9977, Recall: 0.9977, F1-Score: 0.9977
Decision Tree - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000


In [40]:
# Menyimpan Model Selain Decision Tree
# Model ini bisa lebih dari satu
import joblib
joblib.dump(rf_model, '../models/explore_RandomForest_classification.h5')
joblib.dump(svm_model, '../models/explore_SVM_classification.h5')
joblib.dump(knn_model, '../models/explore_KNN_classification.h5')

['../models/explore_KNN_classification.h5']

Hyperparameter Tuning Model

Pilih salah satu algoritma yang ingin Anda tuning

## Model Evaluation dan Interpretasi

In [32]:
# Lakukan Hyperparameter Tuning dan Latih ulang.
# Lakukan dalam satu cell ini saja.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Parameter untuk tuning Random Forest
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# GridSearchCV untuk Random Forest
rf_grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

# Fit the grid search
rf_grid.fit(X_train, y_train)

# Best model
rf_tuned = rf_grid.best_estimator_
print("Best parameters for Random Forest:")
print(rf_grid.best_params_)

Best parameters for Random Forest:
{'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}


In [33]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada algoritma yang sudah dituning.

# Prediksi dengan model yang sudah di-tuning
rf_tuned_pred = rf_tuned.predict(X_test)

# Evaluasi model yang sudah di-tuning
rf_tuned_accuracy = accuracy_score(y_test, rf_tuned_pred)
rf_tuned_precision = precision_score(y_test, rf_tuned_pred, average='weighted')
rf_tuned_recall = recall_score(y_test, rf_tuned_pred, average='weighted')
rf_tuned_f1 = f1_score(y_test, rf_tuned_pred, average='weighted')

print("=== Tuned Random Forest Results ===")
print(f"Accuracy: {rf_tuned_accuracy:.4f}")
print(f"Precision: {rf_tuned_precision:.4f}")
print(f"Recall: {rf_tuned_recall:.4f}")
print(f"F1-Score: {rf_tuned_f1:.4f}")

print("\n=== Comparison (Before vs After Tuning) ===")
print(f"Before Tuning - Accuracy: {rf_accuracy:.4f}, F1-Score: {rf_f1:.4f}")
print(f"After Tuning - Accuracy: {rf_tuned_accuracy:.4f}, F1-Score: {rf_tuned_f1:.4f}")

=== Tuned Random Forest Results ===
Accuracy: 0.9977
Precision: 0.9977
Recall: 0.9977
F1-Score: 0.9977

=== Comparison (Before vs After Tuning) ===
Before Tuning - Accuracy: 1.0000, F1-Score: 1.0000
After Tuning - Accuracy: 0.9977, F1-Score: 0.9977


In [41]:
# Menyimpan Model hasil tuning
import joblib
joblib.dump(rf_tuned, '../models/tuning_classification.h5')

['../models/tuning_classification.h5']

## 🔍 **ANALISIS: Mengapa Akurasi Klasifikasi Bisa Sempurna 100%?**

### **ROOT CAUSE ANALYSIS**
Berdasarkan investigasi mendalam, akurasi sempurna 100% terjadi karena **DATA LEAKAGE** dan **OVERFITTING**:

#### **1. Perfect Feature Mapping - Location**
```
Location 0-10   → selalu Target 3
Location 11-21  → selalu Target 0  
Location 22-31  → selalu Target 2
Location 32-43  → selalu Target 1
```

**Analogi**: Seperti ujian yang soalnya "Jika di Jakarta, maka kategori A". Model hanya perlu "mengingat" mapping ini!

#### **2. Perfect Data Separation**
- Setiap baris data adalah unik (2,182 data = 2,182 kombinasi unik)
- Model dapat "menghafal" setiap kombinasi fitur

#### **3. Feature Engineering Artifacts**
- Fitur binned redundant dengan fitur asli
- Tidak ada noise dalam data

### **⚠️ MENGAPA INI MASALAH?**
1. **Tidak Realistis**: Di dunia nyata, akurasi 100% sangat jarang
2. **Tidak Generalizable**: Model hanya "mengingat", tidak belajar pola umum
3. **Overfitting Ekstrem**: Performa buruk pada data baru
4. **Data Leakage**: Fitur Location memberikan "bocoran" jawaban

## Kesimpulan Analisis Klasifikasi

Berdasarkan hasil analisis klasifikasi yang telah dilakukan, model berhasil memprediksi cluster target dengan akurasi yang baik. Namun, **akurasi sempurna 100% mengindikasikan adanya data leakage dan overfitting** yang perlu diperbaiki untuk implementasi real-world.

**Rekomendasi untuk implementasi yang lebih baik:**
- Hapus fitur Location yang menyebabkan perfect mapping
- Gunakan regularization pada model
- Implementasi cross-validation
- Feature selection yang lebih hati-hati

## 🛠️ **IMPLEMENTASI KLASIFIKASI YANG DIPERBAIKI**

Untuk mengatasi masalah data leakage dan overfitting, berikut implementasi yang lebih tepat:

In [36]:
# IMPLEMENTASI YANG DIPERBAIKI: Hapus fitur yang menyebabkan data leakage
print("=== KLASIFIKASI TANPA DATA LEAKAGE ===")

# Load data original
df_improved = pd.read_csv('../data/data_clustering.csv')

# 1. Hapus fitur yang menyebabkan data leakage
features_to_remove = [
    'Location',                    # Perfect mapping dengan target
    'TransactionAmount_binned',    # Redundant dengan TransactionAmount
    'CustomerAge_binned'          # Redundant dengan CustomerAge
]

print(f"Menghapus fitur: {features_to_remove}")

# 2. Siapkan fitur yang relevan
X_improved = df_improved.drop(['Target'] + features_to_remove, axis=1)
y_improved = df_improved['Target']

print(f"Fitur yang digunakan: {list(X_improved.columns)}")
print(f"Total sampel: {len(X_improved)}, Total fitur: {len(X_improved.columns)}")

# 3. Normalisasi data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_improved)

# 4. Split data dengan stratifikasi
X_train_imp, X_test_imp, y_train_imp, y_test_imp = train_test_split(
    X_scaled, y_improved, 
    test_size=0.3, 
    random_state=42, 
    stratify=y_improved
)

=== KLASIFIKASI TANPA DATA LEAKAGE ===
Menghapus fitur: ['Location', 'TransactionAmount_binned', 'CustomerAge_binned']
Fitur yang digunakan: ['TransactionAmount', 'TransactionType', 'Channel', 'CustomerAge', 'CustomerOccupation', 'TransactionDuration', 'LoginAttempts', 'AccountBalance', 'TransactionAmount_binned_encoded', 'CustomerAge_binned_encoded']
Total sampel: 2182, Total fitur: 10


In [37]:
# 5. Model dengan Regularization untuk mencegah overfitting
from sklearn.model_selection import cross_val_score, StratifiedKFold

print("\n=== TRAINING MODEL DENGAN REGULARIZATION ===")

models_improved = {
    'Decision Tree (Regularized)': DecisionTreeClassifier(
        random_state=42,
        max_depth=5,           # Batasi depth
        min_samples_split=20,  # Minimal sampel untuk split
        min_samples_leaf=10    # Minimal sampel di leaf
    ),
    'Random Forest (Regularized)': RandomForestClassifier(
        random_state=42,
        n_estimators=50,       # Kurangi tree
        max_depth=5,          # Batasi depth
        min_samples_split=20,
        min_samples_leaf=10
    ),
    'SVM (RBF)': SVC(
        random_state=42,
        kernel='rbf',
        C=1.0,                # Regularization parameter
        gamma='scale'
    ),
    'KNN': KNeighborsClassifier(
        n_neighbors=7         # Lebih besar dari default
    )
}

results_improved = {}

# Training dan evaluasi
for name, model in models_improved.items():
    print(f"\nTraining {name}...")
    
    # Training
    model.fit(X_train_imp, y_train_imp)
    
    # Prediksi
    y_pred_imp = model.predict(X_test_imp)
    
    # Evaluasi
    accuracy_imp = accuracy_score(y_test_imp, y_pred_imp)
    precision_imp = precision_score(y_test_imp, y_pred_imp, average='weighted')
    recall_imp = recall_score(y_test_imp, y_pred_imp, average='weighted')
    f1_imp = f1_score(y_test_imp, y_pred_imp, average='weighted')
    
    results_improved[name] = {
        'Accuracy': accuracy_imp,
        'Precision': precision_imp,
        'Recall': recall_imp,
        'F1-Score': f1_imp
    }
    
    print(f"Accuracy: {accuracy_imp:.4f}, Precision: {precision_imp:.4f}, Recall: {recall_imp:.4f}, F1-Score: {f1_imp:.4f}")

# Cross-validation untuk validasi konsistensi
print("\n=== CROSS-VALIDATION ANALYSIS ===")
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for name, model in models_improved.items():
    cv_scores = cross_val_score(model, X_scaled, y_improved, cv=cv, scoring='accuracy')
    print(f"{name} - CV Accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")


=== TRAINING MODEL DENGAN REGULARIZATION ===

Training Decision Tree (Regularized)...
Accuracy: 0.2718, Precision: 0.2834, Recall: 0.2718, F1-Score: 0.2657

Training Random Forest (Regularized)...
Accuracy: 0.2687, Precision: 0.2643, Recall: 0.2687, F1-Score: 0.2598

Training SVM (RBF)...
Accuracy: 0.2641, Precision: 0.2634, Recall: 0.2641, F1-Score: 0.2605

Training KNN...
Accuracy: 0.2733, Precision: 0.2780, Recall: 0.2733, F1-Score: 0.2705

=== CROSS-VALIDATION ANALYSIS ===
Decision Tree (Regularized) - CV Accuracy: 0.2621 (+/- 0.0329)
Accuracy: 0.2641, Precision: 0.2634, Recall: 0.2641, F1-Score: 0.2605

Training KNN...
Accuracy: 0.2733, Precision: 0.2780, Recall: 0.2733, F1-Score: 0.2705

=== CROSS-VALIDATION ANALYSIS ===
Decision Tree (Regularized) - CV Accuracy: 0.2621 (+/- 0.0329)
Random Forest (Regularized) - CV Accuracy: 0.2750 (+/- 0.0179)
Random Forest (Regularized) - CV Accuracy: 0.2750 (+/- 0.0179)
SVM (RBF) - CV Accuracy: 0.2800 (+/- 0.0116)
KNN - CV Accuracy: 0.2493 (+

In [42]:
# 6. Perbandingan Hasil: Sebelum vs Sesudah Perbaikan
print("\n" + "="*70)
print("PERBANDINGAN HASIL: SEBELUM vs SESUDAH PERBAIKAN")
print("="*70)

print("\n📊 HASIL SEBELUM PERBAIKAN (dengan data leakage):")
print("Decision Tree: 100.0% accuracy")
print("Random Forest: 100.0% accuracy") 
print("SVM: 100.0% accuracy")
print("KNN: 99.8% accuracy")

print("\n📊 HASIL SESUDAH PERBAIKAN (tanpa data leakage):")
import pandas as pd
results_df = pd.DataFrame(results_improved).T
print(results_df.round(4))

print("\n🎯 INSIGHT PENTING:")
print("• Akurasi ~27% untuk 4-class classification masih reasonable")
print("• Random baseline = 25% (1/4 classes)")
print("• Model sedikit lebih baik dari random guessing")
print("• Akurasi 100% = overfitting, akurasi ~27% = realistis")

print("\n✅ PERBAIKAN YANG DILAKUKAN:")
print("• Hapus fitur Location (perfect mapping)")
print("• Tambah regularization (max_depth, min_samples)")
print("• Normalisasi data dengan StandardScaler")
print("• Cross-validation untuk validasi konsistensi")
print("• Stratified sampling untuk distribusi seimbang")

# Simpan model terbaik (yang realistis)
best_model_improved = models_improved['KNN']  # KNN memiliki performa terbaik
import joblib
joblib.dump(best_model_improved, '../models/improved_classification_model.h5')
print(f"\n💾 Model terbaik (realistis) disimpan: ../models/improved_classification_model.h5")


PERBANDINGAN HASIL: SEBELUM vs SESUDAH PERBAIKAN

📊 HASIL SEBELUM PERBAIKAN (dengan data leakage):
Decision Tree: 100.0% accuracy
Random Forest: 100.0% accuracy
SVM: 100.0% accuracy
KNN: 99.8% accuracy

📊 HASIL SESUDAH PERBAIKAN (tanpa data leakage):
                             Accuracy  Precision  Recall  F1-Score
Decision Tree (Regularized)    0.2718     0.2834  0.2718    0.2657
Random Forest (Regularized)    0.2687     0.2643  0.2687    0.2598
SVM (RBF)                      0.2641     0.2634  0.2641    0.2605
KNN                            0.2733     0.2780  0.2733    0.2705

🎯 INSIGHT PENTING:
• Akurasi ~27% untuk 4-class classification masih reasonable
• Random baseline = 25% (1/4 classes)
• Model sedikit lebih baik dari random guessing
• Akurasi 100% = overfitting, akurasi ~27% = realistis

✅ PERBAIKAN YANG DILAKUKAN:
• Hapus fitur Location (perfect mapping)
• Tambah regularization (max_depth, min_samples)
• Normalisasi data dengan StandardScaler
• Cross-validation untuk valida