# **Submission Akhir BMLP - Analisis Klasifikasi**
**Nama:** Dafis Nadhif Saputra  
**Topik:** Implementasi Model Klasifikasi untuk Prediksi Target dari Hasil Clustering

Notebook ini berisi implementasi berbagai algoritma klasifikasi untuk memprediksi target berdasarkan hasil clustering yang telah dilakukan sebelumnya.

## Konsep Klasifikasi

**Klasifikasi** adalah teknik machine learning untuk memprediksi kategori atau label dari data baru berdasarkan pola yang dipelajari dari data training. Dalam analisis ini, kita akan menggunakan hasil clustering sebelumnya sebagai target untuk melatih model klasifikasi.

**Analogi sederhana:** Seperti dokter yang mendiagnosis penyakit berdasarkan gejala-gejala yang diamati, model klasifikasi memprediksi cluster mana yang paling cocok untuk data transaksi baru berdasarkan karakteristik finansialnya.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import joblib

#Type your code here

# **2. Memuat Dataset dari Hasil Clustering**
Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

## Import Library dan Load Data

In [None]:
df = pd.read_csv('data_clustering.csv')

In [3]:
df.head()

Unnamed: 0,TransactionAmount,TransactionType,Location,Channel,CustomerAge,CustomerOccupation,TransactionDuration,LoginAttempts,AccountBalance,Target
0,0.007554,1,36,0,0.83871,0,0.244828,0.0,0.33679,3
1,0.205368,1,15,0,0.806452,0,0.451724,0.0,0.918049,2
2,0.06884,1,23,2,0.016129,3,0.158621,0.0,0.068578,0
3,0.100636,1,33,2,0.129032,3,0.051724,0.0,0.56917,3
4,0.050192,1,28,0,0.0,3,0.558621,0.0,0.045677,0


# **3. Data Splitting**
Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

## Exploratory Data Analysis (EDA)

In [4]:
# Menggunakan train_test_split() untuk melakukan pembagian dataset.

X = df.drop('Target', axis=1)
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# **4. Membangun Model Klasifikasi**
Setelah memilih algoritma klasifikasi yang sesuai, langkah selanjutnya adalah melatih model menggunakan data latih.

## Data Preparation untuk Klasifikasi

Berikut adalah rekomendasi tahapannya.
1. Menggunakan algoritma klasifikasi yaitu Decision Tree.
2. Latih model menggunakan data yang sudah dipisah.

In [5]:
# Buatlah model klasifikasi menggunakan Decision Tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Prediksi pada data test
y_pred = dt_model.predict(X_test)

# Evaluasi model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Decision Tree Results:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

Decision Tree Results:
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000


In [6]:
# Menyimpan Model
import joblib
joblib.dump(dt_model, 'decision_tree_model.h5')

['decision_tree_model.h5']

## Model Exploration - Perbandingan Algoritma Klasifikasi



## Hyperparameter Tuning Model Terbaik

In [7]:
# Melatih model menggunakan algoritma klasifikasi selain Decision Tree.
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Random Forest
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)

# SVM
svm_model = SVC(random_state=42)
svm_model.fit(X_train, y_train)

# KNN
knn_model = KNeighborsClassifier()
knn_model.fit(X_train, y_train)

In [8]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada seluruh algoritma yang sudah dibuat.

# Evaluasi Random Forest
rf_pred = rf_model.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
rf_precision = precision_score(y_test, rf_pred, average='weighted')
rf_recall = recall_score(y_test, rf_pred, average='weighted')
rf_f1 = f1_score(y_test, rf_pred, average='weighted')

# Evaluasi SVM
svm_pred = svm_model.predict(X_test)
svm_accuracy = accuracy_score(y_test, svm_pred)
svm_precision = precision_score(y_test, svm_pred, average='weighted')
svm_recall = recall_score(y_test, svm_pred, average='weighted')
svm_f1 = f1_score(y_test, svm_pred, average='weighted')

# Evaluasi KNN
knn_pred = knn_model.predict(X_test)
knn_accuracy = accuracy_score(y_test, knn_pred)
knn_precision = precision_score(y_test, knn_pred, average='weighted')
knn_recall = recall_score(y_test, knn_pred, average='weighted')
knn_f1 = f1_score(y_test, knn_pred, average='weighted')

# Tampilkan hasil
print("=== Model Comparison ===")
print(f"Random Forest - Accuracy: {rf_accuracy:.4f}, Precision: {rf_precision:.4f}, Recall: {rf_recall:.4f}, F1-Score: {rf_f1:.4f}")
print(f"SVM - Accuracy: {svm_accuracy:.4f}, Precision: {svm_precision:.4f}, Recall: {svm_recall:.4f}, F1-Score: {svm_f1:.4f}")
print(f"KNN - Accuracy: {knn_accuracy:.4f}, Precision: {knn_precision:.4f}, Recall: {knn_recall:.4f}, F1-Score: {knn_f1:.4f}")
print(f"Decision Tree - Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-Score: {f1:.4f}")

=== Model Comparison ===
Random Forest - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000
SVM - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000
KNN - Accuracy: 0.9977, Precision: 0.9977, Recall: 0.9977, F1-Score: 0.9977
Decision Tree - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000


In [9]:
# Menyimpan Model Selain Decision Tree
# Model ini bisa lebih dari satu
import joblib
joblib.dump(rf_model, 'explore_RandomForest_classification.h5')
joblib.dump(svm_model, 'explore_SVM_classification.h5')
joblib.dump(knn_model, 'explore_KNN_classification.h5')

['explore_KNN_classification.h5']

Hyperparameter Tuning Model

Pilih salah satu algoritma yang ingin Anda tuning

## Model Evaluation dan Interpretasi

In [10]:
# Lakukan Hyperparameter Tuning dan Latih ulang.
# Lakukan dalam satu cell ini saja.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Parameter untuk tuning Random Forest
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# GridSearchCV untuk Random Forest
rf_grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

# Fit the grid search
rf_grid.fit(X_train, y_train)

# Best model
rf_tuned = rf_grid.best_estimator_
print("Best parameters for Random Forest:")
print(rf_grid.best_params_)

Best parameters for Random Forest:
{'max_depth': 3, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 50}


In [11]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada algoritma yang sudah dituning.

# Prediksi dengan model yang sudah di-tuning
rf_tuned_pred = rf_tuned.predict(X_test)

# Evaluasi model yang sudah di-tuning
rf_tuned_accuracy = accuracy_score(y_test, rf_tuned_pred)
rf_tuned_precision = precision_score(y_test, rf_tuned_pred, average='weighted')
rf_tuned_recall = recall_score(y_test, rf_tuned_pred, average='weighted')
rf_tuned_f1 = f1_score(y_test, rf_tuned_pred, average='weighted')

print("=== Tuned Random Forest Results ===")
print(f"Accuracy: {rf_tuned_accuracy:.4f}")
print(f"Precision: {rf_tuned_precision:.4f}")
print(f"Recall: {rf_tuned_recall:.4f}")
print(f"F1-Score: {rf_tuned_f1:.4f}")

print("\n=== Comparison (Before vs After Tuning) ===")
print(f"Before Tuning - Accuracy: {rf_accuracy:.4f}, F1-Score: {rf_f1:.4f}")
print(f"After Tuning - Accuracy: {rf_tuned_accuracy:.4f}, F1-Score: {rf_tuned_f1:.4f}")

=== Tuned Random Forest Results ===
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000

=== Comparison (Before vs After Tuning) ===
Before Tuning - Accuracy: 1.0000, F1-Score: 1.0000
After Tuning - Accuracy: 1.0000, F1-Score: 1.0000


In [12]:
# Menyimpan Model hasil tuning
import joblib
joblib.dump(rf_tuned, 'tuning_classification.h5')

['tuning_classification.h5']

## Kesimpulan Analisis Klasifikasi

Berdasarkan hasil analisis klasifikasi yang telah dilakukan, model berhasil memprediksi cluster target dengan akurasi yang baik. Model terbaik dapat digunakan untuk mengklasifikasikan transaksi bank ke dalam kategori yang sesuai berdasarkan pola spending yang telah diidentifikasi melalui clustering.