# **Penting**
- Pastikan Anda melakukan Run All sebelum mengirimkan submission untuk memastikan seluruh cell berjalan dengan baik.
- Hapus simbol pagar (#) jika Anda menerapkan kriteria tambahan
- Biarkan simbol pagar (#) jika Anda tidak menerapkan kriteria tambahan

# **1. Import Library**
Pada tahap ini, Anda perlu mengimpor beberapa pustaka (library) Python yang dibutuhkan untuk analisis data dan pembangunan model machine learning.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import joblib
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, classification_report

# **2. Memuat Dataset dari Hasil Clustering**
Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

In [2]:
data = pd.read_csv('data_clustering_inverse.csv')

In [3]:
# Tampilkan 5 baris pertama dengan function head.
data.head()

Unnamed: 0,TransactionAmount,TransactionDate,TransactionType,Location,Channel,CustomerAge,CustomerOccupation,TransactionDuration,LoginAttempts,AccountBalance,PreviousTransactionDate,AgeGroup,AmountGroup,Target,PCA1,PCA2
0,14.09,2023-04-11 16:29:14,Debit,San Diego,ATM,70.0,Doctor,81.0,1.0,5112.21,2024-11-04 08:08:08,Lansia,Kecil,0,1.14814,-0.431389
1,376.24,2023-06-27 16:44:19,Debit,Houston,ATM,68.0,Doctor,141.0,1.0,13758.91,2024-11-04 08:09:35,Lansia,Besar,3,2.424262,0.168088
2,126.29,2023-07-10 18:16:08,Debit,Mesa,Online,19.0,Student,56.0,1.0,1122.35,2024-11-04 08:07:04,Remaja,Sedang,1,-1.614817,0.212256
3,184.5,2023-05-05 16:32:11,Debit,Raleigh,Online,26.0,Student,25.0,1.0,8569.06,2024-11-04 08:09:06,Dewasa Muda,Sedang,3,0.035823,0.710394
4,13.45,2023-10-16 17:51:24,Credit,Atlanta,Online,45.0,Student,198.0,1.0,7429.4,2024-11-04 08:06:39,Paruh Baya,Kecil,3,0.419299,-1.592963


# **3. Data Splitting**
Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

In [6]:
# Pisahkan fitur dan target
X = data.drop(columns=['Target'])
y = data['Target']

# Encoding untuk kolom kategorikal
categorical_cols = X.select_dtypes(include=['object']).columns
for col in categorical_cols:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Cek hasil
print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)

Train shape: (1829, 15)
Test shape: (458, 15)


# **4. Membangun Model Klasifikasi**
Setelah memilih algoritma klasifikasi yang sesuai, langkah selanjutnya adalah melatih model menggunakan data latih.

Berikut adalah rekomendasi tahapannya.
1. Menggunakan algoritma klasifikasi yaitu Decision Tree.
2. Latih model menggunakan data yang sudah dipisah.

In [7]:
dt = DecisionTreeClassifier().fit(X_train, y_train)
print("Model training selesai.")

Model training selesai.


In [8]:
# Inisialisasi model Decision Tree
model_dt = DecisionTreeClassifier(random_state=42)

# Latih model dengan data training
model_dt.fit(X_train, y_train)

# Lakukan prediksi pada data test
y_pred = model_dt.predict(X_test)

# Evaluasi model
print("=== Decision Tree Evaluation ===")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred, average='weighted'))
print("Recall:", recall_score(y_test, y_pred, average='weighted'))
print("F1-Score:", f1_score(y_test, y_pred, average='weighted'))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

=== Decision Tree Evaluation ===
Accuracy: 0.9366812227074236
Precision: 0.9370187396579431
Recall: 0.9366812227074236
F1-Score: 0.9367523579854936

Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.93      0.93       143
           1       0.96      0.96      0.96       127
           2       0.95      0.92      0.94        89
           3       0.90      0.93      0.92        99

    accuracy                           0.94       458
   macro avg       0.94      0.94      0.94       458
weighted avg       0.94      0.94      0.94       458



In [9]:
joblib.dump(model_dt, 'decision_tree_model.h5')

['decision_tree_model.h5']

# **5. Memenuhi Kriteria Skilled dan Advanced dalam Membangun Model Klasifikasi**



**Biarkan kosong jika tidak menerapkan kriteria skilled atau advanced**

In [10]:
# Menggunakan Algoritma Random Forest
model_rf = RandomForestClassifier(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)
y_pred_rf = model_rf.predict(X_test)

In [11]:
print("=== Random Forest ===")
print(classification_report(y_test, y_pred_rf))

=== Random Forest ===
              precision    recall  f1-score   support

           0       0.96      0.94      0.95       143
           1       0.98      0.98      0.98       127
           2       0.98      0.97      0.97        89
           3       0.93      0.95      0.94        99

    accuracy                           0.96       458
   macro avg       0.96      0.96      0.96       458
weighted avg       0.96      0.96      0.96       458



In [12]:
joblib.dump(model_rf, 'explore_RandomForest_classification.h5')

['explore_RandomForest_classification.h5']

Hyperparameter Tuning Model

Pilih salah satu algoritma yang ingin Anda tuning

In [13]:
# Parameter grid untuk tuning
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

# Grid Search
grid_search = GridSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

# Latih model
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits


In [14]:
# Model terbaik hasil tuning
best_rf = grid_search.best_estimator_

# Prediksi ulang
y_pred_best = best_rf.predict(X_test)

# Evaluasi
print("=== Random Forest (Tuned) ===")
print(classification_report(y_test, y_pred_best))

=== Random Forest (Tuned) ===
              precision    recall  f1-score   support

           0       0.92      0.94      0.93       143
           1       0.98      0.98      0.98       127
           2       0.95      0.92      0.94        89
           3       0.93      0.92      0.92        99

    accuracy                           0.95       458
   macro avg       0.95      0.94      0.94       458
weighted avg       0.95      0.95      0.95       458



In [15]:
joblib.dump(model_dt, 'tuning_classification.h5')

['tuning_classification.h5']