# TA-11 — Implementasi ANN untuk Deteksi Penipuan Transaksi (Fraud)
**Dataset:** `creditcard.csv` (Kaggle Credit Card Fraud)  
**Target:** `Class` (0 = normal, 1 = fraud)  
**Fitur yang dipakai (sesuai desain TP-11: 3 neuron input):** `Time`, `Amount`, `V14`  
**Arsitektur (sesuai TP-11):** Input(3) → Dense(5, ReLU) → Dense(1, Sigmoid)

> Catatan: dataset ini *imbalanced* (fraud sangat sedikit), jadi evaluasi jangan hanya akurasi.


In [None]:
# (Opsional) Jalankan jika library belum terpasang
# !pip -q install pandas numpy scikit-learn matplotlib tensorflow joblib


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

import joblib


In [None]:
# 1) Load Data
df = pd.read_csv("creditcard.csv")
df.head()


In [None]:
# EDA singkat
print("Shape:", df.shape)
print("\nKolom:", df.columns.tolist())

print("\nCek missing values (NaN) per kolom:")
print(df.isna().sum().sort_values(ascending=False).head(10))

print("\nDistribusi target (Class):")
print(df["Class"].value_counts())


In [None]:
# Visualisasi jumlah kelas (imbalanced)
counts = df["Class"].value_counts().sort_index()
plt.figure()
plt.bar(["0 (Normal)", "1 (Fraud)"], counts.values)
plt.title("Distribusi Kelas (Imbalanced)")
plt.ylabel("Jumlah transaksi")
plt.show()


## 2) Preprocessing (Encoding & Scaling)
- Dataset ini sudah numerik (tidak butuh encoding kategori).
- Kita pakai 3 fitur saja agar konsisten dengan desain TP-11 (Input Layer = 3 neuron).
- Scaling **WAJIB** untuk ANN (sesuai modul).

In [None]:
# Pilih 3 fitur sesuai TP-11
FEATURES = ["Time", "Amount", "V14"]
TARGET = "Class"

X = df[FEATURES].copy()
y = df[TARGET].copy()

# Split train/test (stratify penting karena data tidak seimbang)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scaling (StandardScaler)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

print("Train shape:", X_train_scaled.shape)
print("Test shape :", X_test_scaled.shape)


## 3) Definisi Model (Arsitektur)
Sesuai rancangan TP-11:
- **Input:** 3 fitur  
- **Hidden Layer:** 1 layer, 5 neuron, aktivasi ReLU  
- **Output:** 1 neuron, aktivasi Sigmoid (klasifikasi biner)

In [None]:
tf.random.set_seed(42)

model = Sequential([
    Dense(5, activation="relu", input_shape=(3,)),  # hidden layer: 5 neuron
    Dense(1, activation="sigmoid")                  # output: sigmoid
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.summary()


## 4) Training Process
Kita gunakan **EarlyStopping** agar berhenti otomatis jika validasi tidak membaik.

In [None]:
early_stopping = EarlyStopping(
    monitor="val_loss",
    patience=5,
    restore_best_weights=True
)

history = model.fit(
    X_train_scaled, y_train,
    validation_split=0.2,
    epochs=50,
    batch_size=32,
    callbacks=[early_stopping],
    verbose=1
)


## 5) Visualisasi Grafik (Loss/Accuracy)

In [None]:
history_df = pd.DataFrame(history.history)

# Plot Loss
plt.figure()
plt.plot(history_df["loss"], label="train_loss")
plt.plot(history_df["val_loss"], label="val_loss")
plt.title("Training vs Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

# Plot Accuracy
plt.figure()
plt.plot(history_df["accuracy"], label="train_accuracy")
plt.plot(history_df["val_accuracy"], label="val_accuracy")
plt.title("Training vs Validation Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.show()


## 6) Evaluasi Akhir (Test Set)
Karena data imbalanced, perhatikan **recall** untuk kelas fraud (1).

In [None]:
# Prediksi pada test set
y_pred_prob = model.predict(X_test_scaled).ravel()
y_pred = (y_pred_prob >= 0.5).astype(int)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, digits=4))


In [None]:
# Visualisasi confusion matrix sederhana
plt.figure()
plt.imshow(cm)
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.xticks([0,1], ["0","1"])
plt.yticks([0,1], ["0","1"])
for (i, j), v in np.ndenumerate(cm):
    plt.text(j, i, str(v), ha="center", va="center")
plt.colorbar()
plt.show()


## (Opsional) Simpan Model & Scaler
Agar bisa dipakai untuk deployment (misalnya di aplikasi Flask).

In [None]:
# Simpan scaler (sklearn) dan model (Keras)
joblib.dump(scaler, "scaler.joblib")
model.save("model.keras")

print("Tersimpan: scaler.joblib dan model.keras")


### Catatan untuk Laporan
- Jelaskan bahwa dataset fraud **sangat tidak seimbang** (imbalanced).
- Akurasi tinggi bisa menipu; fokus pada **Recall** kelas fraud (1).
- Jelaskan bahwa scaling dilakukan karena ANN sensitif terhadap skala fitur (sesuai modul).