# Modifikasi Data + Klasifikasi KNN (Dengue Bangladesh)

Mengikuti pola **Tugas10.ipynb**: `load -> (modifikasi data) -> split -> StandardScaler -> KNN -> evaluasi -> input baru`.

- Dataset asli: `dataset.csv`
- Dataset hasil modifikasi: `dataset_klasifikasi_modified.csv` (kolom kategorikal sudah di-*encode* menjadi numerik)
- Target: `Outcome`


In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


In [None]:
# 1) Load dataset asli
df = pd.read_csv("dataset.csv")
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'dataset.csv'

In [None]:
# 2) Modifikasi data (sesuai gaya sederhana Tugas10)
# - Encode kolom kategorikal jadi numerik (LabelEncoder)
# - Hapus duplikat (jika ada)
# - Simpan hasil modifikasi ke CSV

df_mod = df.copy()

cat_cols = ["Gender", "Area", "AreaType", "HouseType", "District"]
encoders = {}

for col in cat_cols:
    le = LabelEncoder()
    df_mod[col] = le.fit_transform(df_mod[col].astype(str))
    encoders[col] = le

df_mod = df_mod.drop_duplicates().reset_index(drop=True)

# Simpan dataset modifikasi
df_mod.to_csv("dataset_klasifikasi_modified.csv", index=False)

df_mod.head()

In [None]:
# 3) Pisahkan fitur & target
X = df_mod.drop(columns=["Outcome"])
y = df_mod["Outcome"]

print("Jumlah data:", len(df_mod))
print("Shape X:", X.shape)
print("Shape y:", y.shape)

In [None]:
# 4) Split data latih & uji
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Data latih:", X_train.shape)
print("Data uji  :", X_test.shape)

In [None]:
# 5) StandardScaler (seperti Tugas10)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

X_train_scaled[:5]

In [None]:
# 6) Model KNN
k = 7
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_scaled, y_train)
print("Model trained âœ…")

In [None]:
# 7) Evaluasi
y_pred = knn.predict(X_test_scaled)

acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print("=== Evaluasi KNN ===")
print("K =", k)
print("Accuracy:", acc)
print("\nConfusion Matrix:\n", cm)
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# 8) Contoh input data baru (mirip Tugas10: input manual)
# Catatan: karena dataset sudah di-encode, input kategorikal harus berupa teks,
# lalu kita encode memakai encoder yang sama.

print("Masukan Data Pasien Baru")
gender = input("Gender (Male/Female): ")
age = int(input("Age: "))
ns1 = int(input("NS1 (0/1): "))
igg = int(input("IgG (0/1): "))
igm = int(input("IgM (0/1): "))
area = input("Area (contoh: Mirpur): ")
areatype = input("AreaType (Developed/Undeveloped): ")
housetype = input("HouseType (Building/Other): ")
district = input("District (Dhaka atau lainnya): ")

new_patient_raw = pd.DataFrame([{
    "Gender": gender,
    "Age": age,
    "NS1": ns1,
    "IgG": igg,
    "IgM": igm,
    "Area": area,
    "AreaType": areatype,
    "HouseType": housetype,
    "District": district
}])

# Encode input baru
new_patient = new_patient_raw.copy()
for col in cat_cols:
    # Jika ada kategori baru yang tidak ada di data latih, fallback sederhana:
    if new_patient[col].iloc[0] not in encoders[col].classes_:
        print(f"WARNING: kategori '{new_patient[col].iloc[0]}' tidak ada di data train untuk kolom {col}.")
        print("Menggunakan kategori pertama dari encoder sebagai fallback.")
        new_patient[col] = encoders[col].classes_[0]
    new_patient[col] = encoders[col].transform(new_patient[col].astype(str))

# Scale
new_patient_scaled = scaler.transform(new_patient)

pred = knn.predict(new_patient_scaled)[0]
proba = knn.predict_proba(new_patient_scaled)[0]

print("\n=== Prediksi Data Baru ===")
print("Input:", new_patient_raw.to_dict(orient="records")[0])
print("Prediksi Outcome:", int(pred))
print("Probabilitas [kelas0, kelas1]:", proba)