<a href="https://colab.research.google.com/github/MHAFIDR/Data-Mining-Klasifikasi/blob/main/Python-update-titanic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Proyek Klasifikasi Penumpang Titanic (KNN)\n",
    "\n",
    "Notebook ini disusun untuk memenuhi tugas akhir mata kuliah Data Mining.\n",
    "Tujuan proyek ini adalah memprediksi keselamatan penumpang kapal Titanic berdasarkan fitur seperti umur, kelas tiket, dan biaya, menggunakan algoritma **K-Nearest Neighbors (KNN)**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import Library\n",
    "Langkah pertama adalah memanggil semua pustaka (library) Python yang dibutuhkan:\n",
    "* **Pandas**: Untuk membaca dan mengolah data tabel.\n",
    "* **Numpy**: Untuk operasi matematika.\n",
    "* **Matplotlib & Seaborn**: Untuk membuat grafik visualisasi data.\n",
    "* **Sklearn**: Library utama untuk algoritma Machine Learning (KNN, splitting data, evaluasi)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.metrics import accuracy_score, classification_report"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Upload Dataset (Khusus Google Colab)\n",
    "Jalankan cell di bawah ini untuk mengunggah file `titanic_clean.csv` dari komputermu ke dalam sesi Google Colab ini."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import files\n",
    "\n",
    "print(\"=== Silakan Upload File 'titanic_clean.csv' ===\")\n",
    "uploaded = files.upload()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Load & Cek Data\n",
    "Membaca file CSV yang baru saja diupload ke dalam variabel DataFrame (`df`) dan menampilkan 5 baris pertama untuk memastikan data terbaca dengan benar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "\n",
    "# Membaca file yang diupload (mengambil nama file secara otomatis)\n",
    "# Jika dijalankan di lokal/VS Code, cukup gunakan: df = pd.read_csv('titanic_clean.csv')\n",
    "try:\n",
    "    filename = next(iter(uploaded))\n",
    "    df = pd.read_csv(io.BytesIO(uploaded[filename]))\n",
    "    print(f\"\\nBerhasil membaca file: {filename}\")\n",
    "except NameError:\n",
    "    # Fallback jika tidak menggunakan fitur upload colab (file sudah ada di folder)\n",
    "    df = pd.read_csv('titanic_clean.csv')\n",
    "    print(\"\\nMembaca file lokal: titanic_clean.csv\")\n",
    "\n",
    "print(\"=== 5 Baris Pertama Data Titanic ===\")\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Visualisasi Data\n",
    "Kita akan menggunakan Scatter Plot untuk melihat pola persebaran data. \n",
    "Grafik ini menghubungkan **Umur (Age)** dan **Harga Tiket (Fare)**, dengan warna titik yang membedakan antara penumpang yang **Selamat (1)** dan **Tidak Selamat (0)**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 6))\n",
    "sns.scatterplot(x='Age', y='Fare', hue='Survived', data=df, palette='seismic')\n",
    "plt.title('Sebaran Penumpang Titanic (Umur vs Harga Tiket)')\n",
    "plt.xlabel('Umur')\n",
    "plt.ylabel('Harga Tiket')\n",
    "plt.legend(title='Status (1=Selamat)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Splitting Data (Membagi Data)\n",
    "Tahap ini memisahkan data menjadi dua bagian:\n",
    "1.  **Fitur (X)**: Atribut data (Kelas, Gender, Umur, Keluarga, Harga).\n",
    "2.  **Target (y)**: Label yang ingin diprediksi (Survived).\n",
    "\n",
    "Selain itu, data juga dibagi menjadi **Data Latih (80%)** untuk melatih model, dan **Data Uji (20%)** untuk mengukur performa model nanti."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# X = Fitur (Hapus kolom target 'Survived')\n",
    "X = df.drop('Survived', axis=1)\n",
    "# y = Target (Hanya kolom 'Survived')\n",
    "y = df['Survived']\n",
    "\n",
    "# Bagi data: 80% Training, 20% Testing\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
    "\n",
    "print(f\"Jumlah Data Latih: {len(X_train)}\")\n",
    "print(f\"Jumlah Data Uji  : {len(X_test)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Training Model\n",
    "Kita melatih algoritma **K-Nearest Neighbors (KNN)**. Di sini kita menggunakan parameter `n_neighbors=5`, yang berarti model akan melihat 5 tetangga terdekat untuk menentukan klasifikasi."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inisialisasi model KNN dengan k=5\n",
    "model = KNeighborsClassifier(n_neighbors=5)\n",
    "\n",
    "# Latih model menggunakan data training\n",
    "model.fit(X_train, y_train)\n",
    "print(\"\\nModel Titanic berhasil dilatih!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Evaluasi Model\n",
    "Setelah dilatih, kita minta model untuk memprediksi data uji (`X_test`). Hasil prediksinya dibandingkan dengan kunci jawaban asli (`y_test`) untuk menghitung akurasi."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prediksi data testing\n",
    "y_pred = model.predict(X_test)\n",
    "\n",
    "# Hitung akurasi\n",
    "akurasi = accuracy_score(y_test, y_pred)\n",
    "print(f\"\\n=== AKURASI MODEL: {akurasi * 100:.2f}% ===\")\n",
    "\n",
    "# Tampilkan laporan klasifikasi detail\n",
    "print(\"\\nLaporan Klasifikasi:\")\n",
    "print(classification_report(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Simulasi Prediksi Manual\n",
    "Di sini kita mencoba menguji model dengan data buatan sendiri (data Jack dan Rose) untuk melihat bagaimana model memprediksi kasus spesifik berdasarkan pola yang sudah dipelajari."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\n=== Simulasi Prediksi: Jack vs Rose ===\")\n",
    "\n",
    "# Format Input Sesuai Kolom: [Pclass, Sex, Age, SibSp, Parch, Fare]\n",
    "# Keterangan: Sex 0 = Pria, 1 = Wanita\n",
    "\n",
    "# Data Jack: Kelas 3 (Ekonomi), Pria, Umur 20, Sendiri (0 sib, 0 par), Tiket Murah\n",
    "data_jack = [3, 0, 20, 0, 0, 7.5]\n",
    "\n",
    "# Data Rose: Kelas 1 (Eksekutif), Wanita, Umur 17, Bawa Ortu (1 par), Tiket Mahal\n",
    "data_rose = [1, 1, 17, 0, 1, 100.0]\n",
    "\n",
    "# Buat DataFrame baru untuk inputan ini\n",
    "input_baru = pd.DataFrame([data_jack, data_rose], columns=X.columns)\n",
    "\n",
    "# Lakukan prediksi\n",
    "hasil_prediksi = model.predict(input_baru)\n",
    "\n",
    "# Tampilkan hasil\n",
    "for i, nama in enumerate(['Jack', 'Rose']):\n",
    "    status = \"SELAMAT\" if hasil_prediksi[i] == 1 else \"TIDAK SELAMAT\"\n",
    "    print(f\"Prediksi untuk {nama}: {status}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined