### 1. Import Library dan Membuat Dataset
Pertama-tama, kita mengimpor library yang diperlukan:
- **pandas** → untuk mengelola data dalam bentuk tabel.
- **scikit-learn** → untuk model Machine Learning (Naive Bayes, Logistic Regression, preprocessing, evaluasi).
  
Lalu kita masukkan dataset dari tabel yang sudah diberikan ke dalam bentuk `DataFrame`.


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Dataset training sesuai tabel
data = {
    "Jenis Kelamin": ["Laki-laki","Laki-laki","Perempuan","Perempuan","Laki-laki",
                      "Laki-laki","Perempuan","Perempuan","Laki-laki","Perempuan",
                      "Perempuan","Perempuan","Laki-laki","Laki-laki","Laki-laki"],
    "Status Mahasiswa": ["Mahasiswa","Bekerja","Mahasiswa","Mahasiswa","Bekerja",
                         "Bekerja","Bekerja","Bekerja","Bekerja","Mahasiswa",
                         "Mahasiswa","Mahasiswa","Bekerja","Mahasiswa","Mahasiswa"],
    "Status Pernikahan": ["Belum","Belum","Belum","Menikah","Menikah",
                          "Menikah","Menikah","Belum","Belum","Menikah",
                          "Belum","Belum","Menikah","Menikah","Belum"],
    "IPK": [3.17,3.30,3.01,3.25,3.20,2.50,3.00,2.70,2.40,2.50,
            2.50,3.50,3.30,3.25,2.30],
    "Status Kelulusan": ["Tepat","Tepat","Tepat","Tepat","Tepat",
                         "Terlambat","Terlambat","Terlambat","Terlambat","Terlambat",
                         "Terlambat","Tepat","Tepat","Tepat","Terlambat"]
}

df = pd.DataFrame(data)
df


Unnamed: 0,Jenis Kelamin,Status Mahasiswa,Status Pernikahan,IPK,Status Kelulusan
0,Laki-laki,Mahasiswa,Belum,3.17,Tepat
1,Laki-laki,Bekerja,Belum,3.3,Tepat
2,Perempuan,Mahasiswa,Belum,3.01,Tepat
3,Perempuan,Mahasiswa,Menikah,3.25,Tepat
4,Laki-laki,Bekerja,Menikah,3.2,Tepat
5,Laki-laki,Bekerja,Menikah,2.5,Terlambat
6,Perempuan,Bekerja,Menikah,3.0,Terlambat
7,Perempuan,Bekerja,Belum,2.7,Terlambat
8,Laki-laki,Bekerja,Belum,2.4,Terlambat
9,Perempuan,Mahasiswa,Menikah,2.5,Terlambat


### 2. Encoding Data Kategorikal
Model tidak bisa langsung membaca teks, sehingga kolom kategorikal perlu diubah menjadi angka.  
Contoh mapping:
- Jenis Kelamin: `Laki-laki=0`, `Perempuan=1`
- Status Mahasiswa: `Mahasiswa=0`, `Bekerja=1`
- Status Pernikahan: `Belum=0`, `Menikah=1`
- Status Kelulusan: `Tepat=1`, `Terlambat=0`


In [3]:
le = LabelEncoder()
for col in ["Jenis Kelamin","Status Mahasiswa","Status Pernikahan","Status Kelulusan"]:
    df[col] = le.fit_transform(df[col])

df

Unnamed: 0,Jenis Kelamin,Status Mahasiswa,Status Pernikahan,IPK,Status Kelulusan
0,0,1,0,3.17,0
1,0,0,0,3.3,0
2,1,1,0,3.01,0
3,1,1,1,3.25,0
4,0,0,1,3.2,0
5,0,0,1,2.5,1
6,1,0,1,3.0,1
7,1,0,0,2.7,1
8,0,0,0,2.4,1
9,1,1,1,2.5,1


### 3. Pisahkan Fitur (X) dan Label (y)
- **Fitur (X):** Jenis Kelamin, Status Mahasiswa, Status Pernikahan, IPK
- **Label (y):** Status Kelulusan


In [4]:
X = df.drop("Status Kelulusan", axis=1)
y = df["Status Kelulusan"]

# Split train dan test untuk evaluasi model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train


Unnamed: 0,Jenis Kelamin,Status Mahasiswa,Status Pernikahan,IPK
13,0,1,1,3.25
5,0,0,1,2.5
8,0,0,0,2.4
2,1,1,0,3.01
1,0,0,0,3.3
14,0,1,0,2.3
4,0,0,1,3.2
7,1,0,0,2.7
10,1,1,0,2.5
12,0,0,1,3.3


### 4. Model Naive Bayes
Naive Bayes menggunakan probabilitas (Teorema Bayes) dengan asumsi independensi antar fitur.


In [5]:
nb = GaussianNB()
nb.fit(X_train, y_train)

y_pred_nb = nb.predict(X_test)

print("=== Evaluasi Naive Bayes ===")
print(classification_report(y_test, y_pred_nb))

=== Evaluasi Naive Bayes ===
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         1

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3



### 5. Model Logistic Regression
Logistic Regression menggunakan fungsi sigmoid untuk memetakan input menjadi probabilitas klasifikasi (0 atau 1).


In [6]:
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

y_pred_log = log_reg.predict(X_test)

print("=== Evaluasi Logistic Regression ===")
print(classification_report(y_test, y_pred_log))

=== Evaluasi Logistic Regression ===
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         1

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3



### 6. Prediksi Data Testing
- **A:** Laki-laki, Mahasiswa, Belum, IPK=2.50  
- **B:** Perempuan, Bekerja, Belum, IPK=2.50  

Hasil prediksi akan menunjukkan apakah mahasiswa lulus **Tepat** atau **Terlambat**.


In [None]:
data_test = pd.DataFrame({
    "Jenis Kelamin": ["Laki-laki","Perempuan"],
    "Status Mahasiswa": ["Mahasiswa","Bekerja"],
    "Status Pernikahan": ["Belum","Belum"],
    "IPK": [2.50,2.50]
})

#Encode data test (menggunakan mapping yang sama)
for col in ["Jenis Kelamin","Status Mahasiswa","Status Pernikahan"]:
    data_test[col] = le.fit_transform(data_test[col])

print("Data Testing:\n", data_test)

print("\nPrediksi Naive Bayes:", nb.predict(data_test))
print("Prediksi Logistic Regression:", log_reg.predict(data_test))


#Mapping decoder
decode_status = {0: "Tepat", 1: "Terlambat"}

#Prediksi model
pred_nb = nb.predict(data_test)
pred_log = log_reg.predict(data_test)

#Decode hasil prediksi
decoded_nb = [decode_status[val] for val in pred_nb]
decoded_log = [decode_status[val] for val in pred_log]

print("Prediksi Naive Bayes:", decoded_nb)
print("Prediksi Logistic Regression:", decoded_log)


Data Testing:
    Jenis Kelamin  Status Mahasiswa  Status Pernikahan  IPK
0              0                 1                  0  2.5
1              1                 0                  0  2.5

Prediksi Naive Bayes: [1 1]
Prediksi Logistic Regression: [1 1]
Prediksi Naive Bayes: ['Terlambat', 'Terlambat']
Prediksi Logistic Regression: ['Terlambat', 'Terlambat']
