# ðŸŽ“ Prediksi Kelulusan Mahasiswa (Dataset Iris)
Notebook ini menunjukkan langkah-langkah pembuatan model klasifikasi untuk memprediksi **kelulusan mahasiswa** berdasarkan dataset Iris yang disesuaikan. Proses mencakup EDA, preprocessing, training model, evaluasi, dan kesimpulan.

## 1. Import Library

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load Dataset
Dataset yang digunakan adalah *Iris dataset*, namun kita ubah konteksnya menjadi prediksi kelulusan.

In [3]:
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['kelulusan'] = iris.target

# Ubah label target menjadi kategori lulus/tidak lulus (simulasi)
df['kelulusan'] = df['kelulusan'].apply(lambda x: 'Lulus' if x != 0 else 'Tidak Lulus')

df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),kelulusan
0,5.1,3.5,1.4,0.2,Tidak Lulus
1,4.9,3.0,1.4,0.2,Tidak Lulus
2,4.7,3.2,1.3,0.2,Tidak Lulus
3,4.6,3.1,1.5,0.2,Tidak Lulus
4,5.0,3.6,1.4,0.2,Tidak Lulus


## 3. EDA (Exploratory Data Analysis)

In [4]:
df.info()
df.describe()
df['kelulusan'].value_counts()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   kelulusan          150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


kelulusan
Lulus          100
Tidak Lulus     50
Name: count, dtype: int64

## 4. Preprocessing Data

In [5]:
X = df.drop('kelulusan', axis=1)
y = df['kelulusan']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

## 5. Training Dua Model (Logistic Regression & Decision Tree)

In [6]:
# Model 1: Logistic Regression
log_model = LogisticRegression()
log_model.fit(X_train, y_train)
y_pred_log = log_model.predict(X_test)

# Model 2: Decision Tree
tree_model = DecisionTreeClassifier(random_state=42)
tree_model.fit(X_train, y_train)
y_pred_tree = tree_model.predict(X_test)

## 6. Evaluasi Model

In [7]:
def evaluate_model(name, y_test, y_pred):
    print(f"\n=== Evaluasi Model: {name} ===")
    print('Accuracy:', accuracy_score(y_test, y_pred))
    print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))
    print('Classification Report:\n', classification_report(y_test, y_pred))

evaluate_model('Logistic Regression', y_test, y_pred_log)
evaluate_model('Decision Tree', y_test, y_pred_tree)


=== Evaluasi Model: Logistic Regression ===
Accuracy: 1.0
Confusion Matrix:
 [[20  0]
 [ 0 10]]
Classification Report:
               precision    recall  f1-score   support

       Lulus       1.00      1.00      1.00        20
 Tidak Lulus       1.00      1.00      1.00        10

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


=== Evaluasi Model: Decision Tree ===
Accuracy: 1.0
Confusion Matrix:
 [[20  0]
 [ 0 10]]
Classification Report:
               precision    recall  f1-score   support

       Lulus       1.00      1.00      1.00        20
 Tidak Lulus       1.00      1.00      1.00        10

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## 7. Kesimpulan
Berdasarkan hasil evaluasi, model dengan **akurasi lebih tinggi** dapat dianggap lebih baik untuk prediksi kelulusan.
Biasanya Logistic Regression bekerja baik untuk data linear, sedangkan Decision Tree lebih fleksibel untuk data kompleks.

**Langkah Selanjutnya:**
- Tambahkan algoritma lain seperti KNN atau SVM.
- Lakukan tuning hyperparameter.
- Gunakan dataset nyata (misalnya data mahasiswa dengan nilai ujian, absensi, dan IPK).