# Tugas Kelompok V

Anggota Kelompok:
- Alfandi Wijaya
- Alvern Brainard
- Petra Gamma Setya Agatha

**Import Library yang digunakan**

Jika tabluate belum terinstall bisa di nyalakan fungsinya

In [1]:
import pandas as pd
import numpy as np
# Pastikan 'tabulate' sudah ter-install
%pip install tabulate 
from sklearn.model_selection import (
    train_test_split 
)
from sklearn.preprocessing import (
    StandardScaler,
    MinMaxScaler,
    RobustScaler,
    OneHotEncoder,
    OrdinalEncoder,
    LabelEncoder
)
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score # 

# --- Impor 9 Model ---
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (
    RandomForestClassifier,
    AdaBoostClassifier,      
    GradientBoostingClassifier 
)
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 


DEPRECATION: Loading egg at c:\python311\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330

[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


**Membaca File CSV**

In [2]:
file_path = 'TARP.csv'
df_raw = pd.read_csv(file_path)

**Mendefinisikan kolom yang akan digunakan**

In [5]:
numeric_features = [
    'Soil Moisture',
    'Temperature',
    ' Soil Humidity',
    'Air temperature (C)',
    'Wind speed (Km/h)',
    'Air humidity (%)',
    'Wind gust (Km/h)',
    'Pressure (KPa)',
    'rainfall'
]

kolom_target = 'Status'

**Membersihkan dan memisahkan X dan Y**

In [6]:
# Membersihkan dan Memisahkan X / y 
kolom_yang_dipakai = numeric_features + [kolom_target]
df = df_raw[kolom_yang_dipakai].copy()


# Menghapus baris dengan nilai kosong
df.dropna(inplace=True)

# Encoding Target (y)
le = LabelEncoder()
df['Status'] = le.fit_transform(df[kolom_target])

# Memisahkan X dan y
X = df.drop(columns=[kolom_target, 'Status'])
y = df['Status']

**Split Data Train dan Test**

In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
print(f"Data dibagi: {X_train.shape[0]} train, {X_test.shape[0]} test.\n")

Data dibagi: 1760 train, 440 test.



**Mendefinisikan 7 Jenis tranformasi**

In [8]:
preprocessors = {}
preprocessors['1. No Transform (Num Only)'] = ColumnTransformer(
    [('num', 'passthrough', numeric_features)], remainder='drop'
)
preprocessors['2. OHE + StandardScaler'] = ColumnTransformer(
    [('num', StandardScaler(), numeric_features)],
    remainder='drop'
)
preprocessors['3. OHE + MinMaxScaler'] = ColumnTransformer(
    [('num', MinMaxScaler(), numeric_features)],
    remainder='drop'
)
preprocessors['4. OHE + RobustScaler'] = ColumnTransformer(
    [('num', RobustScaler(), numeric_features)],
    remainder='drop'
)
ord_encoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
preprocessors['5. ORD + StandardScaler'] = ColumnTransformer(
    [('num', StandardScaler(), numeric_features)],
    remainder='drop'
)
preprocessors['6. ORD + MinMaxScaler'] = ColumnTransformer(
    [('num', MinMaxScaler(), numeric_features)],
    remainder='drop'
)
preprocessors['7. ORD + RobustScaler'] = ColumnTransformer(
    [('num', RobustScaler(), numeric_features)],
    remainder='drop'
)

**Mendefinisikan 9 model yang akan digunakan**

In [9]:
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
    "KKN": KNeighborsClassifier(), # <-- KKN
    "SVM": SVC(kernel='linear', random_state=42),
    "Naive Bayes": GaussianNB(), # <-- Naive Bayes
    "Decision Tree": DecisionTreeClassifier(random_state=42),
    "Random Forest": RandomForestClassifier(random_state=42),
    "LDA": LinearDiscriminantAnalysis(), # <-- Linear Discriminant
    "AdaBoost": AdaBoostClassifier(random_state=42), # <-- AdaBoost
    "Gradient Boosting": GradientBoostingClassifier(random_state=42) # <-- Grad. Boost
}

**Melakukan Train dan Test Serta menghitung Akurasi**

In [10]:
results = []
print("Memulai evaluasi (metode train/test split)...")

for model_name, model in models.items():
    for scenario_name, preprocessor in preprocessors.items():
        
        pipe = Pipeline(steps=[
            ('preprocessor', preprocessor),
            ('classifier', model)
        ])
    
        pipe.fit(X_train, y_train)
        
        y_pred = pipe.predict(X_test)
        
        # 3. Hitung Akurasi (ACCURACY_SCORE)
        acc = accuracy_score(y_test, y_pred)
        
        results.append({
            "Model": model_name,
            "Skenario Transformasi": scenario_name,
            "Akurasi": acc
        })

print("Evaluasi selesai.")

Memulai evaluasi (metode train/test split)...
Evaluasi selesai.


**Menampilkan Hasil Perbandingan Akurasi**

In [11]:
results_df = pd.DataFrame(results)

# Buat tabel pivot: Baris = Model, Kolom = Transformasi
pivot_df = results_df.pivot_table(
    index='Model',
    columns='Skenario Transformasi',
    values='Akurasi'
)

# Mengurutkan kolom agar sesuai urutan skenario
column_order = [
    '1. No Transform (Num Only)',
    '2. OHE + StandardScaler',
    '3. OHE + MinMaxScaler',
    '4. OHE + RobustScaler',
    '5. ORD + StandardScaler',
    '6. ORD + MinMaxScaler',
    '7. ORD + RobustScaler'
]
pivot_df = pivot_df[column_order]

# Mengurutkan baris agar sesuai urutan model
model_order = models.keys()
pivot_df = pivot_df.reindex(model_order)

# Memformat angka menjadi persentase
pivot_df_formatted = pivot_df.applymap(lambda x: f"{x*100:.2f}%")

# --- 9. Menampilkan Hasil ---
print("\n" + "="*70)
print("Tabel Perbandingan Akurasi (Metode .fit / .predict / accuracy_score)")
print("="*70 + "\n")
print(pivot_df_formatted.to_markdown(numalign="left", stralign="left"))



Tabel Perbandingan Akurasi (Metode .fit / .predict / accuracy_score)

| Model               | 1. No Transform (Num Only)   | 2. OHE + StandardScaler   | 3. OHE + MinMaxScaler   | 4. OHE + RobustScaler   | 5. ORD + StandardScaler   | 6. ORD + MinMaxScaler   | 7. ORD + RobustScaler   |
|:--------------------|:-----------------------------|:--------------------------|:------------------------|:------------------------|:--------------------------|:------------------------|:------------------------|
| Logistic Regression | 69.55%                       | 69.77%                    | 70.23%                  | 69.77%                  | 69.77%                    | 70.23%                  | 69.77%                  |
| KKN                 | 64.55%                       | 65.00%                    | 65.45%                  | 65.23%                  | 65.00%                    | 65.45%                  | 65.23%                  |
| SVM                 | 70.00%                       | 69.77%        

  pivot_df_formatted = pivot_df.applymap(lambda x: f"{x*100:.2f}%")


**Memprediksi Calom peserta Baru**

In [18]:
# Memilih OHE + RobustScaler dan Logistic Regression
model_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessors['3. OHE + MinMaxScaler']),
    ('classifier', models['Logistic Regression'])
])
# Melatih model pada data latih
model_pipeline.fit(X_train, y_train)

data_pendaftar_baru = pd.DataFrame({
    # Pendaftar 1: Skor tinggi, S2, LN, pakai LoA
    'Soil Moisture': [50.35],
    'Temperature': [20.0],
    ' Soil Humidity':[90.0],
    'Air temperature (C)': [27.0],
    'Wind speed (Km/h)' : [10.0],
    'Air humidity (%)': [55.0],
    'Wind gust (Km/h)': [15.0],
    'Pressure (KPa)': [101.3],
    'rainfall' : [5.0]
})

# Gunakan pipeline yang sudah dilatih untuk memprediksi data baru
prediksi = model_pipeline.predict(data_pendaftar_baru)

# Hasil prediksi (0 = Tidak, 1 = Ya)
hasil_prediksi_jelas = le.inverse_transform(prediksi)

print(f"Data Pendaftar Baru:\n{data_pendaftar_baru.to_markdown(index=False)}\n")
print(f"Hasil Prediksi (1=Lulus, 0=Tidak): {prediksi[0]}")
print(f"Hasil Prediksi (Jelas): {hasil_prediksi_jelas[0]}")

Data Pendaftar Baru:
|   Soil Moisture |   Temperature |    Soil Humidity |   Air temperature (C) |   Wind speed (Km/h) |   Air humidity (%) |   Wind gust (Km/h) |   Pressure (KPa) |   rainfall |
|----------------:|--------------:|-----------------:|----------------------:|--------------------:|-------------------:|-------------------:|-----------------:|-----------:|
|           50.35 |            20 |               90 |                    27 |                  10 |                 55 |                 15 |            101.3 |          5 |

Hasil Prediksi (1=Lulus, 0=Tidak): 0
Hasil Prediksi (Jelas): OFF
