#### **Tugas 3**

Dengan menggunakan dataset **diabetes**, buatlah ensemble voting dengan algoritma

1. Logistic Regression
2. SVM kernel polynomial
3. Decission Tree

Anda boleh melakukan eksplorasi dengan melakukan tunning hyperparameter

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

In [4]:
diabetes_df = pd.read_csv('data/diabetes.csv')

diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [16]:
diabetes_df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [5]:
# Memisahkan fitur dan target
X = diabetes_df.drop('Outcome', axis=1)  # Asumsi 'Outcome' adalah kolom target
y = diabetes_df['Outcome']

In [6]:
# Membagi dataset menjadi data latih dan uji
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [7]:
# Standarisasi fitur
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

1. **Logistic Regression dengan Tuning Hyperparameter**

In [17]:
# Logistic Regression dengan GridSearchCV
param_grid_lr = {'C': [0.01, 0.1, 1, 10]}
lr = LogisticRegression(random_state=42)
grid_search_lr = GridSearchCV(lr, param_grid_lr, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_lr.fit(X_train, y_train)

# Parameter terbaik dan akurasi terbaik
best_params_lr = grid_search_lr.best_params_
best_accuracy_lr = grid_search_lr.best_score_

print(f"Parameter terbaik Logistic Regression: {best_params_lr}")
print(f"Akurasi terbaik Logistic Regression (CV): {best_accuracy_lr:.4f}")

Parameter terbaik Logistic Regression: {'C': 0.1}
Akurasi terbaik Logistic Regression (CV): 0.7784


2. **SVM (Kernel Polynomial) dengan Tuning Hyperparameter**

In [18]:
# SVM dengan kernel polynomial dan GridSearchCV
param_grid_svm = {'C': [0.1, 1, 10], 'degree': [2, 3, 4]}
svm = SVC(kernel='poly', random_state=42)
grid_search_svm = GridSearchCV(svm, param_grid_svm, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_svm.fit(X_train, y_train)

# Parameter terbaik dan akurasi terbaik
best_params_svm = grid_search_svm.best_params_
best_accuracy_svm = grid_search_svm.best_score_

print(f"Parameter terbaik SVM (Kernel Polynomial): {best_params_svm}")
print(f"Akurasi terbaik SVM (Kernel Polynomial) (CV): {best_accuracy_svm:.4f}")

Parameter terbaik SVM (Kernel Polynomial): {'C': 10, 'degree': 3}
Akurasi terbaik SVM (Kernel Polynomial) (CV): 0.7392


3. **Decision Tree dengan Tuning Hyperparameter**

In [19]:
# Decision Tree dengan GridSearchCV
param_grid_dt = {'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 10, 20]}
dt = DecisionTreeClassifier(random_state=42)
grid_search_dt = GridSearchCV(dt, param_grid_dt, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_dt.fit(X_train, y_train)

# Parameter terbaik dan akurasi terbaik
best_params_dt = grid_search_dt.best_params_
best_accuracy_dt = grid_search_dt.best_score_

print(f"Parameter terbaik Decision Tree: {best_params_dt}")
print(f"Akurasi terbaik Decision Tree (CV): {best_accuracy_dt:.4f}")

Parameter terbaik Decision Tree: {'max_depth': None, 'min_samples_split': 20}
Akurasi terbaik Decision Tree (CV): 0.7429


* Ensemble Voting Classifier

In [20]:
# Membuat Voting Classifier dengan model terbaik dari GridSearch
voting_clf = VotingClassifier(estimators=[
    ('lr', grid_search_lr.best_estimator_), 
    ('svm', grid_search_svm.best_estimator_), 
    ('dt', grid_search_dt.best_estimator_)
], voting='hard')


In [21]:
# Melatih Voting Classifier
voting_clf.fit(X_train, y_train)

In [22]:
# Prediksi dan evaluasi
y_pred = voting_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Akurasi Voting Classifier pada data uji: {accuracy:.4f}')

Akurasi Voting Classifier pada data uji: 0.7403


* Variabel best_log_reg, best_svm_poly, dan best_dt diambil dari hasil tuning GridSearchCV untuk masing-masing algoritma (Logistic Regression, SVM, dan Decision Tree).

In [23]:
# Ambil model terbaik dari GridSearchCV
best_log_reg = grid_search_lr.best_estimator_
best_svm_poly = grid_search_svm.best_estimator_
best_dt = grid_search_dt.best_estimator_

# Buat Voting Classifier menggunakan model yang sudah ada
voting_clf = VotingClassifier(estimators=[
    ('log_reg', best_log_reg), 
    ('svm_poly', best_svm_poly),
    ('dt', best_dt)  # Model ketiga yang sudah dilatih
], voting='hard')  # Majority voting

# Latih Voting Classifier
voting_clf.fit(X_train, y_train)

# Prediksi label pada data uji
y_pred_voting = voting_clf.predict(X_test)

# Menghitung akurasi pada data pelatihan
y_train_pred_voting = voting_clf.predict(X_train)
acc_train_voting = accuracy_score(y_train, y_train_pred_voting)

# Menghitung akurasi pada data pengujian
acc_test_voting = accuracy_score(y_test, y_pred_voting)

# Print hasil evaluasi
print(f'Accuracy on train (Voting Classifier): {acc_train_voting:.2f}')
print(f'Accuracy on test (Voting Classifier): {acc_test_voting:.2f}')

Accuracy on train (Voting Classifier): 0.84
Accuracy on test (Voting Classifier): 0.74
