## **Tugas Praktikum**

#### **Tugas 1**

Terdapat dataset **mushroom**. Berdasarkan dataset yang tersebut, bandingkan peforma antara algoritma Decision Tree dan RandomForest. Gunakan tunning hyperparameter untuk mendapatkan parameter dan akurasi yang terbaik.

#### **Tugas 2**

Terdapat dataset **mushroom**. Berdasarkan dataset tersebut, bandingkan peforma antara algoritma Decision Tree dan AdaBoost. Gunakan tunning hyperparameter untuk mendapatkan parameter dan akurasi yang terbaik.

#### **Tugas 3**

Dengan menggunakan dataset **diabetes**, buatlah ensemble voting dengan algoritma
1. Logistic Regression
2. SVM kernel polynomial
3. Decission Tree
Anda boleh melakukan eksplorasi dengan melakukan tunning hyperparameter

## **Jawaban**

1. **Tugas 1**

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Gantilah path dengan lokasi penyimpanan dataset mushroom
data = pd.read_csv("mushrooms.csv")

# Membagi dataset menjadi atribut (X) dan label (y)
X = data.drop('class', axis=1)
y = data['class']

# Menggunakan one-hot encoding untuk mengubah atribut kategorikal menjadi numerik
X = pd.get_dummies(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Membuat model Decision Tree
decision_tree = DecisionTreeClassifier()

# Definisikan daftar hyperparameter yang akan dituning
param_grid_decision_tree = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Inisialisasi GridSearchCV
grid_search_decision_tree = GridSearchCV(estimator=decision_tree, param_grid=param_grid_decision_tree, 
                                         cv=5, n_jobs=-1, verbose=2)

# Melatih model Decision Tree dengan parameter terbaik
grid_search_decision_tree.fit(X_train, y_train)

# Mendapatkan parameter terbaik
best_decision_tree = grid_search_decision_tree.best_estimator_

# Mencetak parameter terbaik
print("Parameter terbaik untuk Decision Tree:")
print(grid_search_decision_tree.best_params_)

# Membuat model RandomForest
random_forest = RandomForestClassifier()

# Definisikan daftar hyperparameter yang akan dituning
param_grid_random_forest = {
    'n_estimators': [100, 200, 300],
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Inisialisasi GridSearchCV
grid_search_random_forest = GridSearchCV(estimator=random_forest, param_grid=param_grid_random_forest, 
                                         cv=5, n_jobs=-1, verbose=2)

# Melatih model RandomForest dengan parameter terbaik
grid_search_random_forest.fit(X_train, y_train)

# Mendapatkan parameter terbaik
best_random_forest = grid_search_random_forest.best_estimator_

# Mencetak parameter terbaik
print("\nParameter terbaik untuk RandomForest:")
print(grid_search_random_forest.best_params_)


# Memprediksi label dengan model Decision Tree
y_pred_decision_tree = best_decision_tree.predict(X_test)

# Menghitung akurasi model Decision Tree
accuracy_decision_tree = accuracy_score(y_test, y_pred_decision_tree)
print(f"Akurasi Decision Tree: {accuracy_decision_tree:.2f}")

# Memprediksi label dengan model RandomForest
y_pred_random_forest = best_random_forest.predict(X_test)

# Menghitung akurasi model RandomForest
accuracy_random_forest = accuracy_score(y_test, y_pred_random_forest)
print(f"Akurasi RandomForest: {accuracy_random_forest:.2f}")

Fitting 5 folds for each of 108 candidates, totalling 540 fits
Parameter terbaik untuk Decision Tree:
{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2}
Fitting 5 folds for each of 324 candidates, totalling 1620 fits


2. **Tugas 2**

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV

# Langkah 2: Memuat dataset
data = pd.read_csv("mushrooms.csv")
column_names = ["class", "cap-shape", "cap-surface", "cap-color", "bruises", "odor", "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape", "stalk-root", "stalk-surface-above-ring", "stalk-surface-below-ring", "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color", "ring-number", "ring-type", "spore-print-color", "population", "habitat"]

# Langkah 3: Preprocessing data
label_encoder = LabelEncoder()
for col in data.columns:
    data[col] = label_encoder.fit_transform(data[col])

X = data.drop("class", axis=1)
y = data["class"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Langkah 4: Membangun model Decision Tree dan AdaBoost
decision_tree = DecisionTreeClassifier(random_state=42)
adaboost = AdaBoostClassifier(base_estimator=decision_tree, random_state=42)

# Langkah 5: Penyetelan hyperparameter
param_grid = {
    "base_estimator__max_depth": [1, 2, 3, 4],
    "n_estimators": [50, 100, 200]
}

grid_search = GridSearchCV(adaboost, param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_adaboost = grid_search.best_estimator_

# Langkah 6: Melatih model Decision Tree
decision_tree.fit(X_train, y_train)

# Langkah 7: Mengukur kinerja model
y_pred_decision_tree = decision_tree.predict(X_test)
y_pred_adaboost = best_adaboost.predict(X_test)

accuracy_decision_tree = accuracy_score(y_test, y_pred_decision_tree)
accuracy_adaboost = accuracy_score(y_test, y_pred_adaboost)

# Langkah 8: Membandingkan hasil
print(f"Akurasi Decision Tree: {accuracy_decision_tree:.2f}")
print(f"Akurasi AdaBoost: {accuracy_adaboost:.2f}")



Akurasi Decision Tree: 1.00
Akurasi AdaBoost: 1.00


3. **Tugas 3**

In [4]:
# Import library yang diperlukan
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Load dataset diabetes (pastikan Anda sudah memiliki dataset ini)
# Gantilah "nama_dataset.csv" dengan nama file dataset yang Anda miliki
diabetes_data = pd.read_csv("diabetes.csv")

# Pisahkan fitur (X) dan target (y)
X = diabetes_data.drop(columns=["Outcome"])
y = diabetes_data["Outcome"]

# Bagi dataset menjadi data latih dan data uji
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Inisialisasi model-model yang akan digunakan
logistic_regression = LogisticRegression(random_state=42)
svm_poly = SVC(kernel='poly', degree=3, random_state=42)
decision_tree = DecisionTreeClassifier(random_state=42)

# Buat ensemble voting classifier
ensemble_classifier = VotingClassifier(estimators=[('lr', logistic_regression),
                                                   ('svm', svm_poly),
                                                   ('dt', decision_tree)],
                                       voting='hard')  # 'hard' voting untuk klasifikasi

# Latih ensemble classifier pada data latih
ensemble_classifier.fit(X_train, y_train)

# Prediksi dengan ensemble classifier
y_pred = ensemble_classifier.predict(X_test)

# Evaluasi kinerja ensemble classifier
accuracy = accuracy_score(y_test, y_pred)
print("Akurasi Ensemble Classifier:", accuracy)


Akurasi Ensemble Classifier: 0.7727272727272727


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
