# NIM : 2209106045
# Nama : Dustin Hessel Kopalit

# Klasifikasi Jamur dengan Machine Learning
Dataset: [Mushroom Classification (Kaggle)](https://www.kaggle.com/datasets/uciml/mushroom-classification)


## 1. Deskripsi Dataset
Dataset ini berisi fitur-fitur dari berbagai jenis jamur yang diklasifikasikan sebagai dapat dimakan (edible) atau beracun (poisonous). 
Semua fitur bersifat kategorikal. Target klasifikasi adalah kolom `class` yang bernilai:
- `e` = edible (dapat dimakan)
- `p` = poisonous (beracun)


In [1]:

import pandas as pd


data = "D:/SEMESTER 6/VISIKOM/New folder (2)/mushrooms.csv"
df = pd.read_csv(data)
df.head()


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g



## 2. Preprocessing dan Ekstraksi Fitur
Yang di gunakan hanya Encoding
- untuk mengubah semua kolom kategorikal menggunakan Label Encoding.



Alasan melakukan encoding

Model seperti Decision Tree, Naive Bayes, SVM, KNN, atau Random Forest tidak bisa menghitung jarak, probabilitas, atau informasi dari teks seperti 'x' atau 'p'.

In [2]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df_encoded = df.apply(le.fit_transform)
df_encoded.head()


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,1,5,2,4,1,6,1,0,1,4,...,2,7,7,0,2,1,4,2,3,5
1,0,5,2,9,1,0,1,0,0,4,...,2,7,7,0,2,1,4,3,2,1
2,0,0,2,8,1,3,1,0,0,5,...,2,7,7,0,2,1,4,3,2,3
3,1,5,3,8,1,6,1,0,1,5,...,2,7,7,0,2,1,4,2,3,5
4,0,5,2,3,0,5,1,1,0,4,...,2,7,7,0,2,1,0,3,0,1



### Membagi Data
Data dibagi menjadi 80% data latih dan 20% data uji.


In [3]:

from sklearn.model_selection import train_test_split

X = df_encoded.drop("class", axis=1)
y = df_encoded["class"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



## 3. Implementasi Model Machine Learning
Saya menggunakan beberapa model untuk klasifikasi:
- Decision Tree
- Random Forest
- Naïve Bayes


In [4]:

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, accuracy_score

# Decision Tree
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)

# Random Forest
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

# Naive Bayes
nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)

print("Decision Tree Accuracy:", accuracy_score(y_test, dt_pred))
print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("Naive Bayes Accuracy:", accuracy_score(y_test, nb_pred))


Decision Tree Accuracy: 1.0
Random Forest Accuracy: 1.0
Naive Bayes Accuracy: 0.9218461538461539



## 4. Evaluasi dan Contoh Hasil Klasifikasi
Berikut adalah hasil evaluasi model menggunakan metrik akurasi dan classification report.


In [5]:

print("Decision Tree Classification Report:\n", classification_report(y_test, dt_pred))
print("Random Forest Classification Report:\n", classification_report(y_test, rf_pred))
print("Naive Bayes Classification Report:\n", classification_report(y_test, nb_pred))


Decision Tree Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       843
           1       1.00      1.00      1.00       782

    accuracy                           1.00      1625
   macro avg       1.00      1.00      1.00      1625
weighted avg       1.00      1.00      1.00      1625

Random Forest Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       843
           1       1.00      1.00      1.00       782

    accuracy                           1.00      1625
   macro avg       1.00      1.00      1.00      1625
weighted avg       1.00      1.00      1.00      1625

Naive Bayes Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.91      0.92       843
           1       0.91      0.93      0.92       782

    accuracy                           0.92      1625
   macro avg    

In [6]:

for i in range(5):
    print(f"Data ke-{i+1}: Prediksi = {rf_pred[i]}, Label Asli = {y_test.iloc[i]}")


Data ke-1: Prediksi = 0, Label Asli = 0
Data ke-2: Prediksi = 1, Label Asli = 1
Data ke-3: Prediksi = 1, Label Asli = 1
Data ke-4: Prediksi = 0, Label Asli = 0
Data ke-5: Prediksi = 1, Label Asli = 1
