<a href="https://colab.research.google.com/github/Afix0/ZAPOCET_SU1/blob/main/mushroom_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project involves building a decision tree classifier to determine mushroom edibility based on various features. A decision tree classifier was chosen because it is intuitive, easy to visualize, and well-suited for datasets with categorical variables. Decision trees provide clear decision paths, making the classification process interpretable and transparent.

In [None]:
!pip install ucimlrepo


Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7


In [None]:
import pandas as pd
from ucimlrepo import fetch_ucirepo
import SU1_zapocet_functions as functions


mushroom = fetch_ucirepo(id=73)
mushroom_ = pd.DataFrame(data=mushroom.data.features)


mapping_targets = {'p': 'poisonous', 'e': 'eadible'}

mushroom_ = functions.preprocess_mushroom_data(mushroom_).astype(str)
targets = pd.DataFrame(data=mushroom.data.targets)
targets.columns = ['eadibility']
targets['eadibility'] = targets['eadibility'].map(mapping_targets)

mushroom_df = pd.concat([mushroom_, targets], axis=1)
mushroom_df.head()

Unnamed: 0,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,...,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat,eadibility
0,convex,smooth,brown,bruises,pungent,free,close,narrow,black,enlarging,...,white,white,partial,white,one,pendant,black,scattered,urban,poisonous
1,convex,smooth,yellow,bruises,almond,free,close,broad,black,enlarging,...,white,white,partial,white,one,pendant,brown,numerous,grasses,eadible
2,bell,smooth,white,bruises,anise,free,close,broad,brown,enlarging,...,white,white,partial,white,one,pendant,brown,numerous,meadows,eadible
3,convex,scaly,white,bruises,pungent,free,close,narrow,brown,enlarging,...,white,white,partial,white,one,pendant,black,scattered,urban,poisonous
4,convex,smooth,gray,no bruises,none,free,crowded,broad,black,tapering,...,white,white,partial,white,one,evanescent,brown,abundant,grasses,eadible


In [None]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import OneHotEncoder


#1) One-hot encoding
encoder = OneHotEncoder()
X = encoder.fit_transform(mushroom_df[['cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor', 'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color', 'stalk-shape', 'stalk-root', 'stalk-surface-above-ring', 'stalk-surface-below-ring', 'stalk-color-above-ring', 'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number', 'ring-type', 'spore-print-color', 'population', 'habitat' ]]).toarray()
y = mushroom_df['eadibility']
X_df = pd.DataFrame(X, columns=encoder.get_feature_names_out())

print(np.shape(X))
print(X[0])

(8124, 117)
[0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1.
 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


In [None]:
#2) Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

print(np.shape(X_train))
print(np.shape(X_test))

(5686, 117)
(2438, 117)


In [None]:
#3) Decision Tree Classifier creation
clf = DecisionTreeClassifier(max_depth= 4)

#4) Training the model
clf.fit(X_train, y_train)

print(export_text(clf, feature_names = list(X_df.columns)))


|--- odor_none <= 0.50
|   |--- stalk-root_club <= 0.50
|   |   |--- stalk-surface-below-ring_scaly <= 0.50
|   |   |   |--- odor_anise <= 0.50
|   |   |   |   |--- class: poisonous
|   |   |   |--- odor_anise >  0.50
|   |   |   |   |--- class: eadible
|   |   |--- stalk-surface-below-ring_scaly >  0.50
|   |   |   |--- class: eadible
|   |--- stalk-root_club >  0.50
|   |   |--- stalk-color-above-ring_cinnamon <= 0.50
|   |   |   |--- class: eadible
|   |   |--- stalk-color-above-ring_cinnamon >  0.50
|   |   |   |--- class: poisonous
|--- odor_none >  0.50
|   |--- spore-print-color_green <= 0.50
|   |   |--- stalk-surface-below-ring_scaly <= 0.50
|   |   |   |--- cap-surface_grooves <= 0.50
|   |   |   |   |--- class: eadible
|   |   |   |--- cap-surface_grooves >  0.50
|   |   |   |   |--- class: poisonous
|   |   |--- stalk-surface-below-ring_scaly >  0.50
|   |   |   |--- ring-number_two <= 0.50
|   |   |   |   |--- class: poisonous
|   |   |   |--- ring-number_two >  0.50
|   |

In [None]:
#5) Evaluating prredictions
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

df_pred = pd.DataFrame({'Predicted': y_pred, 'Actual': y_test})
print(df_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

      Predicted     Actual
380   poisonous  poisonous
3641    eadible    eadible
273     eadible    eadible
1029    eadible    eadible
684     eadible    eadible
...         ...        ...
520     eadible    eadible
36      eadible    eadible
7959  poisonous  poisonous
6520  poisonous  poisonous
6005  poisonous  poisonous

[2438 rows x 2 columns]
Accuracy: 99.14%


Achieving 99.14% accuracy with a decision tree of maximal depth 4 demonstrates that the classifier is highly effective and efficient for determining the edibility of mushrooms.