# **SML Project: Binary Tree Predictors**

*   **Author:** Matteo Onger
*   **Date:** October 2024

**Dataset documentation**:
*   [Secondary Mushroom](https://archive.ics.uci.edu/dataset/848/secondary+mushroom+dataset)

## VM Setup

In [1]:
# install dataset package
!pip install ucimlrepo

# download repository
!git clone -b dev https://github.com/MatteoOnger/SML_Project.git

# set working directory
%cd /content/SML_Project/

fatal: destination path 'SML_Project' already exists and is not an empty directory.
/content/SML_Project


## Code

In [2]:
# ---- LIBRARIES ----
import logging

from ucimlrepo import fetch_ucirepo

from bintreepredictor import BinTreePredictor
from data import DataSet

In [3]:
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", force=True)

In [12]:
mushroom = fetch_ucirepo(id=848)
mushroom_df = mushroom.data.original

In [5]:
#mushroom_df = mushroom_df.drop(columns=mushroom_df.columns[mushroom_df.isna().any()])

In [25]:
train_df = mushroom_df.sample(frac=0.8, random_state=0)
test_df = mushroom_df.drop(train_df.index)

train_ds = DataSet(train_df, "class")
test_ds = DataSet(test_df, "class")

train_df.head()

Unnamed: 0,class,cap-diameter,cap-shape,cap-surface,cap-color,does-bruise-or-bleed,gill-attachment,gill-spacing,gill-color,stem-height,...,stem-root,stem-surface,stem-color,veil-type,veil-color,has-ring,ring-type,spore-print-color,habitat,season
60661,e,6.02,o,,n,f,f,f,f,5.0,...,,,n,,,f,f,,d,s
23699,p,5.1,x,,b,f,x,,w,6.32,...,,,w,,,f,f,,d,a
60152,p,8.75,o,,e,f,f,f,f,3.15,...,,g,n,,,f,f,,d,s
57970,p,3.34,o,l,g,f,f,f,f,0.0,...,f,f,f,,,f,f,,d,a
47739,p,4.85,b,t,n,f,,,n,10.7,...,,,w,,,t,,k,g,a


In [22]:
tree = BinTreePredictor("zero-one", "mode", "entropy", "max_nodes", 50, max_thresholds=5)
train_err = tree.fit(train_ds)

print(f"accuracy:{1 - train_err}")

2024-09-19 17:50:33,373 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:0, feat:ring-type, threshold:z) - info_gain:0.0301
2024-09-19 17:50:37,051 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:2, feat:stem-color, threshold:w) - info_gain:0.0293
2024-09-19 17:50:41,377 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:6, feat:gill-attachment, threshold:p) - info_gain:0.0246
2024-09-19 17:50:45,451 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:14, feat:gill-color, threshold:w) - info_gain:0.0347
2024-09-19 17:50:49,616 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:13, feat:stem-root, threshold:c) - info_gain:0.0213
2024-09-19 17:50:54,311 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:30, feat:gill-color, threshold:y) - info_gain:0.0148
2024-09-19 17:50:58,410 - INFO - bintreepredictor - BinTreePredictor_id:0 - split:(leaf:28, feat:cap-surface, threshold:e) - info_gain:0.0141
2024-09-19 1

accuracy:0.8092313990379695


In [23]:
pred, test_err = tree.predict(test_ds)

print(f"accuracy:{1 - test_err}")

accuracy:0.8070247257245784
