Atalov S. (TSI AUCA)

Introduction to Machine Learning

---

# Decision Trees

---

<h2>1. Problem</h2>

Can you use the data to help you identify which mushrooms can be eaten safely?

- Since not all mushrooms are edible, you'd like to be able to tell whether a given mushroom is edible or poisonous based on it's physical attributes
- You have some existing data that you can use for this task. 

<b>Note: The dataset used is for illustrative purposes only. It is not meant to be a guide on identifying edible mushrooms.</b>

<div>
    <img src = 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTCoVNJkh4Bi4k4cyl1uJrS_lN3m8_XEhyi-Q&usqp=CAU' width = '300'>
</div>

<h2>2. Dataset</h2>

You will start by loading the dataset for this task. The dataset you have collected is as follows:

| Cap Color | Stalk Shape | Solitary | Edible |
|:---------:|:-----------:|:--------:|:------:|
|   Brown   |   Tapering  |    Yes   |    1   |
|   Brown   |  Enlarging  |    Yes   |    1   |
|   Brown   |  Enlarging  |    No    |    0   |
|   Brown   |  Enlarging  |    No    |    0   |
|   Brown   |   Tapering  |    Yes   |    1   |
|    Red    |   Tapering  |    Yes   |    0   |
|    Red    |  Enlarging  |    No    |    0   |
|   Brown   |  Enlarging  |    Yes   |    1   |
|    Red    |   Tapering  |    No    |    1   |
|   Brown   |  Enlarging  |    No    |    0   |


-  You have 10 examples of mushrooms. For each example, you have
    - Three features
        - Cap Color (`Brown` or `Red`),
        - Stalk Shape (`Tapering (as in \/)` or `Enlarging (as in /\)`), and
        - Solitary (`Yes` or `No`)
    - Label
        - Edible (`1` indicating yes or `0` indicating poisonous)

For ease of implementation, we have one-hot encoded the features (turned them into 0 or 1 valued features)

| Brown Cap | Tapering Stalk Shape | Solitary | Edible |
|:---------:|:--------------------:|:--------:|:------:|
|     1     |           1          |     1    |    1   |
|     1     |           0          |     1    |    1   |
|     1     |           0          |     0    |    0   |
|     1     |           0          |     0    |    0   |
|     1     |           1          |     1    |    1   |
|     0     |           1          |     1    |    0   |
|     0     |           0          |     0    |    0   |
|     1     |           0          |     1    |    1   |
|     0     |           1          |     0    |    1   |
|     1     |           0          |     0    |    0   |


Therefore,
- `X_train` contains three features for each example 
    - Brown Color (A value of `1` indicates "Brown" cap color and `0` indicates "Red" cap color)
    - Tapering Shape (A value of `1` indicates "Tapering Stalk Shape" and `0` indicates "Enlarging" stalk shape)
    - Solitary  (A value of `1` indicates "Yes" and `0` indicates "No")

- `y_train` is whether the mushroom is edible 
    - `y = 1` indicates edible
    - `y = 0` indicates poisonous

In [None]:
import numpy as np

In [None]:
X_train = np.array(
    [
        [1, 1, 1],
        [1, 0, 1],
        [1, 0, 0],
        [1, 0, 0],
        [1, 1, 1],
        [0, 1, 1],
        [0, 0, 0],
        [1, 0, 1],
        [0, 1, 0],
        [1, 0, 0],
    ]
)
y_train = np.array([1, 1, 0, 0, 1, 0, 0, 1, 1, 0])

In [None]:
print(X_train)

<h2> 3. Decision Tree </h2>

In [None]:
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt

#### [Link to sklearn documentation...](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)

In [None]:
model = DecisionTreeClassifier(criterion="entropy")
model.fit(X_train, y_train)
model.score(X_train, y_train)

In [None]:
from sklearn.tree import plot_tree

In [None]:
plot_tree(model);

In [None]:
fig = plt.figure(figsize=(15, 10))
_ = plot_tree(
    model,
    feature_names=["Brown Cap", "Tapering Stalk Shape", "Solitary"],
    class_names=["Edible", "Not edible"],
    filled=True,
)

<h2> 4. Making predictions </h2>

Make a prediction:
- Cap Color: Brown
- Stalk: Shape 
- Solitary: Yes
- Edible - ?