# Mushroom Toxicity Classification Notebook

In this notebook, I'll be developing the model that be used (when given certain structured features) to classify whether or not a mushroom is toxic or not.

I'll be using Scikit-Learn, Matplotlib, NumPy, and Pandas for the most part.

As with other projects, I'll be using the following approach:

1. Problem Definition
2. Exploratory Data Analysis
3. Basic Modelling
4. Evaluation
5. Experimentation
6. Final Conclusions & Model Exportation

## Problem Definition

In a single statement of need:

> Given physical features of a mushroom, can we predict if that mushroom is toxic or not?

The data that will be used for this project is (Mushroom Classification)[https://www.kaggle.com/datasets/uciml/mushroom-classification] from Kaggle. The data's source is from the UCI Machine Learning respository — originally donated to UCI ML on April 27, 1987.

Normally, I'd include some sort of evaluation metric, such as "I will pursue this project if the proof of concept gets an initial accuracy of about 90%" or something along those lines, but I'm planning on pursuing this project either way so that's unnecessary. However:

> This model will have minimum of 50% accuracy.

This is calculated by thinking of a uniform chance model (poisonous or edible). Accuracy metric calculated by $ 0.5 * 0.48 + 0.5 * 0.52 = 0.5 $. On second look, this math looks kind of funky, so give me a day or so to remember proper statistics :)

### Features

All here so I don't have to jump back and forth between the website and this notebook.


**Attribute Information**

cap-shape<br>* bell=b<br>* conical=c<br>* convex=x<br>* flat=f<br>*  knobbed=k<br>* sunken=s

cap-surface<br>* fibrous=f<br>* grooves=g<br>* scaly=y<br>* smooth=s

cap-color<br>* brown=n<br>* buff=b<br>* cinnamon=c<br>* gray=g<br>* green=r<br>* pink=p<br>* purple=u<br>* red=e<br>* white=w<br>* yellow=y

bruises<br>* bruises=t<br>* no=f

odor<br>* almond=a<br>* anise=l<br>* creosote=c<br>* fishy=y<br>* foul=f<br>* musty=m<br>* none=n<br>* pungent=p<br>* spicy=s

gill-attachment<br>* attached=a<br>* descending=d<br>* free=f<br>* notched=n

gill-spacing<br>* close=c<br>* crowded=w<br>* distant=d

gill-size<br>* broad=b<br>* narrow=n

gill-color<br>* black=k<br>* brown=n<br>* buff=b<br>* chocolate=h<br>* gray=g<br>*  green=r<br>* orange=o<br>* pink=p<br>* purple=u<br>* red=e<br>* white=w<br>* yellow=y

stalk-shape<br>* enlarging=e<br>* tapering=t

stalk-root<br>* bulbous=b<br>* club=c<br>* cup=u<br>* equal=e<br>* rhizomorphs=z<br>* rooted=r<br>* missing=?

stalk-surface-above-ring<br>* fibrous=f<br>* scaly=y<br>* silky=k<br>* smooth=s

stalk-surface-below-ring<br>* fibrous=f<br>* scaly=y<br>* silky=k<br>* smooth=s

stalk-color-above-ring<br>* brown=n<br>* buff=b<br>* cinnamon=c<br>* gray=g<br>* orange=o<br>* pink=p<br>* red=e<br>* white=w<br>* yellow=y

stalk-color-below-ring<br>* brown=n<br>* buff=b<br>* cinnamon=c<br>* gray=g<br>* orange=o<br>* pink=p<br>* red=e<br>* white=w<br>* yellow=y

veil-type<br>* partial=p<br>* universal=u

veil-color<br>* brown=n<br>* orange=o<br>* white=w<br>* yellow=y

ring-number<br>* none=n<br>* one=o<br>* two=t

ring-type<br>* cobwebby=c<br>* evanescent=e<br>* flaring=f<br>* large=l<br>* none=n<br>* pendant=p<br>* sheathing=s<br>* zone=z

spore-print-color<br>* black=k<br>* brown=n<br>* buff=b<br>* chocolate=h<br>* green=r<br>* orange=o<br>* purple=u<br>* white=w<br>* yellow=y

population<br>* abundant=a<br>* clustered=c<br>* numerous=n<br>* scattered=s<br>* several=v<br>* solitary=y

habitat<br>* grasses=g<br>* leaves=l<br>* meadows=m<br>* paths=p<br>* urban=u<br>* waste=w<br>* woods=d

**classes**

* edible=e
* poisonous=p


### Exploring the Problem Domain

Mushrooms are divided (by class) into either *edible* or *poisonous*. Any mushrooms of inedibility (albeit not poisonous) or unknown toxicity has been sorted into the poisonous class.

#### By Features

**Habitat**

Very self-explanatory. Where the mushroom's habitat was at the time of data generation.

**Population**

How many mushrooms of this type are present in the immediate area?

**Ring Number/Type**

Refers to the annulus of the mushroom, or the ring-like or collar-like thing that some mushrooms possess on their stem/stipe. They are leftovers of the partial veil, which forms after it ruptures to throw out its spores. Number refers to the number of these rings, and type to how they look.

**Spore Print Colour**

By imprinting the gills of a mushroom onto a light (or dark) piece of paper, you can see the colour of its spores. That is what this feature refers to.

**Veil Type/Colour**

A veil is the thin membrane that covers the cap and stalk of an immature mushroom. Most mushrooms form a partial veil that ruptures once the fruiting body has matured. Type refers to universal or partial veil. Colour is self-expanatory.

**Stalk**



