# Mushroom Classification
![](https://img.freepik.com/free-photo/mushrooms-black-stone-plate-with-brown-knitted-basket_23-2148320789.jpg)

A mushroom is the fleshy, spore-bearing fruiting body of a fungus, which grows above the ground on soil or its food source. It is known as the 'meat' of the vegetable world. Since they were discovered, slowly and gradually, mushrooms are now used extensively in cooking in many cuisines, notably Chinese, Korean, European, and Japanese.

Here in this notebook we are going to classify whether the Mushrooms are Poisonous or Edible.

**Load the important required libraries**

In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('dark')

import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, cross_validate
from sklearn.metrics import plot_confusion_matrix, confusion_matrix, accuracy_score
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn import metrics

**Let us load the data set**

In [None]:
mushroom= pd.read_csv('../input/mushroom-classification/mushrooms.csv')

## **Data Analysis On Mushroom Data Set**

**Checking first 5 and last 5 records from the datasets**

In [None]:
mushroom.head(5)

In [None]:
mushroom.tail(5)

**Let's check the duplicate data in data set**

In [None]:
mushroom.duplicated().sum()

In [None]:
mushroom.shape

In [None]:
mushroom.info()

In [None]:
mushroom.isnull().sum()

**So, there 8124 records in 23 columns. Also, there are no null records as well as duplicate values.**

### Explanation of the relevant features
<img src="https://lh3.googleusercontent.com/proxy/xGMXAoX689b03YHDt2eYvoRsqrXln50OQ_GvOpSCLTizCJWLP2AQcSyb8XEgvt54Pzh3XBfbzCgT71vajM0H-Tg_wyvuZ81sPep3tQJrq3qnW9UzRKxXCqoN9AlFD5XC4n8">

### Attribute Information: 

1. **classes:** 
edible = e, poisonous = p

2. **cap-shape:** 
bell = b, conical = c, convex = x, flat = f, knobbed = k, sunken = s

3. **cap-surface:**
fibrous = f, grooves = g, scaly = y, smooth = s

4. **cap-color:**
brown = n, buff = b, cinnamon = c, gray = g, green = r, pink = p, purple = u, red = e, white = w, yellow = y

5. **bruises:** 
yes = t, no = f

6. **odor:** 
almond = a, anise = l, creosote = c, fishy = y, foul = f, musty = m, none = n, pungent = p, spicy = s

7. **gill-attachment:** 
attached = a, descending = d, free = f, notched = n

8. **gill-spacing:** 
close = c, crowded = w, distant = d

9. **gill-size:** 
broad = b, narrow = n

10. **gill-color:** 
black = k, brown = n, buff = b, chocolate = h, gray = g, green = r, orange = o, pink = p, purple = u, red = e, white = w ,yellow = y

11. **stalk-shape:** 
enlarging = e, tapering = t

12. **stalk-root:**
bulbous = b, club = c, cup = u, equal = e, rhizomorphs = z, rooted = r, missing = ?

13. **stalk-surface-above-ring:** 
fibrous = f, scaly = y, silky = k, smooth = s

14. **stalk-surface-below-ring:** 
fibrous = f, scaly = y, silky = k, smooth = s

15. **stalk-color-above-ring:** 
brown = n, buff = b, cinnamon = c, gray = g, orange = o, pink = p, red = e, white = w, yellow = y

16. **stalk-color-below-ring:** 
brown = n, buff = b, cinnamon = c, gray = g, orange = o, pink = p, red = e, white = w, yellow = y

17. **veil-type:** 
partial = p, universal = u

18. **veil-color:** 
brown = n, orange = o, white = w, yellow = y

19. **ring-number:** 
none = n, one = o, two = t

20. **ring-type:** 
cobwebby = c, evanescent = e, flaring = f, large = l, none = n, pendant = p, sheathing = s, zone = z

21. **spore-print-color:** 
black = k, brown = n, buff = b, chocolate = h, green = r, orange = o,purple = u, white = w, yellow = y

22. **population:** 
abundant = a, clustered = c, numerous = n, scattered = s, several = v, solitary = y

23. **habitat:** 
grasses = g, leaves = l, meadows = m, paths = p, urban = u, waste = w, woods = d

## Exploratory Data Analysis

### Class

In [None]:
mushroom['class'].value_counts().to_frame()

In [None]:
plt.figure(figsize=(10,5))
plt.title('Mushrooms Poisonous v/s Edible', fontsize=14)
sns.countplot(x="class", data=mushroom, palette=('#9b111e','#50c878'))
plt.xlabel("Mushroom Type", fontsize=12)
plt.ylabel("Count", fontsize=12)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

#### Observations: 
1. There is no imbalance in class features.
2. Edible mushrooms are more than poisonous mushrooms in data set.

### Feature Analysis

* Here we'll see how our data used to perform a more precise feature selection in the modeling part.
* We will thus explore one feature at a time in order to determine its importance in predicting the class of mushroom.

### Cap Shape

In [None]:
mushroom.groupby(['cap-shape'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-shape'], ax=axarr[0], order=mushroom['cap-shape'].value_counts().index, palette="magma").set_title('Cap Shape Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Shape')
b = sns.countplot(x="cap-shape", data=mushroom, hue="class", palette=('#9b111e','#50c878'), order=mushroom['cap-shape'].value_counts().index, ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Convex(x) & flat(f) cap shaped mushrooms are more in dataset.
2. Bell(b) cap shape has more edible mushrooms.
3. Knobbed(k) cap shape has more poisonous mushroom.
4. Sunken(s) cap shape has only edible mushroom whereas Conical(c) cap shape has only poisonous mushrooms.

### Cap Surface

In [None]:
mushroom.groupby(['cap-surface'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-surface'], ax=axarr[0], order=mushroom['cap-surface'].value_counts().index, palette="magma").set_title('Cap Surface Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Surface')
b = sns.countplot(x="cap-surface", data=mushroom, hue="class", order=mushroom['cap-surface'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Grooves(g) cap surface mushrooms has only poisonous mushrooms and are very less in numbers.
2. Smooth(s) & Scaly(y) cap surface mushrooms has more poisonous mushroom whereas Fibrous(f) cap surface mushrooms has more edible mushrooms.

### Cap Color

In [None]:
mushroom.groupby(['cap-color'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['cap-color'], ax=axarr[0], order=mushroom['cap-color'].value_counts().index, palette="magma").set_title('Cap Color Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Cap Color')
b = sns.countplot(x="cap-color", data=mushroom, hue="class", order=mushroom['cap-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Brown(n) colored mushrooms are more in number followed by gray(g) & red(e)
2. Most of the brown(n), white(w) & gray(g) colored mushrooms are edible whereas most of the red(e), yellow(y) colored mushrooms are poisonous.
3. All purple(u) & green(r) colored mushrooms are edible but they are less in numbers.

### Bruises

In [None]:
mushroom.groupby(['bruises'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['bruises'], ax=axarr[0], order=mushroom['bruises'].value_counts().index, palette="magma").set_title('Bruise Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Bruises')
b = sns.countplot(x="bruises", data=mushroom, hue="class", order=mushroom['bruises'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
Just to understand here: *Mushroom bruising involves nicking the top and bottom of the mushroom cap and observing any colour changes. As specimens that are not fresh don’t give reliable results, it is important to do this within the first 30 minutes of picking the mushroom. It is also known that some mushrooms that do contain psilocybin and psilocin do not bruise at all.*

1. We have more number of mushrooms that does not bruise(f) at all.
2. Mushrooms that bruises, having said there high % that they are edible, where mushrooms that does not bruise(f), most of them are poisonous.
3. Also, note that, not all bruised(t) mushrooms are edible and vice versa. There are other factors involved in it. But we can say what bruise can be one of the important feature while predicting class.

### Odor

In [None]:
mushroom.groupby(['odor'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['odor'], ax=axarr[0], order=mushroom['odor'].value_counts().index, palette="magma").set_title('Odor Distribution')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Odor')
b = sns.countplot(x="odor", data=mushroom, hue="class", order=mushroom['odor'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Mushrooms with no odor(n) and foul(f) odor are more in mumbers.
2. It is very much clear that, all purgent(p), foul(f), creosote(c), fishy(y), spicy(s) and musty(m) odor mushrooms are poisonous.
3. All almond(a) and anise(l) odor mushroom are edible.
4. Mushroom with no odor(n) can be edible and poisonous. But from distribution we can say that most of them are edible.

Thus, Odor can be one of the most important feature while predicting the class of mushrooms.

### Gill Attachment

In [None]:
mushroom.groupby(['gill-attachment'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-attachment'], ax=axarr[0], order=mushroom['gill-attachment'].value_counts().index, palette="magma").set_title('Gill Attachment')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Attachment')
b = sns.countplot(x="gill-attachment", data=mushroom, hue="class", order=mushroom['gill-attachment'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Gill attachment type free(f) mushrooms are more in number.
2. All mushrooms with gill attachment type as attached(a) are edible.
3. Not much difference in free(f) gill attachment mushrooms while classifying them as edible or poisonous.

### Gill Spacing

In [None]:
mushroom.groupby(['gill-spacing'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-spacing'], ax=axarr[0], order=mushroom['gill-spacing'].value_counts().index, palette="magma").set_title('Gill Spacing')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Spacing')
b = sns.countplot(x="gill-spacing", data=mushroom, hue="class", order=mushroom['gill-spacing'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Closed(c) gill spacing mushrooms are more in number.
2. Most of the wide(w) gill spacing mushrooms are edible.

### Gill Size

In [None]:
mushroom.groupby(['gill-size'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-size'], ax=axarr[0], order=mushroom['gill-size'].value_counts().index, palette="flare").set_title('Gill Size')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill Size')
b = sns.countplot(x="gill-size", data=mushroom, hue="class", order=mushroom['gill-size'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Overall broad(b) gill size mushrooms are more in number.
2. Most of the narrow(n) gill size mushrooms are poisonous whereas most of the broad(b) gill size mushrooms are edible.

### Gill Color

In [None]:
mushroom.groupby(['gill-color'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['gill-color'], ax=axarr[0], order=mushroom['gill-color'].value_counts().index, palette="magma").set_title('Gill Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Gill color')
b = sns.countplot(x="gill-color", data=mushroom, hue="class", order=mushroom['gill-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Buff(b) gill color mushrooms are more in number followed by pink(p) & white(w)
2. All buff(b) gill colored mushrooms are poisonous. Also, all green(r) gill color mushrooms are poisonous.
3. All red(e) and orange gill color muhsrooms are edible.

### Stalk Shape

In [None]:
mushroom.groupby(['stalk-shape'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-shape'], ax=axarr[0], order=mushroom['stalk-shape'].value_counts().index, palette="magma").set_title('Stalk Shape')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Shape')
b = sns.countplot(x="stalk-shape", data=mushroom, hue="class", order=mushroom['stalk-shape'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Tapering(t) stalk shape mushrooms are slightly more than enlarging(e) stalk shaped one.
2. There no significant difference while considering class of mushroom. Enlarging(e) stalk shape mushrooms are more poisonous whereas tapering(t) ones are more edible.

### Stalk Surface Above Ring

In [None]:
mushroom.groupby(['stalk-surface-above-ring'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-surface-above-ring'], ax=axarr[0], order=mushroom['stalk-surface-above-ring'].value_counts().index, palette="magma").set_title('Stalk Surface Above Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Surface Above Ring')
b = sns.countplot(x="stalk-surface-above-ring", data=mushroom, hue="class", order=mushroom['stalk-surface-above-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Smooth(s) stalk surface above ring mushrooms are more and scaly(y) surface ones are very very less in numbers.
2. Most of the smooth(s) stalk surface above ring mushrooms are edible.
3. Mostof the silky(k) stalk surface above ring mushrooms are poisonous.

### Stalk Surface Below Ring

In [None]:
mushroom.groupby(['stalk-surface-below-ring'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-surface-below-ring'], order=mushroom['stalk-surface-below-ring'].value_counts().index, ax=axarr[0], palette="magma").set_title('Stalk Surface Below Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Surface Below Ring')
b = sns.countplot(x="stalk-surface-below-ring", data=mushroom, hue="class", order=mushroom['stalk-surface-below-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations: 
1. Smooth(s) stalk surface below ring mushrooms are more followed by scaly(y) surface mushrooms.
2. Most of the smooth(s) stalk surface below ring mushrooms are edible.
3. Most of the silky(k) stalk surface above ring mushrooms are poisonous.

#### From Stalk Surface Above & Below Ring: 
1. Most of the smooth(s) gill surface above & below ring mushrooms are edible.
2. Most of the silky(k) gill surface above & below ring mushrooms are poisonous.

### Stalk Color Above Ring

In [None]:
mushroom.groupby(['stalk-color-above-ring'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-color-above-ring'], ax=axarr[0], order=mushroom['stalk-color-above-ring'].value_counts().index, palette="magma").set_title('Stalk Color Above Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Color Above Ring')
b = sns.countplot(x="stalk-color-above-ring", data=mushroom, hue="class", order=mushroom['stalk-color-above-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. White(w) stalk color above ring mushrooms are more in numbers.
2. Most of the white(w) stalk color above ring mushrooms are edible.
3. Most of the pink(p) stalk color above ring mushrooms are poisonous.

### Stalk Color Below Ring

In [None]:
mushroom.groupby(['stalk-color-below-ring'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['stalk-color-below-ring'], order=mushroom['stalk-color-below-ring'].value_counts().index, ax=axarr[0], palette="magma").set_title('Stalk Color Below Ring')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Stalk Color Below Ring')
b = sns.countplot(x="stalk-color-below-ring", data=mushroom, hue="class", order=mushroom['stalk-color-below-ring'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. White(w) stalk color below ring mushrooms are more in numbers.
2. Most of the white(w) stalk color below ring mushrooms are edible.
3. Most of the pink(p) stalk color below ring mushrooms are poisonous.

#### From Stalk Color Above & Below Ring: 
1. Most of the white(w) stalk color above & below ring mushrooms are edible.
2. Most of the pink(p) stalk color above & below ring mushrooms are poisonous.

### Veil Type

In [None]:
mushroom.groupby(['veil-type'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['veil-type'], ax=axarr[0], palette="magma").set_title('Veil Type')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Veil Type')
b = sns.countplot(x="veil-type", data=mushroom, hue="class", palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. There is only one veil type(p) present in dataset ie partial(p).
2. It is not significant data for classifying edible and poisonous mushroom. 

*we can drop this column while modelling*

### Veil Color

In [None]:
mushroom.groupby(['veil-color'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['veil-color'], ax=axarr[0], order=mushroom['veil-color'].value_counts().index, palette="magma").set_title('Veil Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Veil Color')
b = sns.countplot(x="veil-color", data=mushroom, hue="class", order=mushroom['veil-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. More 90% White(w) veil color mushrooms are present in dataset.
2. There is no significant difference while classifying white(w) veil color mushrooms in to edible and poisonous.

### Ring Number

In [None]:
mushroom.groupby(['ring-number'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['ring-number'], ax=axarr[0], order=mushroom['ring-number'].value_counts().index, palette="magma").set_title('Ring Number')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Ring Number')
b = sns.countplot(x="ring-number", data=mushroom, hue="class", order=mushroom['ring-number'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. 1(o) ring number mushrooms are more in number and hard to classify.
2. No ring(n) mushrooms are very less in numbers and all of them are poisonous.
3. 2(t) ring number mushrooms are mostly edible.

### Ring Type

In [None]:
mushroom.groupby(['ring-type'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['ring-type'], ax=axarr[0], order=mushroom['ring-type'].value_counts().index, palette="magma").set_title('Ring Type')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Ring Type')
b = sns.countplot(x="ring-type", data=mushroom, hue="class", order=mushroom['ring-type'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. Pendant(p) ring type mushrooms are more in number and most of them are edible.
2. All large(l) ring type mushrooms are poisonous.

### Spore Print Color

In [None]:
mushroom.groupby(['spore-print-color'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['spore-print-color'], order=mushroom['spore-print-color'].value_counts().index, ax=axarr[0], palette="magma").set_title('Spore Print Color')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Spore Print Color')
b = sns.countplot(x="spore-print-color", data=mushroom, hue="class", order=mushroom['spore-print-color'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. White(w) spore print color mushrooms are more followed by brown(n), black(k) & chocolate(h)
2. More than 80% of black(k) & brown(n) spore print color mushrooms are edible.
3. More than 80% of white(w) & chocolate(h) spore print color mushrooms are poisonous.

This can also be one of the most important feature while classifying mushrooms.

### Population

In [None]:
mushroom.groupby(['population'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['population'], ax=axarr[0], order=mushroom['population'].value_counts().index, palette="magma").set_title('Population')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Population')
b = sns.countplot(x="population", data=mushroom, hue="class", order=mushroom['population'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. Population type several(p) mushrooms are more in numbers and most of them are poisonous.
2. All numerous(n) and abundant(a) population type mushrooms are edible.
3. Most of the scattered(s) & solitary(y) mushrooms are also edible.

### Habitat

In [None]:
mushroom.groupby(['habitat'])['class'].value_counts().to_frame()

In [None]:
fig, axarr = plt.subplots(1, 2, figsize=(12,5))
a = sns.countplot(mushroom['habitat'], ax=axarr[0], order=mushroom['habitat'].value_counts().index, palette="magma").set_title('Habitat')
axarr[1].set_title('Poisonous & Edible Mushroom Based On Habitat')
b = sns.countplot(x="habitat", data=mushroom, hue="class", order=mushroom['habitat'].value_counts().index, palette=('#9b111e','#50c878'), ax=axarr[1]).set_ylabel('Count')

#### Observations:
1. Mushrooms those found in woods(d) are more in number and most of them are edible.
2. Most of the mushrooms found in the grass(g) are also edible
3. Also, all the mushrooms found on the waste(w) are edible.

### Data Preperation For Modelling

#### Drop Unnecessary Columns
First, the columns which only take one value can be dropped. Let's see unique values from each columns.

In [None]:
mushroom.nunique()

In [None]:
mushroom.drop(columns=['veil-type'], axis=1, inplace=True)

### Convert Values To Integers
For converting, we will use label encoding

In [None]:
def Label_enc(feature):
    LE = LabelEncoder()
    LE.fit(feature)
    print(feature.name,LE.classes_)
    return LE.transform(feature)

In [None]:
for col in mushroom.columns:
    mushroom[str(col)] = Label_enc(mushroom[str(col)])

### Modelling

Let's define x and y and split them

In [None]:
x = mushroom.drop(columns=['class'], axis=1)
y = mushroom['class']

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

### Random Forest

In [None]:
model = RandomForestClassifier()
model.fit(x_train, y_train)
y_predict = model.predict(x_test)

Let's build confusion matrix

In [None]:
confmat1 = confusion_matrix(y_predict, y_test)
confmat1

In [None]:
cm=metrics.ConfusionMatrixDisplay(confusion_matrix=metrics.confusion_matrix(y_predict,y_test,labels=model.classes_),
                              display_labels=model.classes_)
cm.plot(cmap="flare")

Looking at the diagonal '0' values, it looks like our prediction is 100% and we will get 100% accuracy for this model.

### Feature Importances

In [None]:
fi_df = pd.DataFrame({
    "feature_importances" : model.feature_importances_,
    "features" : x.columns
})

fi_df.sort_values(by="feature_importances", ascending=False, inplace=True)

plt.figure(figsize=(10,7))
sns.barplot(x="feature_importances", y="features", palette="twilight", data=fi_df)
plt.show()

As discussed above, odor is taking up the 1st place classifying the mushrooms.

Let's find the accuracy now.

In [None]:
accuracy_score(y_predict, y_test)

**There it is 100% Accuracy!!**