## İntroduction

Mushrooms have a great place in our kitchens. It can find a place for itself in many meals. However, despite this, hundreds of people are poisoned each year due to eating the wrong mushroom. This event often results in death. In this project, some properties of mushrooms will be examined and it will be tried to decide whether they are poisonous or edible.

![](https://images.pexels.com/photos/53494/mushroom-fungi-fungus-many-53494.jpeg?cs=srgb&dl=pexels-pixabay-53494.jpg&fm=jpg)

### Content
1. [Load and Check Data](#1)
1. [Variable Description](#2)
1. [Basic Data Analysis - Visualization](#3)
    * [Cap - Class Analysis](#8)
    * [Gill - Class Analysis](#9)
    * [Stalk - Class Analysis](#10)
    * [Veil - Class Analysis](#11)
    * [Ring - Class Analysis](#12)
    * [Other Feature Analysis](#13)
1. [Missing Value](#5)
1. [Feature Engineering](#6)
    * [Cap -  Class Feature Engineering](#14)
    * [Gill - Class Feature Engineering](#15)
    * [Stalk -Class Feature Engineering](#16)
    * [Veil - Class Feature Engineering](#17)
    * [Ring - Class Feature Engineering](#18)
    * [Other Feature - Class Feature Engineering](#19)
    * [Class Feature Engineering](#20)
    * [Train Test Split](#21)
1. [Modeling](#7)
    * [Decision Tree Algorithm](#22)
    * [Random Forest Algorithm](#23)
    * [Logistic Regression Algorithm](#24)
    * [Support Vector Machine Algorithm](#25)
    * [Naive Bayes Algorithm](#26)

<a id = "1"></a><br>
## Load And Check Data

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px



import matplotlib.pyplot as plt
plt.style.use("seaborn-whitegrid")

import seaborn as sns

from collections import Counter

import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



In [None]:
data = pd.read_csv("/kaggle/input/mushroom-classification/mushrooms.csv")

In [None]:
data.shape

In [None]:
data.info()

In [None]:
data.head()

<a id = "2"></a><br>
## Variable Description


* class: edible=e, poisonous=p 
* cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
* cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
* cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
* bruises: bruises=t,no=f
* odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s
* gill-attachment: attached=a,descending=d,free=f,notched=n
* gill-spacing: close=c,crowded=w,distant=d
* gill-size: broad=b,narrow=n
* gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y
* stalk-shape: enlarging=e,tapering=t
* stalk-root: bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?
* stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
* stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
* stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
* stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
* veil-type: partial=p,universal=u
* veil-color: brown=n,orange=o,white=w,yellow=y
* ring-number: none=n,one=o,two=t
* ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z
* spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y
* population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y
* habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

* all columns in the data set are categorical data

In [None]:
def variable(variable):
    var = data[variable]
    varcount = var.value_counts()
    
    plt.Figure(figsize = (1,1))
    sns.barplot(x = varcount.index , y = varcount.values)
    plt.xlabel(variable)
    plt.ylabel("Frequency")
    plt.show()

for i in data.columns:
    variable(i)

<a id = "3"></a><br>
## Basic Data Analysis - Visualization


<a id = "8"></a><br>
### Cap - Class Anaylsis

In [None]:
capshape = data["cap-shape"].unique()

poisson_state_top = []
edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in capshape:
    edibles = len(data[(data["cap-shape"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["cap-shape"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["cap-shape"] == i ) & (data["class"] == "e")]) + len(data[(data["cap-shape"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

capshape_poisson_list = pd.DataFrame({"cap-shape" : capshape , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(capshape))
bar_width = 0.35

rects1 = plt.bar(index, capshape_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, capshape_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel('Cap Shape')
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('x', 'b', 's', 'f', 'k', 'c'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
capshape_poisson_list

In [None]:
data["cap-shape"].value_counts()

* Most are convex capped mushrooms.
* Looking at the edible-poisonous ratios, it is seen that the "sunken" (s) coded hat type is completely edible. The type of hat code "bell" (b) is also largely edible. However, the vast majority of hats with the code "knobbed" (k) are poisonous. The "conical" (c) type of hat is completely toxic. However, it would be wrong to reach a definite conclusion since we have 4 data with c code.

In [None]:
capsurface = data["cap-surface"].unique()

poisson_state_top = []
edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in capsurface:
    edibles = len(data[(data["cap-surface"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["cap-surface"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["cap-surface"] == i ) & (data["class"] == "e")]) + len(data[(data["cap-surface"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

capsurfac_poisson_list = pd.DataFrame({"cap-surface" : capsurface , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(capsurface))
bar_width = 0.35

rects1 = plt.bar(index, capsurfac_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, capsurfac_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel('Cap Surface')
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('s', 'y', 'f', 'g'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
capsurfac_poisson_list

In [None]:
data['cap-surface'].value_counts()

* When examining the surface of the fungus, the two possibilities are close to each other in scaly (y) and smooth (s) surface types. In the fibrous (f) type, it is edible with a rate of 67 percent. There is not enough data in 1 of them. This data alone cannot make a big contribution.

In [None]:
capcolor = data["cap-color"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in capcolor:
    edibles = len(data[(data["cap-color"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["cap-color"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["cap-color"] == i ) & (data["class"] == "e")]) + len(data[(data["cap-color"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

capcolor_poisson_list = pd.DataFrame({"cap-color" : capcolor , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(capcolor))
bar_width = 0.35

rects1 = plt.bar(index, capcolor_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, capcolor_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("cap-color")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('n', 'y', 'w', 'g', 'e', 'p', 'b', 'u', 'c', 'r'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
capcolor_poisson_list

In [None]:
data["cap-color"].value_counts()

* When we look at the color data, there is a distinctive distinction in the case of poison according to colors.

<a id = "9"></a><br>
### Gill - Class Anaylsis

In [None]:
print("gill-attachment Variable Unique Values: {}".format(data["gill-attachment"].unique()))
print("gill-spacing Variable Unique Values: {}".format(data["gill-spacing"].unique()))
print("gill-size Variable Unique Values: {}".format(data["gill-size"].unique()))
print("gill-color Variable Unique Values: {}".format(data["gill-color"].unique()))

In [None]:
gillattachment = data["gill-attachment"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in gillattachment:
    edibles = len(data[(data["gill-attachment"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["gill-attachment"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["gill-attachment"] == i ) & (data["class"] == "e")]) + len(data[(data["gill-attachment"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

gillattachment_poisson_list = pd.DataFrame({"gill-attachment" : gillattachment , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(gillattachment))
bar_width = 0.35

rects1 = plt.bar(index, gillattachment_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, gillattachment_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Gill Attachment")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('f','a'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
gillattachment_poisson_list

* There are 2 categorical titles in the "gill attachment" data. "free" (f) and "attached" (a). Header f is almost evenly divided in the case of venom. Title a contains 18 pieces of data. For this reason, this column is not very useful for us.

In [None]:
gillspacing = data["gill-spacing"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in gillspacing:
    edibles = len(data[(data["gill-spacing"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["gill-spacing"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["gill-spacing"] == i ) & (data["class"] == "e")]) + len(data[(data["gill-spacing"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

gillspacing_poisson_list = pd.DataFrame({"gill-spacing" : gillspacing , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(gillspacing))
bar_width = 0.35

rects1 = plt.bar(index, gillspacing_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, gillspacing_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Gill Spacing")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('c','w'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
gillspacing_poisson_list

* There are 2 categorical titles in the "gill spacing" data. "close" (c) and "crowded" (w). this column can help in our algorithm

In [None]:
gillsize = data["gill-size"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in gillsize:
    edibles = len(data[(data["gill-size"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["gill-size"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["gill-size"] == i ) & (data["class"] == "e")]) + len(data[(data["gill-size"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

gillsize_poisson_list = pd.DataFrame({"gill-size" : gillsize , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(gillsize))
bar_width = 0.35

rects1 = plt.bar(index, gillsize_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, gillsize_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Gill Size")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('n','b'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
gillsize_poisson_list

In [None]:
data["gill-size"].value_counts()

In [None]:
gillcolor = data["gill-color"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in gillcolor:
    edibles = len(data[(data["gill-color"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["gill-color"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["gill-color"] == i ) & (data["class"] == "e")]) + len(data[(data["gill-color"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

gillcolor_poisson_list = pd.DataFrame({"gill-color" : gillcolor , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(gillcolor))
bar_width = 0.35

rects1 = plt.bar(index, gillcolor_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, gillcolor_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Gill Color")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('k' 'n' 'g' 'p' 'w' 'h' 'u' 'e' 'b' 'r' 'y' 'o'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
gillcolor_poisson_list

In [None]:
data["gill-color"].value_counts()

<a id = "10"></a><br>
### Stalk - Class Analysis

In [None]:
stalk_list = ["stalk-shape","stalk-root","stalk-surface-above-ring","stalk-surface-below-ring","stalk-color-above-ring","stalk-color-below-ring"]
print("stalk-shape Variable Unique Values: {}".format(data["stalk-shape"].unique()))
print("stalk-root Variable Unique Values: {}".format(data["stalk-root"].unique()))
print("stalk-surface-above-ring Variable Unique Values: {}".format(data["stalk-surface-above-ring"].unique()))
print("stalk-surface-below-ring Variable Unique Values: {}".format(data["stalk-surface-below-ring"].unique()))
print("stalk-color-above-ring Variable Unique Values: {}".format(data["stalk-color-above-ring"].unique()))
print("stalk-color-below-ring Variable Unique Values: {}".format(data["stalk-color-below-ring"].unique()))

In [None]:
stalkshape = data["stalk-shape"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalkshape:
    edibles = len(data[(data["stalk-shape"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-shape"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-shape"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-shape"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalkshape_poisson_list = pd.DataFrame({"stalk-shape" : stalkshape , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalkshape))
bar_width = 0.35

rects1 = plt.bar(index, stalkshape_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalkshape_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Shape")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('e' 't'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalkshape_poisson_list

* When the stem shape is examined, there is a clear difference between the two categories.

In [None]:
stalkroot = data["stalk-root"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalkroot:
    edibles = len(data[(data["stalk-root"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-root"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-root"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-root"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalkroot_poisson_list = pd.DataFrame({"stalk-root" : stalkroot , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalkroot))
bar_width = 0.35

rects1 = plt.bar(index, stalkroot_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalkroot_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Root")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('e' 'c' 'b' 'r' '?'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalkroot_poisson_list

* The missing title will be filled in the "missing value" section.

In [None]:
stalksurabri = data["stalk-surface-above-ring"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalksurabri:
    edibles = len(data[(data["stalk-surface-above-ring"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-surface-above-ring"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-surface-above-ring"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-surface-above-ring"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalksurabri_poisson_list = pd.DataFrame({"stalk-surface-above-ring" : stalksurabri , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalksurabri))
bar_width = 0.35

rects1 = plt.bar(index, stalksurabri_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalksurabri_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Surface Above Ring")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('s' 'f' 'k' 'y'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalksurabri_poisson_list

In [None]:
stalksurberi = data["stalk-surface-below-ring"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalksurberi:
    edibles = len(data[(data["stalk-surface-below-ring"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-surface-below-ring"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-surface-below-ring"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-surface-below-ring"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalksurberi_poisson_list = pd.DataFrame({"stalk-surface-below-ring" : stalksurberi , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalksurberi))
bar_width = 0.35

rects1 = plt.bar(index, stalksurberi_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalksurberi_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Surface Below Ring")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('s' 'f' 'k' 'y'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalksurberi_poisson_list

In [None]:
stalkcoabri = data["stalk-color-above-ring"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalkcoabri:
    edibles = len(data[(data["stalk-color-above-ring"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-color-above-ring"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-color-above-ring"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-color-above-ring"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalkcoabri_poisson_list = pd.DataFrame({"stalk-color-above-ring" : stalkcoabri , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalkcoabri))
bar_width = 0.35

rects1 = plt.bar(index, stalkcoabri_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalkcoabri_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Color Above Ring")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('w' 'g' 'p' 'n' 'b' 'e' 'o' 'c' 'y'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalkcoabri_poisson_list

In [None]:
stalkcoberi = data["stalk-color-below-ring"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in stalkcoberi:
    edibles = len(data[(data["stalk-color-below-ring"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["stalk-color-below-ring"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["stalk-color-below-ring"] == i ) & (data["class"] == "e")]) + len(data[(data["stalk-color-below-ring"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

stalkcoberi_poisson_list = pd.DataFrame({"stalk-color-below-ring" : stalkcoberi , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(stalkcoberi))
bar_width = 0.35

rects1 = plt.bar(index, stalkcoberi_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, stalkcoberi_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Stalk Color Below Ring")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('w' 'p' 'g' 'b' 'n' 'e' 'y' 'o' 'c'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
stalkcoberi_poisson_list

<a id = "11"></a><br>
### Veil - Class Analysis

In [None]:
veil_list = ["veil-type" ,"veil-color"]

print("veil-type Variable Unique Values: {}".format(data["veil-type"].unique()))
print("veil-color Variable Unique Values: {}".format(data["veil-color"].unique()))

In [None]:
veiltype = data["veil-type"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in veiltype:
    edibles = len(data[(data["veil-type"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["veil-type"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["veil-type"] == i ) & (data["class"] == "e")]) + len(data[(data["veil-type"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

veiltype_poisson_list = pd.DataFrame({"veil-type" : veiltype , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(veiltype))
bar_width = 0.35

rects1 = plt.bar(index, veiltype_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, veiltype_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Veil Type")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('p'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
veiltype_poisson_list

In [None]:
veilcolor = data["veil-color"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in veilcolor:
    edibles = len(data[(data["veil-color"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["veil-color"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["veil-color"] == i ) & (data["class"] == "e")]) + len(data[(data["veil-color"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

veilcolor_poisson_list = pd.DataFrame({"veil-color" : veilcolor , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(veilcolor))
bar_width = 0.35

rects1 = plt.bar(index, veilcolor_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, veilcolor_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Veil Color")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('w' 'n' 'o' 'y'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
veilcolor_poisson_list

<a id = "12"></a><br>
### Ring - Class Analysis

In [None]:
ring_list = ["ring-number","ring-type"]

print("ring-number Variable Unique Values: {}".format(data["ring-number"].unique()))
print("ring-type Variable Unique Values: {}".format(data["ring-type"].unique()))

In [None]:
ringnumber = data["ring-number"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in ringnumber:
    edibles = len(data[(data["ring-number"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["ring-number"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["ring-number"] == i ) & (data["class"] == "e")]) + len(data[(data["ring-number"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

ringnum_poisson_list = pd.DataFrame({"ring-number" : ringnumber , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(ringnumber))
bar_width = 0.35

rects1 = plt.bar(index, ringnum_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, ringnum_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Ring Number")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('o' 't' 'n'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
ringnum_poisson_list

In [None]:
ringtype = data["ring-type"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in ringtype:
    edibles = len(data[(data["ring-type"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["ring-type"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["ring-type"] == i ) & (data["class"] == "e")]) + len(data[(data["ring-type"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

ringtype_poisson_list = pd.DataFrame({"ring-type" : ringtype , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(ringtype))
bar_width = 0.35

rects1 = plt.bar(index, ringtype_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, ringtype_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Ring Type")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('p' 'e' 'l' 'f' 'n'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
ringtype_poisson_list

<a id = "13"></a><br>
### Other Feature Analysis

In [None]:
other_list =  ["bruises","odor" , "spore-print-color" ,"population","habitat"]

print("bruises Variable Unique Values: {}".format(data["bruises"].unique()))
print("odor Variable Unique Values: {}".format(data["odor"].unique()))
print("spore-print-color Variable Unique Values: {}".format(data["spore-print-color"].unique()))
print("population Variable Unique Values: {}".format(data["population"].unique()))
print("habitat Variable Unique Values: {}".format(data["habitat"].unique()))


In [None]:
bruises = data["bruises"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in bruises:
    edibles = len(data[(data["bruises"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["bruises"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["bruises"] == i ) & (data["class"] == "e")]) + len(data[(data["bruises"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

bruises_poisson_list = pd.DataFrame({"bruises" : bruises , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(bruises))
bar_width = 0.35

rects1 = plt.bar(index, bruises_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, bruises_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Bruises")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('t' 'f'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
bruises_poisson_list

In [None]:
odor = data["odor"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in odor:
    edibles = len(data[(data["odor"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["odor"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["odor"] == i ) & (data["class"] == "e")]) + len(data[(data["odor"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

odor_poisson_list = pd.DataFrame({"odor" : odor , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(odor))
bar_width = 0.35

rects1 = plt.bar(index, odor_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, odor_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Odor")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('p' 'a' 'l' 'n' 'f' 'c' 'y' 's' 'm'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
odor_poisson_list

In [None]:
sporeprintcolor = data["spore-print-color"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in sporeprintcolor:
    edibles = len(data[(data["spore-print-color"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["spore-print-color"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["spore-print-color"] == i ) & (data["class"] == "e")]) + len(data[(data["spore-print-color"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

sporeprintcolor_poisson_list = pd.DataFrame({"spore-print-color" : sporeprintcolor , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(sporeprintcolor))
bar_width = 0.35

rects1 = plt.bar(index, sporeprintcolor_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, sporeprintcolor_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Spore Print Color")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('k' 'n' 'u' 'h' 'w' 'r' 'o' 'y' 'b'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
sporeprintcolor_poisson_list

In [None]:
population  = data["population"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in population :
    edibles = len(data[(data["population"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["population"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["population"] == i ) & (data["class"] == "e")]) + len(data[(data["population"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

population_poisson_list = pd.DataFrame({"population" : population  , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(population ))
bar_width = 0.35

rects1 = plt.bar(index, population_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, population_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Population")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('s' 'n' 'a' 'v' 'y' 'c'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
population_poisson_list

In [None]:
habitat  = data["habitat"].unique()

edible = []
poisson = []
ort = []
edible_ort = []
poisson_ort = []

for i in habitat :
    edibles = len(data[(data["habitat"] == i ) & (data["class"] == "e")])
    poissons = len(data[(data["habitat"] == i ) & (data["class"] == "p")])
    ort_pers =  len(data[(data["habitat"] == i ) & (data["class"] == "e")]) + len(data[(data["habitat"] == i ) & (data["class"] == "p")]) 
    edible_ort_pers = (edibles / ort_pers)*100
    poisson_ort_pers = (poissons / ort_pers)*100

    edible.append(edibles)
    poisson.append(poissons)
    edible_ort.append(edible_ort_pers)
    poisson_ort.append(poisson_ort_pers)
    

habitat_poisson_list = pd.DataFrame({"habitat" : habitat  , "edible_count" : edible ,"edible_ort" : edible_ort , "poisson_count" : poisson, "poisson_ort" : poisson_ort})

fig, ax = plt.subplots()
index = np.arange(len(habitat))
bar_width = 0.35

rects1 = plt.bar(index, habitat_poisson_list["edible_count"], bar_width,
color='b',
label='Edible')

rects2 = plt.bar(index + bar_width, habitat_poisson_list["poisson_count"], bar_width,
color='g',
label='Poisson')


plt.xlabel("Habitat")
plt.ylabel('Poisson State')
plt.xticks(index + bar_width, ('u' 'g' 'm' 'd' 'p' 'w' 'l'))
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
habitat_poisson_list

<a id = "5"></a><br>
## Missing Value


In [None]:
data.isnull().sum()

* There is no missing data on the data set.
* But "?" In the "stalk-root" column. We can fix the category.

In [None]:
data["stalk-root"].unique()

In [None]:
stalkroot_poisson_list

* There is a large amount of data in this category. Therefore, we cannot exclude this category from the data set. That's why "?" Let's write "other" instead

In [None]:
data["stalk-root"].replace("?" ,"stalk_other" ,inplace = True)

In [None]:
data["stalk-root"].unique()

<a id = "6"></a><br>
## Feature Engineering


<a id = "14"></a><br>
### Cap -  Class Feature Engineering

1. Cap-shape Feature Engineering

* Since there is a small number of data, let's combine "s" and "c" coded categories with "b" code

In [None]:
data["cap-shape"].replace(["s" , "c"] , "b" , inplace = True)
data["cap-shape"].unique()

2. Cap-Surface Feature Engineering

* Since there is a small number of data, let's combine the category with the code "g" with the category with the code "y".

In [None]:
data["cap-surface"].replace("g" , "y" , inplace = True)
data["cap-surface"].unique()

3.Cap-Color Feature Engineering

* Since there is a low number of data, let's combine the categories with "p", "b", "u", "c", "r", under "other"

In [None]:
data["cap-color"].replace(["p" ,"b" , "u" ,"c" , "r"] , "cap_other" , inplace = True)
data["cap-color"].unique()

* Now let's convert all the data under the heading "cap" to numbers 1 and 0 by making "one hot encoding".

In [None]:
data = pd.concat([data , pd.get_dummies(data["cap-shape"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["cap-surface"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["cap-color"])] , axis =1)

data.drop("cap-shape" , axis = 1 , inplace = True)
data.drop("cap-surface" , axis = 1 , inplace = True)
data.drop("cap-color" , axis = 1 , inplace = True)

<a id = "15"></a><br>
### Gill - Class Feature Engineering


1. gill-attachment Feature Engineering

* This column will not be of much use so we can delete

In [None]:
data.drop("gill-attachment" , axis = 1 , inplace = True)

2. gill-color Feature Engineering


* Let's combine the categories "e", "b", "r", "y", "o" in the "gill_other" category because they contain a low number of data.

In [None]:
data["gill-color"] = data["gill-color"].replace(["e" ,"b" , "r" ,"y" , "o"] , "gill_other")

In [None]:
data["gill-color"].unique()

Now let's convert all the data under the heading "gill" to numbers 1 and 0 by making "one hot encoding".

In [None]:
data = pd.concat([data , pd.get_dummies(data["gill-spacing"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["gill-size"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["gill-color"])] , axis =1)

data.drop("gill-spacing" , axis = 1 , inplace = True)
data.drop("gill-size" , axis = 1 , inplace = True)
data.drop("gill-color" , axis = 1 , inplace = True)

<a id = "16"></a><br>
### Stalk -Class Feature Engineering


1.stalk-surface-above-ring


* Let's combine category "y" with category "s" since it contains a small number of data.

In [None]:
data["stalk-surface-above-ring"] = data["stalk-surface-above-ring"].replace("y" ,"s")
data["stalk-surface-above-ring"].unique()

2.stalk-color-above-ring


* Let's combine categories "b", "c", "y","e"into the "stalk-color-above-other" category because they contain a small number of data.

In [None]:
data["stalk-color-above-ring"] = data["stalk-color-above-ring"].replace(["b","c","y", "e"] ,"stalk-color-above-other")
data["stalk-color-above-ring"].unique()

3.stalk-color-below-ring


Let's combine categories "e", "c", "y"into the "stalk-color-below-other" category because they contain a small number of data.

In [None]:
data["stalk-color-below-ring"] = data["stalk-color-below-ring"].replace(["c","y", "e"] ,"stalk-color-below-other")
data["stalk-color-below-ring"].unique()

* Now let's convert all the data under the heading "gill" to numbers 1 and 0 by making "one hot encoding".

In [None]:
data = pd.concat([data , pd.get_dummies(data["stalk-shape"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["stalk-root"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["stalk-surface-below-ring"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["stalk-surface-above-ring"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["stalk-color-above-ring"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["stalk-color-below-ring"])] , axis =1)


data.drop("stalk-shape" , axis = 1 , inplace = True)
data.drop("stalk-root" , axis = 1 , inplace = True)
data.drop("stalk-surface-below-ring" , axis = 1 , inplace = True)
data.drop("stalk-surface-above-ring" , axis = 1 , inplace = True)
data.drop("stalk-color-above-ring" , axis = 1 , inplace = True)
data.drop("stalk-color-below-ring" , axis = 1 , inplace = True)

<a id = "17"></a><br>
### Veil - Class Feature Engineering



1.veil-color

Let's combine categories "n", "o", "y"into the "veil-color" category because they contain a small number of data.

In [None]:
data["veil-color"] = data["veil-color"].replace(["n" , "o" , "y"] , "veil-color")

In [None]:
data["veil-color"].unique()

In [None]:
data = pd.concat([data , pd.get_dummies(data["veil-type"])] , axis =1)
data = pd.concat([data , pd.get_dummies(data["veil-color"])] , axis =1)


data.drop("veil-type" , axis = 1 , inplace = True)
data.drop("veil-color" , axis = 1 , inplace = True)

<a id = "18"></a><br>
### Ring - Class Feature Engineering


1.ring-number

In [None]:
data["ring-number"] = data["ring-number"].replace("n" , "o")
data["ring-number"].unique()

data = pd.concat([data , pd.get_dummies(data["ring-number"])] , axis =1)
data.drop("ring-number" , axis = 1 , inplace = True)


2.ring-type

In [None]:
data["ring-type"] = data["ring-type"].replace(["f" , "n"] , "p")
data["ring-type"].unique()

data = pd.concat([data , pd.get_dummies(data["ring-type"])] , axis =1)
data.drop("ring-type" , axis = 1 , inplace = True)


<a id = "19"></a><br>
### Other Feature - Class Feature Engineering

bruises
odor
spore-print-color
population
habitat


1.odor

In [None]:
data["odor"] = data["odor"].replace("m" , "n")
data["odor"].unique()

In [None]:
data["spore-print-color"] = data["spore-print-color"].replace(["u","r" ,"o", "y" ,"b"] , "spor-other")
data["spore-print-color"].unique()

In [None]:
data = pd.concat([data , pd.get_dummies(data["odor"])] , axis =1)
data.drop("odor" , axis = 1 , inplace = True)

data = pd.concat([data , pd.get_dummies(data["bruises"])] , axis =1)
data.drop("bruises" , axis = 1 , inplace = True)

data = pd.concat([data , pd.get_dummies(data["spore-print-color"])] , axis =1)
data.drop("spore-print-color" , axis = 1 , inplace = True)

data = pd.concat([data , pd.get_dummies(data["population"])] , axis =1)
data.drop("population" , axis = 1 , inplace = True)

data = pd.concat([data , pd.get_dummies(data["habitat"])] , axis =1)
data.drop("habitat" , axis = 1 , inplace = True)

<a id = "20"></a><br>
### Class Feature Engineering

In [None]:
data["class"] = data["class"].replace("p" ,0)
data["class"] = data["class"].replace("e" ,1)

data["class"].unique()

In [None]:
data

<a id = "21"></a><br>

### Train Test Split

In [None]:
from sklearn.model_selection import train_test_split,StratifiedKFold, GridSearchCV

In [None]:
x = data.drop("class" , axis = 1)
y = data["class"]

In [None]:
x_train , x_test , y_train , y_test = train_test_split(x,y, test_size = 0.33 , random_state = 42)

In [None]:
print("x_train: {}".format(x_train.shape))
print("x_test: {}".format(x_test.shape))
print("y_train: {}".format(y_train.shape))
print("y_test: {}".format(y_test.shape))

<a id = "7"></a><br>
## Modeling

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.cluster import KMeans
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

<a id = "22"></a><br>
### 1.Decision Tree Algorithm

In [None]:
dt = DecisionTreeClassifier()
dt.fit(x_train , y_train)

print("Decision Tree Test Accuracy: {}".format(dt.score(x_test, y_test)))


<a id = "23"></a><br>
### 2.Random Forest Algorithm

In [None]:
score_list = []
for i in range(1,100):
    rf = RandomForestClassifier(n_estimators = i , random_state = 42)
    rf.fit(x_train , y_train)
    score = rf.score(x_train , y_train)
    score_list.append(score)
    
    
plt.plot(range(1,100) , score_list)
plt.show()

In [None]:
rf = RandomForestClassifier(n_estimators = 3 , random_state = 42)
rf.fit(x_train , y_train)

print("Random Forest Test Accuracy {}".format(rf.score(x_test , y_test)))

<a id = "24"></a><br>
### 3.Logistic Regression

In [None]:
lr = LogisticRegression()
lr.fit(x_train , y_train)
print("Logistic Regression Test Accuracy {}".format(lr.score(x_test , y_test)))

<a id = "25"></a><br>
### 4.Support Vector Machine

In [None]:
svm = SVC(random_state = 42)
svm.fit(x_train , y_train)
print("Support Vector Machine Test Accuracy {}".format(svm.score(x_test , y_test)))

<a id = "26"></a><br>
### 5.Naive Bayes Algorithm

In [None]:
nb = GaussianNB()
nb.fit(x_train , y_train)
print("Naive Bayes Test Accuracy {}".format(nb.score(x_test , y_test)))