**Problem Statement:**

The Audubon Society Field Guide to North American Mushrooms contains descriptions
of hypothetical samples corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family Mushroom (1981). Each species is labelled as either
definitely edible, definitely poisonous, or maybe edible but not recommended. This last
category was merged with the toxic category. The Guide asserts unequivocally that
there is no simple rule for judging a mushroom's edibility, such as "leaflets three, leave it
be" for Poisonous Oak and Ivy.
The main goal is to predict which mushroom is poisonous & which is edible.

In [2]:
import pandas as pd
import numpy as np
from plotly import express as px
import matplotlib.pyplot as plt
import seaborn as sns



In [3]:
df=pd.read_csv(r'/content/mushrooms.csv')

In [4]:
pd.set_option('display.max_columns', None)

In [5]:
df

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,e,k,s,n,f,n,a,c,b,y,e,?,s,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,e,?,s,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,e,?,s,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,t,?,s,k,w,w,p,w,o,e,w,v,l


In [6]:
df.shape

(8124, 23)

In [7]:
df.info

<bound method DataFrame.info of      class cap-shape cap-surface cap-color bruises odor gill-attachment  \
0        p         x           s         n       t    p               f   
1        e         x           s         y       t    a               f   
2        e         b           s         w       t    l               f   
3        p         x           y         w       t    p               f   
4        e         x           s         g       f    n               f   
...    ...       ...         ...       ...     ...  ...             ...   
8119     e         k           s         n       f    n               a   
8120     e         x           s         n       f    n               a   
8121     e         f           s         n       f    n               a   
8122     p         k           y         n       f    y               f   
8123     e         x           s         n       f    n               a   

     gill-spacing gill-size gill-color stalk-shape stalk-root  \
0 

In [8]:
pd.set_option('display.max_colwidth', 1000)

def categorical_var_summary(df):
  mode=df.value_counts().reset_index()
  return pd.Series([df.value_counts(),df.isna().sum(),mode.iloc[0,0],mode.iloc[0,1]],
                   index=['N','NMissing','Mode','Frequency'])
x=df.copy()
x.apply(categorical_var_summary).T.round(2)

Unnamed: 0,N,NMissing,Mode,Frequency
class,"e 4208 p 3916 Name: class, dtype: int64",0,e,4208
cap-shape,"x 3656 f 3152 k 828 b 452 s 32 c 4 Name: cap-shape, dtype: int64",0,x,3656
cap-surface,"y 3244 s 2556 f 2320 g 4 Name: cap-surface, dtype: int64",0,y,3244
cap-color,"n 2284 g 1840 e 1500 y 1072 w 1040 b 168 p 144 c 44 u 16 r 16 Name: cap-color, dtype: int64",0,n,2284
bruises,"f 4748 t 3376 Name: bruises, dtype: int64",0,f,4748
odor,"n 3528 f 2160 y 576 s 576 a 400 l 400 p 256 c 192 m 36 Name: odor, dtype: int64",0,n,3528
gill-attachment,"f 7914 a 210 Name: gill-attachment, dtype: int64",0,f,7914
gill-spacing,"c 6812 w 1312 Name: gill-spacing, dtype: int64",0,c,6812
gill-size,"b 5612 n 2512 Name: gill-size, dtype: int64",0,b,5612
gill-color,"b 1728 p 1492 w 1202 n 1048 g 752 h 732 u 492 k 408 e 96 y 86 o 64 r 24 Name: gill-color, dtype: int64",0,b,1728


In [None]:
# stalk-root have "?" as missing value
# veil-type	has only 1 type of value


**Attribute Information:**

classes: edible=e, poisonous=p

cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s

cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s

cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y

bruises: bruises=t,no=f

odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s

gill-attachment: attached=a,descending=d,free=f,notched=n

gill-spacing: close=c,crowded=w,distant=d

gill-size: broad=b,narrow=n

gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y

stalk-shape: enlarging=e,tapering=t

stalk-root: bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?

stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s

stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s

stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y

stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y

veil-type: partial=p,universal=u

veil-color: brown=n,orange=o,white=w,yellow=y

ring-number: none=n,one=o,two=t

ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z

spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y

population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y

habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

We will replace these values with the original column values to understand data better.


we have 8124 rows and 23 columns

In [9]:
a=df[df['stalk-root']=="?"].shape[0]/df.shape[0]
print(f'stalk-root contains {a} % missing/null values')

stalk-root contains 0.3052683407188577 % missing/null values


In [10]:
df = df.replace({
    "class":{'e':'edible','p':'poisonous'},
    "cap-shape":{'b':'bell','c':'conical','x':'convex','f':'flat','k':'knobbed','s':'sunken'},
    "cap-surface":{'f':'fibrous','g':'grooves','y':'scaly','s':'smooth'},
    "cap-color":{'n':'brown','b':'buff','c':'cinnamon','g':'gray','r':'green','p':'pink','u':'purple','e':'red','w':'white','y':'yellow'},
    "bruises":{'t':'yes','f':'no'},
    "odor":{'a':'almond','l':'anise','c':'creosote','y':'fishy','f':'foul','m':'musty','n':'none','p':'pungent','s':'spicy'},
    "gill-attachment":{'a':'attached','d':'descending','f':'free','n':'notched'},
    "gill-spacing":{'c':'close','w':'crowded','d':'distant'},
    "gill-size":{'b':'broad','n':'narrow'},
    "gill-color":{'k':'black','n':'brown','b':'buff','h':'chocolate','g':'gray','r':'green','o':'orange','p':'pink','u':'purple','e':'red',
                  'w':'white','y':'yellow'},
    "stalk-shape":{'e':'enlarging','t':'tapering'},
    "stalk-root":{'b':'bulbous','c':'club','u':'cup','e':'equal','z':'rhizomorphs','r':'rooted'},
    "stalk-surface-above-ring":{'f':'fibrous','y':'scaly','k':'silky','s':'smooth'},
    "stalk-surface-below-ring":{'f':'fibrous','y':'scaly','k':'silky','s':'smooth'},
    "stalk-color-above-ring":{'n':'brown','b':'buff','c':'cinnamon','g':'gray','o':'orange','p':'pink','e':'red','w':'white','y':'yellow'},
    "stalk-color-below-ring":{'n':'brown','b':'buff','c':'cinnamon','g':'gray','o':'orange','p':'pink','e':'red','w':'white','y':'yellow'},
    "veil-type":{'p':'partial','u':'universal'},
    "veil-color":{'n':'brown','o':'orange','w':'white','y':'yellow'},
    "ring-number":{'n':'none','o':'one','t':'two'},
    "ring-type":{'c':'cobwebby','e':'evanescent','f':'flaring','l':'large','n':'none','p':'pendant','s':'sheathing','z':'zone'},
    "spore-print-color":{'k':'black','n':'brown','b':'buff','h':'chocolate','r':'green','o':'orange','u':'purple','w':'white','y':'yellow'},
    "population":{'a':'abundant','c':'clustered','n':'numerous','s':'scattered','v':'several','y':'solitary'},
    "habitat":{'g':'grasses','l':'leaves','m':'meadows','p':'paths','u':'urban','w':'waste','d':'woods'}
})

In [11]:
df.head()

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,poisonous,convex,smooth,brown,yes,pungent,free,close,narrow,black,enlarging,equal,smooth,smooth,white,white,partial,white,one,pendant,black,scattered,urban
1,edible,convex,smooth,yellow,yes,almond,free,close,broad,black,enlarging,club,smooth,smooth,white,white,partial,white,one,pendant,brown,numerous,grasses
2,edible,bell,smooth,white,yes,anise,free,close,broad,brown,enlarging,club,smooth,smooth,white,white,partial,white,one,pendant,brown,numerous,meadows
3,poisonous,convex,scaly,white,yes,pungent,free,close,narrow,brown,enlarging,equal,smooth,smooth,white,white,partial,white,one,pendant,black,scattered,urban
4,edible,convex,smooth,gray,no,none,free,crowded,broad,black,tapering,equal,smooth,smooth,white,white,partial,white,one,evanescent,brown,abundant,grasses


In [12]:
 df.drop(['veil-type'],axis=1,inplace=True)


In [13]:
df.sample().shape

(1, 22)

In [14]:
data1=df.groupby(['class'])['class'].count()

In [15]:
fig=px.pie(df,values=data1,names=data1.index,title="EDIBLE VS POISONOUS")
fig.show()

In [16]:
data2 = df.groupby(['bruises'])['bruises'].count()
fig2 =px.pie(df,values=data2,names=data2.index,title="Mushroom Bruises")
fig2.show()

In [29]:
data3 = df.groupby(['odor'])['odor'].count()
fig3 =px.bar(data3,x=data3.index,y='odor')
fig3.show()

In [18]:
CapShape_Class = df.groupby(['class','cap-shape']).size().reset_index().pivot(
    columns='class',index='cap-shape',values=0).rename(columns={0:"counts"})
CapShape_Class

class,edible,poisonous
cap-shape,Unnamed: 1_level_1,Unnamed: 2_level_1
bell,404.0,48.0
conical,,4.0
convex,1948.0,1708.0
flat,1596.0,1556.0
knobbed,228.0,600.0
sunken,32.0,


In [19]:
fig4=px.bar(CapShape_Class,x=CapShape_Class.index,y=[CapShape_Class['edible'],CapShape_Class['poisonous']])
fig4.show()

In [20]:
Bruises_Class = df.groupby(['class','bruises']).size().reset_index().pivot(
    columns='class',index='bruises',values=0).rename(columns={0:"counts"})
fig5=px.bar(Bruises_Class,x=Bruises_Class.index,y=[Bruises_Class['edible'],Bruises_Class['poisonous']])
fig5.show()

In [21]:
OdorClass = df.groupby(['class','odor']).size().reset_index().pivot(
    columns='class',index='odor',values=0).rename(columns={0:"counts"})
fig5=px.bar(OdorClass,x=OdorClass.index,y=[OdorClass['edible'],OdorClass['poisonous']])
fig5.show()

In [23]:
PopulationClass = df.groupby(['class','population']).size().reset_index().pivot(
    columns='class',index='population',values=0).rename(columns={0:"counts"})
PopulationClass

class,edible,poisonous
population,Unnamed: 1_level_1,Unnamed: 2_level_1
abundant,384.0,
clustered,288.0,52.0
numerous,400.0,
scattered,880.0,368.0
several,1192.0,2848.0
solitary,1064.0,648.0


In [24]:
fig6=px.bar(PopulationClass,x=PopulationClass.index,y=[PopulationClass['edible'],PopulationClass['poisonous']])
fig6.show()

In [26]:
HabitatClass = df.groupby(['class','habitat']).size().reset_index().pivot(
    columns='class',index='habitat',values=0).rename(columns={0:"counts"})
fig7=px.bar(HabitatClass,x=HabitatClass.index,y=[HabitatClass['edible'],HabitatClass['poisonous']])
fig7.show()

In [27]:
StalkColorBelowRing_Class = df.groupby(['class','stalk-color-below-ring']).size().reset_index().pivot(
    columns='class',index='stalk-color-below-ring',values=0).rename(columns={0:"counts"})
fig8=px.bar(StalkColorBelowRing_Class,x=StalkColorBelowRing_Class.index,y=[StalkColorBelowRing_Class['edible'],StalkColorBelowRing_Class['poisonous']])
fig8.show()

# Summary :

* Data authentication: The source of data is iNeuron.ai .The data is good for analysis
* Data bias : After analysing data it looks like data is not bias .
1. The target column has 2 class type one is 'poisonous' which has 3916 counts and second is 'edible' which has 4208 counts so we have nearly equal counts for poisonous and edible classes in our data. Hence we can say that our data is balanced.
2. There are 4 types of cap-surface in a mushroom and also it suggests that 'edible' mushrooms do not have 'cap-surface' : 'g : grooves' according to our data.
3. 51.8 % Mushrooms are Edible.
4. Some people think that all blue bruising mushrooms are safe to eat or are hallucinogenic. The bolete rule above proves that is not true. This myth is an example of why identifying mushrooms through bruising alone is a bad idea.(source google = 'https://www.mushroom-appreciation.com/identifying-mushrooms.html#:~:text=The%20spores%20and%20stem%20turn,alone%20is%20a%20bad%20idea!')
5. 3528 mushrooms dosent have odor
6. cap-shape sunken mushrooms in this dataset is not poisonous in nature where conical is poisonous in nature . other are mixed.
7. Mushrooms with out Bruises have higher chance of being poisonous while with bruises have lower chance being poisonous.
8. mushrooms with almond and anise is edible and no odor is high channce bring edible . Other odor is not recomended for eating
9. abundant and numerous population class are edible according to this data where other are mixed .
10.The 'poisonous' mushrooms do not have Habitat Type as Waste according to this data.
10. stalk color Gray,Orange,Red are completely edible and buff,cinnamon,yellow are poisonous . brown have higher chance of being poisonous .
