<a id="1"></a>
# <div style="text-align:center; border-radius:30px 30px; padding:7px; color:white; margin:0; font-size:150%; font-family:Arial; background-color:#0033cc; overflow:hidden"><b> Mushroom Mystery: Edible or Poisonous? </b></div>

![mushrooms](https://img.freepik.com/free-vector/different-types-mushrooms_1308-86573.jpg?t=st=1722544742~exp=1722548342~hmac=afb99520b515d7ae34bf06b713fd9b50277d5d432b6f9b455d53c475f48bf5ee&w=2000)

<div style="padding: 20px; border-color: #4CAF50; border-radius: 8px; box-shadow: 0 2px 6px 0 rgba(0, 0, 0, 0.2); border: 2px solid #4CAF50; width: 75%; margin: 20px auto; background-color: #f4fff4;">
    <p style="font-size: 20px; font-family: 'Georgia'; line-height: 1.8em; color: #333;">
        Imagine stepping into the enchanting world of mycology, where every mushroom tells a story. As an aspiring <strong>"Mushroom Inspector"</strong>, your task is to determine the safety of these fungi based on their features. Some mushrooms may appear harmless yet harbor toxic secrets, while others might be perfectly safe but are often avoided due to their unusual looks.
    </p>
    <p style="font-size: 20px; font-family: 'Georgia'; line-height: 1.8em; color: #333;">
        In this project, we dive into a carefully curated dataset that mimics real-world mushroom observations. It features characteristics such as cap color, gill size, and spore print—traits that hold the key to unraveling their edibility. By applying data science techniques, you’ll transform raw data into actionable insights.
    </p>
    <p style="font-size: 20px; font-family: 'Georgia'; line-height: 1.8em; color: #333;">
        From data exploration (EDA) to predictive modeling, this journey equips you with the skills to spot patterns and design an intelligent system that distinguishes between edible and poisonous mushrooms, ensuring safer adventures in the forest.
    </p>
</div>


- <a href="#libraries">1. Importing Required Libraries</a>
- <a href="#data">2. Reading and Understanding our Data</a>
- <a href="#clean">3. Data Cleaning</a>
    - <a href="#infreq">3.1. Deal with Infrequent Categories</a> 
    - <a href="#numerical">3.2. Fill Missing Values in Numerical Columns</a> 
    - <a href="#impute">3.3. Impute Categorical Missing Values</a> 
    - <a href="#dup">3.4. Drop Duplicates</a> 


In [None]:
import numpy as np
import pandas as pd 
import os 
import matplotlib.pyplot as plt

In [None]:
os.chdir('D:\\data analyst\\kaggle datasets\\mushroom classification\\uci\\')

In [None]:
mdata=pd.read_csv('agaricus-lepiota.csv')

In [None]:
mdata.head()

In [None]:
mdata=mdata.rename(columns={'p':'target',
                            'x':'cap-shape',
                            's':'cap-surface',
                            'n':'cap-color',
                            't':'bruises',
                            'p.1':'odor',
                            'f':'gill-attachment',
                            'c':'gill-spacing',
                            'n.1':'gill-size',
                            'k':'gill-color',
                            'e':'stalk-shape',
                            'e.1':'stalk-root',
                            's.1':'stalk-surface-above-ring',
                            's.2':'stalk-surface-below-ring',
                            'w':'stalk-color-above-ring',
                            'w.1':'stalk-color-below-ring',
                            'p.2':'veil-type',
                            'w.2':'veil-color',
                            'o':'ring-number',
                            'p.3':'ring-type',
                            'k.1':'spore-print-color',
                            's.3':'population',
                            'u':'habitat'})

In [None]:
mdata.info()

In [None]:
for x in mdata:
    print(mdata[x].value_counts())

In [None]:
mdata['stalk-root'].value_counts()

In [None]:
mdata_new=mdata.drop(mdata[mdata['stalk-root']=='?'].index,axis=0)


In [None]:
mdata_new['stalk-root'].value_counts()

In [None]:
plt.title('target balance')
mdata['target'].value_counts().plot.pie(autopct='%1.1f%%')
plt.figure()
plt.title('target balance after removing empty rows')
mdata_new['target'].value_counts().plot.pie(autopct='%1.1f%%')


In [None]:
from ipywidgets import interact
import seaborn as sns

In [None]:
mdata_new['target']=mdata_new['target'].map({'p':0,'e':1})


In [None]:
def count_box(col):
    sns.countplot(x=mdata_new[col],y=mdata_new['target'],palette='bright')

cols=mdata_new.columns[:]
interact(countplot_box,col=cols)

In [None]:
def barplot_box(col):
    sns.barplot(x=mdata_new[col],y=mdata_new['target'],palette='bright')

cols=mdata_new.columns[:]
interact(barplot_box,col=cols)

In [None]:
def crosstab(col):
    return(pd.crosstab(index=mdata_new[col],columns=mdata_new['target']))
    
interact(crosstab,col=cols)

In [None]:
mdata_new.head()

In [None]:
mdata_final=mdata[['cap-surface','cap-color','bruises','gill-color','stalk-shape','habitat','target']]

In [None]:
mdata_final.to_csv('cleaned_musshroom_data',index=False)