# 🍄 Mushroom Dataset for Binary Classification

## 📝 About the Dataset
This dataset is a **cleaned version** of the original Mushroom Dataset available at the **UCI Machine Learning Repository**. It has been preprocessed using advanced cleaning techniques to enhance usability and performance for binary classification tasks.

---

### 🛠️ Cleaning Techniques Applied
- 🧹 **Modal Imputation**: Handling missing values by replacing them with the most frequent value.
- 🔄 **One-Hot Encoding**: Converting categorical variables into numerical form.
- 📊 **Z-Score Normalization**: Standardizing numerical columns.
- ✂️ **Feature Selection**: Retaining only the most important attributes.

---

## 📋 Dataset Features
The dataset includes **9 columns**:

1. **Cap Diameter** 🌐  
   - The diameter of the mushroom cap.

2. **Cap Shape** 🍂  
   - The shape of the mushroom cap.

3. **Gill Attachment** 🪶  
   - How the gills are attached to the mushroom cap.

4. **Gill Color** 🌈  
   - The color of the gills.

5. **Stem Height** 📏  
   - The height of the mushroom stem.

6. **Stem Width** 📐  
   - The width of the mushroom stem.

7. **Stem Color** 🎨  
   - The color of the mushroom stem.

8. **Season** 🗓️  
   - The season in which the mushroom is found.

9. **Target Class** 🎯  
   - Indicates if the mushroom is edible or poisonous:
     - **0**: Edible 🍽️  
     - **1**: Poisonous ☠️  

---

## 🌟 Key Highlights
- The dataset is suitable for **binary classification tasks**.  
- Pre-cleaned for easy model training and analysis.  
- Aimed at distinguishing between edible and poisonous mushrooms.

---

✨ **Happy Exploring and Modeling!** 🚀


- **About the author:** Sajjad Ali Shah
- **LinkedIn:** [LinkedIn Profile](https://www.linkedin.com/in/sajjad-ali-shah47/)
- **Dataset link:** [Mushroom Dataset (Binary Classification)](https://www.kaggle.com/datasets/prishasawhney/mushroom-dataset/data)

---

In [6]:
# importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [7]:
df=pd.read_csv("./dataset/mushroom_cleaned.csv")

In [8]:
df.head()

Unnamed: 0,cap-diameter,cap-shape,gill-attachment,gill-color,stem-height,stem-width,stem-color,season,class
0,1372,2,2,10,3.807467,1545,11,1.804273,1
1,1461,2,2,10,3.807467,1557,11,1.804273,1
2,1371,2,2,10,3.612496,1566,11,1.804273,1
3,1261,6,2,10,3.787572,1566,11,1.804273,1
4,1305,6,2,10,3.711971,1464,11,0.943195,1
