<a href="https://colab.research.google.com/github/Maruf346/AI-ML-with-python/blob/main/Answer_Script_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Question 02**

You are working with a mushroom dataset that contains physical features of mushrooms such as cap shape, odor, gill color, and habitat. Each mushroom is labeled as either edible (e) or poisonous (p). The dataset can be accessed here: https://tinyurl.com/387y7hye

**(a) Preprocess the dataset:**
1. Encode categorical features into numeric form.
2. Drop any irrelevant or zero-variance columns.

In [1]:
# (a) Load and preprocess
import pandas as pd

# Load dataset
df = pd.read_csv("sample_data/mushrooms.csv")

print("First 5 rows:")
display(df.head())

print("\nShape:", df.shape)
print("\nColumns:", df.columns.tolist())

# Target variable: 'class' (edible = e, poisonous = p)

# 1. Encode categorical features
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
for col in df.columns:
    df[col] = le.fit_transform(df[col])

# 2. Drop irrelevant/zero-variance columns
nunique = df.nunique()
#If a column has only 1 unique value, it is considered zero-variance.
zero_var_cols = [col for col in df.columns if nunique[col] == 1]
print("\nZero-variance columns:", zero_var_cols)

df.drop(columns=zero_var_cols, inplace=True)


First 5 rows:


Unnamed: 0,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,...,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat,class
0,x,s,n,t,p,f,c,n,k,e,...,w,w,p,w,o,p,k,s,u,p
1,x,s,y,t,a,f,c,b,k,e,...,w,w,p,w,o,p,n,n,g,e
2,b,s,w,t,l,f,c,b,n,e,...,w,w,p,w,o,p,n,n,m,e
3,x,y,w,t,p,f,c,n,n,e,...,w,w,p,w,o,p,k,s,u,p
4,x,s,g,f,n,f,w,b,k,t,...,w,w,p,w,o,e,n,a,g,e



Shape: (8124, 23)

Columns: ['cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor', 'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color', 'stalk-shape', 'stalk-root', 'stalk-surface-above-ring', 'stalk-surface-below-ring', 'stalk-color-above-ring', 'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number', 'ring-type', 'spore-print-color', 'population', 'habitat', 'class']

Zero-variance columns: ['veil-type']


**(b) Build a Naive Bayes classifier to predict whether a mushroom is edible or poisonous.**

In [2]:
# (b) Naive Bayes classifier
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X = df.drop("class", axis=1)
y = df["class"]

# splitting... (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train model
nb = GaussianNB()
nb.fit(X_train, y_train)


**(c) Split the data into training and testing sets, then evaluate the model using:**
1. Accuracy score.
2. Confusion matrix.

In [3]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Predictions
y_pred = nb.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("\nAccuracy:", accuracy)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:\n", cm)

# Precision, Recall, F1
print("\nClassification Report:\n", classification_report(y_test, y_pred))



Accuracy: 0.9286153846153846

Confusion Matrix:
 [[778  64]
 [ 52 731]]

Classification Report:
               precision    recall  f1-score   support

           0       0.94      0.92      0.93       842
           1       0.92      0.93      0.93       783

    accuracy                           0.93      1625
   macro avg       0.93      0.93      0.93      1625
weighted avg       0.93      0.93      0.93      1625



**(d) Identify which features (e.g., odor or gill size) seem most important for prediction and briefly explain why.**

In [4]:
# (d) Feature importance via Mutual Information
from sklearn.feature_selection import mutual_info_classif

mi = mutual_info_classif(X, y, discrete_features=True)
feature_importance = pd.Series(mi, index=X.columns).sort_values(ascending=False)

print("\nFeature Importance (Mutual Information):")
print(feature_importance)



Feature Importance (Mutual Information):
odor                        0.628043
spore-print-color           0.333199
gill-color                  0.289027
ring-type                   0.220436
stalk-surface-above-ring    0.197357
stalk-surface-below-ring    0.188463
stalk-color-above-ring      0.175952
stalk-color-below-ring      0.167337
gill-size                   0.159531
population                  0.139987
bruises                     0.133347
habitat                     0.108709
stalk-root                  0.093448
gill-spacing                0.069927
cap-shape                   0.033823
ring-number                 0.026653
cap-color                   0.024987
cap-surface                 0.019817
veil-color                  0.016509
gill-attachment             0.009818
stalk-shape                 0.005210
dtype: float64
