#Predictive Analysis. Which mushrooms feature - odor or cap color - better predicts whether a mushroom is poisonous.

#Using scikit-learn classifiers to determine whether odor or cap color is a better predictor of mushroom edibility (edible vs poisonous).

#Step1: Load and prepare the dataset

In [1]:
>>> import pandas as pd

#Step2: Load dataset from the UCI repository

In [2]:
>>> url = "https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"
columns = [
    "class", "cap-shape", "cap-surface", "cap-color", "bruises", "odor",
    "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape",
    "stalk-root", "stalk-surface-above-ring", "stalk-surface-below-ring",
    "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color",
    "ring-number", "ring-type", "spore-print-color", "population", "habitat"
]

df = pd.read_csv(url, header=None, names=columns, sep=",")
df = df[["class", "odor", "cap-color"]].copy()
df.columns = ["edibility", "odor", "cap_color"]

#Step3: Map edibility: 'e' = 0 (edible), 'p' = 1 (poisonous)

In [3]:
>>> df["edibility"] = df["edibility"].map({"e": 0, "p": 1})

#Step4: One-hot encoder predictors, one-hot encode odor and cap_color separately

In [6]:
>>> odor_encoded = pd.get_dummies(df["odor"], prefix="odor")
cap_color_encoded = pd.get_dummies(df["cap_color"], prefix="cap_color")
X_odor = odor_encoded
X_cap_color = cap_color_encoded
y = df["edibility"]

#Step5: Train classifiers and compare accuracy

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Xo_train, Xo_test, yo_train, yo_test = train_test_split(X_odor, y, test_size=0.3, random_state=42)
Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_cap_color, y, test_size=0.3, random_state=42)

clf_odor = RandomForestClassifier(random_state=42)
clf_cap = RandomForestClassifier(random_state=42)

clf_odor.fit(Xo_train, yo_train)
clf_cap.fit(Xc_train, yc_train)

y_pred_odor = clf_odor.predict(Xo_test)
y_pred_cap = clf_cap.predict(Xc_test)

acc_odor = accuracy_score(yo_test, y_pred_odor)
acc_cap = accuracy_score(yc_test, y_pred_cap)

print(f"Accuracy using odor: {acc_odor:.4f}")
print(f"Accuracy using cap color: {acc_cap:.4f}")

Accuracy using odor: 0.9840
Accuracy using cap color: 0.5870


#Conclusion: Odor was a mushroom feature that achieved significantly higher accuracy than cap_color in predicting whether a mushroom is poisonous.This suggests that odor is a much stronger signal for edibility classification. Cap color alone is not a reliable predictor, likely due to overlapping distributions between edible and poisonous mushrooms.

##Further analysis should combine odor with other features (e.g., gill color, bruises) for improved accuracy.Perhaps use feature importance scores from RandomForest to guide feature selection. In a addition, make use of logistic regression or decision trees to enhance interpretability.
