# **OPEN-ARC**
---

### Project 10: Mushroom Classification Model:
**Challenge:** Create an AI model, capable of classifying whether a mushroom is edible or poisonous based on a set of features.


### Terms and Use:
Learn more about the project's [LICENSE](https://github.com/Infinitode/OPEN-ARC/blob/main/LICENSE) and read our [CODE_OF_CONDUCT](https://github.com/Infinitode/OPEN-ARC/blob/main/CODE_OF_CONDUCT) before contributing to the project. You can contribute to this project from here: [https://github.com/Infinitode/OPEN-ARC/](https://github.com/Infinitode/OPEN-ARC/).

---

Please fill out this performance sheet to help others quickly see your model's performance **(optional)**:

### Performance Sheet:
| Contributor | Architecture Type | Platform | Base Model | Dataset | Accuracy | Link |
|-------------|-------------------|----------|------------|---------|----------|------|
| Infinitode  | RandomForestClassifier  | Kaggle   | ✔  | Mushroom Classification | 91.1% (CV)    | [Notebook](https://github.com/Infinitode/OPEN-ARC/blob/main/Project-10-MCM/project-10-mcm.ipynb) |
| Username  | Unknown  | Kaggle   | ✗/✔  | Mushroom Classification | Score    | [Notebook](https://github.com) |

---

**Disclaimer:** This model is for **educational purposes only** and should not be used for real-life mushroom classification or any decision-making processes related to the consumption of mushrooms. While the model performs well on the provided dataset, it has not been thoroughly validated for real-world scenarios and may not accurately detect poisonous mushrooms in all conditions. Always consult an expert or use trusted resources when identifying mushrooms.

### Using `RandomForestClassifier`
For this dataset, we decided to use a simple `RandomForestClassifier` along with automatic class balancing.

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import joblib

# Load the dataset from our inputs
url = '/kaggle/input/mushroom-classification/mushrooms.csv'
df = pd.read_csv(url)

# Display basic dataset information
print(df.info())
print(df.head())

# Define the mappings for each feature
mappings = {
    'class': {'e': 0, 'p': 1},
    'cap-shape': {'b': 0, 'c': 1, 'x': 2, 'f': 3, 'k': 4, 's': 5},
    'cap-surface': {'f': 0, 'g': 1, 'y': 2, 's': 3},
    'cap-color': {'n': 0, 'b': 1, 'c': 2, 'g': 3, 'r': 4, 'p': 5, 'u': 6, 'e': 7, 'w': 8, 'y': 9},
    'bruises': {'t': 1, 'f': 0},
    'odor': {'a': 0, 'l': 1, 'c': 2, 'y': 3, 'f': 4, 'm': 5, 'n': 6, 'p': 7, 's': 8},
    'gill-attachment': {'a': 0, 'd': 1, 'f': 2, 'n': 3},
    'gill-spacing': {'c': 0, 'w': 1, 'd': 2},
    'gill-size': {'b': 0, 'n': 1},
    'gill-color': {'k': 0, 'n': 1, 'b': 2, 'h': 3, 'g': 4, 'r': 5, 'o': 6, 'p': 7, 'u': 8, 'e': 9, 'w': 10, 'y': 11},
    'stalk-shape': {'e': 0, 't': 1},
    'stalk-root': {'b': 0, 'c': 1, 'u': 2, 'e': 3, 'z': 4, 'r': 5, '?': 6},
    'stalk-surface-above-ring': {'f': 0, 'y': 1, 'k': 2, 's': 3},
    'stalk-surface-below-ring': {'f': 0, 'y': 1, 'k': 2, 's': 3},
    'stalk-color-above-ring': {'n': 0, 'b': 1, 'c': 2, 'g': 3, 'o': 4, 'p': 5, 'e': 6, 'w': 7, 'y': 8},
    'stalk-color-below-ring': {'n': 0, 'b': 1, 'c': 2, 'g': 3, 'o': 4, 'p': 5, 'e': 6, 'w': 7, 'y': 8},
    'veil-type': {'p': 0, 'u': 1},
    'veil-color': {'n': 0, 'o': 1, 'w': 2, 'y': 3},
    'ring-number': {'n': 0, 'o': 1, 't': 2},
    'ring-type': {'c': 0, 'e': 1, 'f': 2, 'l': 3, 'n': 4, 'p': 5, 's': 6, 'z': 7},
    'spore-print-color': {'k': 0, 'n': 1, 'b': 2, 'h': 3, 'r': 4, 'o': 5, 'u': 6, 'w': 7, 'y': 8},
    'population': {'a': 0, 'c': 1, 'n': 2, 's': 3, 'v': 4, 'y': 5},
    'habitat': {'g': 0, 'l': 1, 'm': 2, 'p': 3, 'u': 4, 'w': 5, 'd': 6}
}

# Map the values to numerical values
for column, mapping in mappings.items():
    df[column] = df[column].map(mapping)

# Separate the target from features
X = df.drop('class', axis=1)
y = df['class']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the default model with class weight balancing
model = RandomForestClassifier(class_weight='balanced', random_state=42)

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
print(classification_report(y_test, y_pred))

# Save the model
joblib.dump(model, 'mushroom_classifier.pkl')

# Save the mappings
joblib.dump(mappings, 'mappings.pkl')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   class                     8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-root                8124 non-null   object
 12  stalk-surface-above-ring  8124 non-null   object
 13  stalk-surface-below-ring  8124 non-null   object
 14  stalk-color-above-ring  

['mappings.pkl']

Since our classifcation report showed perfect results, with `precision`, `recall`, and `F1` all being 1, with an accuracy of `100%`, we'll use cross-validation testing to evaluate how the model performs on certain parts of the dataset.

In [4]:
from sklearn.model_selection import cross_val_score

# Perform 5-fold cross-validation
cv_scores = cross_val_score(model, X, y, cv=5)

print(f"Cross-validation scores: {cv_scores}")
print(f"Average cross-validation score: {cv_scores.mean()}")

Cross-validation scores: [0.84246154 1.         1.         1.         0.71059113]
Average cross-validation score: 0.910610534293293


A cross-validation score of `0.91` or `91.06%` is quite remarkable, but as shown in the cross-validation testing, the model struggles with certain splits of the dataset. This indicates that these parts of the dataset might contain more difficult problems. It can also show that our model has potentially overfitted on the simpler examples from our dataset.

### Testing the model
We'll now define feature maps, to make it easier to understand the model's input data.

In [5]:
feature_options = {
    'cap-shape': {'b': 'bell', 'c': 'conical', 'x': 'convex', 'f': 'flat', 'k': 'knobbed', 's': 'sunken'},
    'cap-surface': {'f': 'fibrous', 'g': 'grooves', 'y': 'scaly', 's': 'smooth'},
    'cap-color': {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'r': 'green', 'p': 'pink', 'u': 'purple', 'e': 'red', 'w': 'white', 'y': 'yellow'},
    'bruises': {'t': 'bruises', 'f': 'no'},
    'odor': {'a': 'almond', 'l': 'anise', 'c': 'creosote', 'y': 'fishy', 'f': 'foul', 'm': 'musty', 'n': 'none', 'p': 'pungent', 's': 'spicy'},
    'gill-attachment': {'a': 'attached', 'd': 'descending', 'f': 'free', 'n': 'notched'},
    'gill-spacing': {'c': 'close', 'w': 'crowded', 'd': 'distant'},
    'gill-size': {'b': 'broad', 'n': 'narrow'},
    'gill-color': {'k': 'black', 'n': 'brown', 'b': 'buff', 'h': 'chocolate', 'g': 'gray', 'r': 'green', 'o': 'orange', 'p': 'pink', 'u': 'purple', 'e': 'red', 'w': 'white', 'y': 'yellow'},
    'stalk-shape': {'e': 'enlarging', 't': 'tapering'},
    'stalk-root': {'b': 'bulbous', 'c': 'club', 'u': 'cup', 'e': 'equal', 'z': 'rhizomorphs', 'r': 'rooted', '?': 'missing'},
    'stalk-surface-above-ring': {'f': 'fibrous', 'y': 'scaly', 'k': 'silky', 's': 'smooth'},
    'stalk-surface-below-ring': {'f': 'fibrous', 'y': 'scaly', 'k': 'silky', 's': 'smooth'},
    'stalk-color-above-ring': {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'o': 'orange', 'p': 'pink', 'e': 'red', 'w': 'white', 'y': 'yellow'},
    'stalk-color-below-ring': {'n': 'brown', 'b': 'buff', 'c': 'cinnamon', 'g': 'gray', 'o': 'orange', 'p': 'pink', 'e': 'red', 'w': 'white', 'y': 'yellow'},
    'veil-type': {'p': 'partial', 'u': 'universal'},
    'veil-color': {'n': 'brown', 'o': 'orange', 'w': 'white', 'y': 'yellow'},
    'ring-number': {'n': 'none', 'o': 'one', 't': 'two'},
    'ring-type': {'c': 'cobwebby', 'e': 'evanescent', 'f': 'flaring', 'l': 'large', 'n': 'none', 'p': 'pendant', 's': 'sheathing', 'z': 'zone'},
    'spore-print-color': {'k': 'black', 'n': 'brown', 'b': 'buff', 'h': 'chocolate', 'r': 'green', 'o': 'orange', 'u': 'purple', 'w': 'white', 'y': 'yellow'},
    'population': {'a': 'abundant', 'c': 'clustered', 'n': 'numerous', 's': 'scattered', 'v': 'several', 'y': 'solitary'},
    'habitat': {'g': 'grasses', 'l': 'leaves', 'm': 'meadows', 'p': 'paths', 'u': 'urban', 'w': 'waste', 'd': 'woods'}
}

def get_user_input():
    """
    Collects user input for each mushroom feature.

    Returns:
    dict: A dictionary containing the user's input for each feature.
    """
    user_input = {}
    print("Please provide the following mushroom characteristics:")
    for feature, options in feature_options.items():
        print(f"\n{feature.replace('-', ' ').capitalize()}:")
        for key, value in options.items():
            print(f"  {key}: {value}")
        while True:
            choice = input(f"Enter the corresponding letter for {feature}: ").strip().lower()
            if choice in options:
                user_input[feature] = choice
                break
            else:
                print("Invalid input. Please enter one of the listed letters.")
    return user_input

user_input = get_user_input()

def predict_mushroom(features):
    """
    Predict whether a mushroom is edible or poisonous based on its features.
    
    Parameters:
    features (dict): A dictionary of mushroom features with feature names as keys and corresponding categorical values.
    
    Returns:
    str: 'Edible' or 'Poisonous'
    """
    # Load the trained model and mappings
    model = joblib.load('mushroom_classifier.pkl')
    mappings = joblib.load('mappings.pkl')
    
    # Initialize a dictionary to hold the numerical features
    numerical_features = {}
    
    # Map each feature to its numerical value
    for feature, value in features.items():
        if feature in mappings:
            if value in mappings[feature]:
                numerical_features[feature] = mappings[feature][value]
            else:
                raise ValueError(f"Invalid value '{value}' for feature '{feature}'.")
        else:
            raise ValueError(f"Feature '{feature}' is not recognized.")
    
    # Convert the numerical features into a DataFrame
    input_df = pd.DataFrame([numerical_features])
    
    # Predict using the trained model
    prediction = model.predict(input_df)
    
    # Interpret the prediction
    if prediction[0] == 0:
        return 'Edible'
    else:
        return 'Poisonous'

# Predict edibility
try:
    result = predict_mushroom(user_input)
    print(f"\nThe mushroom is likely: {result}")
except ValueError as e:
    print(f"Error: {e}")

Please provide the following mushroom characteristics:

Cap shape:
  b: bell
  c: conical
  x: convex
  f: flat
  k: knobbed
  s: sunken


Enter the corresponding letter for cap-shape:  b



Cap surface:
  f: fibrous
  g: grooves
  y: scaly
  s: smooth


Enter the corresponding letter for cap-surface:  f



Cap color:
  n: brown
  b: buff
  c: cinnamon
  g: gray
  r: green
  p: pink
  u: purple
  e: red
  w: white
  y: yellow


Enter the corresponding letter for cap-color:  n



Bruises:
  t: bruises
  f: no


Enter the corresponding letter for bruises:  t



Odor:
  a: almond
  l: anise
  c: creosote
  y: fishy
  f: foul
  m: musty
  n: none
  p: pungent
  s: spicy


Enter the corresponding letter for odor:  a



Gill attachment:
  a: attached
  d: descending
  f: free
  n: notched


Enter the corresponding letter for gill-attachment:  a



Gill spacing:
  c: close
  w: crowded
  d: distant


Enter the corresponding letter for gill-spacing:  c



Gill size:
  b: broad
  n: narrow


Enter the corresponding letter for gill-size:  b



Gill color:
  k: black
  n: brown
  b: buff
  h: chocolate
  g: gray
  r: green
  o: orange
  p: pink
  u: purple
  e: red
  w: white
  y: yellow


Enter the corresponding letter for gill-color:  k



Stalk shape:
  e: enlarging
  t: tapering


Enter the corresponding letter for stalk-shape:  e



Stalk root:
  b: bulbous
  c: club
  u: cup
  e: equal
  z: rhizomorphs
  r: rooted
  ?: missing


Enter the corresponding letter for stalk-root:  b



Stalk surface above ring:
  f: fibrous
  y: scaly
  k: silky
  s: smooth


Enter the corresponding letter for stalk-surface-above-ring:  f



Stalk surface below ring:
  f: fibrous
  y: scaly
  k: silky
  s: smooth


Enter the corresponding letter for stalk-surface-below-ring:  f



Stalk color above ring:
  n: brown
  b: buff
  c: cinnamon
  g: gray
  o: orange
  p: pink
  e: red
  w: white
  y: yellow


Enter the corresponding letter for stalk-color-above-ring:  n



Stalk color below ring:
  n: brown
  b: buff
  c: cinnamon
  g: gray
  o: orange
  p: pink
  e: red
  w: white
  y: yellow


Enter the corresponding letter for stalk-color-below-ring:  n



Veil type:
  p: partial
  u: universal


Enter the corresponding letter for veil-type:  p



Veil color:
  n: brown
  o: orange
  w: white
  y: yellow


Enter the corresponding letter for veil-color:  n



Ring number:
  n: none
  o: one
  t: two


Enter the corresponding letter for ring-number:  n



Ring type:
  c: cobwebby
  e: evanescent
  f: flaring
  l: large
  n: none
  p: pendant
  s: sheathing
  z: zone


Enter the corresponding letter for ring-type:  c



Spore print color:
  k: black
  n: brown
  b: buff
  h: chocolate
  r: green
  o: orange
  u: purple
  w: white
  y: yellow


Enter the corresponding letter for spore-print-color:  k



Population:
  a: abundant
  c: clustered
  n: numerous
  s: scattered
  v: several
  y: solitary


Enter the corresponding letter for population:  a



Habitat:
  g: grasses
  l: leaves
  m: meadows
  p: paths
  u: urban
  w: waste
  d: woods


Enter the corresponding letter for habitat:  g



The mushroom is likely: Edible


You can also just run the code cell below to use predefined inputs and run predictions on them. The second set of input features, are from an **Amanita Phalloides** mushroom, also known as a **Death Cap** mushroom, a very poisonous and toxic mushroom.

In [7]:
edible_mushroom = {
    'cap-shape': 'x',  # convex
    'cap-surface': 's',  # smooth
    'cap-color': 'n',  # brown
    'bruises': 't',  # bruises
    'odor': 'a',  # almond
    'gill-attachment': 'a',  # attached
    'gill-spacing': 'c',  # close
    'gill-size': 'b',  # broad
    'gill-color': 'k',  # black
    'stalk-shape': 'e',  # enlarging
    'stalk-root': 'b',  # bulbous
    'stalk-surface-above-ring': 'f',  # fibrous
    'stalk-surface-below-ring': 'f',  # fibrous
    'stalk-color-above-ring': 'n',  # brown
    'stalk-color-below-ring': 'n',  # brown
    'veil-type': 'p',  # partial
    'veil-color': 'n',  # brown
    'ring-number': 'o',  # one
    'ring-type': 'c',  # cobwebby
    'spore-print-color': 'k',  # black
    'population': 'a',  # abundant
    'habitat': 'g'  # grasses
}

amanita_phalloides_features = {
    'cap-shape': 'b',  # bell
    'cap-surface': 's',  # smooth
    'cap-color': 'w',  # white
    'bruises': 'f',  # no
    'odor': 'n',  # none
    'gill-attachment': 'f',  # free
    'gill-spacing': 'w',  # crowded
    'gill-size': 'n',  # narrow
    'gill-color': 'w',  # white
    'stalk-shape': 't',  # tapering
    'stalk-root': 'e',  # equal
    'stalk-surface-above-ring': 's',  # smooth
    'stalk-surface-below-ring': 's',  # smooth
    'stalk-color-above-ring': 'w',  # white
    'stalk-color-below-ring': 'w',  # white
    'veil-type': 'p',  # partial
    'veil-color': 'w',  # white
    'ring-number': 'o',  # one
    'ring-type': 'p',  # pendant
    'spore-print-color': 'w',  # white
    'population': 's',  # scattered
    'habitat': 'd'  # woods
}


print(f"This mushroom is {predict_mushroom(edible_mushroom)}")
print(f"This mushroom is {predict_mushroom(amanita_phalloides_features)}")

This mushroom is Edible
This mushroom is Poisonous


### The End:

This is the end of this project notebook, make sure to experiment and contribute to help improve the model and implementation. You can browse more of the open-source free projects on our GitHub repository: https://github.com/Infinitode/OPEN-ARC. If you like this project, make sure to star the repo and contribute your implementation, or help others in the community.

~ Infinitode