<a href="https://colab.research.google.com/github/denver-edwards/mushroom-classifier/blob/main/Mushroom_Classifier_Colab_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mushroom Classification

**Date:** 3/10/24  
**Author:** Denver Edwards

## Introduction and Objective
This project focuses on building a machine learning model to classify mushrooms based on their features.

## Notes
Too large for Colabs

Dataset from: https://www.kaggle.com/datasets/maysee/mushrooms-classification-common-genuss-images

In [None]:
from google.colab import files
files.upload()

!rm -r ~/.kaggle
!mkdir ~/.kaggle
!mv ./kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d maysee/mushrooms-classification-common-genuss-images

import zipfile
zip_ref = zipfile.ZipFile('mushrooms-classification-common-genuss-images.zip', 'r')
zip_ref.extractall('/content')
zip_ref.close()

Saving kaggle.json to kaggle.json
mushrooms-classification-common-genuss-images.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint

## Data Collection

In [None]:
data_dir = '/content/Mushrooms/'

# Define image dimensions and batch size
img_width, img_height = 778, 600
batch_size = 8

## Data Cleaning and Preprocessing
Check for duplicates
Remove images of unexpected size
Manual removal of blurry or poor quality image

Resize to be same size
Convert images to same format
Rename files to have same name in folder

In [None]:
datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Load images from the dataset directory
image_generator = datagen.flow_from_directory(
    data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False  # Important: Set shuffle to False
)

# Create a DataFrame to store filenames and labels
data = pd.DataFrame({'filename': image_generator.filenames, 'class': image_generator.classes})

# Convert labels to strings
data['class'] = data['class'].astype(str)

# Split the data into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Define the data generator for training set
train_generator = datagen.flow_from_dataframe(
    dataframe=train_data,
    directory=data_dir,
    x_col='filename',
    y_col='class',
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical'
)

# Define the data generator for testing set
test_generator = datagen.flow_from_dataframe(
    dataframe=test_data,
    directory=data_dir,
    x_col='filename',
    y_col='class',
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical'
)


Found 6714 images belonging to 9 classes.
Found 5371 validated image filenames belonging to 9 classes.
Found 1343 validated image filenames belonging to 9 classes.


## Model Building

In [None]:
# Build the CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(img_width, img_height, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(9, activation='softmax')  # 9 mushroom types
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
# Define model checkpoint to save the best model
checkpoint = ModelCheckpoint('/content/Mushrooms/mushroom_classifier.h5',
                             monitor='val_accuracy',
                             save_best_only=True,
                             mode='max',
                             verbose=1)

In [None]:
# Train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=20,
    validation_data=test_generator,
    validation_steps=test_generator.samples // batch_size,
    callbacks=[checkpoint]
)

Epoch 1/20

## Model Evaluation

In [None]:
# Plot training history
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# Evaluate the model
predictions = model.predict(test_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())

# Generate classification report and confusion matrix
print(classification_report(true_classes, predicted_classes, target_names=class_labels))
conf_matrix = confusion_matrix(true_classes, predicted_classes)
print('Confusion Matrix:')
print(conf_matrix)

## Conclusion and Next Steps
[Summary of findings, conclusions, and recommendations for future work]


## References
https://www.kaggle.com/datasets/maysee/mushrooms-classification-common-genuss-images