# End-to-End Machine Learning Project
This notebook demonstrates an end-to-end machine learning workflow, including data preprocessing, model training, and GUI development. The workflow is modular and follows clean code practices.

## Load and Inspect the Dataset
Load the dataset using pandas, inspect its structure, and display summary statistics. This step ensures data quality and provides insights into the dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
file_path = '/Users/alandgailan/Desktop/mushroom_classification_model/mushrooms_decoded.csv'
data = pd.read_csv(file_path)

# Display the first few rows
print(data.head())

# Summary statistics
print(data.describe())

# Check for missing values
missing_values = data.isnull().sum()
print("Missing values:", missing_values)

## Handle Missing or Incorrect Data
Identify and handle missing or incorrect data using appropriate techniques such as imputation or removal. This step ensures the dataset is clean and ready for analysis.

In [None]:
# Handle missing values
data = data.dropna()

# Verify no missing values remain
missing_values_after = data.isnull().sum()
print("Missing values after handling:", missing_values_after)

## Normalize and Transform Values
Normalize numerical values and encode categorical variables for machine learning compatibility. This step ensures the data is in a format suitable for model training.

In [None]:
from sklearn.preprocessing import LabelEncoder, StandardScaler

# Encode categorical variables
label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    data[column] = le.fit_transform(data[column])
    label_encoders[column] = le

# Normalize numerical values
scaler = StandardScaler()
numerical_columns = data.select_dtypes(include=['int64', 'float64']).columns
data[numerical_columns] = scaler.fit_transform(data[numerical_columns])

## Feature Selection
Select the most relevant features for training using techniques like correlation analysis or feature importance. This step reduces dimensionality and improves model performance.

In [None]:
# Correlation analysis
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()

# Select features based on correlation
target = 'target'
features = correlation_matrix[target][correlation_matrix[target] > 0.1].index.tolist()
print('Selected features:', features)

## Train a Machine Learning Model
Train a machine learning model using scikit-learn, evaluate its performance, and optimize hyperparameters. This step builds the predictive model.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Split the data
X = data[features]
y = data[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Model accuracy:', accuracy)

## Save the Trained Model
Save the trained model using pickle or joblib for later use in the GUI application. This step ensures the model can be reused without retraining.

In [None]:
import pickle

# Save the model
model_path = '/Users/alandgailan/Desktop/mushroom_classification_model/mushroom_model.pkl'
with open(model_path, 'wb') as file:
    pickle.dump(model, file)
print(f'Model saved to {model_path}')

## Build a GUI for Predictions
Create a GUI using Tkinter, Streamlit, or Gradio to load the trained model, accept user input, and display predictions. This step provides an interactive interface for users.

In [None]:
import streamlit as st
import pickle
import numpy as np

# Load the model
model_path = 'path_to_your_model.pkl'  # Update this with your model path
with open(model_path, 'rb') as file:
    loaded_model = pickle.load(file)

# Define the feature names based on your model
features = ['feature1', 'feature2', 'feature3']  # Update this with your feature names

# Streamlit GUI
st.title('Mushroom Classification Prediction')
st.write('Enter the features of the mushroom to predict its edibility.')

# User input
user_input = {}
for feature in features:
    user_input[feature] = st.number_input(f'Enter {feature}:')

# Predict
if st.button('Predict'):
    input_data = np.array([list(user_input.values())]).reshape(1, -1)
    prediction = loaded_model.predict(input_data)[0]
    st.write('Prediction:', 'Edible' if prediction == 1 else 'Poisonous')