# üçΩÔ∏è Dish Success Prediction using Machine Learning

### Predicting whether a new restaurant dish will succeed, fail, or perform average  
*A full-stack ML project with data generation, model training, and web deployment*
<hr>

# 1. Introduction

## üìå Project Overview

Restaurants constantly experiment with new dishes, but launching a dish that fails
can waste time, money, and kitchen resources.

This project builds a **machine learning system** that predicts whether a dish added
to a restaurant menu will be:

- **Successful**
- **Average**
- **Unsuccessful**

The prediction is based on historical dish data such as:
- Taste profile (spiciness, balance, sweetness)
- Pricing and complexity
- Kitchen operations
- Expected customer response

The final solution is deployed as a **web application** where users can:
- Choose a predefined cuisine dataset
- Upload their own custom dataset
- Enter dish details and get real-time predictions
<hr>

# 2. Problem Statement

Given structured data about a dish before it is launched,
can we predict how well it will perform on a restaurant menu?

This is formulated as a **multi-class classification problem**:

- Input: Dish features (numeric + categorical)
- Output: Performance tier  
  (`successful`, `average`, `unsuccessful`)
<hr>

# 3. Dataset Description

## üìä Dataset Description

The project uses **synthetic but realistic restaurant data** designed to mimic
real-world menu analytics.

Each row represents one dish.

### Target Column
- `performance_tier` (categorical)

### Example Features
- `price`
- `price_category`
- `spiciness_level`
- `flavor_balance_score`
- `prep_time_minutes`
- `menu_visibility_score`
- `avg_customer_rating`
- `repeat_order_rate`

The dataset includes both:
- **objective attributes** (price, time)
- **subjective estimates** (expected rating, repeat rate)
<hr>

# 4. Data Preprocessing

The dataset contains:
- Numeric features
- Categorical features
- Missing values (in custom uploads)

To handle this, we use a preprocessing pipeline:

```
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
```

### Preprocessing Steps

**Numeric Features**
- Median imputation
- Standard scaling

**Categorical Features**
- Most frequent value imputation
- One-hot encoding

Using a pipeline ensures:
- No data leakage
- Same transformations during training and prediction
<hr>

# 5. Model Selection

We use **Logistic Regression** as the final model.

### Why Logistic Regression?
- Works well for tabular data
- Fast to train and predict
- Highly interpretable
- Outputs probabilities
- Suitable for multi-class classification

This makes it ideal for:
- Real-time web applications
- Business-facing ML tools
<hr>

# 6. Model Training Pipeling

The full ML pipeline consists of:

1. Column-wise preprocessing
2. Feature scaling
3. Logistic Regression classifier

```
from sklearn.linear_model import LogisticRegression

model = Pipeline(steps=[
    ("preprocess", preprocessor),
    ("classifier", LogisticRegression(
        max_iter=1000,
        multi_class="auto"
    ))
])
```

The pipeline ensures consistent preprocessing during both
training and inference.
<hr>

# 7. Model Evaluation

## üìà Model Evaluation

The dataset is split into training and test sets.

Evaluation metric:
- **Accuracy**

The model achieves strong performance while remaining interpretable.

Since this is a business-oriented prediction problem,
model explainability is prioritized over extreme accuracy.
<hr>

# 8. Web Application Architecture

## üåê Web Application Architecture

The trained model is deployed using **Flask**.

### Key Pages
- `/` ‚Üí Dataset selection
- `/custom_data` ‚Üí Upload and validate custom datasets
- `/predict` ‚Üí Enter dish details
- `/result` ‚Üí View prediction

### Backend Responsibilities
- Load trained model
- Validate custom datasets
- Transform user input into model-ready format
- Return predictions with confidence
<hr>

# 9. Custom Dataset Support

## üìÇ Custom Dataset Support

Users can upload their own CSV datasets.

The system automatically:
- Validates dataset structure
- Checks target column
- Handles missing values
- Trains a new Logistic Regression model
- Redirects user to prediction page

This makes the system **flexible and reusable** beyond predefined cuisines.
<hr>

# 10. Prediction Output

The model predicts one of three classes:
- Successful
- Average
- Unsuccessful

Additionally, prediction confidence is calculated using
`predict_proba`, improving trust and transparency.
<hr>

# 11. Key Challenges & Solutions

### Challenge 1: Always predicting "successful"
**Solution:**  
- Aligned UI inputs with trained features
- Removed hidden historical-only features
- Added probability-based prediction

### Challenge 2: Mixed feature types
**Solution:**  
- ColumnTransformer with separate pipelines

### Challenge 3: Custom dataset variability
**Solution:**  
- Robust dataset validation module
<hr>

# 12. Conclusion

This project demonstrates a complete end-to-end ML system:

- Data generation
- Feature engineering
- Model training
- Validation
- Deployment
- User interaction

It combines **machine learning theory** with **real-world product design**,
making it suitable for academic evaluation, interviews, and portfolio use.
<hr>

# 13. Future Improvements

## üöÄ Future Enhancements

- Feature importance visualization
- Explainable AI (SHAP values)
- Cuisine-specific tuning
- User feedback loop
- Model retraining based on real outcomes