Machine Learning project for hotel booking cancellation prediction using explainable AI and behavioral analysis.
The project aims to predict hotel booking cancellations and extend prediction into:
- behavioral interpretation
- explainable AI
- risk segmentation
- business decision support
01_Data_Processing.ipynb
│
├── Data cleaning
├── Missing value handling
├── Feature engineering
├── Leakage handling
└── Export cleaned datasets
02_Literature.ipynb
│
├── Literature summary
├── Related work
└── Project framing
03_EDA_and_Hypothesis.ipynb
│
├── Hypothesis-driven EDA
├── Visualization
├── Interaction analysis
└── Business interpretation
04_Model_Analysis.ipynb
│
├── Train/test split
├── Logistic Regression
├── Decision Tree
├── Random Forest
├── XGBoost
├── Hyperparameter tuning
└── Model evaluation
05_SHAP_Analysis.ipynb
│
├── SHAP feature importance
├── SHAP summary plots
├── Dependence plots
├── Local explanations
├── Risk segmentation
└── Behavioral personas
06_Business_Decision_Analysis.ipynb
│
├── Threshold optimization
├── Cost-sensitive evaluation
├── Calibration analysis
├── Business intervention framework
└── Final decision support analysis
Dataset:
Hotel Booking Demand Dataset
Source:
https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand
Target:
is_canceled
- 1 = canceled
- 0 = not canceled
Install required packages:
pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblibRun notebooks in the following order:
01_Data_Processing
↓
02_Literature
↓
03_EDA_and_Hypothesis
↓
04_Model_Analysis
↓
05_SHAP_Analysis
↓
06_Business_Decision_Analysis
Generated outputs include:
- cleaned datasets
- model-ready datasets
- EDA figures
- model evaluation results
- SHAP visualizations
- risk segmentation results
- threshold analysis
- business strategy recommendations
Final selected model:
Tuned XGBoost
Performance:
| Metric | Value |
|---|---|
| Accuracy | 0.87 |
| Precision | 0.80 |
| Recall | 0.87 |
| F1-score | 0.83 |
| ROC-AUC | 0.949 |
This repository focuses on building a complete explainable machine learning workflow rather than only maximizing prediction accuracy.