Skip to content

Wang12932/Machine-learning-project-code

Repository files navigation

Hotel Booking Cancellation Prediction

Machine Learning project for hotel booking cancellation prediction using explainable AI and behavioral analysis.


Project Objective

The project aims to predict hotel booking cancellations and extend prediction into:

  • behavioral interpretation
  • explainable AI
  • risk segmentation
  • business decision support

Project Structure

01_Data_Processing.ipynb
│
├── Data cleaning
├── Missing value handling
├── Feature engineering
├── Leakage handling
└── Export cleaned datasets

02_Literature.ipynb
│
├── Literature summary
├── Related work
└── Project framing

03_EDA_and_Hypothesis.ipynb
│
├── Hypothesis-driven EDA
├── Visualization
├── Interaction analysis
└── Business interpretation

04_Model_Analysis.ipynb
│
├── Train/test split
├── Logistic Regression
├── Decision Tree
├── Random Forest
├── XGBoost
├── Hyperparameter tuning
└── Model evaluation

05_SHAP_Analysis.ipynb
│
├── SHAP feature importance
├── SHAP summary plots
├── Dependence plots
├── Local explanations
├── Risk segmentation
└── Behavioral personas

06_Business_Decision_Analysis.ipynb
│
├── Threshold optimization
├── Cost-sensitive evaluation
├── Calibration analysis
├── Business intervention framework
└── Final decision support analysis

Dataset

Dataset:

Hotel Booking Demand Dataset

Source:

https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand

Target:

is_canceled
  • 1 = canceled
  • 0 = not canceled

Required Libraries

Install required packages:

pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib

Run Order

Run notebooks in the following order:

01_Data_Processing
↓
02_Literature
↓
03_EDA_and_Hypothesis
↓
04_Model_Analysis
↓
05_SHAP_Analysis
↓
06_Business_Decision_Analysis

Main Outputs

Generated outputs include:

  • cleaned datasets
  • model-ready datasets
  • EDA figures
  • model evaluation results
  • SHAP visualizations
  • risk segmentation results
  • threshold analysis
  • business strategy recommendations

Final Model Performance

Final selected model:

Tuned XGBoost

Performance:

Metric Value
Accuracy 0.87
Precision 0.80
Recall 0.87
F1-score 0.83
ROC-AUC 0.949

Notes

This repository focuses on building a complete explainable machine learning workflow rather than only maximizing prediction accuracy.

About

Hotel Booking Cancellation Prediction: An explainable machine learning project using behavioral analysis, SHAP interpretation, risk segmentation, and business decision support.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors