Skip to content

JustinAliData/Bayesian-Optimization

Repository files navigation

Flight Departure Delay Prediction with Bayesian Optimization

Predicts whether a flight will depart late using a LightGBM classifier, tuned with Bayesian optimization to converge on strong hyperparameters in far fewer iterations than exhaustive grid search.

Stack: Python · scikit-learn · LightGBM · scikit-optimize · NumPy · Pandas Status: Completed Author: Justin Ali · LinkedIn


The problem

Flight delays cascade across airline networks: one late departure becomes a missed connection, a misplaced crew, and a knock-on cost that compounds through the day. A reliable predictor of departure delays lets operations teams pre-position resources and notify passengers earlier. This project frames the task as a binary classification problem — will this flight depart late? — and uses Bayesian optimization to find a strong LightGBM model without burning compute on exhaustive search.

Approach

  1. Baseline — A simple model to establish a floor on performance and give us a reference AUC to beat.
  2. Modeling — LightGBM, chosen for its strong tabular performance and built-in handling of categorical features.
  3. Tuning — Bayesian optimization, which models the AUC surface probabilistically and balances exploration with exploitation. This focuses compute on promising regions of the hyperparameter space rather than sweeping a grid blindly.
  4. Validation — Stratified k-fold cross-validation with ROC-AUC as the primary metric, robust to the class imbalance typical of delay data.

Results

The tuned LightGBM achieved measurable AUC improvement over the baseline, and Bayesian optimization converged on competitive hyperparameters in far fewer iterations than a comparable grid search — a meaningful reduction in compute time without sacrificing model quality.

What I would do next

  • Add airport-level and route-level features (mean historical delay, weather lag). Domain features typically dominate generic ML tweaks.
  • Compare against XGBoost and CatBoost; ensemble the top performers.
  • Recalibrate probabilities (Platt scaling) so the model output is usable as a confidence score, not just a class label.
  • Wrap the model in a small inference service for real-time scoring.

Repo contents

.
├── notebooks/
│   ├── 01_eda.ipynb
│   ├── 02_baseline.ipynb
│   └── 03_lgbm_bayesian.ipynb
├── requirements.txt
└── README.md

How to run

git clone https://github.com/JustinAliData/Bayesian-Optimization.git
cd Bayesian-Optimization
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
jupyter lab notebooks/

Acknowledgments

Built as part of the Springboard Data Science Career Track.

About

Flight departure delay prediction with a LightGBM classifier tuned via Bayesian optimization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors