A machine learning framework for predicting physical parameters and morphological classification of eclipsing binary stars from photometric light curves.
- Python 3.7+
- 8GB+ RAM recommended
- Optional: CUDA-compatible GPU for faster training and predictions (CuPy)
pip install -r requirements.txtFor GPU acceleration (optional):
pip install cupy-cuda11x # Replace 11x with your CUDA versionExecute the combined script to run data preparation, feature extraction, and training sequentially. models/held_out_data.pkl is included in this repository. When present, 3_train_models.py loads it directly to ensure the exact same 845/150 train/test split used in the paper. If deleted, a fresh stratified split will be created automatically.
python 123_extract_and_train.pyThis script automatically runs:
1_prepare_training_data.py- Preprocesses light curves2_extract_training_features.py- Extracts 51 features per light curve3_train_models.py- Trains RF and XGBoost models with 5-fold CV
If you need more control or want to modify individual steps:
# Step 1: Prepare training data (PCHIP interpolation to 1000 points)
python 1_prepare_training_data.py
# Step 2: Extract features (51 features per light curve)
python 2_extract_training_features.py
# Step 3: Train models (RF and XGBoost, 5-fold CV on 845 systems, 150 held out)
python 3_train_models.pyAfter training, predict parameters for OGLE, Kepler, or custom light curves. Prediction scripts use GPU acceleration (CuPy) if available, falling back to CPU otherwise. Refer to the download.txt files in ogle_data and kepler_data to access the necessary datasets.
# Predict OGLE catalog
python 4a_ogle_prediction.py
# Predict Kepler catalog
python 4b_kepler_prediction.py
# Predict custom light curves
python 4c_custom_prediction.pyFor custom predictions, place your CSV files in the custom_data/ folder. Each file must have two columns: phase and flux (with header row).
After predictions, you can assess prediction reliability by computing Mahalanobis distance. This measures how far each system's features are from the training distribution; systems with high distance are out-of-distribution and predictions may be less reliable.
# Step 1: Extract features for distance computation
python 5a_extract_ogle_features.py
python 5b_extract_kepler_features.py
# Step 2: Compute distances and merge with predictions
python 6_compute_mahalanobis.pyAfter training, you can evaluate model performance on the 150-system held-out test set. This script loads the trained RF and XGBoost models, makes predictions on the held-out data, and reports R² scores and classification metrics.
python 5_held_out_evaluation.pyprocessed_data/training_data.pkl- Preprocessed light curvesprocessed_data/training_features.pkl- Extracted featuresmodels/models_rf/- Random Forest models (5 folds x 6 tasks)models/models_xgb/- XGBoost models (5 folds x 6 tasks)models/models_rf/rf_cv_summary.csv- Cross-validation results (RF)models/models_xgb/xgb_cv_summary.csv- Cross-validation results (XGB)models/held_out_data.pkl- Held-out test set (150 systems, pre-defined for reproducibility)models/held_out_evaluation_results.csv- Held-out R² and classification metrics
predictions/ogle_predictions/ogle_predictions.csvpredictions/kepler_predictions/kepler_predictions.csvpredictions/custom_predictions/custom_predictions.csv
ogle_features.pkl- OGLE feature vectorskepler_features.pkl- Kepler feature vectorsogle_predictions_with_distance.csv- OGLE predictions with Mahalanobis distancekepler_predictions_with_distance.csv- Kepler predictions with Mahalanobis distancemahalanobis_summary.txt- Distance summary statistics
MIT License - See licence.txt for details.
For questions or issues, please open a GitHub issue or contact [burak.ulas@comu.edu.tr].