Implementation of the Trimmed Marginal Likelihood GP for regression of count data as proposed in Robust Variational Gaussian Process Regression for Count Data with the Trimmed Marginal Likelihood.
- Python >= 3.12
- PyTorch >= 2.0.1
- GPyTorch >= 1.13
- Create experiment environment using e.g. conda as follows
conda create -n GPs python=3.12
conda activate GPs
pip3 install gpytorch matplotlib- Create nessecary folders for saving pre-processed data and results
mkdir all_results && mkdir all_results_hyper_params && mkdir all_summary_data && mkdir all_plots && mkdir openDatasets_prepared- (OPTIONAL - Only necessary for experiments with real data)
mkdir openDatasets_rawPlace the real data in openDatasets_raw and run prepare_real_count_data.py
- Create synthetic data and/or add outliers to real data
python prepare_splits_add_outliers_count_data.py- Run Experiments
Run proposed method
python runExperiments.py --likelihood=NB --method=trimmedLB --noise_type=lowest --true_outlier_ratio=0.05 Same as above but using 200 inducing points:
python runExperiments.py --likelihood=NB --method=trimmedLB --noise_type=lowest --true_outlier_ratio=0.05 --reduced_rank=200 --learn_inducing_points=True Run ordinary count GP with Negative Binomial likelihood:
python runExperiments.py --likelihood=NB --method=variationalApprox --noise_type=lowest --true_outlier_ratio=0.05Run OLRE (observation-level random effect) model with Poisson likelihood:
python runExperiments.py --likelihood=Poisson --method=OLRE --noise_type=lowest --true_outlier_ratio=0.05Run
python runExperiments.py --likelihood=NB --method=wGP --noise_type=lowest --true_outlier_ratio=0.05Run GP with Poisson Likelihood trained by optimizing the
python runExperiments.py --likelihood=RobustPoisson --method=variationalApprox --noise_type=lowest --true_outlier_ratio=0.05Run GP with NB Likelihood and post-hoc trimming (Post-Hoc) where
python runExperiments.py --likelihood=NB --method=variationalApproxPostHocTrimming --pre_specified_nu=0.2 --noise_type=lowest --true_outlier_ratio=0.05For more details see argument description in runExperiments.py All results for analysis are saved into folder "all_results/".
- Show summary of all results
Determine outlier ratio (using Algorithm 2 and 3) and create summary data by running:
python create_summary_data.py --noise_type=lowest --true_outlier_ratio=0.05Shows summary of negative log-likelihood (nll) results (requires results of all methods on all datasets):
python show_summary.pyShows scaled continuous ranked probability scores (SCRPS) (requires results of all methods on all datasets):
python show_summary.py --scrps Shows outlier estimates (requires results of all methods on all datasets):
python show_outlier_estimates.pyPlots the difference of a model's predictive cumulative distribution function (CDF) and the empirical CDF (requires results of all methods on all datasets):
python create_plots.pyIf you are using part of the code in your work please cite the following paper:
Andrade, Daniel. "Robust Variational Gaussian Process Regression for Count Data with the Trimmed Marginal Likelihood." Statistics and Computing 36, 139 (2026): https://doi.org/10.1007/s11222-026-10895-9