Skip to content

Victor-Alejandre/BandwidthSelection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

221 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bandwidth Selectors on Semiparametric Bayesian Networks

This repository contains the implementation and experiments related to the bandwidth selectors introduced in our publication, Bandwidth Selectors on Semiparametric Bayesian Networks [1]. These selectors enhance the learning of both structures and parameters in semiparametric Bayesian networks (SPBNs) [2] for density estimation.

Bayesian networks [3], and specifically SPBNs, are probabilistic graphical models that efficiently factorize the joint probability distribution of a set of features by leveraging conditional independence between them. This results in reduced computational effort for joint density estimation. Bayesian network consists of a graph that encodes these conditional independence relationships, along with a set of conditional distributions associated with each node (feature), conditioned on its parent nodes in the graph.

The implemented bandwidth selectors include Unbiased Cross-Validation (UCV) [4], Smooth Cross-Validation (SCV) [5], and Plug-In (PI) [6]. These methods determine the bandwidth parameter for Kernel Density Estimators (KDEs), which are essential for density estimation. All three are available for parameter learning in SPBNs and can be applied at each iteration of the Hill-Climbing algorithm during structure learning. Experimental results highlight the advantages of SCV and PI in low-sample-size settings, while UCV proves to be the most effective in general and particularly beneficial in high-sample-size scenarios. Finally, the robustness of the normal rule in structure learning and the exceptional performance of UCV in parameter learning suggest combining these methods for SPBN learning while reducing computational costs.

Key Features

  • Integration of bandwdith selectors into Bayesian network learning frameworks for both parameters and structure.
  • Underlying implementation in C++ for performance optimization following the original PyBNesian package.
  • GPU acceleration to handle large-scale datasets with OpenCL.

Repository Structure

/bandwidth_selection
│── /experiments         # Scripts for running experiments on synthetic and real data
│── /mod_PyBNesian          # Extension of the PyBNesian package with bandwidth selectors UCV, SCV and PI for structure and parameter learning of SPBNs
│── vignette.md            # Guide on using the extended package
│── LICENSE              # License information
│── README.md            # Main repository documentation
│── installation.md           # Installation guide
│── requirements.txt     # List of dependencies
│── .gitignore           # Ignore unnecessary files

Bandwidth Selectors Extension

The original PyBNesian package [7] introduces a Python library with core methods implemented in C++, leveraging GPU computation via OpenCL. It provides a complete environment for SPBNs, a type of Bayesian Network, while also supporting traditional BN models such as discrete and Gaussian networks. SPBNs integrate both parametric and non-parametric approaches for density estimation. Since BNs involve both parameter and structure learning, PyBNesian offers support for both. In the non-parametric approach, the bandwidth parameter for conditional kernel density estimators (CKDEs) is selected using the normal rule. For structure learning, both the PC and Hill-Climbing (HC) algorithms are available. A GPU is required for use of this library.

In this work, we extend PyBNesian while preserving its original functionality. The bandwidth selectors UCV, SCV, and PI have been implemented in C++ with GPU acceleration, following the package’s existing structure. These bandwidth selectors serve as parameter learning methods in the non-parametric approach, as they determine the bandwidth of CKDEs used in non-parametric nodes. They can be applied within a fixed structure and are also integrated into the HC algorithm for structure learning, where parameter estimation is performed at each iteration.

Experiments

The experiments folder contains the scripts and datasets used in our study. We evaluated our methods on both synthetic and real-world data to assess their effectiveness. This folder includes all .py files necessary to reproduce our experiments, with random seeds incorporated to ensure reproducibility. The real-world datasets are sourced from the UCI Machine Learning Repository, while the synthetic data is generated using functions defined within the code.

Findings

  • Enhanced Density Estimation: The bandwidth selectors improve the learning of SPBNs in the parameter learning paradigm by enabling more accurate density estimations through CKDEs for the nodes, resulting in better joint density estimates, as demonstrated in the synthetic data experiments. This experiments also show that the current normal rule presents as a robust alternative to the proposed bandwidth selectors in the learning structure paradigm.
  • Improvement in Real-World Scenarios: The introduced bandwidth selectors (UCV, SCV, and PI) prove highly effective in real-world scenarios, leading to better log-likelihood values and significant improvements in density estimation. UCV stands out, particularly in high sample size scenarios, where it provides superior performance.
  • Overall Impact: The integration of bandwidth selectors in non-parametric models like SPBNs ensures enhanced performance in both parameter and structure learning, with improved density estimations contributing to more accurate and reliable models.

Installation

For installation instructions, see installation.md.

Usage

A detailed guide on how to use the bandwidth selectors is available in the vignette.md.

License

This project is licensed under the LICENSE file in the repository.

References

  1. Alejandre, V., Bielza, C., & Larrañaga, P. (in submission). Bandwidth selectors on semiparametric Bayesian networks. Information Sciences.
  2. Atienza, D., Bielza, C., Larrañaga, P., 2022b. Semiparametric Bayesian networks. Information Sciences 584, 564–582.
  3. D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009.
  4. Duong, T., Hazelton, M.L., 2005. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics 32, 485–506.
  5. Chacón, J.E., Duong, T., 2011. Unconstrained pilot selectors for smoothed cross-validation. Australian & New Zealand Journal of Statistics 53, 331–351.
  6. Chacón, J.E., Duong, T., 2010. Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test 19, 375–398.
  7. Atienza, D., Bielza, C., Larrañaga, P., 2022a. PyBNesian: An extensible Python package for Bayesian networks. Neurocomputing 504, 204–209.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors