This library contains the implementation coreset generation for k-Means and (Bayesian) Gaussian mixture models. It also offers the extended versions of the corresponding algorithms that support weighted data sets.
To get started, take a look at:
examples/intro.ipynb
(this is a fork of https://github.com/zalanborsos/coresets, intended to fix installation issues + publish to pypi)
- Install poetry.
poetry build
poetry install
In project root run:
poetry run pytest
The implementation of the library is based on the following works:
Bachem, O., Lucic, M., & Krause, A. (2017). Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476.
Bachem, O., Lucic, M., & Krause, A. (2017). Scalable and distributed clustering via lightweight coresets. arXiv preprint arXiv:1702.08248.
Lucic, M., Faulkner, M., Krause, A., & Feldman, D. (2018). Training Gaussian Mixture Models at Scale via Coresets. Journal of Machine Learning Research, 18, Art-No.
Borsos, Z., Bachem, O., & Krause, A. Variational Inference for DPGMM with Coresets. (2017). Advances in Approximate Bayesian Inference
rm -rf build dist
poetry build
rename -v 's/manylinux_2_\d+/manylinux1/' dist/*.whl # Rename the wheel to manylinux1 as we don't use advanced LIBC feats
poetry publish