Computational Statistics, Summer 2021 by Marina Khismatullina | Jonathan Willnow | Partitioning and Hierarchical Clustering and their Validation

This is the final project for the course Computational Statistics at the University of Bonn (summerterm 2021) b< Jonathan Willnow

Project overview

This notebook explores Partitioning and Hierarchical clustering algorithms and their validation. It contains a simulation study as well as a empirical application for K-Means clustering.

Reproducibility

For full reproducibility of this project, a continuous integration workflow was set up using GitHub Actions CI. I also provided an environment.yml file of my environment to ensure full reproducibility of my notebook.

Acknowledgement

I would also like to thank all the awesome people that provide their knowledge by creating and managing the various documentations and coding examples available all over the internet. Special thanks to Dr. Marina Khismatullina and Prof. Dr. Philipp Eisenhauer, who have taught me a lot in the last two semesters and without whom I would not have been able to complete this project.

Sources

Bayne et al. (1980): Monte Carlo Comparison of selected clustering procedures, Pattern Recognition, 12:2, 51-62.
Documentation of the hdbscan.io library. Online source [https://hdbscan.readthedocs.io/en/latest/index.html], last access 24.08.2021.
Documentation of the scikit-learn.org library. Online source [https://scikit-learn.org/stable/modules/clustering.html#clustering], last access 24.08.2021.
Halkidi et al. (2001): On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17:2/3, 107-145.
intechopen.com (2017): Partitional Clustering, Uğurhan Kutbay, DOI: 10.5772/intechopen.75836. Online source [https://www.intechopen.com/chapters/60501], last access 23.08.2021.
Klecker (2019): Building A Classification Model Using Affinity Propagation, Electronic Theses and Dissertations, 1917, Georgia Southern University.
MacQueen (1967): Some methods for classification and analysis of multivariate observations, Berkeley Symposium on Mathematical Statistics and Probability, 1967: 281-297 (1967).
Moro et al. (2014): A Data-Driven Approach to Predict the Success of Bank Telemarketing, Decision Support Systems, Elsevier, 62:22-31, June 2014.
realpython.com (n.d): K-Means Clustering in Python: A Practical Guide, Kevin Arvai. Online source [https://realpython.com/k-means-clustering-python/], last access 23.08.2021.
Theodoridis and Koutroumbas (2008): Pattern Recognition, Academic Press, ISBN 9781597492720.
towardsdatascience.com (2020): Understanding K-Means, K-Means++ and, K-Medoids Clustering Algorithms, Satyam Kunmar. Online source [https://towardsdatascience.com/understanding-k-means-k-means-and-k-medoids-clustering-algorithms-ad9c9fbf47ca], last access 23.08.2021.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
.ipynb_checkpoints		.ipynb_checkpoints
auxiliary		auxiliary
material		material
FinalProject.ipynb		FinalProject.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.ipynb_checkpoints

.ipynb_checkpoints

auxiliary

auxiliary

material

material

FinalProject.ipynb

FinalProject.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Computational Statistics, Summer 2021 by Marina Khismatullina | Jonathan Willnow | Partitioning and Hierarchical Clustering and their Validation

Project overview

Reproducibility

Acknowledgement

Sources

About

Releases

Packages

Languages

License

JonathanWillnow/CompuStatsClustering

Folders and files

Latest commit

History

Repository files navigation

Computational Statistics, Summer 2021 by Marina Khismatullina | Jonathan Willnow | Partitioning and Hierarchical Clustering and their Validation

Project overview

Reproducibility

Acknowledgement

Sources

About

Resources

License

Stars

Watchers

Forks

Languages