Skip to content

Final project for the course Computational Statistics at the University of Bonn, summerterm 2021

License

Notifications You must be signed in to change notification settings

JonathanWillnow/CompuStatsClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computational Statistics, Summer 2021 by Marina Khismatullina | Jonathan Willnow | Partitioning and Hierarchical Clustering and their Validation

This is the final project for the course Computational Statistics at the University of Bonn (summerterm 2021) b< Jonathan Willnow

Project overview

This notebook explores Partitioning and Hierarchical clustering algorithms and their validation. It contains a simulation study as well as a empirical application for K-Means clustering.

Reproducibility

Continuous Integration

For full reproducibility of this project, a continuous integration workflow was set up using GitHub Actions CI. I also provided an environment.yml file of my environment to ensure full reproducibility of my notebook.

Acknowledgement

I would also like to thank all the awesome people that provide their knowledge by creating and managing the various documentations and coding examples available all over the internet. Special thanks to Dr. Marina Khismatullina and Prof. Dr. Philipp Eisenhauer, who have taught me a lot in the last two semesters and without whom I would not have been able to complete this project.

Sources

  • Bayne et al. (1980): Monte Carlo Comparison of selected clustering procedures, Pattern Recognition, 12:2, 51-62.

  • Documentation of the hdbscan.io library. Online source [https://hdbscan.readthedocs.io/en/latest/index.html], last access 24.08.2021.

  • Documentation of the scikit-learn.org library. Online source [https://scikit-learn.org/stable/modules/clustering.html#clustering], last access 24.08.2021.

  • Halkidi et al. (2001): On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17:2/3, 107-145.

  • intechopen.com (2017): Partitional Clustering, Uğurhan Kutbay, DOI: 10.5772/intechopen.75836. Online source [https://www.intechopen.com/chapters/60501], last access 23.08.2021.

  • Klecker (2019): Building A Classification Model Using Affinity Propagation, Electronic Theses and Dissertations, 1917, Georgia Southern University.

  • MacQueen (1967): Some methods for classification and analysis of multivariate observations, Berkeley Symposium on Mathematical Statistics and Probability, 1967: 281-297 (1967).

  • Moro et al. (2014): A Data-Driven Approach to Predict the Success of Bank Telemarketing, Decision Support Systems, Elsevier, 62:22-31, June 2014.

  • realpython.com (n.d): K-Means Clustering in Python: A Practical Guide, Kevin Arvai. Online source [https://realpython.com/k-means-clustering-python/], last access 23.08.2021.

  • Theodoridis and Koutroumbas (2008): Pattern Recognition, Academic Press, ISBN 9781597492720.

  • towardsdatascience.com (2020): Understanding K-Means, K-Means++ and, K-Medoids Clustering Algorithms, Satyam Kunmar. Online source [https://towardsdatascience.com/understanding-k-means-k-means-and-k-medoids-clustering-algorithms-ad9c9fbf47ca], last access 23.08.2021.

About

Final project for the course Computational Statistics at the University of Bonn, summerterm 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published