Repository for the Laboratories of Data Mining course in KTH - ID2222
Repository of all the deliverables developed for the course in KTH: ID2222 in Data mining, part of the Double Degree Master of Science in Computer Science and Engineering, Data Science & Distibuted Systems track @EIT Digital.
The course deliverables are 5 labs implementing some Data Mining techniques to deal with large datasets. They are developed using Python or pyspark.
Labs
Lab1 - LSH: LSH Document Similarity implementation using 10 scientific papers sample corpusLab1 - MinHashing & AssociationRules: AssociationRules algorithm implementation including Apriori algorithmLab3 - HyperBall: HyperBall technique for approximated node centrality calculation implementation. From paper HyperBallLab4 - GraphSpectra: Graph Spectral Clustering technique implementationLab5 - JABEJA: JABEJA (swap) algorithm for minimization of ratio cut in partitioning large graphs into similar dimensionality clusters
