Parallel K-Means clustering algorithm

K-Means algorithm

Initialize the centers using kmeans++ algorithm
Assign each point to the cluster with the nearest
Update the clusters centers
Repeats step 2 and 3 until convergence

Data set

The data set was generated from a gaussian mixture distribution.

Clusters obtained by runing the sequential Kmeans clustering algorithm.

Dependencies

C++ compiler with c++-17 support (tested on clang)
Make
[Optional] Julia (DataFrames, CSV and Plots packages)

Building

To build run make

Results

The following results are obtained for:

number of clusters : 3
number of threads: 32
number of repetitions: 30
maximum number of iterations: 1000
threshold : 0.1

To reproduce the results run ./benchmark 3 32 30 1000 0.1

Benchmark

Speedup

Conclusion

Used OPenMP for parallelization
Used speedup to evaluate the performances of the parallelized kmeans algorithm
Maximum speedup is ~7x

References

T Kanungo, DM Mount, NS Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.
Arthur, David and Sergei Vassilvitski. k-means++: The Advantages of Careful Seeding, 2007.
Barbara Chapman, Gabrielle Jost, Ruud Van Der Pas. Using OpenMP, 2007.
Joe Pitt-Francis, Jonathan Whiteley. Guide to Scientific Computing in C++, Second Edition, 2017.
OpenMP 4.5 Reference Guide – C/C++

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
images		images
inc		inc
plots		plots
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel K-Means clustering algorithm

K-Means algorithm

Data set

Dependencies

Building

Results

Benchmark

Speedup

Conclusion

References

About

Languages

License

axmat/ParallelKMeans

Folders and files

Latest commit

History

Repository files navigation

Parallel K-Means clustering algorithm

K-Means algorithm

Data set

Dependencies

Building

Results

Benchmark

Speedup

Conclusion

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages