GitHub - deric/kmeans: A CUDA implementation of the k-means clustering algorithm

deric / kmeans Public

forked from serban/kmeans

Notifications You must be signed in to change notification settings
Fork 0
Star 2

A CUDA implementation of the k-means clustering algorithm

serban.org/software/kmeans

MIT license

2 stars 121 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Image_data		Image_data
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README		README
benchmark.sh		benchmark.sh
cuda_io.cu		cuda_io.cu
cuda_kmeans.cu		cuda_kmeans.cu
cuda_main.cu		cuda_main.cu
cuda_wtime.cu		cuda_wtime.cu
file_io.c		file_io.c
go		go
kmeans.h		kmeans.h
mpi_io.c		mpi_io.c
mpi_kmeans.c		mpi_kmeans.c
mpi_main.c		mpi_main.c
omp_kmeans.c		omp_kmeans.c
omp_main.c		omp_main.c
sample.output		sample.output
seq_kmeans.c		seq_kmeans.c
seq_main.c		seq_main.c
wtime.c		wtime.c

Repository files navigation

This software is dervied from Professor Wei-keng Liao's parallel k-means
clustering code obtained on November 21, 2010 from
 http://users.eecs.northwestern.edu/~wkliao/Kmeans/index.html
(http://users.eecs.northwestern.edu/~wkliao/Kmeans/simple_kmeans.tar.gz).

With his permission, I am publishing my CUDA implementation based on his code
under the open-source MIT license. See the LICENSE file for more details.

Please don't hesitate to contact me with any questions you may have.

For starters, run the benchmark.sh script to see how fast this code runs.
On an 8-core 2.4 GHz Intel Xeon E5620 machine with an NVIDIA Tesla C1060 card,
the CUDA implementation runs almost 50 times faster than the sequential version
on the color17695.bin data set (for k = 128)!

The original README, with some additions, is reproduced below.

Cheers!

Serban Giuroiu
http://serban.org

# ------------------------------------------------------------------------------

Parallel K-Means Data Clustering

The software package of parallel K-means data clustering contains the 
followings:

  * A parallel implementation using OpenMP and C
  * A parallel implementation using MPI and C
  * A parallel implementation using CUDA and C
  * A sequential version in C

To compile:
Although I used Intel C compiler, icc, version 7.1 during the code 
development, there is no particular features required except for OpenMP. 
Thus, the implementation should be fairly portable. Please modify 
Makefile to change the compiler if needed.

You will need the NVIDIA CUDA toolkit, which contains nvcc, to build the CUDA
version. It works fine in concert with gcc.

To run:
  * The Makefile will produce executables
     o "omp_main" for OpenMP version
     o "mpi_main" for MPI version
     o "cuda_main" for CUDA version
     o "seq_main" for sequential version

  * The list of available command-line arguments can be obtained by
    running -h option
     o For example, running command "omp_main -h" will produce:
       Usage: main [switches] -i filename -n num_clusters
             -i filename    : file containing data to be clustered
             -b             : input file is in binary format (default no)
             -n num_clusters: number of clusters (K must > 1)
             -t threshold   : threshold value (default 0.0010)
             -p nproc       : number of threads (default system allocated)
             -a             : perform atomic OpenMP pragma (default no)
             -o             : output timing results (default no)
             -d             : enable debug mode

Input file format:
The executables read an input file that stores the data points to be 
clustered. A few example files are provided in the sub-directory 
./Image_data. The input files can be in two formats: ASCII text and raw 
binary.

  * ASCII text format:
    o Each line contains the coordinates of a single data point
    o The number of coordinates must be equal for all data points
  * Raw binary format:
    o There is a header of 2 integers.
    o The first 4-byte integer must be the number of data points.
    o The second integer must be the number of coordinates.
    o The rest of the file contains the coordinates of all data 
      points and each coordinate is of type 4-byte float.

Output files: There are two output files:
  * Coordinates of cluster centers
    o The file name is the input file name appended with ".cluster_centres".
    o It is in ASCII text format.
    o Each line contains an integer indicating the cluster id and the
      coordinates of the cluster center.
  * Membership of all data points to the clusters
    o The file name is the input file name appended with ".membership".
    o It is in ASCII text format.
    o Each line contains two integers: data point index (from 0 to 
      the number of points) and the cluster id indicating the membership of
      the point.

Limitations:
    * Data type -- This implementation uses C float data type for all
      coordinates and other real numbers.
    * Large number of data points -- The number of data points cannot
      exceed 2G due to the 4-byte integers used in the programs. (But do
      let me know if it is desired.)


Wei-keng Liao (wkliao@ece.northwestern.edu)
EECS Department
Northwestern University

Sep. 17, 2005