The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.
- Enter Paracel's home directory
- Generate training dataset(datagen.py will also create a result data named
python ./tool/datagen.py -m kmeans -o data.csv -n 1000 -k 20 --ncenters 50
- Set up link library path:
- Create a json file named
cfg.json, see example in Parameters section below.
- Run (4 workers, local mode in the following example)
./prun.py -w 4 -p 1 -c cfg.json -m local your_paracel_install_path/bin/kmeans
Default parameters are set in a JSON format file. For example, we create a cfg.json as below(modify
"input" : "data.csv",
"output" : "./kmeans_result/",
"type" : "fvec",
"kclusters" : 50,
"update_file" : "your_paracel_install_path/lib/libkmeans_update.so",
"update_functions" : ["local_update_kmeans_clusters", "local_update_kmeans_groups"],
"rounds" : 100
In the configuration file,
kclusters refers to the number of clusters you want to separate.
update_functions is the information of the registry function needed in kmeans algorithm.
rounds refers to the number of training iterations.
centers_0: center coordinates of each group
kmeans_0: the belonging points of each group
You can make use of the label data generated by
datagen.py to compare with the training effect.