GitHub - chanioxaris/polygonal-curves: Collection of clustering algorithms for polygonal curves.

Overview

This is an implementation of clustering algorithms for polygonal curves. User can choose between the below methods:

Initialization

K-means++
Random selection of k-points - (Simplest)

Assignment

Lloyd’s assignment - (Simplest)
Assignment by Range search (LSH)

Update

Mean Discrete Frechet curve
Partitioning Around Medoids (PAM) – (Improved)

Input data files

Input file

The format of input text file, described by the following structure:

@dimension {2,3,4} 
curve_id1 m1 (x11,y11) (x12,y12) (x13,y13) ... (x1m1,y1m1)
curve_id2 m2 (x21,y21) (x22,y22) (x23,y23) ... (x2m2,y2m2)
curve_id3 m3 (x31,y31) (x32,y32) (x33,y33) ... (x3m3,y3m3)
...
curve_idN mN (xN1,yN1) (xN2,yN2) (xN3,yN3) ... (xNmN,yNmN)

where mi the total points included in i curve, (xij,yij) the coordinates of point j in i curve, when dimesion is equal to 2

Configuration file

The format of configuration file, described by the following structure:

number_of_clusters: <int> 
number_of_grid_curves: <int> 
number_of_hash_tables: <int>

Output data files

Output file

The format of output text file, described by the following structure:

Algorithm: Ι<x>A<y>U<z>
Metric: Frechet or DTW
CLUSTER-1 {size: <int>, centroid: <curve_id>}
...
CLUSTER-k {size: <int>, centroid: <curve_id>}
clustering_time: <double> 
Silhouette: [s1,...,sk,stotal]

where x the chosen Initialization method (1 or 2), y the chosen Assignment method (1 or 2), z the chosen Update method (1 or 2)

Output file (-complete)

If the user choose -complete flag then more information become available about the clusters:

CLUSTER-1 {curve_idA, curve_idB, ..., curve_idC}
CLUSTER-2 {curve_idA, curve_idB, ..., curve_idC}
...
CLUSTER-k {curve_idR, curve_idT, ..., curve_idZ}

Evaluation (Silhouette)

Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object lies within its cluster. The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

Compile

./makefile

Usage

./clustering –i [input file] –c [configuration file] -ο [output file] –d {Frechet, DTW} -complete (optional)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
input		input
src		src
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Initialization

Assignment

Update

Input data files

Input file

Configuration file

Output data files

Output file

Output file (-complete)

Evaluation (Silhouette)

Compile

Usage

About

Releases

Packages

Languages

License

chanioxaris/polygonal-curves

Folders and files

Latest commit

History

Repository files navigation

Overview

Initialization

Assignment

Update

Input data files

Input file

Configuration file

Output data files

Output file

Output file (-complete)

Evaluation (Silhouette)

Compile

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages