GitHub - SimranPPatil/ClusteringResearch

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Results_Blobs		Results_Blobs
Results_Circles		Results_Circles
Results_Moons		Results_Moons
DBSCAN.py		DBSCAN.py
DBSCAN_without_choice.py		DBSCAN_without_choice.py
KMeans.py		KMeans.py
Library.py		Library.py
README		README
Sklearn_dbscn.py		Sklearn_dbscn.py
cache_statistics.py		cache_statistics.py
make_blobs_data.txt		make_blobs_data.txt
make_blobs_labels.txt		make_blobs_labels.txt
make_circles_data.txt		make_circles_data.txt
make_circles_labels.txt		make_circles_labels.txt
make_moons_data.txt		make_moons_data.txt
make_moons_labels.txt		make_moons_labels.txt

Repository files navigation

Clustering On The Fly - A Pursuit for the Optimal Algorithm
Our pipeline helps to select the most optimal clustering at runtime based on the cost model we have developed (Includes computation cost and Memory cost). 

Data-sets to run:
1. make_blobs_data.txt
2. make_moons_data.txt
3. make_circles_data.txt

Corresponding labels:
1. make_blobs_labels.txt
2. make_moons_labels.txt
3. make_circles_labels.txt

Commands to run:
python3 <Our Pipeline> <Fraction of dataset> <data-set> <labels>
python3 Library.py 0.14 make_circles_data.txt make_circles_labels.txt

Installations Required:
1. Valgrind
2. Sklearn
3. CACTI Tool

Files Prosent:
KMeans.py runs only KMeans Clustering Algorithm
DBSCAN.py runs only DBSCAN Clustering Algorithm
Library.py runs our pipeline
Results folder has the results of our runs