Skip to content

This repo consists of a simple clustering of the famous Wine dataset's using K-means. Principal Component Analysis a.k.a PCA is used as a dimensionality reduction method.

License

Notifications You must be signed in to change notification settings

Shivangi0503/Wine_Clustering_KMeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wine_Clustering_KMeans

This repo consists of a simple clustering of the famous Wine dataset's using K-means. There are total 13 attributes based on which the wines are grouped into different categories, hence Principal Component Analysis a.k.a PCA is used as a dimensionality reduction method and attributes are reduced to 2. This makes the visualization of the data pretty human-perceivable.

Dataset

Wine dataset is taken from Kaggle. The type of wine information was removed so that it can be used for clustering. It contains total of 13 columns, the attributes on the basis of which each wine can be grouped. This information was collected for three different kind of wines, and our K-means algorithm was able to prove that. There are total 178 wine entry (rows 178)

Environment

Ubuntu 20.0.4
Python 3.8.5
Numpy 1.19.4
Pandas 1.1.4
Matplotlib 3.1.2

Hyper-parameter tuning (for the optinum number of clusters) is done on the basis of silhouette scores.

Final output after KMeans clustering

About

This repo consists of a simple clustering of the famous Wine dataset's using K-means. Principal Component Analysis a.k.a PCA is used as a dimensionality reduction method.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages