Wine_Clustering_KMeans

This repo consists of a simple clustering of the famous Wine dataset's using K-means. There are total 13 attributes based on which the wines are grouped into different categories, hence Principal Component Analysis a.k.a PCA is used as a dimensionality reduction method and attributes are reduced to 2. This makes the visualization of the data pretty human-perceivable.

Dataset

Wine dataset is taken from Kaggle. The type of wine information was removed so that it can be used for clustering. It contains total of 13 columns, the attributes on the basis of which each wine can be grouped. This information was collected for three different kind of wines, and our K-means algorithm was able to prove that. There are total 178 wine entry (rows 178)

Environment

Ubuntu 20.0.4
Python 3.8.5
Numpy 1.19.4
Pandas 1.1.4
Matplotlib 3.1.2

Hyper-parameter tuning (for the optinum number of clusters) is done on the basis of silhouette scores.

Final output after KMeans clustering

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
WineClustering.py		WineClustering.py
silhouette_scores.png		silhouette_scores.png
wine_clusters.png		wine_clusters.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

WineClustering.py

WineClustering.py

silhouette_scores.png

silhouette_scores.png

wine_clusters.png

wine_clusters.png

Repository files navigation

Wine_Clustering_KMeans

Dataset

Environment

About

Packages

Languages

License

Shivangi0503/Wine_Clustering_KMeans

Folders and files

Latest commit

History

Repository files navigation

Wine_Clustering_KMeans

Dataset

Environment

About

Resources

License

Stars

Watchers

Forks

Languages