Clustering is an unsupervised process. Unlike, in a classification problem where X1, X2, ...Xn are predictor variables and Y is a response variable, in a clustering problem there is no response variable. The objective is to find patterns in the data such that the clusters
or groups are as homogenous as possible. Clustering can also be defined as a pre-processing technique.
In a classification problem, we can verify the validity of the response variable, whereas, in clustering we cannot. Although, there do exist some techniques like Principal Component Analysis (PCA) used in visualization to offer valididty if a given dataset is clusterable or not.
In this repository, I will make an attempt to discuss and analyse clustering.