### K means Clustering

K-means clustering is a popular and straightforward method used in data mining and statistical data analysis for partitioning a dataset into distinct, non-overlapping groups or clusters. 
Here's an overview of how K-means clustering works and its key features:

#### How K-means Clustering Works
1. Initialization:
    * Choose the number of clusters, 𝐾.
    * Randomly initialize 𝐾 centroids (the center points of the clusters).

2. Assignment Step:
    * Assign each data point to the nearest centroid based on a distance metric (typically Euclidean distance). This forms 𝐾 clusters.

3.  Update Step:
    * Recalculate the centroids as the mean of all data points assigned to each cluster.

4. Repeat:
    * Repeat the assignment and update steps until the centroids no longer change significantly or a maximum number of iterations is reached.


#### Key Features and Considerations
*   Number of Clusters (K): The number of clusters,𝐾, must be specified beforehand. Determining the appropriate K can be done using methods like the elbow method or silhouette analysis.

* Distance Metric: K-means typically uses Euclidean distance to assign data points to clusters, but other distance metrics can be used depending on the context.

* Scalability: K-means is computationally efficient and can handle large datasets, although it may struggle with very high-dimensional data.

* Cluster Shape: K-means assumes clusters to be spherical and of similar size, which may not be suitable for all datasets.

* Random Initialization: The algorithm's outcome can depend on the initial placement of centroids, so it is often run multiple times with different initializations, and the best result is chosen.

* Speed and Efficiency: K-means is relatively fast and can be optimized further using techniques like the k-means++ initialization method to improve the placement of initial centroids.

<img src="k-means-clustering.png" width="550" heoght="800">

<img src="k-means-clustering-1.webp" width="550" heoght="800">

Distance is calculated by Euclidean or Manhattan Distance:

<img src="manhattan_euclidean-distance.png">

----------------------------
### WCSS (With in cluster sum of square distance)
##### How to find the optimal value for K?
----------------------------

<img src="wcss.png" width="680">

How many clusters we need in our dataset, maybe it’s 3, 4 or 10. We need some metric to evaluate, how a certain number of clusters perform and preferably that metric should be quantifiable.

Fortunately, there is one metric called Within-Cluster-Sum-Square (WCSS).

<img src="wcss-formula.webp">

where ‘Yᵢ’ is centroid for observation ‘Xᵢ’ and ’n’ is the total number of observations.

***So, from the formula, we can interpret that as the number of clusters increases, the distance between the point and its centroid decreases and hence the WCSS decreases.***

#### So, How far it keeps decreasing?

Let say we have as many clusters as we have data points. In this case, our WCSS will equate to 0 because every single point has its cluster and therefore centroid is exactly where the point is, so the distance between the point and its centroid is 0 and hence WCSS is 0.

So, from the above statement, we can interpret that higher the number of clusters lesser is the WCSS value.

To find the optimal number of clusters we use the Elbow method which uses the WCSS metric.

### Let’s understand the Elbow method.

The Elbow method runs K-Means clustering for the dataset for a range of values of ‘K’ (say 1:10) and for each value of ‘K’ calculates the WCSS value for all clusters and then plot the graph for different WCSS value.

<img src="wcss-plot.webp">

And our hint to select optimal is to find the point where the improvement is not great and that point is our elbow point. In the above graph, that point is at 5. So the optimal number of clusters for our example is 5.

As we can see that this method is quite arbitrary. Somebody might pick ‘K’ as 5 or someone else might 4 or 6. This is the judgement call as a data scientist we need to make.