## Clustering with K-Means
Untangle complex spatial relationships with cluster labels.

### Using Cluster Labels as Features
Applied a single real-valued features, clustering can be done as binning / mapping.

The motivating idea for adding cluster labels is that the clusters will break up complicated relationships across features into simpler chunks. Our model can then just learn the simpler chunks one-by-one instead having to learn the complicated whole all at once. It's a "divide and conquer" strategy.

![image.png](attachment:image.png)

### K-Means Clustering
K-Means work by measuring the distance similarity from a point, which called `centroids`. The `K` means how many centroid it makes a cluster.

![image.png](attachment:image.png)

### Examples - California Housing
As spatial features, California Housing's 'Latitude' and 'Longitude' make natural candidates for k-means clustering. In this example we'll cluster these with 'MedInc' (median income) to create economic segments in different regions of California.

```python
## Create K-Means features
kmeans = KMeans(n_cluster=6)
X["Cluster"] = kmeans.fit_predict(X)
X["Cluster"] = X["Cluster"].astype("category")

X.head()
```

Hence then we can plot this
```python
sns.relplot(
    x = "Longitude", y="Latitude", hue="Cluster", data=X, height=6
);
```
![image.png](attachment:image.png)

These box-plots show the distribution of the target within each cluster. If the clustering is informative, these distributions should, for the most part, separate across MedHouseVal, which is indeed what we see.

```python
X["MedHouseVal"] = df["MedHouseVal"]
sns.catplot(x="MedHouseval", y="Cluster", data=X, kind="boxen", height=6)

![image.png](attachment:image.png)