# Python code for K-Means Clustering 
---

## Author : Amir Atapour-Abarghouei, amir.atapour-abarghouei@newcastle.ac.uk

This notebook will provide a simple example for the k-means algorithm applied to 2-dimensional data.

Copyright (c) 2021 School of Computing, Newcastle University, UK.

License : LGPL - http://www.gnu.org/licenses/lgpl.html

This will be a very simple demonstration of the k-means clustering algorithm.

First, we will import all the necessary packages.

In [None]:
# sklearn for the k-means algorithm:
import sklearn
from sklearn.cluster import KMeans
print(f"sklearn version: {sklearn.__version__}")

# dataset generator from sklearn:
from sklearn.datasets.samples_generator import make_blobs

# matplotlib and seaborn for plotting:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# no more warnings:
import warnings
warnings.filterwarnings("ignore")

Now, we set a few parameters:

In [None]:
# parameters for generating data points:
num_data_points = 300
num_clusters = 3
cluster_std = 2
seed = 1234

print('Done!')

We can now generate our data for clustering:

In [None]:
# generate two dimensional data points in clusters:
# x: two dimensions of the points
# y: ground truth cluster label
x, gt_y = make_blobs(n_samples=num_data_points, centers=num_clusters, cluster_std=cluster_std, random_state=seed)

# plot the generated data points:
plt.scatter(x[:, 0], x[:, 1], s=25);

In [None]:
# k-means: 
kmeans = KMeans(n_clusters=num_clusters)
kmeans.fit(x)

# clusters predicted via k-means:
pred_y = kmeans.predict(x)

print('Done!')

Our data is now clustered using the K-Means algorithm. Now, let's plot them.

In [None]:
# plot the points with predicted cluster labels:
plt.scatter(x[:, 0], x[:, 1], c=pred_y, s=25, cmap='viridis')

# plot the cluster centroids:
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='blue', marker="X", s=200, alpha=1);