## K- Means Clustering

![main.jpg](attachment:main.jpg)

## Lecture House Keeping

- The use of disrespectful language is prohibited in the questions, this is a supportive, learning environment for all - please engage accordingly.
    - Please review Code of Conduct (in Student Undertaking Agreement) if unsure
- No question is daft or silly - ask them!
- There are Q&A sessions midway and at the end of the session, should you wish to ask any follow-up questions.
- Should you have any questions after the lecture, please schedule a mentor session.
- For all non-academic questions, please submit a query: [www.hyperiondev.com/support](www.hyperiondev.com/support)

## Learning Objectives

1. Understanding Clustering
2. Introduce the K-Means algorithm as a way to partition data points into clusters.
3. Algorithm Walkthrough 
4. Choosing K

## Understanding Clustering

![dhdh.png](attachment:dhdh.png)

The main difference between classification and clustering in machine learning is that classification is a supervised learning task, while clustering is an unsupervised learning task.

**Supervised learning** tasks involve training a model on a labeled dataset, where each data point has a known output value. The model then learns to predict the output value for new data points based on the patterns it has learned from the training data.

**Unsupervised learning** tasks, on the other hand, involve training a model on an unlabeled dataset, where the data points do not have any known output values. The model must then learn to identify patterns in the data on its own.

## How the model works

K-means clustering is an unsupervised machine learning algorithm that groups similar data points together. It is a centroid-based algorithm, which means that each cluster is represented by a centroid, which is the average of all the data points in the cluster.

K-means clustering works by following these steps:

- Choose the number of clusters, K. This is a hyperparameter that must be chosen by the user.
- Randomly initialize the centroids.
- Assign each data point to the cluster with the closest centroid.

- Update the centroids to be the average of all the data points in each cluster.
- Repeat steps 3 and 4 until the centroids no longer change.

### Euclidean Distance

K-means clustering uses Euclidean distance to measure the distance between data points and centroids. Euclidean distance is the most common distance metric used in machine learning, and it is calculated as follows:

In [None]:
Euclidean distance(x, y) = sqrt((x1 - y1)^2 + (x2 - y2)^2 + ... + (xn - yn)^2)

### Implementation

In [1]:
# Import the necessary libraries
from sklearn.cluster import KMeans
import numpy as np

# Generate some sample data for clustering
data = np.array([[1, 2], [5, 8], [1.5, 1.8], [8, 8], [1, 0.6], [9, 11]])

# Specify the number of clusters (K)
k = 2

# Create a KMeans instance with the desired number of clusters
kmeans = KMeans(n_clusters=k)

# Fit the KMeans model to the data
kmeans.fit(data)

# Get the cluster labels for each data point
labels = kmeans.labels_

# Get the coordinates of the cluster centers
centers = kmeans.cluster_centers_

# Print the cluster labels and centers
print("Cluster Labels:", labels)
print("Cluster Centers:", centers)


Cluster Labels: [1 0 1 0 1 0]
Cluster Centers: [[7.33333333 9.        ]
 [1.16666667 1.46666667]]
