# Unsupervised Machine Learning Tutorials

Run the cell below if you need to install the required libraries. In Google Colab they come pre-installed.


In [None]:
!pip install scikit-learn matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_digits
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

## k-means clustering

k-means is an iterative clustering algorithm that partitions data into a specified number of clusters, $k$. The algorithm begins by initializing $k$ cluster centers (centroids), often chosen randomly. Each data point is then assigned to the nearest centroid based on Euclidean distance, forming clusters.

After assignment, the centroids are updated by computing the mean of all points assigned to each cluster. This process of assignment and update repeats, with the objective of minimizing the within-cluster sum of squares:

$$
J = \sum_{i=1}^n \|x_i - \mu_{c_i}\|^2
$$

where $x_i$ is a data point, $\mu_{c_i}$ is the centroid of the cluster assigned to $x_i$, and $n$ is the total number of data points.

Iterations continue until cluster assignments no longer change (convergence) or a maximum number of iterations is reached. The result is a partitioning of the data into clusters with minimized intra-cluster variance.

In [None]:
X, y = load_iris(return_X_y=True)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X)
print('Cluster counts:', np.bincount(clusters))
print('Cluster centers:', kmeans.cluster_centers_)


In [None]:
plt.scatter(X[:,0], X[:,1], c=clusters, cmap='viridis', s=30)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], c='red', marker='x', s=100, linewidths=2, label='Centers')
plt.title('k-means Clustering')
plt.legend()
plt.tight_layout()
plt.show()


## 5. Principal component analysis

PCA projects data onto directions of maximal variance via eigen decomposition of the covariance matrix.


In [None]:
X, y = load_digits(return_X_y=True)
pca = PCA(n_components=2)
reduced = pca.fit_transform(X)
print('Explained variance ratio:', pca.explained_variance_ratio_)
print('Transformed shape:', reduced.shape)


In [None]:
plt.scatter(reduced[:,0], reduced[:,1], c=y, cmap='tab10', s=15)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA of Digits')
plt.tight_layout()
plt.show()


This concludes the brief tour of basic machine learning examples using scikit-learn. Feel free to modify the code cells and explore further!
Try experimenting with other datasets or algorithms for practice.