# Dataset
## The CIFAR-10 dataset
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

You can check and download the dataset from [here](https://www.cs.toronto.edu/~kriz/cifar.html)

## Loading Dataset Into Memory

Wrote my own ImageDataLoader Class that is extendible to support multiple datasets in the future as well.
It loads the dataset using the directory path, and returns four results: **train_X, train_y , test_X , test_y** 

Although clustering does not make use of the labels since this is an unsupervised learning algorithm.
This is intended to support a general case ImageDataLoader not specifically built for the image clustering problem. 

In [1]:
from data_loader import ImageDataLoader
data_loader = ImageDataLoader()
train_X, train_y , test_X , test_y = data_loader.load_cifar10("./cifar-10-batches-py", num_batches = 5)

100%|██████████| 5/5 [00:00<00:00, 29.29it/s]


## Clustering Using KMeans

The purpose of K-means is to **identify groups**, or clusters of data points in a multidimensional space. The number K in K-means is the number of clusters to create. Initial cluster means are usually chosen at random.

K-means is usually implemented as an **iterative procedure** in which each iteration involves two successive steps. The first step is to assign each of the data points to a cluster. The second step is to modify the cluster means so that they become the mean of all the points assigned to that cluster.

The **quality** of the current assignment is given by the **distortion measure** which is the sum of squared distances between each cluster centroid and points inside the cluster.

In [2]:
from kmeans import KMeans 
model = KMeans(num_clusters= 3)
model.fit(train_X)

  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/300 [00:00<?, ?it/s][A
  0%|          | 1/300 [00:02<11:34,  2.32s/it][A

Restart 1  Iteration 1
The Error of this iteration is  6485.91979279723
The Distoration Measure score of this assignment is  743133155.034374



  1%|          | 2/300 [00:04<11:27,  2.31s/it][A

Restart 1  Iteration 2
The Error of this iteration is  1353.932653636578
The Distoration Measure score of this assignment is  543489751.1226282



  1%|          | 3/300 [00:06<11:21,  2.30s/it][A

Restart 1  Iteration 3
The Error of this iteration is  285.61883889524336
The Distoration Measure score of this assignment is  565402975.1392524



  1%|▏         | 4/300 [00:09<11:22,  2.31s/it][A

Restart 1  Iteration 4
The Error of this iteration is  163.64691788249024
The Distoration Measure score of this assignment is  566871352.0781212



  2%|▏         | 5/300 [00:11<11:19,  2.30s/it][A

Restart 1  Iteration 5
The Error of this iteration is  117.26498321330875
The Distoration Measure score of this assignment is  568599179.9457592



  2%|▏         | 6/300 [00:13<11:16,  2.30s/it][A

Restart 1  Iteration 6
The Error of this iteration is  85.88485357137755
The Distoration Measure score of this assignment is  570213191.0001668



  2%|▏         | 7/300 [00:16<11:12,  2.30s/it][A

Restart 1  Iteration 7
The Error of this iteration is  62.383090066489856
The Distoration Measure score of this assignment is  571473624.9884087



  3%|▎         | 8/300 [00:18<11:08,  2.29s/it][A

Restart 1  Iteration 8
The Error of this iteration is  52.678181262304015
The Distoration Measure score of this assignment is  572365586.5438294



  3%|▎         | 9/300 [00:20<11:05,  2.29s/it][A

Restart 1  Iteration 9
The Error of this iteration is  42.76658094284559
The Distoration Measure score of this assignment is  573084344.6970731



  3%|▎         | 10/300 [00:22<11:03,  2.29s/it][A

Restart 1  Iteration 10
The Error of this iteration is  34.19711620801298
The Distoration Measure score of this assignment is  573638882.2082702



  4%|▎         | 11/300 [00:25<11:00,  2.29s/it][A

Restart 1  Iteration 11
The Error of this iteration is  25.98975988726518
The Distoration Measure score of this assignment is  574027541.7174398



  4%|▍         | 12/300 [00:27<10:57,  2.28s/it][A

Restart 1  Iteration 12
The Error of this iteration is  20.567516389967366
The Distoration Measure score of this assignment is  574273534.2542315



  4%|▍         | 13/300 [00:29<10:54,  2.28s/it][A

Restart 1  Iteration 13
The Error of this iteration is  18.900809627070824
The Distoration Measure score of this assignment is  574426573.9394536



  5%|▍         | 14/300 [00:32<10:52,  2.28s/it][A

Restart 1  Iteration 14
The Error of this iteration is  12.749728556024413
The Distoration Measure score of this assignment is  574558948.7636071



  5%|▌         | 15/300 [00:34<10:50,  2.28s/it][A

Restart 1  Iteration 15
The Error of this iteration is  10.17335110211409
The Distoration Measure score of this assignment is  574616780.8448288



  5%|▌         | 16/300 [00:36<10:48,  2.28s/it][A

Restart 1  Iteration 16
The Error of this iteration is  10.114918219089365
The Distoration Measure score of this assignment is  574651874.4983358



  6%|▌         | 17/300 [00:38<10:44,  2.28s/it][A

Restart 1  Iteration 17
The Error of this iteration is  8.726683611076064
The Distoration Measure score of this assignment is  574705040.7781162



  6%|▌         | 18/300 [00:41<10:41,  2.27s/it][A

Restart 1  Iteration 18
The Error of this iteration is  7.413136945147135
The Distoration Measure score of this assignment is  574722620.2267032



  6%|▋         | 19/300 [00:43<10:51,  2.32s/it][A

Restart 1  Iteration 19
The Error of this iteration is  7.136948701246193
The Distoration Measure score of this assignment is  574727408.4719344



  7%|▋         | 20/300 [00:45<10:45,  2.30s/it][A

Restart 1  Iteration 20
The Error of this iteration is  4.727522784886013
The Distoration Measure score of this assignment is  574726570.7751007



  7%|▋         | 21/300 [00:48<10:39,  2.29s/it][A

Restart 1  Iteration 21
The Error of this iteration is  4.948189441056339
The Distoration Measure score of this assignment is  574728144.4726428



  7%|▋         | 22/300 [00:50<10:40,  2.30s/it][A

Restart 1  Iteration 22
The Error of this iteration is  4.984692220803975
The Distoration Measure score of this assignment is  574724261.7815678



  8%|▊         | 23/300 [00:52<10:36,  2.30s/it][A

Restart 1  Iteration 23
The Error of this iteration is  3.829256655142583
The Distoration Measure score of this assignment is  574725823.0664358



  8%|▊         | 24/300 [00:54<10:32,  2.29s/it][A

Restart 1  Iteration 24
The Error of this iteration is  3.2012792633964424
The Distoration Measure score of this assignment is  574725911.0501304



  8%|▊         | 25/300 [00:57<10:28,  2.29s/it][A

Restart 1  Iteration 25
The Error of this iteration is  3.0170074898002106
The Distoration Measure score of this assignment is  574710379.9383032



  9%|▊         | 26/300 [00:59<10:24,  2.28s/it][A

Restart 1  Iteration 26
The Error of this iteration is  2.4750947547721753
The Distoration Measure score of this assignment is  574701554.0370733



  9%|▉         | 27/300 [01:01<10:21,  2.28s/it][A

Restart 1  Iteration 27
The Error of this iteration is  1.7482503306756387
The Distoration Measure score of this assignment is  574709428.7253368



  9%|▉         | 28/300 [01:04<10:20,  2.28s/it][A

Restart 1  Iteration 28
The Error of this iteration is  1.1611624643332477
The Distoration Measure score of this assignment is  574696436.7201873



 10%|▉         | 29/300 [01:06<10:20,  2.29s/it][A

Restart 1  Iteration 29
The Error of this iteration is  1.1924462910474398
The Distoration Measure score of this assignment is  574688473.447318



 10%|█         | 30/300 [01:08<10:16,  2.29s/it][A

Restart 1  Iteration 30
The Error of this iteration is  0.6649907991478456
The Distoration Measure score of this assignment is  574674084.1078042



 10%|█         | 31/300 [01:10<10:15,  2.29s/it][A

Restart 1  Iteration 31
The Error of this iteration is  0.6868178028611114
The Distoration Measure score of this assignment is  574666789.4185202



 11%|█         | 32/300 [01:13<10:12,  2.29s/it][A

Restart 1  Iteration 32
The Error of this iteration is  0.4890574772264736
The Distoration Measure score of this assignment is  574656599.9081708



 11%|█         | 33/300 [01:15<10:09,  2.28s/it][A

Restart 1  Iteration 33
The Error of this iteration is  0.5473622335039584
The Distoration Measure score of this assignment is  574649527.6958103



 11%|█▏        | 34/300 [01:17<10:08,  2.29s/it][A
100%|██████████| 1/1 [01:18<00:00, 78.85s/it]

Restart 1  Iteration 34
The Error of this iteration is  0.0
The Distoration Measure score of this assignment is  574646394.9893672
This Restart scored better than last one. Updating Attributes...





### Evaluting the fitted model

In [4]:
print("The Model needed ",model.iter_num_," iterations to converge.")
print("The Model scored ",model.distoration_measure_," for the distoration measure value.")

The Model needed  34  iterations to converge.
The Model scored  574646394.9893672  for the distoration measure value.


### Visualizing the fitted model