![alt text](images/HDAT9500Banner.PNG)
<br>

# Assessment Chapter 7

## 1.1. Bisecting K-Means

K-Means often results in clusters of widely different sizes. In this assessment you are asked to implement an extension to k-means called bisecting k-means. The algorithm proceeds as follows:

1. Pick a cluster to split (for this exercise, always pick the largest one)
2. Find 2 sub-clusters using the basic k-means algorithm (bisecting step)
3. Repeat step 2 for ITER times (for this exercise, set ITER=20) and take the split that minimizes the inertia
4. Repeat steps 1, 2 and 3 until the desired number of clusters is reached

## 1.2. Tasks

1. Implement the bisecting k-means algorithm.
2. Apply bisecting k-means to the Breast Cancer Wisconsin (Diagnostic) Data Set using the first 10 numerical features in the dataset as feature vectors (as was done in the practical exercise "Exercise 7 - PCA"). Remember to scale the data to have zero mean and unit variance before clustering. Run the algorithm three different times so that the data are divided into: (a) 2 clusters, (b) 5 clusters, (c) 10 clusters.
3. Compare the number of observations in each cluster between basic k-means and bisecting k-means when the data are divided into 10 clusters.

## 1.2. Aims:

1. Gain a better understanding of k-means and clustering algorithms in general.

## 1.3. Tips:

You are allowed to use any function that was used in the practical exercises.

In [308]:
import numpy as np
import pandas as pd

In [309]:
#create function to cluster standard k means using unscaled

def standard_kmeans(X, desired_cluster_number):
    #import libraries
    import numpy as np
    import pandas as pd
    from sklearn import preprocessing
    from sklearn.cluster import KMeans

    #scale X
    X_scaled = preprocessing.scale(X)

    k = desired_cluster_number
    kmeans = KMeans(n_clusters=desired_cluster_number, init="random", n_init=20,
                     algorithm="full", random_state=1)
    y_pred=kmeans.fit_predict(X_scaled)
    
    return(y_pred)

In [310]:
#create function to cluster bissect k means using unscaled data

def bisect_kmeans(X, desired_cluster_number):
    #import libraries
    import numpy as np
    import pandas as pd
    from sklearn import preprocessing
    from sklearn.cluster import KMeans
    
    #initial k means bissect
    X_scaled = preprocessing.scale(X) #scale X
       
    k = 2
    kmeans = KMeans(n_clusters=2, init="random", n_init=20,
                     algorithm="full", random_state=1)
    y_pred=kmeans.fit_predict(X_scaled)
    
    
    #initial variables
    clusters=2
    total_size = np.shape(X_scaled)[0]
    
    
    #if desired clusters > 2 (initial bissect)
    while clusters < desired_cluster_number: #loop until desired number of clusters
        list_cluster_sizes = [] #reset list every new cluster made
                
        for e in range (0,clusters): #loop through all current clusters
            sum_cluster = int(0) #reset cluster sum every time
            for i in range (1,total_size): #count number in each cluster
                if y_pred[i]== e:
                    sum_cluster += 1
            #append size of each cluster, index = cluster number
            list_cluster_sizes.append(sum_cluster)
            
        largest_cluster=list_cluster_sizes.index(max(list_cluster_sizes)) #find index largest (index = cluster)
        
        print(list_cluster_sizes) #sanity check largest cluster is bissected
        
        #make largest cluster = 0 and remove 1's so you can recluster using the 0's and 1's bissecting
        if largest_cluster == 0:
            #kick 1's out
            y_pred[y_pred==1] = clusters

        elif largest_cluster == 1:
            y_pred[y_pred==0]= clusters #0's are new number
            y_pred[y_pred==1] = 0 #1's become 0's
        else:
            y_pred[y_pred==0]= clusters #0 = new
            y_pred[y_pred==largest_cluster]= 0 #largest becomes 0
            y_pred[y_pred==1]= largest_cluster #1's become the old largest cluster value
        
        #fit model using 0 which is largest cluster
        '''rescales biggest cluster'''
        #X_largest_Cluster = X[y_pred==0] #select only largest cluster from unscaled data
        #X_largest_Cluster_scaled = preprocessing.scale(X_largest_Cluster) #scale
        #y_pred_temp=kmeans.fit_predict(X_largest_Cluster_scaled) #bissect largest cluster into temp y_pred
        '''does not rescale biggest cluster'''
        X_largest_Cluster = X_scaled[y_pred==0]
        y_pred_temp=kmeans.fit_predict(X_largest_Cluster) #no rescaling
        
        #input split cluster(0's and 1's) in main y_pred where they were 0
        index_y_pred_temp = 0 #track index of y_temp used
        for i in range (1,total_size):
            if y_pred[i]==0:
                y_pred[i]=y_pred_temp[index_y_pred_temp]
                index_y_pred_temp +=1
        
        
        clusters += 1 #counter for number of clusters
   
    return(y_pred)  

In [311]:
#import data
bcw = pd.read_csv("C:/Users/akrus/Desktop/ML&DM/chapter07-akruskal/data/breast-cancer-wisconsin-data/data.csv", sep=',')


In [312]:
#check data
bcw.describe(include='all')

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
count,569.0,569,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
unique,,2,,,,,,,,,...,,,,,,,,,,
top,,B,,,,,,,,,...,,,,,,,,,,
freq,,357,,,,,,,,,...,,,,,,,,,,
mean,30371830.0,,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,125020600.0,,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,8670.0,,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,869218.0,,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,906024.0,,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,8813129.0,,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208


In [313]:
#Select parameters - remove ID and diagnosis
print(bcw.columns)
X = bcw[bcw.columns[2:12]]
X.describe()

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst'],
      dtype='object')


Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744


In [314]:
two_cluster_bissect = bisect_kmeans(X, 2) #X = unscaled data
print(two_cluster_bissect)

[0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 1
 1 1 1 0 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1
 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1
 1 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1
 1 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1
 1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0
 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1
 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 0 1 1 1 1 0 

In [315]:
two_cluster_standard = standard_kmeans(X, 2) #X = unscaled data
print(two_cluster_standard)

[0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 1
 1 1 1 0 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1
 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1
 1 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1
 1 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1
 1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0
 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1
 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 0 1 1 1 1 0 

In [316]:
five_cluster_bissect = bisect_kmeans(X, 5)
print(five_cluster_bissect)

[168, 400]
[153, 247, 168]
[158, 89, 168, 153]
[0 1 0 0 1 0 1 0 1 1 2 2 1 2 1 1 2 1 1 4 3 3 0 1 0 1 1 1 0 1 0 3 0 1 0 1 3
 4 2 3 2 3 0 3 2 0 3 3 3 2 4 4 4 0 2 4 0 0 4 3 3 3 1 3 3 1 3 4 3 4 1 3 0 3
 4 2 3 1 1 4 3 3 1 0 3 0 2 0 2 0 2 2 4 4 1 1 3 3 3 3 2 3 4 3 3 0 3 4 1 4 3
 3 1 3 3 2 3 1 1 2 4 1 0 3 4 4 2 1 0 1 3 0 0 4 0 2 4 4 0 3 3 2 3 4 4 3 1 2
 4 4 3 3 1 3 4 4 1 2 4 4 3 0 0 3 0 2 4 2 0 4 3 2 0 3 4 3 3 1 4 4 0 0 2 4 2
 4 2 4 4 4 1 2 4 3 1 4 1 1 0 3 3 0 0 0 3 4 3 2 3 4 1 4 0 0 0 3 3 4 1 0 4 3
 3 0 4 4 3 4 2 0 1 2 2 1 4 2 0 0 2 0 4 4 3 2 0 3 4 3 3 3 0 4 0 0 0 3 0 0 1
 1 1 2 0 2 0 0 3 2 4 3 4 3 0 3 2 3 4 2 4 4 0 4 0 0 4 4 2 4 3 4 3 2 3 4 4 4
 4 4 4 3 1 2 0 3 4 2 4 4 4 4 4 4 4 4 3 4 4 1 3 4 3 0 3 0 4 4 4 4 1 1 1 3 3
 4 4 0 3 0 3 0 3 3 3 0 3 3 4 4 4 3 4 0 1 2 4 2 3 4 3 3 4 2 4 2 4 0 0 4 0 0
 0 4 1 0 4 3 3 2 4 0 3 4 2 3 4 2 4 4 3 1 3 3 0 1 3 4 3 4 4 4 0 4 4 4 4 3 4
 2 1 4 4 3 4 2 2 3 3 0 4 4 4 1 3 2 3 4 3 4 4 4 1 3 1 0 4 3 4 4 4 4 3 0 4 4
 0 3 0 4 2 0 2 0 2 3 4 2 2 2 2 2 0 0 2 4 4 2 2 4 0 3 

In [317]:
five_cluster_standard = standard_kmeans(X, 5)
print(five_cluster_bissect)

[0 1 0 0 1 0 1 0 1 1 2 2 1 2 1 1 2 1 1 4 3 3 0 1 0 1 1 1 0 1 0 3 0 1 0 1 3
 4 2 3 2 3 0 3 2 0 3 3 3 2 4 4 4 0 2 4 0 0 4 3 3 3 1 3 3 1 3 4 3 4 1 3 0 3
 4 2 3 1 1 4 3 3 1 0 3 0 2 0 2 0 2 2 4 4 1 1 3 3 3 3 2 3 4 3 3 0 3 4 1 4 3
 3 1 3 3 2 3 1 1 2 4 1 0 3 4 4 2 1 0 1 3 0 0 4 0 2 4 4 0 3 3 2 3 4 4 3 1 2
 4 4 3 3 1 3 4 4 1 2 4 4 3 0 0 3 0 2 4 2 0 4 3 2 0 3 4 3 3 1 4 4 0 0 2 4 2
 4 2 4 4 4 1 2 4 3 1 4 1 1 0 3 3 0 0 0 3 4 3 2 3 4 1 4 0 0 0 3 3 4 1 0 4 3
 3 0 4 4 3 4 2 0 1 2 2 1 4 2 0 0 2 0 4 4 3 2 0 3 4 3 3 3 0 4 0 0 0 3 0 0 1
 1 1 2 0 2 0 0 3 2 4 3 4 3 0 3 2 3 4 2 4 4 0 4 0 0 4 4 2 4 3 4 3 2 3 4 4 4
 4 4 4 3 1 2 0 3 4 2 4 4 4 4 4 4 4 4 3 4 4 1 3 4 3 0 3 0 4 4 4 4 1 1 1 3 3
 4 4 0 3 0 3 0 3 3 3 0 3 3 4 4 4 3 4 0 1 2 4 2 3 4 3 3 4 2 4 2 4 0 0 4 0 0
 0 4 1 0 4 3 3 2 4 0 3 4 2 3 4 2 4 4 3 1 3 3 0 1 3 4 3 4 4 4 0 4 4 4 4 3 4
 2 1 4 4 3 4 2 2 3 3 0 4 4 4 1 3 2 3 4 3 4 4 4 1 3 1 0 4 3 4 4 4 4 3 0 4 4
 0 3 0 4 2 0 2 0 2 3 4 2 2 2 2 2 0 0 2 4 4 2 2 4 0 3 3 2 4 2 3 4 2 4 3 1 4
 4 3 4 3 3 4 1 3 2 2 4 0 

In [318]:
ten_cluster_bissect = bisect_kmeans(X, 10)
print(ten_cluster_bissect)

[168, 400]
[153, 247, 168]
[158, 89, 168, 153]
[101, 67, 89, 153, 158]
[80, 78, 89, 153, 67, 101]
[84, 69, 89, 78, 67, 101, 80]
[31, 70, 89, 78, 67, 69, 80, 84]
[42, 47, 70, 78, 67, 69, 80, 84, 31]
[8 4 8 2 4 8 4 8 4 4 7 7 4 7 4 4 7 4 4 6 5 0 8 4 8 4 4 4 2 4 8 5 2 4 8 4 5
 3 9 5 9 5 8 5 7 2 1 5 0 7 3 3 6 2 7 6 2 2 3 0 1 1 4 1 5 4 1 3 5 6 4 5 8 5
 6 7 5 4 4 6 1 5 4 2 1 2 7 2 9 2 7 7 3 6 4 4 1 0 0 5 7 0 3 1 1 8 5 3 4 3 0
 5 4 5 0 9 0 4 4 7 6 4 8 5 3 6 9 4 8 4 0 8 2 6 2 9 6 6 2 0 0 7 0 6 3 5 4 7
 6 3 5 5 4 0 6 6 4 7 6 3 5 8 2 0 2 7 6 7 2 6 0 7 2 0 6 0 5 4 3 3 8 2 7 3 7
 6 7 6 6 3 4 9 3 5 4 6 4 4 8 5 1 2 2 2 5 6 1 7 5 3 4 3 8 2 2 5 1 6 4 8 6 5
 1 2 3 6 0 6 9 2 4 9 9 4 3 7 8 2 9 2 6 3 5 9 2 1 3 5 1 0 2 6 2 2 2 5 2 2 4
 4 4 7 8 7 2 2 1 9 6 5 3 0 2 1 7 5 6 7 3 6 2 6 2 2 3 3 9 3 5 6 5 7 0 6 6 6
 3 6 3 1 4 7 8 0 3 9 3 3 3 3 6 3 6 6 1 3 3 4 5 3 0 8 0 2 6 6 3 3 4 4 4 5 1
 6 3 8 0 2 1 2 5 1 0 2 0 0 3 6 6 5 3 2 4 7 3 7 5 3 1 0 3 9 6 7 3 8 2 6 2 2
 2 6 4 2 6 5 5 9 6 2 5 6 9 5 6 7 6 3 0 4 0 1 8 4 1 3

In [319]:
ten_cluster_standard = standard_kmeans(X, 10)
print(ten_cluster_bissect)

[8 4 8 2 4 8 4 8 4 4 7 7 4 7 4 4 7 4 4 6 5 0 8 4 8 4 4 4 2 4 8 5 2 4 8 4 5
 3 9 5 9 5 8 5 7 2 1 5 0 7 3 3 6 2 7 6 2 2 3 0 1 1 4 1 5 4 1 3 5 6 4 5 8 5
 6 7 5 4 4 6 1 5 4 2 1 2 7 2 9 2 7 7 3 6 4 4 1 0 0 5 7 0 3 1 1 8 5 3 4 3 0
 5 4 5 0 9 0 4 4 7 6 4 8 5 3 6 9 4 8 4 0 8 2 6 2 9 6 6 2 0 0 7 0 6 3 5 4 7
 6 3 5 5 4 0 6 6 4 7 6 3 5 8 2 0 2 7 6 7 2 6 0 7 2 0 6 0 5 4 3 3 8 2 7 3 7
 6 7 6 6 3 4 9 3 5 4 6 4 4 8 5 1 2 2 2 5 6 1 7 5 3 4 3 8 2 2 5 1 6 4 8 6 5
 1 2 3 6 0 6 9 2 4 9 9 4 3 7 8 2 9 2 6 3 5 9 2 1 3 5 1 0 2 6 2 2 2 5 2 2 4
 4 4 7 8 7 2 2 1 9 6 5 3 0 2 1 7 5 6 7 3 6 2 6 2 2 3 3 9 3 5 6 5 7 0 6 6 6
 3 6 3 1 4 7 8 0 3 9 3 3 3 3 6 3 6 6 1 3 3 4 5 3 0 8 0 2 6 6 3 3 4 4 4 5 1
 6 3 8 0 2 1 2 5 1 0 2 0 0 3 6 6 5 3 2 4 7 3 7 5 3 1 0 3 9 6 7 3 8 2 6 2 2
 2 6 4 2 6 5 5 9 6 2 5 6 9 5 6 7 6 3 0 4 0 1 8 4 1 3 5 3 3 6 8 6 6 6 3 0 6
 9 4 6 3 0 3 7 9 1 1 8 6 6 6 4 5 7 1 3 0 6 3 3 4 5 4 8 6 5 6 6 3 3 0 2 3 6
 2 5 2 6 7 2 9 2 9 5 6 9 9 9 9 9 2 2 9 3 3 7 7 3 2 5 1 9 6 9 0 6 7 3 0 4 3
 3 5 6 5 5 3 4 0 7 9 3 2 

In [320]:
from collections import Counter
print('Two cluster bissect:', Counter(two_cluster_bissect))
print('Two cluster standard:', Counter(two_cluster_standard))
print('********************************************************************************************************')
print('Five cluster bissect:',Counter(five_cluster_bissect))
print('Five cluster standard:', Counter(five_cluster_standard))
print('********************************************************************************************************')
print('Ten cluster bissect:',Counter(ten_cluster_bissect))
print('Ten cluster standard:', Counter(ten_cluster_standard))

Two cluster bissect: Counter({1: 400, 0: 169})
Two cluster standard: Counter({1: 400, 0: 169})
********************************************************************************************************
Five cluster bissect: Counter({4: 158, 3: 153, 0: 102, 2: 89, 1: 67})
Five cluster standard: Counter({4: 190, 1: 175, 3: 94, 2: 72, 0: 38})
********************************************************************************************************
Ten cluster bissect: Counter({6: 80, 3: 78, 2: 70, 5: 69, 4: 67, 7: 47, 0: 46, 9: 42, 1: 38, 8: 32})
Ten cluster standard: Counter({1: 101, 0: 94, 6: 68, 7: 66, 5: 57, 9: 51, 3: 50, 8: 45, 4: 23, 2: 14})


# Summary
Based on the observation on the counts above, it is clear that standard k means produces more varied cluster sizes than bissect k means.