# Unsupervised Learning

In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets
import sklearn.metrics as sm
 
import pandas as pd
import numpy as np


In [None]:
# import some data to play with

iris = pd.read_csv('iris_data2.csv')

## Split the data into feature and label

In [None]:
# split the data into feature and label
X = iris.iloc[:,0:4] # inputs into model
y = iris.species # output of model

In [None]:
X.head()

In [None]:
y.head()

In [None]:
num_row=len(y)
print(num_row)

In [None]:
iris_species = y.tolist()
#print(iris_species)

![](https://i1221.photobucket.com/albums/dd476/kk_yin/u1.png)

## Building the Kmeans model

You'll now create a KMeans model to find 3 clusters, and fit it to the data points from the previous exercise. After the model has been fit, you'll obtain the cluster labels for some new points using the <font color="blue">.predict()</font> method.

You are given the array points from the previous exercise, and also an array new_points.

### Instructions

- Import KMeans from sklearn.cluster.
- Using KMeans(), create a KMeans inst
ance called model to find 3 clusters. To specify the number of clusters, use the n_clusters keyword argument.
- Use the .fit() method of model to fit the model to the array of points points.
- Use the .predict() method of model to predict the cluster labels of new_points, assigning the result to labels.



In [None]:
# Import KMeans
from sklearn.cluster import KMeans

# Create a KMeans instance with 3 clusters: model
km = KMeans(n_clusters=3)

# Fit model to points
km.fit(X)

# Determine the cluster labels of new_points: labels
labels = km.labels_

# Print cluster labels of new_points
labels

## Correspondence with iris species

### Instructions

Use the <font color="blue">pd.crosstab()</font> function on df['labels'] and df['varieties'] to count the number of times each iris species coincides with each cluster label. Assign the result to ct


In [None]:
# Create a KMeans model with 3 clusters: model
km = KMeans(n_clusters=3)

# Use fit_predict to fit model and obtain cluster labels: labels
km_labels = km.fit_predict(X)

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'km_labels': km_labels, 'species': iris_species})

# Create crosstab: ct
ct = pd.crosstab(df['km_labels'], df['species'])


In [None]:
print(df.to_string())

In [None]:
print(ct)

## Measuring Quality of Clustering

- Using only samples and their cluster labels
- A good clustering has tight clusters
- ... and samples in each cluster bunched together


- Measures how spread out the clusters are (lower is better)
- Distance from each sample to centroid of its cluster
- Afer fit(), available as attribute inertia_
- k-means attempts to minimize the inertia when choosing clusters


In [None]:
from sklearn.cluster import KMeans
km = KMeans(n_clusters=3)
km.fit(X)

print(km.inertia_)


### Instructions

- For each of the given values of k, perform the following steps:
- Create a KMeans instance called model with k clusters.
- Fit the model to the grain data samples.
- Append the value of the inertia_ attribute of model to the list inertias.
- The code to plot ks vs inertias has been written for you, so hit 'Shift + Enter' to see the plot!



In [None]:
ks = range(1, 7)
inertias = []

for k in ks:
    # Create a KMeans instance with k clusters: model
    model = KMeans(n_clusters=k)
    
    # Fit model to samples
    model.fit(X)
    
    # Append the inertia to the list of inertias
    inertias.append(model.inertia_)
    
# Plot ks vs inertias
plt.plot(ks, inertias, '-o')
plt.xlabel('number of clusters, k')
plt.ylabel('inertia')
plt.xticks(ks)
plt.show()

### Convert class to integer

In [None]:
for i in range(0,num_row): # loop and stop before total number of instance
                             
    # Convert class to integer "Iris-Setosa = 0", "Iris -Versicolor = 1", and "Iris-Virginica = 2"
    if y[i] == "Iris-setosa":
        y[i] = "0"
    elif y[i] == "Iris-versicolor":
        y[i] = "1"
    elif y[i] == "Iris-virginica":
        y[i] = "2"


### Measure the performance 
**[Clustering metrics](https://scikit-learn.org/stable/modules/classes.html)**

In [None]:
from sklearn.metrics.cluster import v_measure_score
v_measure_score(km_labels, y)  

Reference:
https://www.kaggle.com/sashr07/unsupervised-learning-tutorial

# <font color="red"> Exercise </font>

# 1.0 Mean Shift

### Import the library

### Find out the number of estimated clusters by Mean Shift

### Fit Mean Shift model and generate the ct

### Calculate the score using  *v_measure_score()*

__Example output:__ 0.6994

# 2.0 GMM

### Import library

### Fit GMM model

### Generate the ct

### Calculate the score using *v_measure_score()*

__Example output:__ 0.8997

## 3.0 Agglomerative Hierarchical  Clustering

### Import library

### Fit the model

### Generate the ct

__Example output:__ 0.7701

### Reference

https://www.kaggle.com/sashr07/unsupervised-learning-tutorial

# K-means (Image)

![](https://i163.photobucket.com/albums/t281/kyin_album/k.png)

[OpenCV Kmeans](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.html)

In [None]:
import numpy as np
import cv2

img = cv2.imread('Cameraman.png')
Z = img.reshape((-1,1))

# convert to np.float32
Z = np.float32(Z)

#cv2.imshow('img',img)
#cv2.waitKey(0)
#cv2.destroyAllWindows()

In [None]:
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS

K = 4
ret,label,center=cv2.kmeans(Z,K,None,criteria,10,flags)

In [None]:
# # Now convert back into uint8, and make original image
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))

cv2.imshow('res2',res2)
cv2.waitKey(0)
cv2.destroyAllWindows()