In [0]:
%matplotlib inline
from pylab import *

# Principal Component Analysis



---
## Get the data

* Load the Olivetti Face dataset
* Import the smile/no smile reference data

In [0]:
from sklearn import datasets
faces = datasets.fetch_olivetti_faces()
faces.keys()

In [0]:
# Display some images
for i in range(10):
    face = faces.images[i]
    subplot(1, 10, i + 1)
    imshow(face.reshape((64, 64)), cmap='gray')
    axis('off')
    
print(faces.target)



---

## Feature extraction

* Compute Histogram of Gradients (HoGs) features on **all images**
* Understand what HoGs are

In [0]:
from __future__ import division, print_function
from time import time

import numpy as np
import matplotlib.pyplot as plt

from skimage import feature

# Compute HoG features
hog_vec = []
hog_vis = []
for i in range(len(faces.images)):
  image = faces.images[i]
  hvec, hvis = feature.hog(image, visualise=True)
  hog_vec.append(hvec)
  hog_vis.append(hvis)

print('Number of features of size... ',np.array(hog_vec).shape)

In [0]:
# Understand HOG features
from random import randint
ii = randint(0, len(faces.images))
print(len(faces.images), ii)

fig, ax = plt.subplots(1, 2, figsize=(12, 6),
                       subplot_kw=dict(xticks=[], yticks=[]))
ax[0].imshow( faces.images[ii], cmap='gray')
ax[0].set_title('input image')

ax[1].imshow(hog_vis[ii])
ax[1].set_title('visualization of HOG features');

In [0]:
print(hog_vec[ii])
print(hog_vec[ii].shape)
print( np.max(hog_vec) )
print( np.min(hog_vec) )



---

## Principal Component Analysis

HoGs on faces= many dimensions!

* Compute an ACP of the HOG features

Info: [sklearn PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

* Display 2 random dimensions of the original features
* Display the dataset along its 2 first principal components
* Perform 2 previous displays with the person class as a color.

Info: [matplotlib.plot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html),  [matplotlib.scatter](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html)

In [0]:
# PCA from sci-kit learn


In [0]:
# Display 2 dimensions picked at random


In [0]:
# Display 2 first principal components after PCA transform


Next cell is useful with  [matplotlib.scatter](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html)

In [0]:
# define the colormap
cmap = plt.cm.jet
# extract all colors from the .jet map
cmaplist = [cmap(i) for i in range(cmap.N)]
# create the new map
cmap = cmap.from_list('Custom cmap', cmaplist, cmap.N)


In [0]:
# Display 2 dimensions picked at random with a color per person


In [0]:
# Display 2 first principal components after PCA transform with a color per person




---

# Clustering: image segmentation with k-means

Images = many pixels!

* Load an image and convert it a numpy array (3xNxM)
* Code a k-means algorihm on the pixel colors
* Find color prototypes
* Back-project color prototypes instead of original colors into the image

In [0]:
%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt

In [0]:
# Load an image
import numpy as np
from sklearn.datasets import load_sample_image
from skimage.transform import resize

china = load_sample_image("china.jpg")

# Convert to floats, in [0-1]
china = np.array(china, dtype=np.float64) / 255
print(china.shape)

# Subsample
china = resize(china, (200, 300))

#china = china[140:240,50:210,:]

# Display image
plt.figure(1)
plt.clf()
plt.axis('off')
plt.title('Original image (96,615 colors)')
plt.imshow(china)
plt.show()

w, h, d = original_shape = tuple(china.shape)
print(w, h, d)

Create variable objects for the k-means: samples, memberships, prototypes, etc.

Initialisation of prototypes: https://en.wikipedia.org/wiki/K-means_clustering

Info:[random.randrange](https://docs.python.org/3/library/random.html)

In [0]:
# nb clusters
N = 16

# data samples = pixels
china_pixels = np.reshape(china, (w*h,d))


# memberships
u = ...

# prototypes: initialized as true, random pixels (try otherwise) 
p = ...


Code the k-means: loop to estimate alternatively 
* memberships of points to clusters (based on distance to prototypes), and 
* prototypes (weighted average of points belonging to a cluster)

Compute loss (objective function) at each iteration

Info: [linalg.norm](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.norm.html) [np.sum](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.sum.html), [np.argmin](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argmin.html), [np.average](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.average.html)

In [0]:
from numpy import linalg as LA

max_iter= 10


dist= ...
previous_loss= ...


for k in range(0,max_iter):
  ...
  
    



Compute quantized version of the image, i.e. replace each pixel with the prototype of the cluster it belongs to.

Display original and quantized images; Run with various number of clusters.

Info: [np.copy](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.copy.html), [np.reshape](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.reshape.html), [plt.show](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html)

In [0]:
# compute quantized version of the image
...


# Display all results, alongside original image
plt.figure(1)
plt.clf()
plt.axis('off')
plt.title('Original image (96,615 colors)')
plt.imshow(china)

# Quantized image by k-means
plt.figure(2)
plt.clf()
plt.axis('off')
plt.title('Quantized image (N colors)')
plt.imshow(china_qtzd)

