<a href="https://colab.research.google.com/github/M-H-Amini/MachineLearning-TMU/blob/master/MLe_TMU_Lec7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# In The Name Of ALLAH
# Machine Learning *elementary* Course
## Tarbiat Modares University
### Mohammad Hossein Amini (mhamini@aut.ac.ir)
# Lecture 7

<img src="https://drive.google.com/uc?id=144SDpgv7EEy6Og1ZFNIv_nBaugKGiSCE" width="400">



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import os
from PIL import Image

Today we're gonna see 2 funny applications of **PCA**. The theoretical stuff has been discussed in the video lectures. Let's see what can PCA do in action.



*   First, we're going to see some **word embeddings**. In *NLP*, for reasons such as efficient space usage and finding concepts of words, each word is represented as an N-d vector, e.g. 100-d vector. Since, as humans, we can't imagine higher dimension spaces, we must find a way to see how well our word embeddings are. So using PCA, we project each embedding to a 2-d space and plot it.
*   Second, we're going to do a **face recognition** task using PCA. We'll be learning eigenfaces from data and represent each face with a small vector. Then we can do a simple supervised learning classification algorithm to find out whose face are we facing :)



#  Word Embeddings

In [None]:
!wget https://nlp.stanford.edu/data/glove.6B.zip

In [None]:
!unzip -q glove.6B.zip

In [None]:
with open('glove.6B.100d.txt') as f:
  lines = f.readlines()

In [None]:
my_dict = dict()
for line in lines:
  splitted_line = line.split()
  word = splitted_line[0]
  vec = np.array([float(i) for i in splitted_line[1:]])
  my_dict[word] = vec

In [None]:
words = []
vecs = []
for word, vec in my_dict.items():
  words.append(word)
  vecs.append(vec)
vecs = np.array(vecs)

In [None]:
print(len(words), vecs.shape)

In [None]:
pca = PCA(2).fit(vecs)

In [None]:
X = pca.transform(vecs)
print(X.shape)

In [None]:
chosen_words = ['coffee', 'tea', 'water',
                         'spaghetti', 'borscht', 'hamburger', 'pizza', 'falafel', 'sushi', 'meatballs',
                         'dog', 'horse', 'cat', 'monkey', 'parrot', 'koala', 'lizard',
                         'frog', 'toad', 'monkey', 'ape', 'kangaroo', 'wombat', 'wolf',
                         'france', 'germany', 'hungary', 'luxembourg', 'australia', 'fiji', 'china',
                         'homework', 'assignment', 'problem', 'exam', 'test', 'class',
                         'school', 'college', 'university', 'institute']

chosen_indxs = [words.index(word) for word in chosen_words]
X_chosen = X[chosen_indxs, :]

In [None]:
plt.figure(figsize=(12, 12))
plt.plot(X_chosen[:, 0], X_chosen[:, 1], 'bo')
for i in range(len(chosen_words)):
  plt.text(X_chosen[i, 0] + 0.05, X_chosen[i, 1] + 0.05, chosen_words[i])
plt.show()

# Face Recognition

In [None]:
!wget http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz
!tar -xvzf lfw-funneled.tgz

In [None]:
main_folder = 'lfw_funneled'
sub_folders = [f for f in os.listdir(main_folder) if os.path.isdir(os.path.join(main_folder, f))]
files_dict = {folder:[os.path.join(main_folder, folder, f) for f in os.listdir(os.path.join(main_folder, folder))] for folder in sub_folders}
print(files_dict)

In [None]:
def processImage(path):
  return np.array(Image.open(path).convert('L').resize((40, 40))).reshape((-1, 1))

X = []
for person in files_dict:
  for path in files_dict[person]:
    X.append(processImage(path))

X = np.array(X)

In [None]:
X = X[:, :, 0]
print(X.shape)

In [None]:
pca = PCA(16)
X_new = pca.fit_transform(X)

In [None]:
efaces = pca.components_
print(efaces.shape)

In [None]:
def showEfaces(efaces, r=4, c=4):
  plt.figure(figsize=(12,12))
  for i in range(r):
    for j in range(c):
      plt.subplot(r, c, i * c + j + 1)
      plt.imshow(efaces[i * c + j].reshape((40, 40)), cmap='gray')
      plt.axis('off')
  plt.show()

showEfaces(efaces)