<a href="https://colab.research.google.com/github/26medias/TF-Face-Angle-Translation/blob/master/Face_Position_Dataset_Builder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Face Angle Dataset Generator


## Credits

Face extraction built thanks to https://machinelearningmastery.com/how-to-perform-face-recognition-with-vggface2-convolutional-neural-network-in-keras/

## How this works

1. Download movie trailers
2. Extract the frames from the video files
3. Extract the faces from the images
4. Cluster the faces by actor
5. Build & save the facial landmarks for each face
6. Build the dataset
7. Zip & upload the dataset to Google Storage

## Downloading videos, extracting the frames

We're going to download movie trailers from https://www.davestrailerpage.co.uk/

The frames from the video files will be extracted and saved to file.

## Downloading videos, extracting the frames

In [1]:
import requests
import ntpath
import cv2
import os, sys

# The variables
DIR_VIDEOS = "videos"
DIR_IMAGES = "images"
CAPTURE_FTP  = 10 # We'll extract 10 images per second of video

if not os.path.isdir(DIR_VIDEOS):
  os.mkdir(DIR_VIDEOS, 755);
if not os.path.isdir(DIR_IMAGES):
  os.mkdir(DIR_IMAGES, 755);

# The methods
# Dowload a video from a url
def downloadFile(url):
  myfile = requests.get(url)
  filename = DIR_VIDEOS+"/"+ntpath.basename(url)
  open(filename, 'wb').write(myfile.content)
  return filename

# Export the frames out of a video at a specific fps
def videoToImages(filename, capture_fps=1):
  basename = os.path.splitext(ntpath.basename(filename))[0]
  print("basename:", basename)
  if not os.path.isdir(DIR_IMAGES+"/"+basename):
    os.mkdir(DIR_IMAGES+"/"+basename, 755)
  cap = cv2.VideoCapture(filename)
  # Get the video's FPS
  fps = cap.get(cv2.CAP_PROP_FPS)
  # How many frames between capture?
  skipFrame = round(fps/capture_fps)
  print(basename, ": fps: ",fps," / skipFrame: ", skipFrame)
  i = 0
  while(cap.isOpened()):
      ret, frame = cap.read()
      if ret == False:
          break
      i+=1
      if (i % skipFrame == 0):
        continue
      cv2.imwrite(DIR_IMAGES+"/"+basename+'/'+str(round((i-1)/fps,2))+'sec.jpg',frame)
  cap.release()
  cv2.destroyAllWindows()
  print(basename, " processed.")

# Download a video then extract the frames
def remoteVideoToImages(url):
  videoFilename = downloadFile(url)
  videoToImages(videoFilename, CAPTURE_FTP)

remoteVideoToImages("http://trailers.apple.com/movies/paramount/terminator-dark-fate/terminator-dark-fate-trailer-2_h480p.mov")

basename: terminator-dark-fate-trailer-2_h480p
terminator-dark-fate-trailer-2_h480p : fps:  23.976023976023978  / skipFrame:  2
terminator-dark-fate-trailer-2_h480p  processed.


## Find & extract the faces from the video frames

Import the dependencies

In [2]:
!pip install git+https://github.com/rcmalli/keras-vggface.git
!pip show keras-vggface
!pip install matplotlib
!pip install mtcnn

Collecting git+https://github.com/rcmalli/keras-vggface.git
  Cloning https://github.com/rcmalli/keras-vggface.git to /tmp/pip-req-build-shl7ql4n
  Running command git clone -q https://github.com/rcmalli/keras-vggface.git /tmp/pip-req-build-shl7ql4n
Building wheels for collected packages: keras-vggface
  Building wheel for keras-vggface (setup.py) ... [?25l[?25hdone
  Created wheel for keras-vggface: filename=keras_vggface-0.6-cp36-none-any.whl size=8311 sha256=f0064b53f29a0a76f6848a72b320dea13ae5e222968ff7319fa189e334042c19
  Stored in directory: /tmp/pip-ephem-wheel-cache-foewfdtc/wheels/36/07/46/06c25ce8e9cd396dabe151ea1d8a2bc28dafcb11321c1f3a6d
Successfully built keras-vggface
Installing collected packages: keras-vggface
Successfully installed keras-vggface-0.6
Name: keras-vggface
Version: 0.6
Summary: VGGFace implementation with Keras framework
Home-page: https://github.com/rcmalli/keras-vggface
Author: Refik Can MALLI
Author-email: mallir@itu.edu.tr
License: MIT
Location: /usr/

We're going to use VGGFace2 to find & extract the faces

In [0]:
import matplotlib.pyplot as pyplot
import glob
import keras_vggface
import mtcnn
from PIL import Image
from numpy import asarray
from mtcnn.mtcnn import MTCNN
from pathlib import Path

# The variables
DIR_FACES = "faces"

if not os.path.isdir(DIR_FACES):
  os.mkdir(DIR_FACES, 755);

# The methods
# Get the directory of a filename
def getDir(filename):
  p = Path(filename);
  return p.parts[len(p.parts)-2]

# Extract a single face from an image
def findFaces(filename):
	# load image from file
	pixels = pyplot.imread(filename)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	return (pixels, detector.detect_faces(pixels))

def extractFacesFromImage(filename, required_size=(224, 224), limit=50):
  (pixels, results) = findFaces(filename)
  faces = []
  for i,faceData in enumerate(results):
    if len(faces) > limit:
      break
    x1, y1, width, height = faceData['box']
    x2, y2 = x1 + width, y1 + height
    # extract the face
    face = pixels[y1:y2, x1:x2]
    # resize pixels to the model size
    try:
      image = Image.fromarray(face)
      image = image.resize(required_size)
      face_array = asarray(image)
      faces.append(face_array)
      if limit==1:
        return face_array
    except:
      print("Face processing failed")
  if limit==1 and len(faces)==0:
    return False
  return faces;


# Extract faces from images in a directory & its subdirectories
def extractFacesFromDirectory(directory, outputDirectory):
  filenames = glob.glob(directory+'/*/*.jpg')
  for i,filename in enumerate(filenames):
    dirname  = getDir(filename)
    basename = os.path.splitext(ntpath.basename(filename))[0]
    faces = extractFacesFromImage(filename);
    print(filename, "Faces: ", len(faces))
    if len(faces) > 0:
      n = 0
      for face in faces:
        try:
          im = Image.fromarray(face)
          im.save(outputDirectory+'/'+dirname+'_'+basename+'-'+str(n)+'.jpg')
        except:
          print("Couldn't save", outputDirectory+'/'+dirname+'_'+basename+'-'+str(n)+'.jpg')
        n = n+1


#extractFacesFromDirectory(DIR_IMAGES, DIR_FACES)



Save the faces to Cloud Storage

In [16]:
!tar -zcvf faces.tar.gz faces

faces/
faces/terminator-dark-fate-trailer-2_h480p_105.61sec-1.jpg
faces/terminator-dark-fate-trailer-2_h480p_38.87sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_94.26sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_125.79sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_88.76sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_109.11sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_107.86sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_92.34sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_108.19sec-1.jpg
faces/terminator-dark-fate-trailer-2_h480p_11.26sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_82.5sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_119.79sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_126.21sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_94.26sec-1.jpg
faces/terminator-dark-fate-trailer-2_h480p_92.43sec-1.jpg
faces/terminator-dark-fate-trailer-2_h480p_111.11sec-0.jpg
faces/terminator-dark-fate-trailer-2_h480p_137.55sec-0.jpg

In [0]:
from google.colab import auth
auth.authenticate_user()

In [18]:
!gcloud config set project deep-learning-files
#!gsutil cp gs://tf-face-angle-translation/foo.bar ./foo.bar
!gsutil cp  ./faces.tar.gz gs://tf-face-angle-translation/datasets/faces-terminator.tar.gz

Updated property [core/project].


To take a quick anonymous survey, run:
  $ gcloud alpha survey

Copying file://./faces.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  1.3 MiB/  1.3 MiB]                                                
Operation completed over 1 objects/1.3 MiB.                                      


## Cluster the faces

We want to group all the actors per directory

In [15]:
from matplotlib import pyplot
from PIL import Image
from numpy import asarray
from scipy.spatial.distance import cosine
from mtcnn.mtcnn import MTCNN
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input


# Extract faces and calculate face embeddings for a list of photo files
def get_embeddings(filenames):
	# extract faces
	faces = [extractFacesFromImage(f, limit=1) for f in filenames]
	# convert into an array of samples
	samples = asarray(faces, 'float32')
	# prepare the face for the model, e.g. center pixels
	samples = preprocess_input(samples, version=2)
	# create a vggface model
	model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')
	# perform prediction
	embeddings = model.predict(samples)
	return embeddings


# Determine if a candidate face is a match for a known face
def is_match(known_embedding, candidate_embedding, threshold=0.6):
	# calculate distance between embeddings
	score = cosine(known_embedding, candidate_embedding)
	return score >= threshold


def clusterFacesFromDirectory(directory):
  filenames = glob.glob(directory+'/*.jpg')
  embeddings = get_embeddings(filenames)
  print(len(embeddings))
    
clusterFacesFromDirectory(DIR_FACES)

Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed
Face processing failed


ValueError: ignored