**PART 7 - DSIM project**

In this section, we test our algorithms to new images with a session of live cam.

**Authors:** 
* Francesca De Cola, matricola 819343  CdLM: Data Science
* Valentina Moretto, matricola 853744  CdLM: Data Science
* Valentina Zangirolami, matricola 819451  CdLM: Scienze Statistiche ed Economiche (CLAMSES)

Summary procedure:
1. Start live session and extract frame when it is possible detect face
2. Model receive this frame and predict the expression
3. Results are visible in live cam with the assignment of label, accuracy and the correspondent emoji.

**Load packages**

In [1]:
import numpy as np
import os
import math
import time
import matplotlib.pyplot as plt

import cv2 
import cv2 as cv
from cv2 import VideoCapture as cap

from keras.models import load_model, Model
from keras_vggface.utils import preprocess_input
from keras_vggface.vggface import VGGFace

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Follow lines include two function: **image_resize** and **CFEVideoConf**. In general, this function are imported by packages *utils*, but for some problems we import these function in the following chunck.

These function allows to:
* image_resize: resize images. In particular we use this function to resize emoji images.
* CFEVideoConf, is useful to capture video.

In [2]:
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
    # initialize the dimensions of the image to be resized and
    # grab the image size
    dim = None
    (h, w) = image.shape[:2]
    # if both the width and height are None, then return the
    # original image
    if width is None and height is None:
        return image
    # check to see if the width is None
    if width is None:
        # calculate the ratio of the height and construct the
        # dimensions
        r = height / float(h)
        dim = (int(w * r), height)
    # otherwise, the height is None
    else:
        # calculate the ratio of the width and construct the
        # dimensions
        r = width / float(w)
        dim = (width, int(h * r))

    # resize the image
    resized = cv2.resize(image, dim, interpolation = inter)
    # return the resized image
    return resized

class CFEVideoConf(object):
    # Standard Video Dimensions Sizes
    STD_DIMENSIONS =  {
        "360p": (480, 360),
        "480p": (640, 480),
        "720p": (1280, 720),
        "1080p": (1920, 1080),
        "4k": (3840, 2160),
    }
    # Video Encoding, might require additional installs
    # Types of Codes: http://www.fourcc.org/codecs.php
    VIDEO_TYPE = {
        'avi': cv2.VideoWriter_fourcc(*'XVID'),
        #'mp4': cv2.VideoWriter_fourcc(*'H264'),
        'mp4': cv2.VideoWriter_fourcc(*'XVID'),
    }

    width           = 640
    height          = 480
    dims            = (640, 480)
    capture         = None
    video_type      = None
    def __init__(self, capture, filepath, res="480p", *args, **kwargs):
        self.capture = capture
        self.filepath = filepath
        self.width, self.height = self.get_dims(res=res)
        self.video_type = self.get_video_type()

    # Set resolution for the video capture
    # Function adapted from https://kirr.co/0l6qmh
    def change_res(self, width, height):
        self.capture.set(3, width)
        self.capture.set(4, height)

    def get_dims(self, res='480p'):
        width, height = self.STD_DIMENSIONS['480p']
        if res in self.STD_DIMENSIONS:
            width, height = self.STD_DIMENSIONS[res]
        self.change_res(width, height)
        self.dims = (width, height)
        return width, height

    def get_video_type(self):
        filename, ext = os.path.splitext(self.filepath)
        if ext in self.VIDEO_TYPE:
          return  self.VIDEO_TYPE[ext]
        return self.VIDEO_TYPE['avi']

**Load model**

Initially, we load four models:
* model_ft1 and model_ft2: refers to a model in which we apply fine tuning techniques. They represents the two best model of the scripts Fine_tuning_4.ipynb
* model_cnn1: refers to our CNN with the best cut of vggface (layer: add_12). It is the last model of the scripts CNN_5
* base_model: it is the pretrained neural network VGGFace. It is necessary for model_cnn1, because we use this to extract features of images before model_cnn1.

In [3]:
model_ft1=load_model('C:/Users/valen/Desktop/MAGISTRALE/DSIM/aml/weights-tmp4.best.hdf5')
model_ft2=load_model('C:/Users/valen/Desktop/MAGISTRALE/DSIM/aml/weights-tmp3.best.hdf5')
model_cnn1=load_model('C:/Users/valen/Desktop/MAGISTRALE/DSIM/aml_cnn/cnn_best.h5')
base_model = VGGFace(include_top = False, input_shape = (224, 224, 3), model='senet50')









Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


We define **label_emoji**: is a function that receives in input images and pre process them with a function of preprocessing of vggface (same preprocessing apply in previous notebook for the consider model). After, we create an ensemble model with all three models in which:
1. We extract features of images with vggface, this result is passes to model_cnn1
2. For other models we passes directly images such as we have describe in the previous chunck
3. After, we define a weighted average of prediction of all models. The value of weights are evaluate in Final_model_6.ipynb.
4. Finally, we extract label prediction and define the accuracy and return them.

In [4]:
def label_emoji(img):
    #preprocessing
    img = np.expand_dims(img, axis = 0)
    x = img.astype('float32')
    x = preprocess_input(x, version = 2)
    
    #global variable
    global acc
    global pred
    
    #For cnn, images are predict initially with VGGFace (optimal cut: add_12) - feature extraction
    model = Model(inputs=base_model.input, outputs=base_model.get_layer('add_12').output)
    predicted = model.predict(x)
    #prediction for each model
    pred_cnn = model_cnn1.predict(predicted)
    pred_ft1 = model_ft1.predict(x)
    pred_ft2 = model_ft2.predict(x)
    
    #ensemble model
    final_pred= 0.4*pred_ft1 + 0.1*pred_ft2 + 0.5*pred_cnn 
    #accuracy
    acc = round(np.max(final_pred)*100, 3)
    
    #predicted labels
    pred = np.argmax(final_pred, axis = 1)

    return pred, acc

We, also, define **detect**, this function receives in input: image extract by video capture, classifier for face detection, counter and rate are parameters useful for video capture, scale_factor and minNeighbors are parameters necessary to face detection and size represents size of images.

With this function, we detect face on image of video capture with Cascade Classifier (OpenCV) and define rectangle to print into live cam. Also, we define label and emoji to associate at the expression on video live with the prediction result of our model.

In [5]:
def detect(image, classifier, counter, rate,
           scale_factor = 1.1, minNeighbors = 5,
           size = (224,224)):

    face_cascade = cv.CascadeClassifier(classifier)
    gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)
    faces = face_cascade.detectMultiScale(gray, scale_factor, minNeighbors)
    
    #global variable
    global pred
    global acc

    if len(faces) == 1:
        for (x, y, w, h) in faces:
            roi_gray = gray[y:y+h, x:x+h] # rec
            roi_color = image[y:y+h, x:x+w]
            n_img = cv.resize(roi_color, size)
        
            #color for face, text and rectangle
            color_face = (24,191,255) #gold
            color_text = (255,255,255) #white
            rectangle_bgr = (24,191,255) #gold
            
            #font e size for text
            font = cv.FONT_HERSHEY_TRIPLEX
            font_scale = 0.88
            
            #rectangle for face detection
            cv.rectangle(image, (x,y), (x+w,y+h), color_face, 3)
            
            #extract pred and accuracy to previous function
            if counter % rate == 0:
                pred, acc = label_emoji(n_img)
                
            #we define label and emoji to print in live cam
            label=''
            emoji=''
            if pred==0:
                label='Angry'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/angry.png", -1)
            elif pred==1:
                label='Disgust'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/disgust.png", -1)
            elif pred==2:
                label='Fear'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/fear.png", -1)
            elif pred==3:
                label='Happy'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/happy.png", -1)
            elif pred==4:
                label='Neutral'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/neutral.png", -1)
            elif pred==5:
                label='Sad'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/sad.png", -1)
            elif pred==6:
                label='Surprise'
                emoji = cv.imread("C:/Users/valen/Desktop/magistrale/DSIM/emoji/surprised.png", -1)
            else:
                label='Unknow'
            #string to print
            string = " " + label + ": " + str(acc) + "% "
            
            
            #Rectangle for background of string
            (text_width, text_height) = cv.getTextSize(string, font, fontScale=font_scale, thickness=1)[0]
            box_coords = ((x-2, y), (x + text_width + 4, y - text_height - 8))
            cv.rectangle(image, box_coords[0], box_coords[1], rectangle_bgr, cv.FILLED)
            #insert of string
            cv.putText(image, string, (x,y-5), font, font_scale, color_text, 1)
            
            for (ex, ey, ew, eh) in faces:
                roi_face = roi_gray[ey: ey + eh, ex: ex + ew]
                emojis = image_resize(emoji.copy(), width = w//4)
                
                #save shape of face 
                faces_h, faces_w = faces.shape
                
                gw, gh, gc = emojis.shape
                
                for i in range(0, gw):
                    for j in range(0, gh):
                        #print(emoji[i, j])
                        if emojis[i, j][3] != 0: # alpha 0
                            offset = 10
                            h_offset = faces_h - gh - offset
                            w_offset = faces_w - gw - offset
                            roi_color[h_offset + i, w_offset + j] = emojis[i, j, :-1]
            
    else:
        pass

Now, we define **live_cam**. This function is the last function that include all passages that allows to start live session.
Initially, we use VideoCapture to read, display and save the video. After, we define a loop that capture a frame, pass it into **detect** function and extrapolate the label, accuracy and emoji and print them on the window session. The terminal of the loop is determinate of the click of the ESC.

In [6]:
def live_cam(classifier, resolution = (1280, 720), fps = 30, time_sleep = 2, rate = 30):
  
    cam = cv.VideoCapture(0)

    cam.set(cv.CAP_PROP_FRAME_WIDTH, resolution[0])
    cam.set(cv.CAP_PROP_FRAME_HEIGHT, resolution[1])
    cam.set(cv.CAP_PROP_FPS, fps)

    cv.namedWindow("Acquisition window")

    counter = 0
    while True:
        ret, frame = cam.read()
        if not ret:
            break
        k = cv.waitKey(1)

        time.sleep(time_sleep)
        detect(frame, classifier=classifier, counter = counter, rate = rate, scale_factor = 1.1, minNeighbors = 6)
        
        text = 'Press ESC to exit from demo...'
        cv.putText(frame, text, (10,20), cv.FONT_HERSHEY_DUPLEX, 0.75, (255,255,255), 1)
        
        cv.imshow("Acquisition window", frame)

        
        if k%256 == 27:
            # ESC pressed
            print("Escape hit, closing...")
            break
            
        counter += 1
        

    cam.release()
    cv.destroyAllWindows()

**LIVE SESSION**

We specify the classifier for cascade classifier, we use *haarcascade_frontalface_alt2* that represent a file XML that we load from our folder.

In [10]:
live_cam(time_sleep = 0.01, rate = 10, classifier = 'C:/Users/valen/Desktop/magistrale/DSIM/haarCascade/haarcascade_frontalface_alt2.xml')

Escape hit, closing...
