OpenVINO人臉定位、特徵提取、頭部姿態估測及注視點偵測範例程式

歐尼克斯實境互動工作室 OmniXRI Jack, 2021.6.10 整理製作
由於Intel OpenVINO官方未提供Head Pose, Gaze Estimation Python範例程式，這裡參考 https://github.com/LCTyrell/Gaze_estimation 進行測試。
原程式使用OpenVINO 2020.3.194 (2020.3 LTS)版本，經測試不適用於 2021.3.394版本。 

1.下載測試源碼

In [1]:
!git clone https://github.com/LCTyrell/Gaze_estimation
!ls

fatal: destination path 'Gaze_estimation' already exists and is not an empty directory.
darknet					      input.jpg
DevCloud_OpenVINO_Face_Detection.ipynb	      intel
DevCloud_OpenVINO_Face_Head_Gaze_2021.ipynb   output.jpg
DevCloud_OpenVINO_Face_Head_Gaze.ipynb	      public
DevCloud_OpenVINO_Image_Classification.ipynb  test_0511.ipynb
DevCloud_OpenVINO_Pose_Estimation.ipynb       Untitled.ipynb
devcloud-yolov4-tiny-test.ipynb		      Webcam_Test_01.ipynb
DevCloud_Yolov4-tiny_Training.ipynb	      Yolov4-tiny_Colab_User_Datasets
face.jpg				      yolov4-tiny.conv.29
Gaze_estimation				      yolov4-tiny.weights
imagenet_label.txt


2.下載所需預訓練模型

    人臉定位　 face-detection-retail-0004
    人臉特徵點 landmarks-regression-retail-0009
    頭部姿態　 head-pose-estimation-adas-0001
    注視點估測 gaze-estimation-adas-0002

In [2]:
%cd Gaze_estimation
!ls
# 下載所需使用的Intel pretrained models
!downloader.py --name face-detection-retail-0004
!downloader.py --name head-pose-estimation-adas-0001
!downloader.py --name landmarks-regression-retail-0009
!downloader.py --name gaze-estimation-adas-0002
# 檢查下載路徑內容
!ls intel/

/home/u75102/My-Notebooks/Gaze_estimation
demo.mp4		      intel		   README.md
face_detection.py	      main.py		   requirements.txt
facial_landmark_detection.py  models		   results
gaze_estimation.py	      mouse_controller.py  utils.py
head_pose_estimation.py       __pycache__
################|| Downloading models ||################

... 100%, 99 KB, 139843 KB/s, 0 seconds passed

... 100%, 2297 KB, 11797 KB/s, 0 seconds passed

... 100%, 99 KB, 160188 KB/s, 0 seconds passed

... 100%, 1148 KB, 7671 KB/s, 0 seconds passed

... 100%, 243 KB, 144682 KB/s, 0 seconds passed

... 100%, 586 KB, 21784 KB/s, 0 seconds passed

################|| Post-processing ||################

################|| Downloading models ||################

... 100%, 49 KB, 89754 KB/s, 0 seconds passed

... 100%, 7468 KB, 12809 KB/s, 0 seconds passed

... 100%, 49 KB, 119449 KB/s, 0 seconds passed

... 100%, 3734 KB, 8740 KB/s, 0 seconds passed

... 100%, 81 KB, 114957 KB/s, 0 seconds passed

... 100%, 2026 KB, 121

3.修改main.py

原程式使用cv2.imshow來顯示內容，因為Colab不支援cv2.imshow，所以要手動修改 main.py 使直接顯示影像變成輸出結果視訊檔案(result/output_video_CPU.mp4)

1.增加註解 cv2.imshow("Camera_view",cv2.resize(frame,(900,450)))
2.去掉註解 out_video.write(image) 

In [3]:
%%writefile main.py

import os
import time
import cv2

import logging as log
from openvino.inference_engine import IECore
from argparse import ArgumentParser
from face_detection import Face_detection
from head_pose_estimation import Head_pose
from facial_landmark_detection import Landmark_detection
from gaze_estimation import Gaze_estimation
#from mouse_controller import MouseController

def build_argparser():
    """
    Parse command line arguments.

    :return: command line arguments
    """
    parser = ArgumentParser()
    parser.add_argument("-mfd", "--model_fd", required=True, type=str,
                        help="Path to the face detection model file without extension (e.g. path/model_name).")
    parser.add_argument("-mhp", "--model_hp", required=True, type=str,
                        help="Path to the head pose model file without extension (e.g. path/model_name).")
    parser.add_argument("-mld", "--model_ld", required=True, type=str,
                        help="Path to the landmark detection model file without extension (e.g. path/model_name).")
    parser.add_argument("-mge", "--model_ge", required=True, type=str,
                        help="Path to the gaze estimation model file without extension (e.g. path/model_name).")
    parser.add_argument("-i", "--input", required=True, type=str,
                        help="Use 'CAM' for camera or path to video file")
    parser.add_argument("-l", "--cpu_extension", required=False, type=str,
                        default=None,
                        help="Path to cpu extension if needed.")
    parser.add_argument("-df", "--draw_flags", required=False, nargs='+',
                        default=[],
                        help="Flags to draw model(s) output on the video (e.g. fd hp ld ge)"
                             "fd to draw face detection output"
                             "hp to draw head pose output"
                             "ld to draw landmark detection output"
                             "ge to draw gaze estimation output" )
    parser.add_argument("-d", "--device", type=str, default="CPU",
                        help="Specify the target device: "
                             "CPU, GPU, FPGA, MYRIAD or MULTI (e.g. ""MULTI:CPU(2),GPU(2)"")")
    parser.add_argument("-pt", "--threshold", type=float, default=0.5,
                        help="Minimum inference probability threshold (0.5 by default)")
    return parser


def main():
    """
    Load the network and parse the SSD output.
    """

    args = build_argparser().parse_args()

    model_fd=args.model_fd
    model_hp=args.model_hp
    model_ld=args.model_ld
    model_ge=args.model_ge

    device=args.device
    draw_flags=args.draw_flags
    if args.input=='CAM':
        video_file=0
    else: video_file=args.input

    threshold=args.threshold

    #mc=MouseController(precision='low', speed='fast')

    start_model_load_time=time.time()

    log.info("Creating fd Inference Engine...")
    ie = IECore()

    fd= Face_detection(model_fd, device, threshold)
    hp= Head_pose(model_hp, device, threshold)
    ld= Landmark_detection(model_ld, device, threshold)
    ge=Gaze_estimation(model_ge, device, threshold)

    fd.load_model(ie)
    hp.load_model(ie)
    ld.load_model(ie)
    ge.load_model(ie)

    total_model_load_time = time.time() - start_model_load_time

    try:
        cap=cv2.VideoCapture(video_file)
    except FileNotFoundError:
        print("Cannot locate video file: "+ video_file)
    except Exception as e:
        print("Something else went wrong with the video file: ", e)

    initial_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    initial_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    fps=15
    out_video = cv2.VideoWriter(os.path.join('results', 'output_video_'+args.device+'.mp4'), cv2.VideoWriter_fourcc(*'avc1'), fps, (initial_w, initial_h), True)

    counter=0
    start_inference_time=time.time()

    try:
        fd.set_initial(initial_w, initial_h)
        hp.set_initial(initial_w, initial_h)
        ge.set_initial(initial_w, initial_h)

        while cap.isOpened():
            ret, frame=cap.read()
            if not ret:
                break
            counter+=1
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

            coords, image, head_image= fd.predict(frame, draw_flags)

            if coords:
                land_image, left_eye, right_eye, nose= ld.predict(head_image, draw_flags)
                head_pose, pose_image= hp.predict(head_image, nose, draw_flags)
                eye_pose= ge.predict(left_eye, right_eye, head_pose, draw_flags)

            #if counter % 4 ==0:
                #mc.move(-eye_pose[0]/10, eye_pose[1]/10)

            #cv2.imshow("Camera_view",cv2.resize(frame,(900,450)))

            out_video.write(image)

        total_time=time.time()-start_inference_time
        total_inference_time=round(total_time, 1)
        fps=counter/total_inference_time

        with open(os.path.join('results', 'stats_ge_'+args.device+'.txt'), 'w') as f:
            f.write(str(total_inference_time)+'\n')
            f.write(str(fps)+'\n')
            f.write(str(total_model_load_time)+'\n')

        cap.release()
        cv2.destroyAllWindows()
    except Exception as e:
        print("Could not run Inference: ", e)

if __name__ == '__main__':
    main()
    exit(0)

Overwriting main.py


5. 進行推論

執行推論

輸入參數：
-mfd 指定人臉偵測模型路徑
-mld 指定人臉特徵點模型路徑
-mhp 指定頭部姿態模型路徑
-mge 指定注視點估測模型路徑
-i 指定輸人影像或視訊檔案
-df 指定繪圖內容

    fd（人臉綠色框）
    hp（頭部姿態紅綠藍XYZ軸線）
    ld（雙眼位置白色框，鼻子綠色點）
    ge（雙眼注視洋紅線）

-d 指定執行裝置，預設為CPU (Intel Xeon CPU)

In [4]:
!python3 main.py \
-mfd intel/face-detection-retail-0004/FP32/face-detection-retail-0004 \
-mld intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009 \
-mhp intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001 \
-mge intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-0002 \
-i demo.mp4 -df fd hp ld ge -d CPU

[ INFO ] Creating fd Inference Engine...
  self.model=IENetwork(self.model_structure, self.model_weights)
  self.model=IENetwork(self.model_structure, self.model_weights)
[ INFO ] Loading network files:
	intel/face-detection-retail-0004/FP32/face-detection-retail-0004.xml
	intel/face-detection-retail-0004/FP32/face-detection-retail-0004.bin
[ INFO ] Loading IR to the plugin...
[ INFO ] Loading network files:
	intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001.xml
	intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001.bin
[ INFO ] Loading IR to the plugin...
[ INFO ] Loading network files:
	intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009.xml
	intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009.bin
[ INFO ] Loading IR to the plugin...
[ INFO ] Loading network files:
	intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-0002.xml
	intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-000

5.顯示輸出結果

由於DevCloud無法以連續單張顯示的方法播放視頻，這裡用IPython.display的HTML函式即可在線上顯示輸出結果視訊檔案 results/output_video_CPU.mp4

In [5]:
from IPython.display import HTML # 導入IPython.display HTML函式庫
from base64 import b64encode # 導入base64 baseencode函式庫

vs1 = open('results/output_video_CPU.mp4','rb').read() # 開啟並讀取mp4格式視頻檔
data_url = "data:video/mp4;base64," + b64encode(vs1).decode() # 設定顯示內容格式
HTML(f'<video width=400 controls><source src={data_url} type="video/mp4"></video>') # 將視頻顯示於視窗上