# **Compare models [High-Speed face Emotion recognition](https://github.com/HSE-asavchenko/face-emotion-recognition) vs [Residual Masking Network](https://github.com/phamquiluan/ResidualMaskingNetwork)**

In [1]:
!pip install hsemotion==0.3.0 opencv-python==4.8.0.74 pillow rmn==3.1.1 timm==0.6.5

Collecting hsemotion==0.3.0
  Downloading hsemotion-0.3.0.tar.gz (8.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting opencv-python==4.8.0.74
  Downloading opencv_python-4.8.0.74-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.7/61.7 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
Collecting rmn==3.1.1
  Downloading rmn-3.1.1-py3-none-any.whl (109 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.0/109.0 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting timm==0.6.5
  Downloading timm-0.6.5-py3-none-any.whl (512 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m512.8/512.8 kB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
Collecting pytorchcv (from rmn==3.1.1)
  Downloading pytorchcv-0.0.67-py2.py3-none-any.whl (532 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m532.4/532.4 kB[0m [31m48.6 MB/s[0m eta [36m0:00:00[0

In [2]:
!curl -L -o affectnet.zip https://www.dropbox.com/s/miltxghiqvnu98d/AffectNet.zip?dl=0
!7z x affectnet.zip -oaffectnet

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17    0    17    0     0     54      0 --:--:-- --:--:-- --:--:--    54
100   340  100   340    0     0    503      0 --:--:-- --:--:-- --:--:--   503
100   534    0   534    0     0    556      0 --:--:-- --:--:-- --:--:--  9709
100  314M  100  314M    0     0  66.7M      0  0:00:04  0:00:04 --:--:--  109M

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.30GHz (306F0),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan         1 file, 329755345 bytes (315 MiB)

Extracting archive: affectnet.zip
 14% 4096 Open              --
Path = affectnet.zip
Type = zip
Physical Size = 329755345

  0%      0% 521 - anger/image0009907.jpg

In [3]:
import pathlib
import time

import cv2
import numpy
import rmn
import torch
from hsemotion.facial_emotions import HSEmotionRecognizer
from PIL import Image

pretrained_ckpt does not exists!


Downloading pretrained_ckpt..: 100%|██████████| 552M/552M [00:09<00:00, 61.0MiB/s]


deploy.prototxt.txt does not exists!


Downloading deploy.prototxt.txt..: 100%|██████████| 28.1k/28.1k [00:00<00:00, 16.5MiB/s]


res10_300x300_ssd_iter_140000.caffemodel does not exists!


Downloading res10_300x300_ssd_iter_140000.caffemodel..: 100%|██████████| 10.7M/10.7M [00:00<00:00, 80.2MiB/s]


In [4]:
subdir_to_class = {"anger": 0, "disgust": 1, "fear": 2, "happy": 3, "neutral": 4, "sad": 5, "surprise": 6}
hse_label_to_class = {"Anger": 0, "Disgust": 1, "Fear": 2, "Happiness": 3, "Neutral": 4, "Sadness": 5, "Surprise": 6}
rmn_label_to_class = {"angry": 0, "disgust": 1, "fear": 2, "happy": 3, "neutral": 4, "sad": 5, "surprise": 6}

In [9]:
def test(images, hse_fer, rmn_fer, logging_step=1000):
    hse_num_matches = 0
    rmn_num_matches = 0
    hse_total_duration = 0
    rmn_total_duration = 0
    num_images = 0
    for i, (true_class, img) in enumerate(images):
        num_images += 1
        img = numpy.asarray(img)
        img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)

        hse_duration, (hse_label, *_) = timed(hse_fer.predict_emotions, img)
        hse_num_matches += hse_label_to_class.get(hse_label) == true_class
        hse_total_duration += hse_duration

        rmn_duration, (rmn_label, *_) = timed(rmn_fer.detect_emotion_for_single_face_image, img_bgr)
        rmn_num_matches += rmn_label_to_class[rmn_label] == true_class
        rmn_total_duration += rmn_duration

        if i % logging_step == 0:
            print(
                "Accuracy: HSE={0:.3f} RMN={1:.3f} | Duration: HSE={2:.3f} RMN={3:.3f}".format(
                    hse_num_matches / num_images,
                    rmn_num_matches / num_images,
                    hse_total_duration / num_images,
                    rmn_total_duration / num_images,
                )
            )

    print("Images:", num_images)
    print(
        "Accuracy: HSE={0:.3f} RMN={1:.3f} | Duration: HSE={2:.3f} RMN={3:.3f}".format(
            hse_num_matches / num_images,
            rmn_num_matches / num_images,
            hse_total_duration / num_images,
            rmn_total_duration / num_images,
        )
    )

def iter_images(root):
    class_to_images = {class_: (pathlib.Path(root) / subdir).iterdir() for subdir, class_ in subdir_to_class.items()}
    while class_to_images:
        for class_, images in list(class_to_images.items()):
            try:
                yield class_, Image.open(next(images))
            except StopIteration:
                class_to_images.pop(class_)


def timed(fn, *args):
    started = time.monotonic()
    result = fn(*args)
    finished = time.monotonic()
    return finished - started, result

**Let's compare the models in accuracy:**

In [10]:
test(
    iter_images("affectnet"),
    HSEmotionRecognizer(device="cuda" if torch.cuda.is_available() else "cpu"),
    rmn.RMN(),
)

/root/.hsemotion/enet_b0_8_best_vgaf.pt Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
Accuracy: HSE=1.000 RMN=1.000 | Duration: HSE=0.014 RMN=0.034
Accuracy: HSE=0.638 RMN=0.494 | Duration: HSE=0.022 RMN=0.023
Accuracy: HSE=0.656 RMN=0.496 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.648 RMN=0.493 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.650 RMN=0.493 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.646 RMN=0.490 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.646 RMN=0.486 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.649 RMN=0.489 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.649 RMN=0.491 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.652 RMN=0.490 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.651 RMN=0.489 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.652 RMN=0.489 | Duration: HSE=0.021 RMN=0.022
Accuracy: HSE=0.651 RMN=0.486 | Durati

**The model will be used on cpu only. Therefore let's estimate perfomance of models on CPU:**

In [11]:
import itertools

rmn.is_cuda = False
test(
    itertools.islice(iter_images("affectnet"), 100),
    HSEmotionRecognizer(device="cpu"),
    rmn.RMN(),
    logging_step=10,
)

/root/.hsemotion/enet_b0_8_best_vgaf.pt Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
Accuracy: HSE=1.000 RMN=1.000 | Duration: HSE=0.195 RMN=0.926
Accuracy: HSE=0.818 RMN=0.545 | Duration: HSE=0.071 RMN=0.977
Accuracy: HSE=0.810 RMN=0.524 | Duration: HSE=0.064 RMN=0.990
Accuracy: HSE=0.871 RMN=0.516 | Duration: HSE=0.062 RMN=0.954
Accuracy: HSE=0.805 RMN=0.537 | Duration: HSE=0.061 RMN=0.964
Accuracy: HSE=0.745 RMN=0.510 | Duration: HSE=0.061 RMN=0.969
Accuracy: HSE=0.705 RMN=0.475 | Duration: HSE=0.061 RMN=0.970
Accuracy: HSE=0.676 RMN=0.479 | Duration: HSE=0.060 RMN=0.960
Accuracy: HSE=0.704 RMN=0.481 | Duration: HSE=0.060 RMN=0.959
Accuracy: HSE=0.681 RMN=0.473 | Duration: HSE=0.060 RMN=0.962
Images: 100
Accuracy: HSE=0.660 RMN=0.470 | Duration: HSE=0.060 RMN=0.965


**HSE model is better both in accuracy and perfomance**