Skip to content

Facial expressivity v2.0

anzar edited this page Sep 26, 2023 · 2 revisions
Date completed September 26, 2023
Release where first appeared OpenWillis v1.4
Researcher / Developer Vijay Yadav, Georgios Efstathiadis

1 – Use

import openwillis as ow

framewise_loc, framewise_disp, summary = ow.facial_expressivity(filepath = 'video.mov', baseline_filepath = 'baseline_video.mov')

2 – Methods

Using framewise displacement in facial landmark coordinates to quantify facial expressivity

Methods without a baseline input:

  1. For every frame of the video, coordinates of 468 unique facial landmarks are calculated using the facemesh model within mediapipe. The framewise x, y, and z coordinates for all landmarks (i.e., their locations), whose values range from 0 to 1, are saved in the framewise_loc output.
  2. For each facial landmark, the framewise euclidean distance is calculated. These values range from 0 to 1, with values for the first frame always being 0. The data is saved in the framewise_disp output.
  3. For specific groups of facial landmarks, the average euclidean distance for all landmarks within that group is calculated. These composite values also range from 0 to 1, with values for the first frame always being 0, and are saved in the** framewise_disp** output. The groups are:
    • Overall facial expressivity, saved as: overall
    • Upper facial expressivity, saved as: upper_face
    • Lower facial expressivity, saved as: lower_face
    • Lip expressivity, saved as: lips
    • Eyebrow expressivity, saved as: eyebrows
    • Mouth openness, saved as: mouth_openness
  4. Summary statistics for framewise_disp are saved in the summary output. This includes the mean displacement over the course of the video in all composite variables i.e. the list above.

Methods with a baseline input:

For more information on using a baseline input, see the Research Guidelines on the Github Wiki.

  1. All steps above are performed on the baseline video to acquire the mean displacement over the video for each facial landmark and all facial landmarks cumulatively (these are not outputted).
  2. Using the main video, framewise_loc is calculated as described above.
  3. For framewise_disp on the main video, framewise displacement for each facial landmark is normalized against the overall value of its displacement in the baseline video. The normalized values range from -1 to 1, with negative values signifying displacement lower than baseline and positive values signifying displacement greater than baseline. The method used for normalization is further explained in the appendix of this document.
  4. Summary statistics are saved in the summary output in the same manner as described above.

Additional measures calculated in framewise_disp include:

  1. Framewise displacement of the lower face and upper face separately.
  2. Framewise displacement of the mouth.
  3. Framewise displacement of the eyebrows.
  4. Mouth openness metric calculated as the ratio of the mouth height divided by the minimum of the lower lip height and the upper lip height.

These additional measures are also summarized in the summary output.

Note: The user can combine the mean displacement of any combination of facial landmarks they wish and derive their own custom measure of overall facial expressivity (e.g. focusing on other specific areas of the face). See mediapipe documentation to figure out which landmarks refer to which parts of the face.


3 – Inputs

3.1 – filepath

Type str
Description path to main video

3.2 – baseline_filepath

Type str, optional
Description path to baseline video

4 – Outputs

4.1 – framewise_loc

Type data-type
Description framewise coordinates of 468 facial landmarks. columns refer to landmarks, with every landmark having a value each for its x, y, and z coordinate (ranging between 0 and 1). rows refer to frames in the video.

What the data frame looks like:

frame lmk001_x lmk002_x ... lmk001_y ... lmk001_z ...
0
1
...

4.2 – framewise_disp

Type data-type
Description framewise euclidean distance for each facial landmark. columns refer to individual landmarks, with the last few columns representing mean displacement across landmark composites for the entire face, the upper face, the lower face, lips, eyebrows, and a framewise measure of mouth openness. range for these values is -1 to 1 in case of baselining and 0-1 otherwise. rows refer to frames in the video.

What the data frame looks like:

frame lmk001 lmk002 overall lower_face upper_face lips eyebrows mouth_openness
0 0 0 0 0 0 0 0 0 0
1
...

4.3 – summary

Type data-type
Description summary statistics calculated from the framewise_disp output, namely the mean and standard deviation value of overall, upper face, lower face, lips, and eyebrow expressivity as well as mouth openness.

What the dataframe output looks like:

overall_mean lower_face_mean upper_face_mean lips_mean eyebrows_mean mouth_openness_mean overall_std lower_face_std upper_face_std lips_std eyebrows_std mouth_openness_std

5 – Example use

Here, we use the facial expressivity function to process sample data included in the repository.

import openwillis as ow

framewise_loc, framewise_disp, summary = ow.facial_expressivity(filepath = 'data/subj01.mp4', baseline_filepath = 'data/subj01_base.mp4')
framewise_loc.head(2)
frame lmk000_x lmk001_x lmk002_x lmk003_x lmk004_x lmk005_x ...
0 0.533611 0.534253 0.533953 0.524268 0.534301 0.534436 ...
1 0.533908 0.534034 0.534066 0.524676 0.533993 0.534129 ...

2 rows x 1405 columns


6 – Dependencies

Below are dependencies specific to calculation of this measure.

Dependency License Justification
mediapipe Apache 2.0 Continually maintained, 468 3D facial landmarks, plenty of validation data, good documentation. Doesn’t do AU or emotion detection but that’s not needed for this method anyway.
opencv Apache 2.0 Open-source computer vision library for basic CV operations.

Appendix – Normalization

To see the justification for baseline normalization, see the Research Guidelines on the Github Wiki.

Let’s say that:

A is the framewise displacement for a given landmark in the main video; this value ranges from 0 to 1

B is the overall displacement for that landmark calculated from the baseline video, also ranging from 0 to 1

To normalize, we want to divide A by B to get C, the normalized or baseline-corrected value.

When we do this, three scenarios ensue:

  1. If A > B, C can range from 1 to infinity.
  2. If A < B, C will range between 0 and 1.
  3. If A = B, C will be equal to 1.

To avoid the large possible range of values (0 to infinity), we add 1 to both A and** B** before division. So:

  1. When A > B, C can range between 1 and 2.
  2. When A < B, C will range between 0 and 1.
  3. When A = B, C will be equal to 1.

When we do this, we get a range of values between 0 and 2. If we subtract 1 from this value, we get a range between -1 and 1, with negative values signifying scenarios where A > B, positive values signifying when A < B, and a 0 signifying A = B or no difference from baseline.

Clone this wiki locally