-
Notifications
You must be signed in to change notification settings - Fork 7
Facial expressivity v2.0
Date completed | September 26, 2023 |
Release where first appeared | OpenWillis v1.4 |
Researcher / Developer | Vijay Yadav, Georgios Efstathiadis |
import openwillis as ow
framewise_loc, framewise_disp, summary = ow.facial_expressivity(filepath = 'video.mov', baseline_filepath = 'baseline_video.mov')
Using framewise displacement in facial landmark coordinates to quantify facial expressivity
- For every frame of the video, coordinates of 468 unique facial landmarks are calculated using the facemesh model within mediapipe. The framewise x, y, and z coordinates for all landmarks (i.e., their locations), whose values range from 0 to 1, are saved in the
framewise_loc
output. - For each facial landmark, the framewise euclidean distance is calculated. These values range from 0 to 1, with values for the first frame always being 0. The data is saved in the
framewise_disp
output. - For specific groups of facial landmarks, the average euclidean distance for all landmarks within that group is calculated. These composite values also range from 0 to 1, with values for the first frame always being 0, and are saved in the**
framewise_disp
** output. The groups are:- Overall facial expressivity, saved as:
overall
- Upper facial expressivity, saved as:
upper_face
- Lower facial expressivity, saved as:
lower_face
- Lip expressivity, saved as:
lips
- Eyebrow expressivity, saved as:
eyebrows
- Mouth openness, saved as:
mouth_openness
- Overall facial expressivity, saved as:
- Summary statistics for
framewise_disp
are saved in thesummary
output. This includes the mean displacement over the course of the video in all composite variables i.e. the list above.
For more information on using a baseline
input, see the Research Guidelines on the Github Wiki.
- All steps above are performed on the baseline video to acquire the mean displacement over the video for each facial landmark and all facial landmarks cumulatively (these are not outputted).
- Using the main video,
framewise_loc
is calculated as described above. - For
framewise_disp
on the main video, framewise displacement for each facial landmark is normalized against the overall value of its displacement in the baseline video. The normalized values range from -1 to 1, with negative values signifying displacement lower than baseline and positive values signifying displacement greater than baseline. The method used for normalization is further explained in the appendix of this document. - Summary statistics are saved in the
summary
output in the same manner as described above.
Additional measures calculated in framewise_disp
include:
- Framewise displacement of the lower face and upper face separately.
- Framewise displacement of the mouth.
- Framewise displacement of the eyebrows.
- Mouth openness metric calculated as the ratio of the mouth height divided by the minimum of the lower lip height and the upper lip height.
These additional measures are also summarized in the summary
output.
Note: The user can combine the mean displacement of any combination of facial landmarks they wish and derive their own custom measure of overall facial expressivity (e.g. focusing on other specific areas of the face). See mediapipe documentation to figure out which landmarks refer to which parts of the face.
Type | str |
Description | path to main video |
Type | str, optional |
Description | path to baseline video |
Type | data-type |
Description | framewise coordinates of 468 facial landmarks. columns refer to landmarks, with every landmark having a value each for its x, y, and z coordinate (ranging between 0 and 1). rows refer to frames in the video. |
What the data frame looks like:
frame | lmk001_x | lmk002_x | ... | lmk001_y | ... | lmk001_z | ... |
0 | |||||||
1 | |||||||
... |
Type | data-type |
Description | framewise euclidean distance for each facial landmark. columns refer to individual landmarks, with the last few columns representing mean displacement across landmark composites for the entire face, the upper face, the lower face, lips, eyebrows, and a framewise measure of mouth openness. range for these values is -1 to 1 in case of baselining and 0-1 otherwise. rows refer to frames in the video. |
What the data frame looks like:
frame | lmk001 | lmk002 | … | overall | lower_face | upper_face | lips | eyebrows | mouth_openness |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | |||||||||
... |
Type | data-type |
Description | summary statistics calculated from the framewise_disp output, namely the mean and standard deviation value of overall, upper face, lower face, lips, and eyebrow expressivity as well as mouth openness. |
What the dataframe output looks like:
overall_mean | lower_face_mean | upper_face_mean | lips_mean | eyebrows_mean | mouth_openness_mean | overall_std | lower_face_std | upper_face_std | lips_std | eyebrows_std | mouth_openness_std |
Here, we use the facial expressivity function to process sample data included in the repository.
import openwillis as ow
framewise_loc, framewise_disp, summary = ow.facial_expressivity(filepath = 'data/subj01.mp4', baseline_filepath = 'data/subj01_base.mp4')
framewise_loc.head(2)
frame | lmk000_x | lmk001_x | lmk002_x | lmk003_x | lmk004_x | lmk005_x | ... |
0 | 0.533611 | 0.534253 | 0.533953 | 0.524268 | 0.534301 | 0.534436 | ... |
1 | 0.533908 | 0.534034 | 0.534066 | 0.524676 | 0.533993 | 0.534129 | ... |
2 rows x 1405 columns
Below are dependencies specific to calculation of this measure.
Dependency | License | Justification |
mediapipe | Apache 2.0 | Continually maintained, 468 3D facial landmarks, plenty of validation data, good documentation. Doesn’t do AU or emotion detection but that’s not needed for this method anyway. |
opencv | Apache 2.0 | Open-source computer vision library for basic CV operations. |
To see the justification for baseline normalization, see the Research Guidelines on the Github Wiki.
Let’s say that:
A is the framewise displacement for a given landmark in the main video; this value ranges from 0 to 1
B is the overall displacement for that landmark calculated from the baseline video, also ranging from 0 to 1
To normalize, we want to divide A by B to get C, the normalized or baseline-corrected value.
When we do this, three scenarios ensue:
- If A > B, C can range from 1 to infinity.
- If A < B, C will range between 0 and 1.
- If A = B, C will be equal to 1.
To avoid the large possible range of values (0 to infinity), we add 1 to both A and** B** before division. So:
- When A > B, C can range between 1 and 2.
- When A < B, C will range between 0 and 1.
- When A = B, C will be equal to 1.
When we do this, we get a range of values between 0 and 2. If we subtract 1 from this value, we get a range between -1 and 1, with negative values signifying scenarios where A > B, positive values signifying when A < B, and a 0 signifying A = B or no difference from baseline.
OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
- Release notes
- Getting started
-
List of functions
- Facial Expressivity v2.0
- Emotional Expressivity v2.0
- Eye Blink Rate v1.0
- Speech Transcription with Vosk v1.0
- Speech Transcription with Whisper v1.1
- Speech Transcription with AWS v1.1
- Speaker Separation with Labels v1.0
- Speaker Separation without Labels v1.0
- Vocal Acoustics v2.0
- Speech Characteristics v3.0
- GPS Analysis v1.0
- Research guidelines