# Automatic Stimuli Creation for Degrading Visual Information about Articulation or Mouthing
<br>
<div align="center">Wim Pouw (wim.pouw@donders.ru.nl) & Annika Schiefner (a.schiefner@uva.nl)>
</div>

<img src="Images/mask_comparison.gif" alt="isolated" width="300"/>

## Info documents

<img src="Images/envision_banner.png" alt="isolated" width="100"/>


This script uses mediapipe to automatically and dynamically blur parts of the face, now specifically the mouth region. This can be helpful for stimuli generation in the application of sign languages but also spoken languages. See below for a description of this.

* location Repository:  https://github.com/WimPouw/StimuliCreationMaskingMouth.ipynb

* location Jupyter notebook: https://github.com/WimPouw/AutoVisualDegradArticulationMouthing/blob/main/StimuliCreationMaskingMouth.ipynb

Current Github: https://github.com/WimPouw/AutoVisualDegradArticulationMouthing

# Citations
* Pouw, W., & Schiefner, A. (2025). Masking the mouth region for visual degradation of articulation and mouthing (Version 1.0.0) [Software]. Retrieved from https://github.com/WimPouw/AutoVisualDegradArticulationMouthing



## Application for Sign langauages
In sign language research, psycholinguists often work with videos of individual signs, akin to working with single spoken words and gestures. These individual signs have different components, each contributing to the information conveyed by the lexical signs. This includes the hands, movements of the body and facial expressions and movements of the mouth.

Sign languages, existing in close proximity to hearing communities, often incorporate the movements associated with spoken words in the signs. When producing individual signs, signers may thus produce a mouth movement that looks as though they were saying the word alongside the hand movement. In fact, signers often find it difficult to produce signs with a neutral face and the resulting videos are perceived as less natural. This mean, in turn, that observers may be able to extract information about the lexical item from looking at the mouth, even without looking at the hands. If you want to investigate what information is conveyed by the manual movements without influence from what the mouth is contributing, you therefore need to mask the mouth area. This module provides an approach to doing that, blurring the area around the mouth such that lip reading becomes impossible.

Whenever the signers hands now move into the area around the mouth, you need to decide what to do with them. They can be covered by the same mask such that those parts of the hand that are close to the mouth are obscured as well or they can be excluded from the mask so the hand stays visible even around the mouth. As some handshapes are easier to identify and fit into a mask, e.g. a fist is a nice little circle and is therefore easy to track, while shaping the hand like a C leaves an opening between the fingers and the thumb which is more difficult to mask. Therefore, some portions of the mouth may still be visible with this option. The module provides different options, so you get to choose which one you prefer. Try and see for your own items and purpose what works best.

## Application for Spoken Languages
Perhaps you know the phenomenon of McGurk, where information about lip or tongue movements, i.e., articulatory gestures, influence the sounds you tend to hear. With this code you can easily adjust how much of such information is present.

## Use
Make sure to install all the packages in requirements.txt. Then move your videos that you want to mask into the input folder. Then run this code, which will loop through all the videos contained in the input folder; and saves all the results in the output folders.

Please use, improve and adapt as you see fit.

This python notebook runs you through the procedure of taking videos as inputs with a single person in the video, and outputting the 1 outputs of the kinematic timeseries, and optionally masking video with facial, hand, and arm kinematics ovelays.

## Additional information backbone of the tool (Mediapipe Holistic Tracking)
https://google.github.io/mediapipe/solutions/holistic.html

## Citation of mediapipe
citation: Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., ... & Grundmann, M. (2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.

## Modification that is the basis of this tool
Our modification of the Mediapipe tool is using the coordinates of mediapipe to determine a region that we mask with a blur. We can change the coordinates for this bounding polygon if we want to blur other regions on the face. Please have a look below for more information about the keypoints.

In [2]:
#load in required packages
import mediapipe as mp #mediapipe
import cv2 #opencv
import math #basic operations
import numpy as np #basic operations
import pandas as pd #data wrangling
import csv #csv saving
import os #some basic functions for inspecting folder structure etc.

#list all videos in input_videofolder
from os import listdir
from os.path import isfile, join
mypath = "./Input_Videos/" #this is your folder with (all) your video(s)
vfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] #loop through the filenames and collect them in a list
#time series output folder
inputfol = "./Input_Videos/"
outputf_mask = "./Output_Videos/"
outtputf_ts = "./Output_TimeSeries/"

#check videos to be processed
print("The following folder is set as the output folder where all the pose time series are stored")
print(os.path.abspath(outtputf_ts))
print("\n The following folder is set as the output folder for saving the masked videos ")
print(os.path.abspath(outputf_mask))
print("\n The following video(s) will be processed for masking: ")
print(vfiles)

#initialize modules and functions

#load in mediapipe modules
mp_holistic = mp.solutions.holistic
# Import drawing_utils and drawing_styles.
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

##################FUNCTIONS AND OTHER VARIABLES
#landmarks 33x that are used by Mediapipe (Blazepose)
markersbody = ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_OUTER', 'RIGHT_EYE', 'RIGHT_EYE_OUTER',
          'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 
          'RIGHT_ELBOW', 'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 'RIGHT_INDEX',
          'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE',
          'LEFT_HEEL', 'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']

markershands = ['LEFT_WRIST', 'LEFT_THUMB_CMC', 'LEFT_THUMB_MCP', 'LEFT_THUMB_IP', 'LEFT_THUMB_TIP', 'LEFT_INDEX_FINGER_MCP',
              'LEFT_INDEX_FINGER_PIP', 'LEFT_INDEX_FINGER_DIP', 'LEFT_INDEX_FINGER_TIP', 'LEFT_MIDDLE_FINGER_MCP', 
               'LEFT_MIDDLE_FINGER_PIP', 'LEFT_MIDDLE_FINGER_DIP', 'LEFT_MIDDLE_FINGER_TIP', 'LEFT_RING_FINGER_MCP', 
               'LEFT_RING_FINGER_PIP', 'LEFT_RING_FINGER_DIP', 'LEFT_RING_FINGER_TIP', 'LEFT_PINKY_FINGER_MCP', 
               'LEFT_PINKY_FINGER_PIP', 'LEFT_PINKY_FINGER_DIP', 'LEFT_PINKY_FINGER_TIP',
              'RIGHT_WRIST', 'RIGHT_THUMB_CMC', 'RIGHT_THUMB_MCP', 'RIGHT_THUMB_IP', 'RIGHT_THUMB_TIP', 'RIGHT_INDEX_FINGER_MCP',
              'RIGHT_INDEX_FINGER_PIP', 'RIGHT_INDEX_FINGER_DIP', 'RIGHT_INDEX_FINGER_TIP', 'RIGHT_MIDDLE_FINGER_MCP', 
               'RIGHT_MIDDLE_FINGER_PIP', 'RIGHT_MIDDLE_FINGER_DIP', 'RIGHT_MIDDLE_FINGER_TIP', 'RIGHT_RING_FINGER_MCP', 
               'RIGHT_RING_FINGER_PIP', 'RIGHT_RING_FINGER_DIP', 'RIGHT_RING_FINGER_TIP', 'RIGHT_PINKY_FINGER_MCP', 
               'RIGHT_PINKY_FINGER_PIP', 'RIGHT_PINKY_FINGER_DIP', 'RIGHT_PINKY_FINGER_TIP']
facemarks = [str(x) for x in range(478)] #there are 478 points for the face mesh (see google holistic face mesh info for landmarks)

print("Note that we have the following number of pose keypoints for markers body")
print(len(markersbody))

print("\n Note that we have the following number of pose keypoints for markers hands")
print(len(markershands))

print("\n Note that we have the following number of pose keypoints for markers face")
print(len(facemarks ))

#set up the column names and objects for the time series data (add time as the first variable)
markerxyzbody = ['time']
markerxyzhands = ['time']
markerxyzface = ['time']

for mark in markersbody:
    for pos in ['X', 'Y', 'Z', 'visibility']: #for markers of the body you also have a visibility reliability score
        nm = pos + "_" + mark
        markerxyzbody.append(nm)
for mark in markershands:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzhands.append(nm)
for mark in facemarks:
    for pos in ['X', 'Y', 'Z']:
        nm = pos + "_" + mark
        markerxyzface.append(nm)

#check if there are numbers in a string
def num_there(s):
    return any(i.isdigit() for i in s)

#take some google classification object and convert it into a string
def makegoginto_str(gogobj):
    gogobj = str(gogobj).strip("[]")
    gogobj = gogobj.split("\n")
    return(gogobj[:-1]) #ignore last element as this has nothing

#make the stringifyd position traces into clean numerical values
def listpostions(newsamplemarks):
    newsamplemarks = makegoginto_str(newsamplemarks)
    tracking_p = []
    for value in newsamplemarks:
        if num_there(value):
            stripped = value.split(':', 1)[1]
            stripped = stripped.strip() #remove spaces in the string if present
            tracking_p.append(stripped) #add to this list  
    return(tracking_p)

The following folder is set as the output folder where all the pose time series are stored
d:\Research_projects\AutoVisualDegradArticulationMouthing\Output_TimeSeries

 The following folder is set as the output folder for saving the masked videos 
d:\Research_projects\AutoVisualDegradArticulationMouthing\Output_Videos

 The following video(s) will be processed for masking: 
['DOLFIJN.mp4', 'ETEN.mp4', 'NULL.mp4', 'OCHTEND.mp4', 'OLIFANT.mp4', 'RIETJE.mp4']
Note that we have the following number of pose keypoints for markers body
33

 Note that we have the following number of pose keypoints for markers hands
42

 Note that we have the following number of pose keypoints for markers face
478


## Main procedure

Below we apply a blur mask, with some blurring value (how many pixels are mixed) and an opacity value (how much is the original image blocked), for particular position values of the face mask. In the images folder you will find this image, where if you would zoom in will have numbers:

<img src="Images/keypoints_holistic_face.png" alt="isolated" width="600"/>


# Mouth landmarks for masking
Now the position landmark values for the mouth region we have identified as follows.

MOUTH_LANDMARKS = [192, 206, 2, 426, 436, 434, 431, 211]

Note that nothing is stopping you to draw other polygons that cover some other area of the face!



In [17]:
# do you want to apply masking?
masking = True
blur_kernel_size = 111  # Adjust this value to change blur intensity
opacity = 1  # 0.0 is fully transparent, 1.0 is fully opaque

# Mouth landmarks for masking
MOUTH_LANDMARKS = [192, 206, 2, 426, 436, 434, 431, 211]

# We will now loop over all the videos that are present in the video file
for vidf in vfiles:
    print("We will now process video:")
    print(vidf)
    print("This is video number " + str(vfiles.index(vidf))+ " of " + str(len(vfiles)) + " videos in total")
    
    videoname = vidf
    videoloc = inputfol + videoname
    capture = cv2.VideoCapture(videoloc)
    frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
    frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
    samplerate = capture.get(cv2.CAP_PROP_FPS)

    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    out = cv2.VideoWriter(outputf_mask+'handv3_'+videoname, fourcc, 
                         fps = samplerate, frameSize = (int(frameWidth), int(frameHeight)))

    time = 0
    tsbody = [markerxyzbody]
    tshands = [markerxyzhands]
    tsface = [markerxyzface]
    
    with mp_holistic.Holistic(
            static_image_mode=False, enable_segmentation=True, refine_face_landmarks=True) as holistic:
        while (True):
            ret, image = capture.read()
            if ret == True:
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                results = holistic.process(image)
                
                h, w, c = image.shape
                if np.all(results.face_landmarks) != None:
                    original_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                    
                    if masking:
                        # Create mask for mouth area
                        mouth_mask = np.zeros((h, w), dtype=np.uint8)
                        landmarks = results.face_landmarks.landmark
                        
                        # Get mouth area points
                        mouth_points = np.array([(int(landmarks[idx].x * w), int(landmarks[idx].y * h)) 
                                               for idx in MOUTH_LANDMARKS], dtype=np.int32)
                        
                        # Fill mouth polygon
                        cv2.fillPoly(mouth_mask, [mouth_points], 255)
                        
                        # Create hand mask
                        hand_mask = np.zeros((h, w), dtype=np.uint8)
                        
                        # Draw hands on the mask
                        if results.left_hand_landmarks:
                            hand_points = []
                            for landmark in results.left_hand_landmarks.landmark:
                                x = int(landmark.x * w)
                                y = int(landmark.y * h)
                                hand_points.append((x, y))
                            if len(hand_points) > 0:
                                hull = cv2.convexHull(np.array(hand_points))
                                cv2.fillConvexPoly(hand_mask, hull, 255)
                                
                        if results.right_hand_landmarks:
                            hand_points = []
                            for landmark in results.right_hand_landmarks.landmark:
                                x = int(landmark.x * w)
                                y = int(landmark.y * h)
                                hand_points.append((x, y))
                            if len(hand_points) > 0:
                                hull = cv2.convexHull(np.array(hand_points))
                                cv2.fillConvexPoly(hand_mask, hull, 255)
                        
                        # Create a more precise hand mask with minimal dilation
                        kernel = np.ones((5,5), np.uint8)
                        hand_mask = cv2.dilate(hand_mask, kernel, iterations=1)
                        
                        # Smooth the edges of the hand mask
                        hand_mask = cv2.GaussianBlur(hand_mask, (3,3), 0)
                        _, hand_mask = cv2.threshold(hand_mask, 127, 255, cv2.THRESH_BINARY)
                        
                        # First apply the mouth blur
                        blur_region = cv2.GaussianBlur(original_image, (blur_kernel_size, blur_kernel_size), 0)
                        mask_3channel = cv2.cvtColor(mouth_mask, cv2.COLOR_GRAY2BGR) / 255.0
                        blurred_image = (original_image * (1 - mask_3channel * opacity) + 
                                       blur_region * (mask_3channel * opacity)).astype(np.uint8)
                        
                        # Then overlay the hands
                        hand_mask_3channel = cv2.cvtColor(hand_mask, cv2.COLOR_GRAY2BGR) / 255.0
                        original_image = (blurred_image * (1 - hand_mask_3channel) + 
                                        original_image * hand_mask_3channel).astype(np.uint8)
                    
                    # Save time series data
                    samplebody = listpostions(results.pose_landmarks)
                    samplehands = listpostions([results.left_hand_landmarks, results.right_hand_landmarks])
                    sampleface = listpostions(results.face_landmarks)
                    samplebody.insert(0, time)
                    samplehands.insert(0, time)
                    sampleface.insert(0, time)
                    tsbody.append(samplebody)
                    tshands.append(samplehands)
                    tsface.append(sampleface)
                    
                if np.all(results.face_landmarks) == None:
                    original_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                    samplebody = [np.nan for x in range(len(markerxyzbody)-1)]
                    samplehands = [np.nan for x in range(len(markerxyzhands)-1)]
                    sampleface = [np.nan for x in range(len(markerxyzface)-1)]
                    samplebody.insert(0, time)
                    samplehands.insert(0, time)
                    sampleface.insert(0, time)
                    tsbody.append(samplebody)
                    tshands.append(samplehands)
                    tsface.append(sampleface)
                
                cv2.imshow("resizedimage", original_image)
                out.write(original_image)
                time = time+(1000/samplerate)
                
            if cv2.waitKey(1) == 27:
                break
            if ret == False:
                break

    out.release()
    capture.release()
    cv2.destroyAllWindows()
    
    # Write CSV files
    filebody = open(outtputf_ts + vidf[:-4]+'_body.csv', 'w+', newline ='')
    with filebody:    
        write = csv.writer(filebody)
        write.writerows(tsbody)
        
    filehands = open(outtputf_ts + vidf[:-4]+'_hands.csv', 'w+', newline ='')
    with filehands:
        write = csv.writer(filehands)
        write.writerows(tshands)
        
    fileface = open(outtputf_ts + vidf[:-4]+'_face.csv', 'w+', newline ='')
    with fileface:    
        write = csv.writer(fileface)
        write.writerows(tsface)

print("Done with processing all folders; go look in your output folders!")

We will now process video:
DOLFIJN.mp4
This is video number 0 of 6 videos in total
We will now process video:
ETEN.mp4
This is video number 1 of 6 videos in total
We will now process video:
NULL.mp4
This is video number 2 of 6 videos in total
We will now process video:
OCHTEND.mp4
This is video number 3 of 6 videos in total
We will now process video:
OLIFANT.mp4
This is video number 4 of 6 videos in total
We will now process video:
RIETJE.mp4
This is video number 5 of 6 videos in total
Done with processing all folders; go look in your output folders!


# Making a stimuli set
Now we could also make a stimuli set where we iterate over different levels in which visual information is blocked. We also now isolate automatically the head of the persons, in case we only want to show the head (we choose a fixed bounding box on this based on median values of 30 frames). You can undo headzoom, if you want to keep the original.

In [5]:
# Now make a stimuli creation pipeline
blur_kernel_size = 111  # Adjust this value to change blur intensity
opacitylist = [0, 0.25, 0.50, 0.75, 1]  # 0.0 is fully transparent, 1.0 is fully opaque
headzoom = True
masking = True
MOUTH_LANDMARKS = [192, 206, 2, 426, 436, 434, 431, 211]

def get_head_bounds(landmarks, w, h, padding=0.2):
    """Calculate head bounding box with padding"""
    face_coords = np.array([(landmark.x * w, landmark.y * h) for landmark in landmarks])
    min_x, min_y = np.min(face_coords, axis=0)
    max_x, max_y = np.max(face_coords, axis=0)
    
    # Add padding
    width = max_x - min_x
    height = max_y - min_y
    pad_x = width * padding
    pad_y = height * padding
    
    # Ensure bounds are within image
    x1 = max(0, int(min_x - pad_x))
    y1 = max(0, int(min_y - pad_y))
    x2 = min(w, int(max_x + pad_x))
    y2 = min(h, int(max_y + pad_y))
    
    return x1, y1, x2, y2

# Process each video with different opacities
for opacity in opacitylist:
    for vidf in vfiles:
        print(f"Processing video {vfiles.index(vidf) + 1}/{len(vfiles)}: {vidf}")
        
        videoloc = inputfol + vidf
        capture = cv2.VideoCapture(videoloc)
        frameWidth = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
        frameHeight = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
        samplerate = capture.get(cv2.CAP_PROP_FPS)

        # Calculate stable head bounds if head zoom is enabled
        head_bounds = None
        if headzoom:
            print("Calculating stable head bounds...")
            temp_capture = cv2.VideoCapture(videoloc)
            bounds_list = []
            frame_count = 0
            
            with mp_holistic.Holistic(static_image_mode=False, enable_segmentation=True, refine_face_landmarks=True) as holistic:
                while frame_count < 30:  # Check first 30 frames
                    ret, image = temp_capture.read()
                    if not ret:
                        break
                    
                    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                    results = holistic.process(image)
                    
                    if results.face_landmarks:
                        h, w, _ = image.shape
                        bounds = get_head_bounds(results.face_landmarks.landmark, w, h)
                        bounds_list.append(bounds)
                        frame_count += 1
            
            temp_capture.release()
            
            if bounds_list:
                bounds_array = np.array(bounds_list)
                head_bounds = tuple(map(int, np.median(bounds_array, axis=0)))
                x1, y1, x2, y2 = head_bounds
                frameWidth = x2 - x1
                frameHeight = y2 - y1
                print(f"Head bounds calculated: {head_bounds}")
            else:
                print("Warning: Could not detect face in initial frames")
                headzoom = False

        # Set up video writer with appropriate dimensions
        fourcc = cv2.VideoWriter_fourcc(*'MP4V')
        if headzoom and head_bounds is not None:
            out = cv2.VideoWriter(
                outputf_mask + f'opacity{opacity}_headcrop_{vidf}', 
                fourcc, fps=samplerate, 
                frameSize=(int(frameWidth), int(frameHeight))
            )
        else:
            out = cv2.VideoWriter(
                outputf_mask + f'opacity{opacity}_{vidf}', 
                fourcc, fps=samplerate, 
                frameSize=(int(frameWidth), int(frameHeight))
            )

        # Initialize time series storage
        time = 0
        tsbody = [markerxyzbody]
        tshands = [markerxyzhands]
        tsface = [markerxyzface]
        
        # Process video frames
        with mp_holistic.Holistic(static_image_mode=False, enable_segmentation=True, refine_face_landmarks=True) as holistic:
            while True:
                ret, image = capture.read()
                if not ret:
                    break
                    
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                results = holistic.process(image)
                
                original_image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                
                if results.face_landmarks:
                    h, w, c = original_image.shape
                    
                    if masking:
                        # Create mouth mask
                        mouth_mask = np.zeros((h, w), dtype=np.uint8)
                        landmarks = results.face_landmarks.landmark
                        mouth_points = np.array([(int(landmarks[idx].x * w), int(landmarks[idx].y * h)) 
                                            for idx in MOUTH_LANDMARKS], dtype=np.int32)
                        cv2.fillPoly(mouth_mask, [mouth_points], 255)
                        
                        # Create hand mask
                        hand_mask = np.zeros((h, w), dtype=np.uint8)
                        
                        # Add hands to mask
                        if results.left_hand_landmarks:
                            hand_points = []
                            for landmark in results.left_hand_landmarks.landmark:
                                x = int(landmark.x * w)
                                y = int(landmark.y * h)
                                hand_points.append((x, y))
                            if hand_points:
                                hull = cv2.convexHull(np.array(hand_points))
                                cv2.fillConvexPoly(hand_mask, hull, 255)
                                
                        if results.right_hand_landmarks:
                            hand_points = []
                            for landmark in results.right_hand_landmarks.landmark:
                                x = int(landmark.x * w)
                                y = int(landmark.y * h)
                                hand_points.append((x, y))
                            if hand_points:
                                hull = cv2.convexHull(np.array(hand_points))
                                cv2.fillConvexPoly(hand_mask, hull, 255)
                        
                        # Smooth hand mask edges
                        kernel = np.ones((5,5), np.uint8)
                        hand_mask = cv2.dilate(hand_mask, kernel, iterations=1)
                        hand_mask = cv2.GaussianBlur(hand_mask, (3,3), 0)
                        _, hand_mask = cv2.threshold(hand_mask, 127, 255, cv2.THRESH_BINARY)
                        
                        # Apply mouth blur
                        blur_region = cv2.GaussianBlur(original_image, (blur_kernel_size, blur_kernel_size), 0)
                        mask_3channel = cv2.cvtColor(mouth_mask, cv2.COLOR_GRAY2BGR) / 255.0
                        blurred_image = (original_image * (1 - mask_3channel * opacity) + 
                                    blur_region * (mask_3channel * opacity)).astype(np.uint8)
                        
                        # Overlay hands
                        hand_mask_3channel = cv2.cvtColor(hand_mask, cv2.COLOR_GRAY2BGR) / 255.0
                        original_image = (blurred_image * (1 - hand_mask_3channel) + 
                                        original_image * hand_mask_3channel).astype(np.uint8)
                    
                    # Save landmarks data
                    samplebody = listpostions(results.pose_landmarks)
                    samplehands = listpostions([results.left_hand_landmarks, results.right_hand_landmarks])
                    sampleface = listpostions(results.face_landmarks)
                    
                else:
                    # No face detected
                    samplebody = [np.nan for x in range(len(markerxyzbody)-1)]
                    samplehands = [np.nan for x in range(len(markerxyzhands)-1)]
                    sampleface = [np.nan for x in range(len(markerxyzface)-1)]
                
                # Apply head zooming after all processing
                if headzoom and head_bounds is not None:
                    x1, y1, x2, y2 = head_bounds
                    original_image = original_image[y1:y2, x1:x2]
                
                # Add time to samples and append to time series
                samplebody.insert(0, time)
                samplehands.insert(0, time)
                sampleface.insert(0, time)
                tsbody.append(samplebody)
                tshands.append(samplehands)
                tsface.append(sampleface)
                
                cv2.imshow("resizedimage", original_image)
                out.write(original_image)
                time = time + (1000/samplerate)
                
                if cv2.waitKey(1) == 27:
                    break

        # Clean up video resources
        out.release()
        capture.release()
        cv2.destroyAllWindows()

        # Write CSV files only if they don't exist
        body_path = outtputf_ts + vidf[:-4] + '_body.csv'
        hands_path = outtputf_ts + vidf[:-4] + '_hands.csv'
        face_path = outtputf_ts + vidf[:-4] + '_face.csv'

        if not os.path.exists(body_path):
            with open(body_path, 'w+', newline='') as filebody:
                write = csv.writer(filebody)
                write.writerows(tsbody)
                print(f"Saved body data to {body_path}")

        if not os.path.exists(hands_path):
            with open(hands_path, 'w+', newline='') as filehands:
                write = csv.writer(filehands)
                write.writerows(tshands)
                print(f"Saved hands data to {hands_path}")

        if not os.path.exists(face_path):
            with open(face_path, 'w+', newline='') as fileface:
                write = csv.writer(fileface)
                write.writerows(tsface)
                print(f"Saved face data to {face_path}")

print("Done with processing all folders; go look in your output folders!")

Processing video 1/6: DOLFIJN.mp4
Calculating stable head bounds...
Head bounds calculated: (417, 181, 597, 388)
Processing video 2/6: ETEN.mp4
Calculating stable head bounds...
Head bounds calculated: (431, 203, 609, 414)
Processing video 3/6: NULL.mp4
Calculating stable head bounds...
Head bounds calculated: (437, 187, 615, 393)
Processing video 4/6: OCHTEND.mp4
Calculating stable head bounds...
Head bounds calculated: (479, 197, 651, 408)
Processing video 5/6: OLIFANT.mp4
Calculating stable head bounds...
Head bounds calculated: (406, 195, 584, 400)
Processing video 6/6: RIETJE.mp4
Calculating stable head bounds...
Head bounds calculated: (469, 242, 636, 437)
Processing video 1/6: DOLFIJN.mp4
Calculating stable head bounds...
Head bounds calculated: (417, 181, 597, 388)
Processing video 2/6: ETEN.mp4
Calculating stable head bounds...
Head bounds calculated: (431, 203, 609, 414)
Processing video 3/6: NULL.mp4
Calculating stable head bounds...
Head bounds calculated: (437, 187, 615, 3

In [None]:
# Bonus: Create a gif with all opacities

In [12]:
import cv2
import numpy as np
from PIL import Image
import glob
import random
import os

def create_comparison_gif(output_folder, output_name="comparison.gif", grid_size=(4, 4), duration=100):
    # Get all video files for each opacity
    video_files = {}
    for opacity in [0, 0.50, 1]:
        pattern = os.path.join(output_folder, f'opacity{opacity}_*.mp4')
        video_files[opacity] = glob.glob(pattern)
    
    # Randomly select videos for each grid position
    selected_videos = []
    for _ in range(grid_size[0] * grid_size[1]):
        opacity = random.choice(list(video_files.keys()))
        if video_files[opacity]:  # If there are videos available for this opacity
            video = random.choice(video_files[opacity])
            selected_videos.append((opacity, video))
    
    # Open all videos
    video_captures = []
    for _, video_path in selected_videos:
        cap = cv2.VideoCapture(video_path)
        video_captures.append(cap)
    
    # Get video properties
    sample_cap = video_captures[0]
    frame_width = int(sample_cap.get(cv2.CAP_PROP_FRAME_WIDTH))*2
    frame_height = int(sample_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))*2
    fps = int(sample_cap.get(cv2.CAP_PROP_FPS))
    
    # Calculate dimensions for grid cells
    cell_width = frame_width // grid_size[1]
    cell_height = frame_height // grid_size[0]
    
    # Create list to store frames for GIF
    gif_frames = []
    
    while True:
        # Read frames from all videos
        frames = []
        all_read = True
        
        for cap in video_captures:
            ret, frame = cap.read()
            if not ret:
                all_read = False
                break
            # Resize frame to fit grid cell
            frame = cv2.resize(frame, (cell_width, cell_height))
            frames.append(frame)
        
        if not all_read:
            break
        
        # Create grid frame
        grid_frame = np.zeros((cell_height * grid_size[0], 
                             cell_width * grid_size[1], 
                             3), dtype=np.uint8)
        
        # Fill grid with frames
        for idx, frame in enumerate(frames):
            i = idx // grid_size[1]
            j = idx % grid_size[1]
            grid_frame[i*cell_height:(i+1)*cell_height, 
                      j*cell_width:(j+1)*cell_width] = frame
        
        # Add opacity labels
        for idx, (opacity, _) in enumerate(selected_videos):
            i = idx // grid_size[1]
            j = idx % grid_size[1]
                   
        # Convert BGR to RGB for PIL
        grid_frame_rgb = cv2.cvtColor(grid_frame, cv2.COLOR_BGR2RGB)
        gif_frames.append(Image.fromarray(grid_frame_rgb))
    
    # Release video captures
    for cap in video_captures:
        cap.release()
    
    # Save as GIF
    if gif_frames:
        gif_frames[0].save(
            os.path.join(output_folder, output_name),
            save_all=True,
            append_images=gif_frames[1:],
            duration=duration,  # milliseconds per frame
            loop=0
        )
        print(f"Created GIF: {output_name}")
    else:
        print("No frames were processed. Check if videos are readable.")

# Usage
output_folder = outputf_mask  # Use your output folder path
create_comparison_gif(output_folder, 
                     output_name="mask_comparison.gif",
                     grid_size=(4, 4),
                     duration=100)  # Adjust duration (ms) as needed (fps=10)

Created GIF: mask_comparison.gif
