# Video Processing Notebook |
##### <strong>Author:</strong> <u>Walter Dych</u> <em>(walterpdych@gmail.com)</em>
##### <strong>Edits/Documentation:</strong> <u>Karee Garvin</u> <em>(kgarvin@fas.harvard.edu)</em>

This notebook serves the purpose of video processing using various computational techniques.

A Python script that processes a video file using the MediaPipe Holistic model. The script reads in a video file and extracts pose landmarks from each frame of the video. The pose landmarks are then stored in a DataFrame along with the corresponding time stamp.

The script uses the `cv2.VideoCapture` function from the OpenCV library to read in the video file. The `isOpened` method is used to check if the video file was successfully opened. If the video file was opened successfully, the script reads in each frame of the video using the `read` method. The `ret` variable is used to check if the frame was successfully read in. If the frame was not successfully read in, the script breaks out of the loop.

The `cv2.cvtColor` function is used to convert the color space of the image from BGR to RGB. The `holistic.process` method is then used to extract pose landmarks from the image. If pose landmarks are detected in the image, the x and y coordinates of the right wrist landmark are extracted and stored in the DataFrame along with the corresponding time stamp.

The `cv2.CAP_PROP_POS_MSEC` method is used to get the time stamp of the current frame in milliseconds. This time stamp is stored in the `time_ms` variable and appended to the DataFrame along with the right wrist landmark coordinates.

Finally, the `cap.release` method is used to release the video file and free up system resources.


## Importing Libraries
Here, we import essential libraries:
- `cv2`: OpenCV for image and video processing
- `mediapipe`: Google"s MediaPipe for pose estimation
- `os`: For operating system related tasks
- `pandas`: For DataFrame support

In [1]:
import cv2
import mediapipe as mp
import os
import pandas as pd

## Setting Parameters
In this section, you can modify the following parameters:
- `MODEL`: Choose between Lite model (`1`) and Full model (`2`). Lite Model (`1`) is the  `Default`.
- `video_path`: Path to the video file.

In [2]:
MODEL = 2  # 1 = Lite model, 2 = Full model
video_path = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV"  # Add your video file path here

if os.path.exists(video_path) == True:
    print(f"{video_path} is a valid file. Proceed with processing.")

else:
    raise ValueError(f"{video_path} does not exist. Try adding the entire file path.")

C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV is a valid file. Proceed with processing.


## Initialization
This part initializes MediaPipe components used in the notebook.

In [3]:
# Initialize MediaPipe components
mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic

## Processing Loop
The core logic of video processing is performed in this loop.

In [4]:
print(f"Processing video at {video_path}")
with mp_holistic.Holistic(static_image_mode=False, model_complexity=MODEL) as holistic:
    # Initialize DataFrame to store data
    data = []
    cap = cv2.VideoCapture(video_path)
    while cap.isOpened():
        ret, image = cap.read()
        
        if not ret:
            print("Ignoring empty camera frame.")
            break

        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = holistic.process(image)

        # Append data to list
        time_ms = cap.get(cv2.CAP_PROP_POS_MSEC)
        
        # Dictionary to store data
        if results.pose_landmarks is not None:
            right_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].x
            right_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_SHOULDER].y
            left_shoulder_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].x
            left_shoulder_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_SHOULDER].y
            right_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].x
            right_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_ELBOW].y
            left_elbow_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].x
            left_elbow_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_ELBOW].y
            right_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].x
            right_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_WRIST].y
            left_wrist_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].x
            left_wrist_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_WRIST].y
            right_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].x
            right_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.RIGHT_EYE].y
            left_eye_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].x
            left_eye_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.LEFT_EYE].y
            nose_x = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].x
            nose_y = results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].y
            
            data.append([time_ms, right_shoulder_x, right_shoulder_y, left_shoulder_x, left_shoulder_y, right_elbow_x, right_elbow_y, left_elbow_x, left_elbow_y, right_wrist_x, right_wrist_y, left_wrist_x, left_wrist_y, right_eye_x, right_eye_y, left_eye_x, left_eye_y, nose_x, nose_y])

    cap.release()

Processing video at C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/VIDEO_FILES/5012_I.MOV
Ignoring empty camera frame.


In [5]:
# Convert to DataFrame
df = pd.DataFrame(data, columns=[
    "time_ms", 
    "right_shoulder_x", "right_shoulder_y", 
    "left_shoulder_x", "left_shoulder_y", 
    "right_elbow_x", "right_elbow_y", 
    "left_elbow_x", "left_elbow_y", 
    "right_wrist_x", "right_wrist_y", 
    "left_wrist_x", "left_wrist_y", 
    "right_eye_x", "right_eye_y", 
    "left_eye_x", "left_eye_y",
    "nose_x", "nose_y"
    ])
df

Unnamed: 0,time_ms,right_shoulder_x,right_shoulder_y,left_shoulder_x,left_shoulder_y,right_elbow_x,right_elbow_y,left_elbow_x,left_elbow_y,right_wrist_x,right_wrist_y,left_wrist_x,left_wrist_y,right_eye_x,right_eye_y,left_eye_x,left_eye_y,nose_x,nose_y
0,0.000000,0.346543,0.413691,0.403167,0.370346,0.267567,0.594082,0.388862,0.499703,0.243763,0.848833,0.379629,0.624571,0.442876,0.292149,0.445312,0.289609,0.447097,0.314755
1,33.366667,0.349762,0.404192,0.403014,0.368685,0.267428,0.593026,0.388766,0.514284,0.243417,0.847239,0.391232,0.657073,0.443817,0.291971,0.446493,0.289600,0.448412,0.314845
2,66.733333,0.351200,0.403555,0.402127,0.367687,0.267399,0.593162,0.381889,0.549503,0.242909,0.846286,0.382327,0.708674,0.444323,0.292319,0.447014,0.290269,0.448583,0.317095
3,100.100000,0.353098,0.402300,0.398918,0.361902,0.267950,0.593586,0.376209,0.541252,0.242784,0.846112,0.378689,0.683741,0.445723,0.292993,0.448321,0.291113,0.449652,0.318940
4,133.466667,0.355514,0.397340,0.396794,0.362337,0.268713,0.593200,0.370343,0.570560,0.242482,0.846839,0.362583,0.754018,0.447435,0.292911,0.449688,0.291053,0.451239,0.319041
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19494,650449.800000,0.238439,0.366778,0.325441,0.332039,0.242718,0.621743,0.326518,0.533944,0.362085,0.709622,0.370577,0.670067,0.308809,0.195500,0.332758,0.203717,0.325474,0.227894
19495,650483.166667,0.238767,0.365260,0.326205,0.333711,0.237742,0.621175,0.326006,0.536027,0.361361,0.704469,0.371994,0.669746,0.308856,0.195518,0.333087,0.203666,0.325584,0.227865
19496,650516.533333,0.238778,0.365264,0.327986,0.334837,0.231410,0.619890,0.326002,0.535907,0.356160,0.699456,0.378909,0.654336,0.308489,0.195568,0.333328,0.203704,0.325390,0.227833
19497,650549.900000,0.238771,0.365269,0.330313,0.336026,0.226552,0.619473,0.326046,0.533799,0.352094,0.687707,0.382221,0.632677,0.309120,0.196582,0.334145,0.204837,0.325690,0.228596


## Data Output
Finally, the processed data is stored in a DataFrame and saved as a pickle file and csv file.

In [6]:
# Print DataFrame shape
print(f"DataFrame Head: {df.head()}")

# Save DataFrame as pickle file
pickle_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.pkl"
df.to_pickle(pickle_file_name)
print(f"DataFrame saved as {pickle_file_name}")

# Save DataFrame as CSV file
csv_file_name = "C:/Users/cosmo/Desktop/Random Scripts/Co-Speech Gesture Automation/Co-Speech-Gesture-Automation/MOTION_TRACKING_FILES/" + os.path.splitext(os.path.basename(video_path))[0] + "_keypoints.csv"
df.to_csv(csv_file_name, index=False)
print(f"DataFrame saved as {csv_file_name}")

DataFrame Head:       time_ms  right_shoulder_x  right_shoulder_y  left_shoulder_x  \
0    0.000000          0.346543          0.413691         0.403167   
1   33.366667          0.349762          0.404192         0.403014   
2   66.733333          0.351200          0.403555         0.402127   
3  100.100000          0.353098          0.402300         0.398918   
4  133.466667          0.355514          0.397340         0.396794   

   left_shoulder_y  right_elbow_x  right_elbow_y  left_elbow_x  left_elbow_y  \
0         0.370346       0.267567       0.594082      0.388862      0.499703   
1         0.368685       0.267428       0.593026      0.388766      0.514284   
2         0.367687       0.267399       0.593162      0.381889      0.549503   
3         0.361902       0.267950       0.593586      0.376209      0.541252   
4         0.362337       0.268713       0.593200      0.370343      0.570560   

   right_wrist_x  right_wrist_y  left_wrist_x  left_wrist_y  right_eye_x  \
0     