In [2]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Data processing notebook

Quiero armar un pipeline basico que pase los videos por mediapipe, y los guarde junto a sus labels.
Por ahora solo quiero extraer la info de manos y cuerpo y normalizar las coordenadas

In [14]:
# Time test between mediapipe pose + mediapipe hands vs mediapipe holistic
import cv2
import mediapipe as mp
from src.utils.video import frame_reader
cap = cv2.VideoCapture("../../../data/LSA64/video/001_001_001.mp4")

In [6]:
%%time
with mp.solutions.pose.Pose() as pose, mp.solutions.hands.Hands(max_num_hands=2) as hands:
    for frame in frame_reader(cap):
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        pose_results = pose.process(rgb)
        hands_results = hands.process(rgb)

I0000 00:00:1762954242.254261  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954242.256754  117129 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1762954242.261615  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954242.263029  117144 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
W0000 00:00:1762954242.280004  117133 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954242.297388  117130 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954242.316761  117116 in

CPU times: user 7.68 s, sys: 679 ms, total: 8.36 s
Wall time: 6.51 s


In [40]:
%%time
with mp.solutions.holistic.Holistic() as pose:
        for frame in frame_reader(cap):
            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            pose_results = pose.process(rgb)

I0000 00:00:1762951268.829084   89390 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762951268.831084  108614 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
W0000 00:00:1762951268.871498  108601 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762951268.885419  108602 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762951268.887774  108601 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762951268.887970  108600 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0

CPU times: user 8.91 s, sys: 651 ms, total: 9.56 s
Wall time: 5.33 s


In [8]:
# Ahora probar hacer multithreading con hands y pose. Mediapipe usa C++ asi que no tiene el GIL
from concurrent.futures import ThreadPoolExecutor

In [9]:
%%time
with mp.solutions.pose.Pose() as pose, mp.solutions.hands.Hands(max_num_hands=2) as hands, \
     ThreadPoolExecutor(max_workers=2) as ex:

    for frame in frame_reader(cap):
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        fut_pose  = ex.submit(pose.process, rgb)
        fut_hands = ex.submit(hands.process, rgb)

        pose_results  = fut_pose.result()
        hands_results = fut_hands.result()

I0000 00:00:1762954442.191540  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954442.197062  117755 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
I0000 00:00:1762954442.218264  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954442.223498  117770 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
W0000 00:00:1762954442.251296  117758 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954442.270012  117762 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954442.296112  117742 inference_feedback_manager.cc:114] Feedback manager requir

CPU times: user 7.09 s, sys: 726 ms, total: 7.81 s
Wall time: 3.74 s


# Exploracion

Ahora quiero probar si mediapipe llega a capturar las 2 manos o si voy a tener que hacer algo raro para recuperar datos perdidos

In [21]:
from src.mediapipe.render import render_frame

with mp.solutions.pose.Pose() as pose, mp.solutions.hands.Hands(max_num_hands=2) as hands, \
     ThreadPoolExecutor(max_workers=2) as ex:

    for frame in frame_reader(cap, fps=12):
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        fut_pose  = ex.submit(pose.process, rgb)
        fut_hands = ex.submit(hands.process, rgb)

        pose_results  = fut_pose.result()
        hands_results = fut_hands.result()

        render_frame(frame, pose_results, hands_results)

cv2.destroyAllWindows()

I0000 00:00:1762954670.482547  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954670.488477  118830 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
I0000 00:00:1762954670.501894  116989 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1762954670.504645  118845 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 25.2.3-arch1.2), renderer: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
W0000 00:00:1762954670.527625  118832 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954670.538709  118834 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1762954670.564975  118816 inference_feedback_manager.cc:114] Feedback manager requir

# Resultados
Tenia 2 opciones, usar:
1. Mediapipe holistic e ignorar todos los puntos de la cara
2. Mediapipe ands + pose y sincronizarlos

Correr mediapipe hands y pose en multithreading es un poco mas rapido que holistic, el problema es que parece capturar peor los datos de los hand landmarks, y no quiero lidiar con el lio de la sincronizacion de landmarks para solo tener un peque√±o boost de velocidad.