# 🎥 Gesture Segmentation Tutorial
This notebook demonstrates the workflow for gesture segmentation using a pre-trained model. 
It will guide you through the steps to:

1. **Extract** 2‑D pose keypoints from a video using [MediaPipe Pose](https://developers.google.com/mediapipe).
2. **Segment** the extracted skeletons with the gesture‑segmentation models.
3. **Export** the result to ELAN for convenient manual inspection

Let's start👇

In [1]:
# --- Library imports----------------------------------------------------
import sys
import pathlib

# Local modules
from test_segmentation import (
    parse_args,
    set_random_seed,
    train_with_config,
    get_config,
)
from utils.extract_mp_pose import extract_keypoints

# Add project root to PYTHONPATH 
PROJECT_ROOT = pathlib.Path.cwd()
if PROJECT_ROOT.as_posix() not in sys.path:
    sys.path.append(PROJECT_ROOT.as_posix())

print("✅ Environment initialised")


✅ Environment initialised


In [2]:
# I use these commands to make the notebook interactive and automatically reload modified modules)
%load_ext autoreload
%autoreload 2
%matplotlib inline


## 1️⃣ Extract pose keypoints
Specify the path to **your** video file below.  
Set `save_video=True` if you would like an overlay video with the skeleton drawn on top.

**NOTE if you want to use your own webcam video**:
- Make sure you have a webcam connected to your computer.
- Change the `video_path` to `"0"` (zero) to use the webcam as input.


In [3]:
# Path to the video you want to analyse
video_path = "input_videos/salma_hayek_short.mp4"  # or specify a path to your video file
# video_path = 0 # ← use this to use your webcam as input

# Extract keypoints. The function returns a dictionary with useful metadata.
pose_data = extract_keypoints(
    vidf=video_path,
    save_video=True,
)

OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
I0000 00:00:1749937176.358596 17318416 gl_context.cc:369] GL version: 2.1 (2.1 Metal - 88.1), renderer: Apple M3 Pro


Video resolution: 1920.0x1080.0, FPS: 29.97002997002997
Number of frames in the video: 1136


Processing frames:   0%|          | 0/1136 [00:00<?, ?frame/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1749937176.454964 17318591 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1749937176.498702 17318590 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1749937176.500884 17318595 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1749937176.501023 17318594 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1749937176.501025 17318587 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabli

The dictionary contains:

* `output_path` – `*.npy` file with the keypoints  
* `video_output_path` – overlay video (optional)  
* `samplerate` – frames‑per‑second of the processed clip

## 2️⃣ Run gesture segmentation

The script below will load and run the gesture segmentation model on the extracted keypoints.

In [4]:
# --- Build CLI‑style arguments ---------------------------------------------
sys.argv = [
    "run_segmentation_test.py",
    "--config",  "config/segmentation/CABB_segment_basic_test.yaml",
    "--poses-path", pose_data["output_path"],
    "--phase",  "test",
    "--seed",   "42",
    "--devices", "0",
    "--models_type", "best"
]

# --- Parse and run ----------------------------------------------------------
opts = parse_args()
set_random_seed(opts.seed)
cfg = get_config(opts.config)

print(f"=== Running segmentation on {opts.poses_path} ===")
segmentation_results = train_with_config(cfg, opts)


Seed set to 42


=== Running segmentation on input_videos/salma_hayek_short.npy ===
Loading dataset...


Preparing segmentation sequences...: 100%|██████████| 1/1 [00:00<00:00, 18893.26it/s]
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/opt/anaconda3/envs/test2/lib/python3.10/site-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.
Restoring states from the checkpoint path at segmentation_models/fold_1/checkpoints/fold_1/best.ckpt
/opt/anaconda3/envs/test2/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:282: Be aware that when using `ckpt_path`, callbacks used to create the checkpoint need to be provided during `Trainer` instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': 'val/segmentation_loss', 'mode': 'min', 'every_n_train_steps': 0, 'every_

[0 1 3 4 5 6 8] [2 7]
Starting Fold 1
INFO: Trainable parameter count: 3345161


/opt/anaconda3/envs/test2/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

/opt/anaconda3/envs/test2/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:79: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 9. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.


Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at segmentation_models/fold_2/checkpoints/fold_2/best.ckpt
Loaded model weights from the checkpoint at segmentation_models/fold_2/checkpoints/fold_2/best.ckpt


[0 2 3 5 6 7 8] [1 4]
Starting Fold 2
INFO: Trainable parameter count: 3345161


Testing: |          | 0/? [00:00<?, ?it/s]

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at segmentation_models/fold_3/checkpoints/fold_3/best.ckpt
Loaded model weights from the checkpoint at segmentation_models/fold_3/checkpoints/fold_3/best.ckpt


[0 1 2 3 4 5 7] [6 8]
Starting Fold 3
INFO: Trainable parameter count: 3345161


Testing: |          | 0/? [00:00<?, ?it/s]

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at segmentation_models/fold_4/checkpoints/fold_4/best.ckpt
Loaded model weights from the checkpoint at segmentation_models/fold_4/checkpoints/fold_4/best.ckpt


[1 2 4 5 6 7 8] [0 3]
Starting Fold 4
INFO: Trainable parameter count: 3345161


Testing: |          | 0/? [00:00<?, ?it/s]

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at segmentation_models/fold_5/checkpoints/fold_5/best.ckpt
Loaded model weights from the checkpoint at segmentation_models/fold_5/checkpoints/fold_5/best.ckpt


[0 1 2 3 4 6 7 8] [5]
Starting Fold 5
INFO: Trainable parameter count: 3345161


Testing: |          | 0/? [00:00<?, ?it/s]

Saving results to CABB_Segmentation/fold_5/test_results.pkl


## 3️⃣ Export to ELAN
Convert the raw segment list into an ELAN tier.  
Afterwards you can open the generated `.eaf` file alongside the overlay video to inspect the automatic segmentation.

In [5]:
from utils.Gesture_Segmentation_to_ELAN import get_elan_files

get_elan_files(
    segmentation_results.copy(),
    fps=pose_data["samplerate"],
    model="skeleton",
    threshold=0.55,
    file_path=pose_data["output_path"],
    video_output_path=pose_data["video_output_path"],
)

print("✅ Finished – ELAN file ready!")


100%|██████████| 5/5 [00:00<00:00, 184.55it/s]


Fold 0 - Number of samples: 9, Number of sequences: 120
Fold 1 - Number of samples: 9, Number of sequences: 120
Fold 2 - Number of samples: 9, Number of sequences: 120
Fold 3 - Number of samples: 9, Number of sequences: 120
Fold 4 - Number of samples: 9, Number of sequences: 120


100%|██████████| 1/1 [00:00<00:00, 124.25it/s]

Pair: 77
Speaker: B
ELAN file saved to: input_videos/salma_hayek_short_segmentation_results_th_0.55.eaf
✅ Finished – ELAN file ready!





## 📝 Exercise 1: Compare Model Checkpoints

In Step 2️⃣ you can swap the `--models_type` argument between `"best"` and `"last"` to observe how the segmentation changes.  

1. **Edit the CLI args** below (Cell 7) to select your model:
    ```python
    sys.argv = [
         "run_segmentation_test.py",
         "--config",       "config/segmentation/CABB_segment_basic_test.yaml",
         "--poses-path",   pose_data["output_path"],
         "--phase",        "test",
         "--seed",         "42",
         "--devices",      "0",
         "--models_type",  "last"    # ← try "best" or "last"
    ]
    ```
2. **Rerun Cell 7** and all following cells to regenerate the ELAN file.
3. **Open the `.eaf`** in ELAN alongside the overlay video to compare.

**Discussion Questions**
- 🔍 What differences do you notice between the `"best"` and `"last"` checkpoints?
- 📈 Are the results substantially different?
- 🤔 Does the accuracy of your pose keypoint extraction impact results more than the chosen model checkpoint?

```markdown
## 📝 Exercise 2: Compare Threshold Values

The threshold value determines how confident the model must be to classify a frame as a gesture. In Step 3️⃣ you can adjust the `threshold` parameter in the `get_elan_files` call to see how it affects segmentation.

1. **Edit the threshold** below in step 3️⃣ to try different values, e.g.:
   ```python
   get_elan_files(
      segmentation_results.copy(),
      fps=pose_data["samplerate"],
      model="skeleton",
      threshold=0.45,  # ← try 0.45, 0.50, 0.55, and 0.6.
      file_path=pose_data["output_path"],
      video_output_path=pose_data["video_output_path"],
   )
   ```
2. **Rerun the Cell** (and any following cells) to regenerate the ELAN file.
3. **Open the `.eaf`** in ELAN alongside the overlay video to compare how different thresholds change segment boundaries.

**Discussion Questions**
- 🔍 How does lowering or raising the threshold impact the number of detected segments?
- 📈 Does a more permissive threshold (lower) introduce more false positives?
- 🤔 Which threshold gives the most meaningful segmentation for your video?
```