# Video Processing for Eye Tracking Analysis

This notebook processes eye tracking data from a research study that combines video recordings with gaze tracking information. The main functions include:
- Converting WMV video files to MP4 format
- Processing eye tracking data from CSV files
- Overlaying gaze positions on video recordings
- Chopping videos based on annotation markers

## Prerequisites

- Required libraries: pandas, cv2 (OpenCV), numpy
- Custom utilities from ivr_utils package
- Access to the data directory containing participant recordings

## Data Organization

The code expects:

- A server data directory containing:
  - Eyetracking data in CSV format
  - Video recordings in WMV format
- CSV files contain columns:
  - Timestamp: Time in milliseconds
  - Gaze X/Y: Screen coordinates of gaze position
  - SlideEvent: Events like "StartMedia"
  - Respondent Annotations active: Used for video chopping

## Potential Improvements
Consider adding:
- Error handling for missing or corrupted files
- Progress indicators for long processing operations
- Validation of input data format and content
- Configuration options for visualization parameters
- Memory optimization for large video files
- Batch processing capabilities for multiple participants

In [6]:
import pandas as pd
import cv2
from pathlib import Path
from typing import Tuple
from ivr_utils.ivr_utils import (
    find_participant_files,
    convert_wmv_to_mp4,
    pyav_timestamps,
    chopping_video,
    overlaying_video
)
import numpy as np

# Constants
PARTICIPANT_ID = "P48"

local_data_dir = Path.cwd().parent / "data"
server_data_dir = Path(
    "/Volumes/ritd-ag-project-rd01wq-tober63/SSID IVR Study 1/"
)
output_dir = server_data_dir / "output"

assert server_data_dir.is_dir(), "Server data directory not found"

## Find Files for participant

In [7]:
eyetracking_dir = server_data_dir / "Eyetracking"

part_csv_path, part_wmv_path = find_participant_files(PARTICIPANT_ID, eyetracking_dir)
print(f"Participant CSV: {part_csv_path}")
print(f"Participant WMV: {part_wmv_path}")

Participant CSV: /Volumes/ritd-ag-project-rd01wq-tober63/SSID IVR Study 1/Eyetracking/SSID AV1/001_P48.csv
Participant WMV: /Volumes/ritd-ag-project-rd01wq-tober63/SSID IVR Study 1/Eyetracking/SSID AV1/Gaze Replays/Scene_P48_ScreenRecording-1_(0,OTHER,1005).wmv


# Import csv file for participant

In [8]:
points = pd.read_csv(part_csv_path, skiprows=lambda x: x < 26)

# find the "StartMedia" timestamp
row = points[points["SlideEvent"] == "StartMedia"]
timestamp_diff = row["Timestamp"].values[0]

# clean the NaN in columns
points = points.dropna(subset=["Gaze X", "Gaze Y"])

# - timestamp_diff to make the timestamp start from 0
points["Timestamp"] = points["Timestamp"] - timestamp_diff

  points = pd.read_csv(part_csv_path, skiprows=lambda x: x < 26)


## Convert to MP4 if not already found

### Dealing with the fps issues

The notebook includes analysis of frame rates across different participants:

- Timestamps are extracted from video files
- Average time differences between frames are calculated
- Equivalent FPS is computed
- Multiple participant examples (P42, P48, P6) show variation in frame rates

**P42**:

```text
Number of timestamps from AV (i.e. n frames): 24007
[0, 170, 496, 707, 807, 908, 1012, 1118, 1225, 1318]
[170 326 211 100 101 104 106 107  93 106]
Average time difference: 107.01857868866117
Standard deviation of time difference: 37.66216773314807
Equivalent FPS 9.34417193961437
```

**P48**:

```text
Number of timestamps from AV (i.e. n frames): 19417
[0, 178, 482, 583, 677, 763, 848, 937, 1022, 1116]
[178 304 101  94  86  85  89  85  94  75]
Average time difference: 114.73217964565308
Standard deviation of time difference: 45.292123590311085
Equivalent FPS 8.715950512650158
```

**p6**

```text
Number of timestamps from AV (i.e. n frames): 7497
[0, 185, 496, 590, 692, 802, 882, 977, 1071, 1168]
[185 311  94 102 110  80  95  94  97  84]
Average time difference: 122.87433297758805
Standard deviation of time difference: 46.83135463165881
Equivalent FPS 8.138396162707124
```

In [11]:
av_timestamps = pyav_timestamps(output_chopping_path)

print(f"Number of timestamps from AV (i.e. n frames): {len(av_timestamps)}")
print(av_timestamps[:10])

tdiff = np.diff(av_timestamps)
print(tdiff[:10])
print(f"Average time difference: {np.mean(tdiff)}")
print(f"Standard deviation of time difference: {np.std(tdiff)}")
print(f"Equivalent FPS {1 / (np.mean(tdiff) / 1000)}")

Number of timestamps from AV (i.e. n frames): 139
[0, 114, 229, 344, 458, 573, 688, 803, 917, 1032]
[114 115 115 114 115 115 115 114 115 115]
Average time difference: 114.72463768115942
Standard deviation of time difference: 0.44669666688180987
Equivalent FPS 8.716523496715512


## Convert WMV to MP4

In [4]:
import logging

# Configure logging with timestamp, level and message
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

result = convert_wmv_to_mp4(part_wmv_path, local_data_dir.joinpath("output_test_variablefps.mp4"), output_fps = None)
result.print_status()

2025-01-15 13:35:16 - INFO - Checking existing MP4 file
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x11ab1c3f0] moov atom not found
OpenCV: Couldn't read video stream from file "/Users/yuqiliang/Documents/Github/SSID_IVR_Study/data/output_test_variablefps.mp4"
2025-01-15 13:35:16 - INFO - Converting WMV to MP4
frame=19417 fps= 14 q=-0.0 Lsize= 1025703KiB time=00:37:07.64 bitrate=3772.0kbits/s speed=1.65x    


Conversion successful


## Chopping

In [10]:

N_FRAMES_PROC = 9 * 60  # 30 fps * 60 seconds

# input_video_path = result.output_path
input_video_path = local_data_dir.joinpath("output_test_variablefps.mp4")
output_chopping_path = local_data_dir / f"{PARTICIPANT_ID}_output_chopped_test_variablefps.mp4"

chopping_video(input_video_path, output_chopping_path, N_FRAMES_PROC, points)



## Overlaying

In [14]:

N_FRAMES_PROC = 30 * 60  # 30 fps * 60 seconds
#input_overlay_path = output_chopping_path
input_overlay_path = local_data_dir.joinpath("P48_output_chopped_test30fps.mp4")
output_overlay_path = local_data_dir / f"{PARTICIPANT_ID}_output_overlay_test.mp4"

overlaying_video(input_overlay_path, output_overlay_path, N_FRAMES_PROC, points)
