# Aligning Videos and EEG

We've done a lot of work on aligning videos, EEG, and trial data. This notebook compiles everything into a single notebook file. Note that this does NOT cover conversion of `.bag` data into video streams - that must be done with a separate notebook script.

## Import Packages

In [1]:
!pip install moviepy
!pip install imageio-ffmpeg

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [2]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

# Required for converting between local timestamps and unix timestamp
import datetime

# Required for aligning timestamps
import shutil

# Required for trimming videos
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip

## Helper Functions

In [3]:
""" === HELPER FUNCTION === """
# Converts the format of a local timestamp into unix seconds. Requires the datetime package
def timestamp_to_unix_seconds(x):
    date_format = datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S.%f")
    unix_seconds = datetime.datetime.timestamp(date_format)
    return unix_seconds

# Converts the format of a local timestamp into unix milliseconds. 
# Requires the datatime package, and relies on `timestamp_to_unix_seconds()` function.
def timestamp_to_unix_milliseconds(x):
    unix_seconds = timestamp_to_unix_seconds(x)
    unix_milliseconds = int(unix_seconds * 1000)
    return unix_milliseconds

## Instructions

For a successful application of this notebook, you MUST have th following:

* The raw EEG file, a `.csv` file.
* The raw video file, usually an `.mp4` or `.mov` file.
* The timestamp (in unix milliseconds) where the EEG data should start from.
* The timestamp (in seconds) where the video should start from.

We run with several assumptions. Namely that the duration of the trial is the time period between the provided start timestamp and the end of the raw eeg file, and that in some situations the EEG and video are the same length.

The scripts below will convert the timestamp data in the raw eeg file into unix milliseconds, splice out any rows that come before the start unix milliseconds and any timestamps that come after the end unix milliseconds (if the duration is provided). Similarly, it parses the video so that only the relevant video data is maintained.

### Reading Raw EEG

In [4]:
# Reads the raw eeg file from the provided file URL, converts the timestamps, and filters rows.
# Note that `start_milliseconds` is derived from unix milliseconds, not relative milliseconds. AKA provide a unix millisecond that represents when the trial starts.
def Get_Raw_EEG(src:str, start_unix_milliseconds:int, duration_milliseconds:int=None, end_millisecond_buffer:int=3000, save_filename:str=None, print_debug:bool=False):
    
    # Read the EEG file
    if print_debug: print("Reading SRC csv file...")
    df = pd.read_csv(src)
    n_rows = len(df.index)
    
    # Filter by removing any rows that have the timestamp as null and the battery amount indicator, which is not useful here.
    if print_debug: print("Removing rows with null timestamps and battery...")
    df = df[~df['TimeStamp'].isna()]
    df = df[~df['Battery'].isna()]
    n_removed_rows = n_rows - len(df.index)
    
    # Convert the "TimeStamp" column into unix milliseconds
    if print_debug: print("Converting timestamps to unix milliseconds...")
    df['unix_ms'] = df['TimeStamp'].apply(lambda x: int(timestamp_to_unix_milliseconds(x)))
    
    # From the start of recording this file, get the "relative" unix milliseconds.
    if print_debug: print("Deriving relative unix milliseconds...")
    raw_start_unix_milliseconds = df['unix_ms'].iloc[0]
    df['rel_unix_ms'] = df['unix_ms'] - raw_start_unix_milliseconds
    
    # Filter further: given the start unix milliseconds, filter out any rows that come before.
    # We also calculate the end timestamp based on either a provided duration or on the last row
    if print_debug: print("Determining end unix milliseconds based on either duration or last row, filtering rows...")
    end_unix_milliseconds = df['unix_ms'].iloc[-1]
    if duration_milliseconds is not None:
        end_unix_milliseconds = start_unix_milliseconds + duration_milliseconds
    if end_millisecond_buffer is not None:
        end_unix_milliseconds -= end_millisecond_buffer
    df = df[df['unix_ms'].between(start_unix_milliseconds, end_unix_milliseconds)]
    
    # Given the new filtered rows, get the trial milliseconds instead
    if print_debug: print("Deriving trial milliseconds...")
    df['trial_ms'] = df['rel_unix_ms'] - df['rel_unix_ms'].iloc[0]
    
    # Final steps
    if print_debug: print(f"\nFinished processing {src}!")
    final_start_milliseconds = df['unix_ms'].iloc[0]
    final_end_milliseconds = df['unix_ms'].iloc[-1]
    final_duration_milliseconds = final_end_milliseconds - final_start_milliseconds
    
    if save_filename is not None:
        # Deriving file path from provided src url
        src_directory = os.path.dirname(src)
        save_name = os.path.join(src_directory, save_filename)
        df.to_csv(save_name)
        if print_debug: print(f"Saving resulting eeg to {save_name}")
    
    if print_debug:
        print(f"\t- # raw rows: {n_rows}")
        print(f"\t- # filtered rows based on NA timestamps: {n_removed_rows}")
        print(f"\t- Final number of rows: {len(df.index)}")
        print(f"\t- Time removed since start of recording: {final_start_milliseconds - raw_start_unix_milliseconds}")
        print(f"\t- Trial start and end: {final_start_milliseconds} - {final_end_milliseconds}")
        print(f"\t- Trial duration: {final_duration_milliseconds}")\
              
    return df, final_start_milliseconds, final_end_milliseconds, final_duration_milliseconds

In [63]:
# In the provided sample, we consider the start timestamp to be equivalent to when the countdown ends.
df, start, end, duration = Get_Raw_EEG('./samples/raw_eeg.csv', 1732470208476, save_filename="eeg.csv", print_debug=True)
df

Reading SRC csv file...
Removing rows with null timestamps and battery...
Converting timestamps to unix milliseconds...
Deriving relative unix milliseconds...
Determining end unix milliseconds based on either duration or last row, filtering rows...
Deriving trial milliseconds...

Finished processing ./samples/raw_eeg.csv!
Saving resulting eeg to ./samples\eeg.csv
	- # raw rows: 211
	- # filtered rows based on NA timestamps: 100
	- Final number of rows: 96
	- Time removed since start of recording: 4001
	- Trial start and end: 1732470838722 - 1732470886234
	- Trial duration: 47512


Unnamed: 0,TimeStamp,Delta_TP9,Delta_AF7,Delta_AF8,Delta_TP10,Theta_TP9,Theta_AF7,Theta_AF8,Theta_TP10,Alpha_TP9,...,HeadBandOn,HSI_TP9,HSI_AF7,HSI_AF8,HSI_TP10,Battery,Elements,unix_ms,rel_unix_ms,trial_ms
11,2024-11-24 12:53:58.722,1.246195,0.415802,0.847750,1.586403,0.499165,0.049713,0.222488,1.680369,1.074834,...,1.0,1.0,1.0,1.0,1.0,60.0,,1732470838722,4001,0
12,2024-11-24 12:53:59.221,0.930701,0.469915,0.717394,1.586403,0.424654,0.105200,0.073327,1.680369,1.183912,...,1.0,1.0,1.0,1.0,1.0,60.0,,1732470839221,4500,499
14,2024-11-24 12:53:59.726,0.614851,0.492273,0.506193,1.586403,0.415773,0.223576,0.049880,1.680369,1.183912,...,1.0,1.0,1.0,1.0,1.0,60.0,,1732470839726,5005,1004
15,2024-11-24 12:54:00.222,0.486586,0.530584,0.367788,1.586403,0.147880,0.259888,0.224948,1.680369,1.105466,...,1.0,1.0,1.0,1.0,1.0,60.0,,1732470840222,5501,1500
17,2024-11-24 12:54:00.722,0.486586,0.623767,0.688515,1.586403,0.147880,0.379940,0.493388,1.680369,1.105466,...,1.0,1.0,1.0,1.0,1.0,60.0,,1732470840722,6001,2000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
186,2024-11-24 12:54:44.222,0.486586,1.041296,1.322643,1.586403,0.147880,0.878076,1.141170,1.680369,1.105466,...,1.0,4.0,1.0,1.0,4.0,60.0,,1732470884222,49501,45500
191,2024-11-24 12:54:44.723,0.486586,0.944163,1.322643,1.586403,0.147880,0.882541,1.141170,1.680369,1.105466,...,1.0,4.0,1.0,1.0,4.0,60.0,,1732470884723,50002,46001
192,2024-11-24 12:54:45.222,0.486586,0.944163,1.322643,1.586403,0.147880,0.882541,1.141170,1.680369,1.105466,...,1.0,4.0,1.0,1.0,4.0,60.0,,1732470885222,50501,46500
194,2024-11-24 12:54:45.723,0.486586,0.944163,1.322643,1.586403,0.147880,0.882541,1.141170,1.680369,1.105466,...,1.0,4.0,1.0,1.0,4.0,60.0,,1732470885723,51002,47001


### Splicing Video

In [58]:
# Reads the video and uses the provided timestamp and duration to know where to splice. Outputs the video in `mp4` format.
# Note that the provided start timestamp and duration MUST be in seconds, not milliseconds. FFMPEG only recognizes seconds.
# Milliseconds can be considered via the thousandth - aka you can use decimals in the start_timestamp and duration.
def trim_video(src:str, save_filename:str, start_seconds:float, duration_seconds:float):
    # Deriving file path from provided src url
    src_directory = os.path.dirname(src)
    basename = os.path.splitext(os.path.basename(save_filename))[0]
    output_filepath = os.path.join(src_directory, f"{basename}.mp4")
    print(f"Outputting video to: {output_filepath}")
    
    # Use FFMPEG to extract the subclip
    ffmpeg_extract_subclip(src, start_seconds, start_seconds+duration_seconds, targetname=output_filepath)
    print("Video successfully outputted!")

In [64]:
# In the provided video, the start timestamp in seconds (aka the end of the countdown) is 1:09.690, or 69.69
# Based on the eeg splice, the duration is 58006, or 58.006 in seconds.
trim_video('./samples/vr.mp4', 'vr_subclip.mp4', 69.69, duration/1000)

Outputting video to: ./samples\vr_subclip.mp4
Moviepy - Running:
>>> "+ " ".join(cmd)
Moviepy - Command successful
Video successfully outputted!


# EEG + Trial: Generating `eeg.csv` with `trial.csv`

We look through all folders and files in `./samples/participant_data_aligned/`. The folder should be structured as:
>- p1
>    - jay-vr-1
>        - raw_eeg.csv
>        - trial.csv
>    - jay-vr-2
>        - raw_eeg.csv
>        - trial.csv
>- p2
>    - jay-vr-1
>        - raw_eeg.csv
>        - trial.csv
>    - jay-vr-4
>        - raw_eeg.csv
>        - trial.csv
>- ...

We need some stuff to automate this. Or at least detect that we have the required files...

In [30]:
# uses `os` package

def list_files_recursive(fs, path:str='.'):
    for entry in os.listdir(path):
        full_path = os.path.join(path, entry)
        if os.path.isdir(full_path):
            list_files_recursive(fs, full_path)
        else:
            fs.append(full_path)

# Specify the directory path you want to start from
directory_path = './samples/participant_data_aligned/'
files_list = []
list_files_recursive(files_list, directory_path)

# Get common directories, then condense to unique directories.
dirs_list = [os.path.dirname(f) for f in files_list]
unique_dirs_list = list(set(dirs_list))
unique_dirs_list.sort()

for d in unique_dirs_list:
    print(d)

./samples/participant_data_aligned/p10\jay-vr-1
./samples/participant_data_aligned/p10\jay-vr-2
./samples/participant_data_aligned/p10\jay-vr-3
./samples/participant_data_aligned/p11\jay-vr-1
./samples/participant_data_aligned/p11\jay-vr-2
./samples/participant_data_aligned/p11\jay-vr-3
./samples/participant_data_aligned/p12\jay-vr-1
./samples/participant_data_aligned/p12\jay-vr-2
./samples/participant_data_aligned/p12\jay-vr-3
./samples/participant_data_aligned/p13\jay-vr-1
./samples/participant_data_aligned/p13\jay-vr-2
./samples/participant_data_aligned/p13\jay-vr-3
./samples/participant_data_aligned/p14\jay-vr-1
./samples/participant_data_aligned/p14\jay-vr-2
./samples/participant_data_aligned/p14\jay-vr-3
./samples/participant_data_aligned/p15\jay-vr-1
./samples/participant_data_aligned/p15\jay-vr-2
./samples/participant_data_aligned/p15\jay-vr-3
./samples/participant_data_aligned/p15\jay-vr-4
./samples/participant_data_aligned/p15\jay-vr-5
./samples/participant_data_aligned/p16\j

In [39]:
# For each directory, which corresponds to a trial, we need a `raw_eeg.csv` and a `trial.csv`. 
# If they're missing, we can't do anything.
# If they DO contain them... then it's prime time

raw_files = ["raw_eeg.csv","trial.csv"]
for d in unique_dirs_list:
    trial_files = [os.path.join(d,f) for f in os.listdir(d) if os.path.isfile(os.path.join(d, f))]
    trial_filenames = [os.path.basename(f) for f in trial_files]
    if all(elem in trial_filenames for elem in raw_files):
        # Contains raw_eeg.csv and trial.csv. Can now work
        raw_eeg = os.path.join(d, 'raw_eeg.csv')
        trial = os.path.join(d, 'trial.csv')
        trial_df = pd.read_csv(trial)
        start_unix_ms = trial_df.iloc[-1]['unix_ms']
        df, start, end, duration = Get_Raw_EEG(raw_eeg, start_unix_ms, save_filename="eeg.csv", print_debug=True)
        print("Processed", raw_eeg)

Reading SRC csv file...
Removing rows with null timestamps and battery...
Converting timestamps to unix milliseconds...
Deriving relative unix milliseconds...
Determining end unix milliseconds based on either duration or last row, filtering rows...
Deriving trial milliseconds...

Finished processing ./samples/participant_data_aligned/p1\jay-vr-1\raw_eeg.csv!
Saving resulting eeg to ./samples/participant_data_aligned/p1\jay-vr-1\eeg.csv
	- # raw rows: 218
	- # filtered rows based on NA timestamps: 152
	- Final number of rows: 59
	- Time removed since start of recording: 3997
	- Trial start and end: 1728509835099 - 1728509893099
	- Trial duration: 58000
Processed ./samples/participant_data_aligned/p1\jay-vr-1\raw_eeg.csv


In [None]:
df, start, end, duration = Get_Raw_EEG('./samples/raw_eeg.csv', 1732470208476, save_filename="eeg.csv", print_debug=True)
