<a href="https://colab.research.google.com/github/bvinha/UB-Video-and-Image-Protocol/blob/main/01_PostAnnotation_GeoreferenceAnnotations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Autor: Beatriz Vinha**

**Contact:** beatrizmouravinha@ub.edu

**Purpose of this code:**
1. Calculate annotation time, for video annotations done using BIIGLE's video annotation tool, based on the start time of the video.
2. Combine timestamped annotations with video metadata.
3. Create continous timestamped files to identify sequences in the navigation for substrate type categories and footage to be removed from the  video transect.

**To run this code, you will require:**

•	BIIGLE Video Annotation Report file (in .csv), exported from BIIGLE, with the following modifications:
  - an added column with "start_time" containing the start time of the annotated video
  - the "frames" column must have the squared brackets ("[  ]") removed from all rows

•	Video metadata file (in .csv), containing lat, lon and depth, based on the USBL navigation:
  - with date and time presented on separate columns
  - with time displayed in “HHMMSS” format.


# Step 1: Calculate Annotation Time

In [None]:
##Optional
#Connect Google Collab to Google Drive
#Run if files are stored on google Drive

from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Import libraries

#to enable df as interactive tables in google collab
from google.colab import data_table
data_table.enable_dataframe_formatter()

import pandas as pd

In [None]:
#import biigle annotation report
biigle_raw = pd.read_csv('/content/biigle_annot.csv') ###CHANGE TO YOUR FILE DIRECTORY###

#import video metadata file based on the ROV navigation data
#'time' column must be in float format as HHMMSS.
rov_nav = pd.read_csv('/content/rov_navigation.csv', sep = ",", dtype={'time':float}) ###CHANGE TO YOUR FILE DIRECTORY###


In [None]:
#convert 'start_time' and 'frames' columns to timedelta
###CHANGE COLUMN NAMES, IF NEEDED###
biigle_raw['start_time'] = pd.to_timedelta(pd.to_datetime(biigle_raw['start_time']).dt.strftime('%H:%M:%S'))
biigle_raw['frames'] = pd.to_timedelta(biigle_raw['frames'], unit = 'seconds')

##run line below to check if "start_time" and "frames_sec" are as timedelta
#biigle_raw.dtypes

In [None]:
#add 'start time' and 'frames' to calculate annotation time
biigle_raw['annotation_time'] = biigle_raw['start_time']+biigle_raw['frames']

To run the rest of the code, the 'annotation_time' column in the video annotation reports has to be in the same format as the video metadata 'time' column, in this case as "HHMMSS" (float). So, we first manually convert 'annotation_time'to the required format before running the rest of the code.

In [None]:
#extract hours, minutes, and seconds from 'annotation_time'
biigle_raw['hours'] = biigle_raw['annotation_time'].dt.components['hours']
biigle_raw['minutes'] = biigle_raw['annotation_time'].dt.components['minutes']
biigle_raw['seconds'] = biigle_raw['annotation_time'].dt.components['seconds']

In [None]:
#create a new column 'time' in HHMMSS format as float (e.g., 104347 for 10:53:47)
biigle_raw['time'] = (biigle_raw['hours'] * 10000 + biigle_raw['minutes'] * 100 + biigle_raw['seconds']).astype(float)

#you run the lines below to check if 'time' columns in both df are as float64
##rov_nav.dtypes
##biigle_raw.dtypes

## Step 2: Merge timestamped annotations with video metadata

In [None]:
#merge video annotations with navigation data using the 'time' column for georeferencing
###CHANGE COLUMN NAMES, IF NEEDED###
allannotations_georef = pd.merge_asof(biigle_raw.sort_values('time'), rov_nav.sort_values('time'),
                                      on="time", direction="nearest")
#view result
allannotations_georef

## Step 3: Create sequenced annotations of Substrate Type and Footage to discard

In [None]:
#extract WholeFrame annotations with START/END markers (substrate type, parts of the video to remove, etc.)
wholeframe_annotations = allannotations_georef[allannotations_georef['shape_name'] == 'WholeFrame']

#delete non-useful columns to avoid duplicates
###CHANGE COLUMN NAMES, IF NEEDED###
wholeframe_annotations.drop(['lat','lng', 'gps_altitude'], axis=1, inplace=True)

In [None]:
#extract the relevant rows for START and END in the label_hierarchy
start_annotations = wholeframe_annotations[wholeframe_annotations['label_hierarchy'].str.contains('START', case=False)]
end_annotations = wholeframe_annotations[wholeframe_annotations['label_hierarchy'].str.contains('END', case=False)]

In [None]:

#create a list of sequenced intervals between START and END
intervals = []
for _, start_row in start_annotations.iterrows():
    #find the corresponding END for each START
    category = start_row['label_hierarchy'].split('>')[0].strip()
    start_time = start_row['time']
    #find the corresponding END time for the same category
    matching_end = end_annotations[(end_annotations['time'] > start_time) &
                                   (end_annotations['label_hierarchy'].str.contains(category, case=False))]
    if not matching_end.empty:
        end_time = matching_end.iloc[0]['time']
        #append the interval (start_time, end_time, category)
        intervals.append((start_time, end_time, category))

#Function to assign "WholeFrame" (substrate type, etc.) to navigation data
def assign_wholeframe_labels(rov_nav, intervals):
    #adding a new column to store the WholeFrame labels
    rov_nav['WholeFrame'] = None

    #iterate over each interval (start_time, end_time, category) and
    #assign the correspoding category in the navigation data
    for start_time, end_time, category in intervals:
        mask = (rov_nav['time'] >= start_time) & (rov_nav['time'] <= end_time)
        rov_nav.loc[mask, 'WholeFrame'] = category

    return rov_nav

sequences_nav = assign_wholeframe_labels(rov_nav, intervals)

In [None]:
#remove rows without WholeFrame labels (i.e., rows not part of the identified intervals)
sequences_nav_cleaned = sequences_nav.dropna(subset=['WholeFrame'])

#merge both datasets
sequenced_annotations = pd.merge_asof(sequences_nav_cleaned, wholeframe_annotations, on='time', direction='nearest')

#view result
sequenced_annotations

## Step 4: Clean and Export Final Files

In [None]:
#delete unnecessary columns
###CHANGE COLUMN NAMES, IF NEEDED###
allannotations_georef.drop(['frames','hours', 'minutes', 'seconds', 'time'], axis=1, inplace=True)
sequenced_annotations.drop(['frames', 'hours', 'minutes', 'seconds'], axis=1, inplace=True)

In [None]:
#separate annotations into different categories
species_annotations = allannotations_georef[allannotations_georef['shape_name'] != 'WholeFrame']
substrate_type_annotations = sequenced_annotations[sequenced_annotations['WholeFrame'] == 'Substrate Type']
transect_to_discard  = sequenced_annotations[sequenced_annotations['WholeFrame'] != 'Substrate Type']

In [None]:
#export files
allannotations_georef.to_csv('/content/allannotations_georef.csv', index=False) ###CHANGE TO YOUR FILE DIRECTORY###
species_annotations.to_csv('/content/species_annotations.csv', index=False) ###CHANGE TO YOUR FILE DIRECTORY###
substrate_type_annotations.to_csv('/content/substrate_type_annotations.csv', index=False) ###CHANGE TO YOUR FILE DIRECTORY###
transect_to_discard.to_csv('/content/transect_to_discard.csv', index=False) ###CHANGE TO YOUR FILE DIRECTORY###