### **Code to extract ant and larva tracking data form the .mat file outputs of AnTrax for the brood care assay.**

##### This script expects a directory structure of that found in the directory '.../Sample_data_brood_care_assay/antrax/antdata'.
##### Running this script will result in the timedata_C*.csv and stats.csv files in the directory '.../Sample_data_brood_care_assay/antrax/analysis'.
##### These .csv files are used in downstream visualization and analysis steps.
##### *Note: if working on MacOS, a ".DS_Store" file may be added to the 'antdata' directory, which must be deleted for this script to run.*
##### *To delete the file, run the following in Terminal:*  find /path/to/antrax/antdata -name ".DS_Store" -delete

In [None]:
import h5py
import os
import glob
import numpy as np
import pandas as pd
from natsort import natsorted


#create global variable of type empty dataframe to store the stats for every ant in the experitment. This will be used for downstream analyses.
#the measures in df_stats will include the following: the number of tracked frames, proportion of tracked frames, and the frame at which the ant first detecdts the larvae. 
#These measures will be used to determine the quality of the tracking data.
df_stats = pd.DataFrame()
path = '/Users/Alex/Desktop/Broodcare_assay_Code_&_Sample_data/antrax/antdata/'
save_path = '/Users/Alex/Desktop/Broodcare_assay_Code_&_Sample_data/antrax/analysis/'

# make a list of all directories in the path
directories = os.listdir(path)
directories = natsorted(directories)

for directory in directories:
    path1 = path + directory
    all_files = glob.glob(os.path.join(path1, "xy_*.mat"))
    all_files = natsorted(all_files)

    #initialize an empty dataframe to store the concatenated data for each directory
    df_data = pd.DataFrame()
    

    #now loop through all files in the current directory
    for file in all_files:
        #load the current file
        with h5py.File(file, 'r') as f:
            #get the keys of the current file
            keys = list(f.keys()) #these will be ['Ant', 'Larva']
            
            # load the data for each of keys "Ant" and "Larva"
            data = f['Ant'][:]  # Load the data
            data2 = f['Larva'][:]  # Load the data

        # Convert to Pandas DataFrames
        df = pd.DataFrame(data)
        df2 = pd.DataFrame(data2)
        #transpose the dataframes
        df = df.T
        df2=df2.T

        #concatenate the two dataframes horizontally
        df3 = pd.concat([df, df2], axis=1)

        #concatenate df3 with df_data vertically
        df_data = pd.concat([df_data, df3], axis=0, ignore_index=True)


        
    #The next several steps take the concatenated data, and process it to convert it to timedata_*.csv and extract stats used for various plotting and downstream analyses
    #rename the columns in df_data
    df_data.columns = ['Ant x', 'Ant y', 'ant or', 'ass', 'Larva x', 'Larva y', 'Larva or', 'ass type']

    #drop the columns 2,3,6,7 from df_data (these are unnecessary for downstream analyses)
    df_data = df_data.drop(df_data.columns[[2,3,6,7]], axis=1)

    #count the number of non-empty cells in each column of df_data, convert to dataframe, and transpose. This will be loaded into df_stats
    tracked_frames = df_data.count().to_frame().T
    tracked_frames = tracked_frames.drop(tracked_frames.columns[[1,3]], axis=1)
    tracked_frames.rename(columns={'Ant x': '#_tracked_frames_A', 'Larva x': '#_tracked_frames_L'}, inplace= True)
    #add a column called 'Ant id' to the first position of the dataframe
    tracked_frames.insert(0, 'Ant id', directory)
    #add colums 'proportion_tracked_Ant' and 'proportion_tracked_Larva' to the dataframe
    tracked_frames['proportion_tracked_A'] = tracked_frames['#_tracked_frames_A'] / 90000
    tracked_frames['proportion_tracked_L'] = tracked_frames['#_tracked_frames_L'] / 90000

    

    #ant and larva positions are typically lost when the ant carries the larva. The next lines correct lost tracking frames.
    df_data['Ant x'].fillna(df_data['Larva x'], inplace=True)
    df_data['Ant y'].fillna(df_data['Larva y'], inplace=True)
    df_data['Larva x'].fillna(df_data['Ant x'], inplace=True)
    df_data['Larva y'].fillna(df_data['Ant y'], inplace=True)

    #calculate the distance between the ant and the larva (dal) and add to the df_data
    df_data['dal'] = np.sqrt((df_data['Ant x'] - df_data['Larva x'])**2 + (df_data['Ant y'] - df_data['Larva y'])**2)

    #add new column 'interacting' to df_data, where if dal == 0 then 1 else 0
    df_data['interacting'] = np.where(df_data['dal'] == 0, 1, 0)


    #find the frame at which the ant fist detects the larva
    first_detect_frame = df_data['dal'].idxmin()+1
    #add the first_detect_frame to the tracked_frames dataframe. Note this may be in error due to tracking errors, or ants still being partially anesthetized. Need to check this manually.
    tracked_frames['larva_detect_frame'] = first_detect_frame
    
    #add the tracked_frames dataframe to the df_stats dataframe
    df_stats = pd.concat([df_stats, tracked_frames], axis=0, ignore_index=True)


    #add new column 'ant velocity' to df_data in which the ant's velocity per frame is clacualted
    df_data['ant_velocity'] = np.sqrt(df_data['Ant x'].diff()**2 + df_data['Ant y'].diff()**2)
    df_data['ant_velocity'].fillna(0, inplace=True)
    

    #save df_data as a csv titled 'timedata_*.csv'
    csv_path = save_path +'timedata_' + directory + '.csv'
    df_data.to_csv(csv_path, index = False )

#save df_stats as a csv titled 'stats.csv'
df_stats.to_csv(save_path + 'stats.csv', index = False)


        

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_data['Ant x'].fillna(df_data['Larva x'], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_data['Ant y'].fillna(df_data['Larva y'], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which