# Analyse TrackMate Data

This is a simple notebook designed to plot and compare track data produced using [TrackMate](https://imagej.net/plugins/trackmate/) (or similar software).

We begin by importing the necessary packages:

In [None]:
import glob
import re
import os

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Now we specify the location of the exported TrackMate data (`INPUT_DIR`) and metric we want to plot (`METRIC_OF_INTEREST`):

In [None]:
METRIC_OF_INTEREST = 'TRACK_MEAN_SPEED'
INPUT_DIR = './TrackMate_Outputs'

The next cell checks to make sure that the `INPUT_DIR` specified above actually exists

In [None]:
valid_input = False

if os.path.exists(INPUT_DIR):
    print(f'{INPUT_DIR} is a valid directory - well done you!')
    valid_input = True
else:
    print(f'{INPUT_DIR} does not exist - check that the path is correct')

Now we look for `*_Pos*_tracks.csv` files in the input directory. Your data must be saved using the following folder structure in order for this to work:

![Folder_Structure](./assets/folder_structure.PNG)

It doesn't matter how many `Pos#` subdirectories there are within the `TrackMate_Outputs` parent directory shown above - the script should find all of them.

In [None]:
if valid_input:
    file_paths = glob.glob(f'{INPUT_DIR}/Pos*/*_Pos*_tracks.csv')
    
    print(f'{len(file_paths)} valid CSV files found in {INPUT_DIR}')
    
    if len(file_paths) < 1:
        print(f'Are you sure {INPUT_DIR} is the folder that contains your TrackMate data?')
    
    position_labels = []
    datasets = []
    
    # Regex to extract position number from file name
    position_pattern = re.compile(r'_Pos(\d+)_')
else:
    print(f'{INPUT_DIR} does not exist - check that the path is correct')

[Pandas](https://pandas.pydata.org/) is a Python data analysis library, particularly well-suited to working with the kind of tabular data that TrackMate produces. Here, we load data from the `*_Pos*_tracks.csv` CSV files using pandas:

In [None]:
for file_path in file_paths:
    # Extract position number from file name
    match = position_pattern.search(file_path)
    position_num = -1
    if match:
        position_num = int(match.group(1))
        position_labels.append(position_num)

    data = pd.read_csv(file_path, skiprows=[1, 2, 3])
    datasets.append(data[METRIC_OF_INTEREST].values)

[Matplotlib](https://matplotlib.org/) is a library for creating data visualisations in Python. Here, we use matplotlib to generate frequency distributions:

In [None]:
if len(datasets) > 0:
    %matplotlib inline
    plt.figure(figsize=(12, 6))
    
    plt.hist(datasets, bins=40, range=(0.004, 0.008), alpha=0.7,
             label=[f'Pos {pos}' for pos in position_labels])
    plt.xlabel(METRIC_OF_INTEREST)
    plt.ylabel("Number of Cells")
    plt.legend()
    plt.show()
else:
    print('No data was found in the cells above - there''s nothing to plot!')

[Seaborn](https://seaborn.pydata.org/) is a Python data visualisation library based on matplotlib. Here, we use seaborn to generate a swarmplot:

In [None]:
if len(datasets) > 0:
    # Boxplot and Swarmplot
    fig, ax = plt.subplots(figsize=(12, 6))
    sns.boxplot(data=datasets, color='white', showfliers=False, ax=ax, linewidth=2)
    sns.swarmplot(data=datasets, size=4, alpha=0.9, ax=ax)
    
    # Set the ticks and labels
    ax.set_xticks(range(len(position_labels)))  # Ensure the number of ticks matches the number of labels
    ax.set_xticklabels(position_labels)
    
    ax.set_ylabel(METRIC_OF_INTEREST)
    ax.set_xlabel('Position')
    plt.show()
else:
    print('No data was found in the cells above - there''s nothing to plot!')