# PDAC CellTracksColab - Landing Tracks
---

This notebook focus on the analysis of cell tracks exhibiting distinct landing patterns. Through a detailed examination of track dynamics, this tool provides deep insights into the mechanisms of cell landing, arrest, and interaction with the endothelium. Here's an overview of its functionalities:

### Key Features

- **Track Filtering Based on Instantaneous Speed:** The initial step involves segregating tracks that demonstrate a clear landing pattern by analyzing their instantaneous speed. This process ensures that only tracks relevant to the landing behavior are included for detailed analysis.

- **Measurement of Track Parameters:** Once filtered, the notebook facilitates the measurement of a range of track parameters.

- **Proximity Analysis to Endothelial Features:** A unique feature of this notebook is its ability to measure the shortest distance of each track from previously segmented features, including endothelial cell nuclei and cell junctions. This analysis is pivotal in understanding the spatial relationships and interactions between circulating cells and the endothelium.

- **Visualization of Track Parameters:** To aid in the interpretation and presentation of findings, the notebook includes functionality for plotting the computed parameters of the tracks. These visualizations facilitate a clear and intuitive understanding of the data, highlighting key trends and patterns in cell behavior.


<font size = 4>Notebook created by [Guillaume Jacquemet](https://cellmig.org/)


In [None]:
# @title #MIT License

print("""
**MIT License**

Copyright (c) 2023 Guillaume Jacquemet

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.""")

--------------------------------------------------------
# **Part 1: Prepare the session and load your data**
--------------------------------------------------------


## **1.1. Install key dependencies**
---
<font size = 4>

In [None]:
#@markdown ##Play to install
!pip -q install pandas scikit-learn
!pip -q install hdbscan
!pip -q install umap-learn
!pip -q install plotly
!pip -q install tqdm

import ipywidgets as widgets
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
import itertools
from matplotlib.gridspec import GridSpec
import requests


# Function to calculate Cohen's d
def cohen_d(group1, group2):
    diff = group1.mean() - group2.mean()
    n1, n2 = len(group1), len(group2)
    var1 = group1.var()
    var2 = group2.var()
    pooled_var = ((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2)
    d = diff / np.sqrt(pooled_var)
    return d

import requests


def save_dataframe_with_progress(df, path, desc="Saving", chunk_size=50000):
    """Save a DataFrame with a progress bar."""

    # Estimating the number of chunks based on the provided chunk size
    num_chunks = int(len(df) / chunk_size) + 1

    # Create a tqdm instance for progress tracking
    with tqdm(total=len(df), unit="rows", desc=desc) as pbar:
        # Open the file for writing
        with open(path, "w") as f:
            # Write the header once at the beginning
            df.head(0).to_csv(f, index=False)

            for chunk in np.array_split(df, num_chunks):
                chunk.to_csv(f, mode="a", header=False, index=False)
                pbar.update(len(chunk))


def check_for_nans(df, df_name):
    """
    Checks the given DataFrame for NaN values and prints the count for each column containing NaNs.

    Args:
    df (pd.DataFrame): DataFrame to be checked for NaN values.
    df_name (str): The name of the DataFrame as a string, used for printing.
    """
    # Check if the DataFrame has any NaN values and print a warning if it does.
    nan_columns = df.columns[df.isna().any()].tolist()

    if nan_columns:
        for col in nan_columns:
            nan_count = df[col].isna().sum()
            print(f"Column '{col}' in {df_name} contains {nan_count} NaN values.")
    else:
        print(f"No NaN values found in {df_name}.")

    print("Done")



## **1.2. Mount your Google Drive**
---
<font size = 4> To use this notebook on the data present in your Google Drive, you need to mount your Google Drive to this notebook.

<font size = 4> Play the cell below to mount your Google Drive and follow the instructions.

<font size = 4> Once this is done, your data are available in the **Files** tab on the top left of notebook.

In [None]:
#@markdown ##Play the cell to connect your Google Drive to Colab

from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive



## **1.3. Compile your data or load existing dataframes**
---

<font size = 4> Please ensure that your data is properly organised (see above)


In [None]:
#@markdown ##Provide the path to your dataset:

#@markdown ###You have multiple TrackMate files you want to compile, provide the path to your:

import os
import re
import glob
import pandas as pd
from tqdm.notebook import tqdm
import numpy as np
import requests
import zipfile

Folder_path = ''  # @param {type: "string"}

#@markdown ###You have existing dataframes, provide the path to your:

Track_table = '/gdrive/MyDrive/PDAC_adhesion/Results_Final/merged_Tracks.csv'  # @param {type: "string"}
Spot_table = '/gdrive/MyDrive/PDAC_adhesion/Results_Final/merged_Spots.csv'  # @param {type: "string"}

#@markdown ###Provide the path to your Result folder

Results_Folder = "/gdrive/MyDrive/PDAC_adhesion/Results_Filtered2"  # @param {type: "string"}

if not Results_Folder:
    Results_Folder = '/content/Results'  # Default Results_Folder path if not defined

if not os.path.exists(Results_Folder):
    os.makedirs(Results_Folder)  # Create Results_Folder if it doesn't exist

# Print the location of the result folder
print(f"Result folder is located at: {Results_Folder}")

def populate_columns(df, filename):
    cells_conditions = {
        'Mia': 'MiaPaca-2', 'P10': 'Panc10',  'p10': 'Panc10', 'As': 'AsPc1',
        'neu': 'Neutrophil', 'mono': 'Monocyte', 'mon': 'Monocyte'
    }
    flow_speed_conditions = {'p1': 300, 'p2': 200, 'p3': 100, 'p4': 'wash'}
    ilbeta_conditions = {'IL1b': 'IL1b', 'il1b': 'IL1b', 'ctrl': 'CTRL'}

    df['Cells'] = next((v for k, v in cells_conditions.items() if k in filename), 'Unknown')
    df['Flow_speed'] = next((v for k, v in flow_speed_conditions.items() if k in filename), 'Unknown')
    df['ILbeta'] = next((v for k, v in ilbeta_conditions.items() if k in filename), 'CTRL')
    filename_without_extension = os.path.splitext(os.path.basename(filename))[0]
    df['File_name'] = remove_suffix(filename_without_extension)
    df['Condition'] = df['Cells'] + '_' + df['Flow_speed'].astype(str) + '_' + df['ILbeta']
    match = re.search(r'n(\d+)', filename)
    df['experiment_nb'] = int(match.group(1)) if match else 'Unknown'

    return df

def load_and_populate(file_pattern, usecols=None, chunksize=500000):
    df_list = []
    pattern = re.compile(file_pattern)
    files_to_process = [f for f in glob.glob(Folder_path + '/*') if pattern.match(os.path.basename(f))]

    # Metadata list
    metadata_list = []

    for filepath in tqdm(files_to_process, desc="Processing Files"):

        # Get the expected number of rows in the file (subtracting header rows)
        expected_rows = sum(1 for row in open(filepath)) - 4

        # Add to the metadata list
        metadata_list.append({
            'filename': os.path.basename(filepath),
            'expected_rows': expected_rows
        })

        chunked_reader = pd.read_csv(filepath, skiprows=[1, 2, 3], usecols=usecols, chunksize=chunksize)
        for chunk in chunked_reader:
            df_list.append(populate_columns(chunk, filepath))

    if not df_list:
        print(f"No files found with pattern: {file_pattern}")
        return pd.DataFrame()

    merged_df = pd.concat(df_list, ignore_index=True)

    # Verify the total rows in the merged dataframe matches the total expected rows from metadata
    total_expected_rows = sum(item['expected_rows'] for item in metadata_list)
    if len(merged_df) != total_expected_rows:
        print(f"Warning: Mismatch in total rows. Expected {total_expected_rows}, found {len(merged_df)} in the merged dataframe.")
    else:
        print(f"Success: The processed dataframe matches the metadata. Total rows: {len(merged_df)}")

    return merged_df

def sort_and_generate_repeat(merged_df):
    merged_df.sort_values(['Condition', 'experiment_nb'], inplace=True)
    merged_df = merged_df.groupby('Condition', group_keys=False).apply(generate_repeat)
    return merged_df

def generate_repeat(group):
    unique_experiment_nbs = sorted(group['experiment_nb'].unique())
    experiment_nb_to_repeat = {experiment_nb: i+1 for i, experiment_nb in enumerate(unique_experiment_nbs)}
    group['Repeat'] = group['experiment_nb'].map(experiment_nb_to_repeat)
    return group

def remove_suffix(filename):
    suffixes_to_remove = ["-tracks", "-spots"]
    for suffix in suffixes_to_remove:
        if filename.endswith(suffix):
            filename = filename[:-len(suffix)]
            break
    return filename


def validate_tracks_df(df):
    """Validate the tracks dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in tracks dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def validate_spots_df(df):
    """Validate the spots dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID', 'POSITION_X', 'POSITION_Y', 'POSITION_T']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in spots dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def check_unique_id_match(df1, df2):
    df1_ids = set(df1['Unique_ID'])
    df2_ids = set(df2['Unique_ID'])

    # Check if the IDs in the two dataframes match
    if df1_ids == df2_ids:
        print("The Unique_ID values in both dataframes match perfectly!")
    else:
        missing_in_df1 = df2_ids - df1_ids
        missing_in_df2 = df1_ids - df2_ids

        if missing_in_df1:
            print(f"There are {len(missing_in_df1)} Unique_ID values present in the second dataframe but missing in the first.")
            print("Examples of these IDs are:", list(missing_in_df1)[:5])

        if missing_in_df2:
            print(f"There are {len(missing_in_df2)} Unique_ID values present in the first dataframe but missing in the second.")
            print("Examples of these IDs are:", list(missing_in_df2)[:5])

if Folder_path:

    merged_tracks_df = load_and_populate(r'.*tracks.*\.csv')

    if not validate_tracks_df(merged_tracks_df):
        print("Error: Validation failed for merged tracks dataframe.")
    else:
        merged_tracks_df = sort_and_generate_repeat(merged_tracks_df)
        merged_tracks_df['Unique_ID'] = merged_tracks_df['File_name'] + "_" + merged_tracks_df['TRACK_ID'].astype(str)
        save_dataframe_with_progress(merged_tracks_df, Results_Folder + '/' + 'merged_Tracks.csv', desc="Saving Tracks")


    merged_spots_df = load_and_populate(r'.*spots.*\.csv', usecols=['TRACK_ID', 'POSITION_X', 'POSITION_Y', 'POSITION_T'])

    if not validate_spots_df(merged_spots_df):
        print("Error: Validation failed for merged spots dataframe.")
    else:
        merged_spots_df = sort_and_generate_repeat(merged_spots_df)
        merged_spots_df.dropna(subset=['POSITION_X', 'POSITION_Y'], inplace=True)
        merged_spots_df.reset_index(drop=True, inplace=True)
        merged_spots_df['Unique_ID'] = merged_spots_df['File_name'] + "_" + merged_spots_df['TRACK_ID'].astype(str)
        save_dataframe_with_progress(merged_spots_df, Results_Folder + '/' + 'merged_Spots.csv', desc="Saving Spots")
        # Now, call the check function
        check_unique_id_match(merged_spots_df, merged_tracks_df)
        print("...Done")

# For existing dataframes
if Track_table:
    print("Loading track table file....")
    merged_tracks_df = pd.read_csv(Track_table, low_memory=False)
    if not validate_tracks_df(merged_tracks_df):
        print("Error: Validation failed for loaded tracks dataframe.")

if Spot_table:
    print("Loading spot table file....")
    merged_spots_df = pd.read_csv(Spot_table, low_memory=False)
    if not validate_spots_df(merged_spots_df):
        print("Error: Validation failed for loaded spots dataframe.")


In [None]:
#@markdown ##Check Metadata


# Define the metadata columns that are expected to have identical values for each filename
metadata_columns = ['Cells', 'Flow_speed', 'ILbeta', 'Condition', 'experiment_nb', 'Repeat']

# Group the DataFrame by 'File_name' and then check if all entries within each group are identical
consistent_metadata = True
for name, group in merged_tracks_df.groupby('File_name'):
    for col in metadata_columns:
        if not group[col].nunique() == 1:
            consistent_metadata = False
            print(f"Inconsistency found for file: {name} in column: {col}")
            break  # Stop checking other columns for this group and move to the next file
    if not consistent_metadata:
        break  # Stop the entire process if any inconsistency is found

if consistent_metadata:
    print("All files have consistent metadata across the specified columns.")
else:
    print("There are inconsistencies in the metadata. Please check the output for details.")



# Assuming merged_tracks_df has the following columns: 'File_name', 'Cells', 'Flow_speed', 'ILbeta', 'Condition', 'Experiment_nb', 'Repeat'

# Drop duplicates based on the 'File_name' to get a unique list of filenames and their metadata
unique_files_df = merged_tracks_df.drop_duplicates(subset=['File_name'])[['File_name', 'Cells', 'Flow_speed', 'ILbeta', 'Condition', 'experiment_nb', 'Repeat']]

# Reset the index to clean up the DataFrame
unique_files_df.reset_index(drop=True, inplace=True)

# Display the resulting DataFrame in a nicely formatted HTML table
unique_files_df

import pandas as pd

# Assuming 'df' is your DataFrame and it already contains 'Conditions' and 'Repeats' columns.

# Group by 'Conditions' and 'Repeats' and count the occurrences
grouped = unique_files_df.groupby(['Condition', 'Repeat']).size().reset_index(name='counts')

# Check if any combinations have a count greater than 1, which means they are not unique
non_unique_combinations = grouped[grouped['counts'] > 1]

# Print the non-unique combinations
if not non_unique_combinations.empty:
    print("There are non-unique combinations of Conditions and Repeats:")
    print(non_unique_combinations)
else:
    print("All combinations of Conditions and Repeats are unique.")

check_unique_id_match(merged_spots_df, merged_tracks_df)


All files have consistent metadata across the specified columns.
All combinations of Conditions and Repeats are unique.
The Unique_ID values in both dataframes match perfectly!


## **1.4. Filter tracks**


In [None]:
#@markdown ##Filter tracks


merged_tracks_df = merged_tracks_df[merged_tracks_df['NUMBER_SPOTS'] >= 30]
merged_spots_df = merged_spots_df[merged_spots_df['Unique_ID'].isin(merged_tracks_df['Unique_ID'])]


## **1.5. Visualise your tracks**
---

In [None]:
# @title ##Run the cell and choose the file you want to inspect

import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt

if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = merged_spots_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrame based on the selected filename
        filtered_df = merged_spots_df[merged_spots_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
            plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=2)

        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")

# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


In [None]:
# @title ##Speed density plots


# Updated code to visualize distributions using the 'fill' parameter in sns.kdeplot

import seaborn as sns
import matplotlib.pyplot as plt

def plot_distribution_by_condition_updated(df):
    conditions = df['Condition'].unique()

    # Setting up the plotting environment
    sns.set_style("whitegrid")
    plt.figure(figsize=(18, 20))  # Increased height to fit the fourth plot

    # Plotting histograms for TRACK_MEAN_SPEED
    plt.subplot(4, 1, 1)
    for condition in conditions:
        sns.histplot(df[df['Condition'] == condition]['TRACK_MEAN_SPEED'], label=condition, kde=False, bins=30)
    plt.title('Histogram of TRACK_MEAN_SPEED by Condition')
    plt.legend()

    # Plotting histograms for TRACK_MAX_SPEED
    plt.subplot(4, 1, 2)
    for condition in conditions:
        sns.histplot(df[df['Condition'] == condition]['TRACK_MAX_SPEED'], label=condition, kde=False, bins=30)
    plt.title('Histogram of TRACK_MAX_SPEED by Condition')
    plt.legend()

    # Plotting histograms for TRACK_MIN_SPEED
    plt.subplot(4, 1, 3)
    for condition in conditions:
        sns.histplot(df[df['Condition'] == condition]['TRACK_MIN_SPEED'], label=condition, kde=False, bins=30)
    plt.title('Histogram of TRACK_MIN_SPEED by Condition')
    plt.legend()

    # Plotting histograms for TOTAL_DISTANCE_TRAVELED
    plt.subplot(4, 1, 4)
    for condition in conditions:
        sns.histplot(df[df['Condition'] == condition]['TOTAL_DISTANCE_TRAVELED'], label=condition, kde=False, bins=30)
    plt.title('Histogram of TOTAL_DISTANCE_TRAVELED by Condition')
    plt.legend()

    plt.tight_layout()
    plt.show()

# You can call this function with your dataframe like this:
plot_distribution_by_condition_updated(merged_tracks_df)



In [None]:
# @title ##Time points per tracks


import matplotlib.pyplot as plt


# Calculate the count of time points per track
time_points_per_track = merged_spots_df.groupby('Unique_ID').size()

# Plotting
plt.figure(figsize=(10, 6))
time_points_per_track.hist(bins=30, edgecolor='black')
plt.title('Distribution of Time Points per Track')
plt.xlabel('Number of Time Points')
plt.ylabel('Count of Tracks')
plt.grid(False)
plt.show()


# **Part 2: Analyse only the tracks that arrest**

## **2.1. Filter the data and save the dataframe**
---

In [None]:
# @title ##Categorise the tracks based on Start speed and Min speed and End speed

from tqdm.notebook import tqdm
import ipywidgets as widgets
from IPython.display import display, clear_output
import pandas as pd
import numpy as np

def categorize_tracks(dataframe, max_speed_threshold, min_speed_threshold, end_speed_threshold):
    # Categorization based on the criteria for Flow_arrested
    condition = (dataframe['AvgSpeedFirstN'] > max_speed_threshold) & \
                (dataframe['TRACK_MIN_SPEED'] < min_speed_threshold) & \
                (dataframe['AvgSpeedLastN'] < end_speed_threshold)

    dataframe['Behaviour'] = np.where(condition, 'Flow_arrested', 'Other')

    # Keep only rows where Behaviour is 'Flow_arrested'
    return dataframe[dataframe['Behaviour'] == 'Flow_arrested']

def on_button_click(button):
    with output:
        clear_output(wait=True)

        filtered_df = categorize_tracks(
            merged_tracks_df,  # Make sure this DataFrame is correctly referenced
            max_speed_threshold=max_speed_input.value,
            min_speed_threshold=min_speed_input.value,
            end_speed_threshold=end_speed_input.value  # New end speed threshold parameter
        )

        # Calculating count for Flow_arrested
        flow_arrested_count = len(filtered_df)

        # Printing the results
        print("Count of 'Flow_arrested' tracks:")
        print(f"Flow_arrested: {flow_arrested_count}")

        print("\nSaving the results")
        save_dataframe_with_progress(filtered_df, Results_Folder + '/' + 'Flow_Arrested_Tracks.csv')
        print("Done")

# Define widgets for user input
max_speed_input = widgets.FloatText(value=20, description='Max Speed Threshold:', step=0.1)
min_speed_input = widgets.FloatText(value=1, description='Min Speed Threshold:', step=0.1)
end_speed_input = widgets.FloatText(value=3, description='End Speed Threshold:', step=0.1)  # New widget for end speed threshold

apply_button = widgets.Button(description="Categorize Tracks")
output = widgets.Output()

apply_button.on_click(on_button_click)

# Display the widgets
display(widgets.VBox([max_speed_input, min_speed_input, end_speed_input, apply_button, output]))


VBox(children=(FloatText(value=20.0, description='Max Speed Threshold:', step=0.1), FloatText(value=1.0, descr…

In [None]:
# @title ##Filter the data to keep only the Flow_arrested


# Filter merged_tracks_df for 'Flow_arrested' behaviour only
Filtered_merged_tracks_df = merged_tracks_df[merged_tracks_df['Behaviour'] == 'Flow_arrested']

# Filter merged_spots_df to include only spots from 'Flow_arrested' tracks
Filtered_merged_spots_df = merged_spots_df[merged_spots_df['Unique_ID'].isin(Filtered_merged_tracks_df['Unique_ID'])]

check_unique_id_match(Filtered_merged_spots_df, Filtered_merged_tracks_df)

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')

save_dataframe_with_progress(Filtered_merged_spots_df, Results_Folder + '/' + 'Filtered_Spots_Tracks.csv')



## **2.2. Compute track metrics**
---

In [None]:
import pandas as pd
import numpy as np

# @title ##Compute track metrics from slow down to arrest, slow down to end, and arrest to end (to test)


# Function to identify the slowing point and return its coordinates and time
def identify_and_get_slowing_point_details(track, slowdown_threshold=10):
    track = track.sort_values(by='POSITION_T')
    slowing_point_candidates = track[track['Speed'] < slowdown_threshold]
    slowing_start_index = slowing_point_candidates.index.min() if not slowing_point_candidates.empty else None

    if slowing_start_index is not None:
        slowing_point_details = track.loc[slowing_start_index, ['POSITION_X', 'POSITION_Y', 'POSITION_T']]
        return slowing_start_index, slowing_point_details
    else:
        return None, pd.Series({'POSITION_X': np.nan, 'POSITION_Y': np.nan, 'POSITION_T': np.nan})

def identify_and_get_stopping_point_details(track):
    track = track.sort_values(by='POSITION_T')
    # Identify the minimum speed
    min_speed = track['Speed'].min()
    # Find all points where speed equals the minimum speed
    stopping_point_candidates = track[track['Speed'] == min_speed]
    # Sort these candidates by time and take the first one
    stopping_point = stopping_point_candidates.sort_values(by='POSITION_T').head(1)

    if not stopping_point.empty:
        stopping_start_index = stopping_point.index[0]
        stopping_point_details = stopping_point.iloc[0][['POSITION_X', 'POSITION_Y', 'POSITION_T']]
        return stopping_start_index, stopping_point_details
    else:
        return None, pd.Series({'POSITION_X': np.nan, 'POSITION_Y': np.nan, 'POSITION_T': np.nan})

# Function to compute the total distance traveled from the slowing point
def compute_total_distance_from_slowing_point(track):
    distances = np.sqrt(track['POSITION_X'].diff()**2 + track['POSITION_Y'].diff()**2)
    return distances.sum()

# Function to calculate Directionality
def calculate_directionality(group):
    group = group.sort_values('POSITION_T')
    start_point = group.iloc[0][['POSITION_X', 'POSITION_Y']].to_numpy()
    end_point = group.iloc[-1][['POSITION_X', 'POSITION_Y']].to_numpy()

    euclidean_distance = np.linalg.norm(end_point - start_point)
    deltas = np.linalg.norm(np.diff(group[['POSITION_X', 'POSITION_Y']].values, axis=0), axis=1)
    total_path_length = deltas.sum()

    D = euclidean_distance / total_path_length if total_path_length != 0 else 0
    return pd.Series({'Directionality': D})

# Function to calculate FMI
def calculate_fmi(group):
    group = group.sort_values('POSITION_T')
    total_forward_displacement = group['POSITION_X'].diff().fillna(0).sum()
    total_path_length = np.linalg.norm(np.diff(group[['POSITION_X', 'POSITION_Y']].values, axis=0), axis=1).sum()

    FMI = total_forward_displacement / total_path_length if total_path_length != 0 else 0
    return pd.Series({'FMI': FMI})

def compute_track_segments_metrics(track, slowdown_threshold=10):
    # Sort the track by time
    track = track.sort_values(by='POSITION_T')

    # Identify slowing, stopping, and end points
    slowing_point_index, slowing_point_details = identify_and_get_slowing_point_details(track, slowdown_threshold)
    stopping_point_index, stopping_point_details = identify_and_get_stopping_point_details(track)
    end_point_index = track.index.max()
    end_point_details = track.loc[end_point_index, ['POSITION_X', 'POSITION_Y', 'POSITION_T']]

    # Check if the points are identified
    if slowing_point_index is None or stopping_point_index is None or end_point_index is None:
        return pd.Series({'Metrics_Slowdown_to_Arrest': np.nan, 'Metrics_Slowdown_to_End': np.nan, 'Metrics_Arrest_to_End': np.nan,
                          'Euclidean_Distance_Slowdown_to_End': np.nan,
                          'Slowing_Point_X': np.nan, 'Slowing_Point_Y': np.nan, 'Slowing_Point_T': np.nan,
                          'Stopping_Point_X': np.nan, 'Stopping_Point_Y': np.nan, 'Stopping_Point_T': np.nan,
                          'End_Point_X': np.nan, 'End_Point_Y': np.nan, 'End_Point_T': np.nan})

    # Calculate metrics for each segment
    metrics_slowdown_to_arrest = calculate_metrics(track, slowing_point_index, stopping_point_index)
    metrics_slowdown_to_end = calculate_metrics(track, slowing_point_index, end_point_index)
    metrics_arrest_to_end = calculate_metrics(track, stopping_point_index, end_point_index)

    # Compute Euclidean distance from slowdown to track end
    start_point = track.loc[slowing_point_index, ['POSITION_X', 'POSITION_Y']].to_numpy()
    end_point = track.loc[end_point_index, ['POSITION_X', 'POSITION_Y']].to_numpy()
    euclidean_distance_slowdown_to_end = np.linalg.norm(end_point - start_point)

    # Construct the result
    result = {
        **{'Slowdown_to_Arrest_' + k: v for k, v in metrics_slowdown_to_arrest.items()},
        **{'Slowdown_to_End_' + k: v for k, v in metrics_slowdown_to_end.items()},
        **{'Arrest_to_End_' + k: v for k, v in metrics_arrest_to_end.items()},
        'Euclidean_Distance_Slowdown_to_End': euclidean_distance_slowdown_to_end,
        'Slowing_Point_X': slowing_point_details['POSITION_X'],
        'Slowing_Point_Y': slowing_point_details['POSITION_Y'],
        'Slowing_Point_T': slowing_point_details['POSITION_T'],
        'Stopping_Point_X': stopping_point_details['POSITION_X'],
        'Stopping_Point_Y': stopping_point_details['POSITION_Y'],
        'Stopping_Point_T': stopping_point_details['POSITION_T'],
        'End_Point_X': end_point_details['POSITION_X'],
        'End_Point_Y': end_point_details['POSITION_Y'],
        'End_Point_T': end_point_details['POSITION_T']
    }
    return pd.Series(result)


def calculate_metrics(track, start_index, end_index):
    # Subset the track for the given segment
    subset_track = track.loc[start_index:end_index]

    # Compute required metrics for the track segment
    # Add any additional metrics calculation here as needed
    total_distance = compute_total_distance_from_slowing_point(subset_track)
    directionality = calculate_directionality(subset_track)['Directionality']
    fmi = calculate_fmi(subset_track)['FMI']

    # Return a dictionary of calculated metrics
    return {'Total_Distance': total_distance, 'Directionality': directionality, 'FMI': fmi}

# Apply the function to the grouped DataFrame
grouped_df = Filtered_merged_spots_df.groupby('Unique_ID')
track_segments_metrics_df = grouped_df.apply(compute_track_segments_metrics).reset_index()

# Save the new DataFrame
save_dataframe_with_progress(track_segments_metrics_df, Results_Folder + '/' + 'Track_Segments_Metrics.csv')

# Find overlapping columns and remove them from the original DataFrame
overlapping_columns = Filtered_merged_tracks_df.columns.intersection(track_segments_metrics_df.columns).drop('Unique_ID')
Filtered_merged_tracks_df.drop(columns=overlapping_columns, inplace=True)

# Merge the new data into the original DataFrame
Filtered_merged_tracks_df = pd.merge(Filtered_merged_tracks_df, track_segments_metrics_df, on='Unique_ID', how='left')

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')

# Check for NaNs in the updated DataFrame
check_for_nans(Filtered_merged_tracks_df, "Filtered_merged_tracks_df")


In [None]:
# @title ##Plot examples


import matplotlib.pyplot as plt
import pandas as pd
import os

if not os.path.exists(Results_Folder+"/Track_speed"):
    os.makedirs(Results_Folder+"/Track_speed")  # Create Results_Folder if it doesn't exist

def plot_flow_arrested_tracks(tracks_df, spots_df, num_tracks=15, save_path='plots'):
    save_path = Results_Folder+"/Track_speed"

    arrested_track_ids = tracks_df['Unique_ID'].unique()

    plotted_tracks = 0
    for track_id in arrested_track_ids:
        if plotted_tracks >= num_tracks:
            break

        track = spots_df[spots_df['Unique_ID'] == track_id]
        if track.empty:
            continue
        track = track.sort_values(by='POSITION_T')

        # Get the recorded slowdown, stopping, and end time from the tracks dataframe
        recorded_slowdown_time = tracks_df[tracks_df['Unique_ID'] == track_id]['Slowing_Point_T'].iloc[0]
        recorded_stopping_time = tracks_df[tracks_df['Unique_ID'] == track_id]['Stopping_Point_T'].iloc[0]
        recorded_end_time = tracks_df[tracks_df['Unique_ID'] == track_id]['End_Point_T'].iloc[0]

        if pd.isna(recorded_slowdown_time) or pd.isna(recorded_stopping_time) or pd.isna(recorded_end_time):
            # Skip plotting if no slowdown, stopping, or end time is recorded
            continue

        # Find the points in the track data corresponding to the slowdown, stopping, and end time
        slowdown_point = track[track['POSITION_T'] == recorded_slowdown_time].iloc[0]
        stopping_point = track[track['POSITION_T'] == recorded_stopping_time].iloc[0]
        end_point = track[track['POSITION_T'] == recorded_end_time].iloc[0]

        # Plotting
        plt.figure(figsize=(12, 6))
        plt.plot(track['POSITION_T'], track['Speed'], label=f'Track {track_id}', linestyle='-', marker=None)

        # Highlight the recorded slowdown point
        plt.scatter(slowdown_point['POSITION_T'], slowdown_point['Speed'], color='red', zorder=5)
        plt.text(slowdown_point['POSITION_T'], slowdown_point['Speed'], ' Slowdown', color='red')

        # Highlight the recorded stopping point
        plt.scatter(stopping_point['POSITION_T'], stopping_point['Speed'], color='blue', zorder=5)
        plt.text(stopping_point['POSITION_T'], stopping_point['Speed'], ' Stopping', color='blue')

        # Highlight the recorded end point
        plt.scatter(end_point['POSITION_T'], end_point['Speed'], color='green', zorder=5)
        plt.text(end_point['POSITION_T'], end_point['Speed'], ' End', color='green')

        plt.xlabel('Time')
        plt.ylabel('Instantaneous Speed')
        plt.title(f'Instantaneous Speed Over Time for Track {track_id} (Flow Arrested)')
        plt.legend()

        # Save the plot as a PDF file
        plt.savefig(f'{save_path}/Track_{track_id}.pdf')
        plt.show()
        plt.close()  # Close the plot to free up memory

        plotted_tracks += 1

# Example usage
plot_flow_arrested_tracks(Filtered_merged_tracks_df, Filtered_merged_spots_df)


## **2.3. Check the tracks**
---

In [None]:
# @title ##Plot track examples

import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt
import os

# Ensure the Results_Folder exists
if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")

# Extract unique filenames from the dataframe
filenames = Filtered_merged_tracks_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrames based on the selected filename
        filtered_df = Filtered_merged_spots_df[Filtered_merged_spots_df['File_name'] == filename]
        points_df = Filtered_merged_tracks_df[Filtered_merged_tracks_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
            plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=1)

            # Plot the slowdown point if it exists
            if unique_id in points_df['Unique_ID'].values:
                point = points_df[points_df['Unique_ID'] == unique_id]
                if not pd.isna(point['Slowing_Point_X'].values[0]):
                    plt.scatter(point['Slowing_Point_X'].values[0], point['Slowing_Point_Y'].values[0], color='red', s=50, label='Slowdown Point' if unique_id == filtered_df['Unique_ID'].unique()[0] else "")

                # Plot the stopping point if it exists
                if not pd.isna(point['Stopping_Point_X'].values[0]):
                    plt.scatter(point['Stopping_Point_X'].values[0], point['Stopping_Point_Y'].values[0], color='blue', s=50, label='Stopping Point' if unique_id == filtered_df['Unique_ID'].unique()[0] else "")

        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.legend()
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")

# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


In [None]:
# @title ##Run to plot the tracks for all FOV

import matplotlib.pyplot as plt
import os

# Ensure the output directory exists
if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")

# Extract unique filenames from the DataFrame
filenames = Filtered_merged_tracks_df['File_name'].unique()

def plot_coordinates(filename):
    # Filter the DataFrames based on the filename
    filtered_df = Filtered_merged_spots_df[Filtered_merged_spots_df['File_name'] == filename]
    points_df = Filtered_merged_tracks_df[Filtered_merged_tracks_df['File_name'] == filename]

    plt.figure(figsize=(10, 8))
    for unique_id in filtered_df['Unique_ID'].unique():
        unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
        plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=1)

        # Plot the slowdown point if it exists
        if unique_id in points_df['Unique_ID'].values:
            point = points_df[points_df['Unique_ID'] == unique_id]
            if not pd.isna(point['Slowing_Point_X'].values[0]):
                plt.scatter(point['Slowing_Point_X'].values[0], point['Slowing_Point_Y'].values[0], color='red', s=50)

            # Plot the stopping point if it exists
            if not pd.isna(point['Stopping_Point_X'].values[0]):
                plt.scatter(point['Stopping_Point_X'].values[0], point['Stopping_Point_Y'].values[0], color='blue', s=50)

    # Set the plot axes limits
    plt.xlim(0, 650)
    plt.ylim(0, 650)

    plt.xlabel('POSITION_X')
    plt.ylabel('POSITION_Y')
    plt.title(f'Coordinates for {filename}')
    plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
    plt.close()

# Generate and save plots for all filenames
for filename in filenames:
    plot_coordinates(filename)


## **2.4. Plot tracks only from slowdown to (first) arrest**
---

In [None]:
# @title ##Plot track example

import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt
import os

# Ensure the Results_Folder exists
if not os.path.exists(Results_Folder+"/Tracks_slowdown"):
    os.makedirs(Results_Folder+"/Tracks_slowdown")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = Filtered_merged_tracks_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrames based on the selected filename
        filtered_df = Filtered_merged_spots_df[Filtered_merged_spots_df['File_name'] == filename]
        points_df = Filtered_merged_tracks_df[Filtered_merged_tracks_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            if unique_id in points_df['Unique_ID'].values:
                point = points_df[points_df['Unique_ID'] == unique_id]
                slowdown_time = point['Slowing_Point_T'].values[0]
                stopping_time = point['Stopping_Point_T'].values[0]

                # Plot only the track segment between the slowdown and stopping points
                if not pd.isna(slowdown_time) and not pd.isna(stopping_time):
                    unique_df = filtered_df[(filtered_df['Unique_ID'] == unique_id) & (filtered_df['POSITION_T'] >= slowdown_time) & (filtered_df['POSITION_T'] <= stopping_time)].sort_values(by='POSITION_T')
                    plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=1)

                    # Mark the slowdown and stopping points
                    plt.scatter(point['Slowing_Point_X'].values[0], point['Slowing_Point_Y'].values[0], color='red', s=50, label='Slowdown Point')
                    plt.scatter(point['Stopping_Point_X'].values[0], point['Stopping_Point_Y'].values[0], color='blue', s=50, label='Stopping Point')

        plt.xlim(0, 650)
        plt.ylim(0, 650)
        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")

# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


In [None]:
# @title ##Run to generate tracks for all FOV



if not os.path.exists(Results_Folder+"/Tracks_slowdown"):
    os.makedirs(Results_Folder+"/Tracks_slowdown")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = Filtered_merged_tracks_df['File_name'].unique()

def plot_coordinates(filename):
    # Filter the DataFrames based on the filename
    filtered_df = Filtered_merged_spots_df[Filtered_merged_spots_df['File_name'] == filename]
    points_df = Filtered_merged_tracks_df[Filtered_merged_tracks_df['File_name'] == filename]

    plt.figure(figsize=(10, 8))
    for unique_id in filtered_df['Unique_ID'].unique():
        if unique_id in points_df['Unique_ID'].values:
            point = points_df[points_df['Unique_ID'] == unique_id]
            slowdown_time = point['Slowing_Point_T'].values[0]
            stopping_time = point['Stopping_Point_T'].values[0]

            # Plot only the track segment between the slowdown and stopping points
            if not pd.isna(slowdown_time) and not pd.isna(stopping_time):
                unique_df = filtered_df[(filtered_df['Unique_ID'] == unique_id) & (filtered_df['POSITION_T'] >= slowdown_time) & (filtered_df['POSITION_T'] <= stopping_time)].sort_values(by='POSITION_T')
                plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=1)

                # Optionally, mark the slowdown and stopping points
                plt.scatter(point['Slowing_Point_X'].values[0], point['Slowing_Point_Y'].values[0], color='red', s=50)
                plt.scatter(point['Stopping_Point_X'].values[0], point['Stopping_Point_Y'].values[0], color='blue', s=50)

    # Set the plot axes limits
    plt.xlim(0, 650)
    plt.ylim(0, 650)

    plt.xlabel('POSITION_X')
    plt.ylabel('POSITION_Y')
    plt.title(f'Coordinates for {filename}')
    plt.savefig(f"{Results_Folder}/Tracks_slowdown/Tracks_{filename}.pdf")
    plt.close()  # Close the figure to free memory

# Process each file
for filename in filenames:
    plot_coordinates(filename)



##  **2.5. Deprecated functions**


In [None]:
# @title ##Plots behaviour percentage


import matplotlib.pyplot as plt

def plot_behavior_percentage(dataframe):
    # Group by Condition and Behaviour, then calculate the count
    condition_behaviour_count = dataframe.groupby(['Condition', 'Behaviour']).size().unstack(fill_value=0)
    # Calculate the percentage of each Behaviour within each Condition
    condition_behaviour_percentage = condition_behaviour_count.div(condition_behaviour_count.sum(axis=1), axis=0) * 100

    # Plotting
    ax = condition_behaviour_percentage.plot(kind='bar', stacked=True, figsize=(10, 6))
    plt.title('Percentage of Each Behaviour by Condition')
    plt.xlabel('Condition')
    plt.ylabel('Percentage (%)')

    # Move legend to the side
    ax.legend(title='Behaviour', bbox_to_anchor=(1.05, 1), loc='upper left')

    plt.tight_layout()  # Adjust layout to fit everything including the legend
    plt.show()

# Example usage
plot_behavior_percentage(merged_tracks_df)

def plot_behavior_counts(dataframe):
    # Group by Condition and Behaviour, then calculate the count
    condition_behaviour_count = dataframe.groupby(['Condition', 'Behaviour']).size().unstack(fill_value=0)

    # Plotting
    ax = condition_behaviour_count.plot(kind='bar', figsize=(10, 6), width=0.8)
    plt.title('Count of Each Behaviour by Condition')
    plt.xlabel('Condition')
    plt.ylabel('Count')

    # Move legend to the side
    ax.legend(title='Behaviour', bbox_to_anchor=(1.05, 1), loc='upper left')

    plt.tight_layout()  # Adjust layout to fit everything including the legend
    plt.show()

# Example usage
plot_behavior_counts(merged_tracks_df)


In [None]:
# @title ##Plot numbers

import matplotlib.pyplot as plt

def plot_flow_arrested_counts(dataframe):
    try:
        # Filter out the rows where Behaviour is 'Flow Arrested'
        flow_arrested_df = dataframe[dataframe['Behaviour'] == 'Flow_arrested']

        # Checking if the filtered dataframe is empty
        if flow_arrested_df.empty:
            print("Filtered DataFrame is empty. No 'Flow Arrested' tracks found.")
            return

        # Group by Condition and count the occurrences of 'Flow Arrested'
        flow_arrested_count = flow_arrested_df.groupby('Condition').size()

        # Checking if the count series is empty
        if flow_arrested_count.empty:
            print("No counts for 'Flow Arrested' tracks after grouping.")
            return

        # Plotting
        ax = flow_arrested_count.plot(kind='bar', figsize=(10, 6))
        plt.title('Count of Flow Arrested Tracks by Condition')
        plt.xlabel('Condition')
        plt.ylabel('Count of Flow Arrested Tracks')

        for p in ax.patches:
            ax.annotate(str(p.get_height()), (p.get_x() * 1.005, p.get_height() * 1.005))

        plt.tight_layout()
        plt.show()

    except Exception as e:
        print(f"An error occurred: {e}")

# Example usage
plot_flow_arrested_counts(merged_tracks_df)


In [None]:
# @title ##Plot examples

import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt

def plot_behavior_tracks(tracks_df, spots_df, behavior):
    # Filter for tracks with the selected behavior and Flow_speed = 300
    track_ids = tracks_df[(tracks_df['Behaviour'] == behavior) & (tracks_df['Flow_speed'] == 100)]['Unique_ID'].unique()

    for track_id in track_ids[:25]:
        track = spots_df[spots_df['Unique_ID'] == track_id]
        if track.empty:
            print(f"No data found for track ID {track_id}")
            continue
        track = track.sort_values(by='POSITION_T')

        fig, ax1 = plt.subplots(figsize=(12, 6))

        # Plot RollingAvgSpeed with blue color on the primary y-axis
        ax1.plot(track['POSITION_T'], track['RollingAvgSpeed'], label=f'Rolling Average Speed', linestyle='-', marker=None, color='blue')
        ax1.set_xlabel('Time')
        ax1.set_ylabel('Rolling Average Speed', color='blue')
        ax1.tick_params(axis='y', labelcolor='blue')
        ax1.set_title(f'Rolling Average Speed and Rolling Distance Over Time for Track {track_id} ({behavior}, Flow_speed=300)')

        # Create the secondary y-axis and plot RollingDistance with red color
        ax2 = ax1.twinx()
        ax2.plot(track['POSITION_T'], track['Speed'], label=f'Speed', linestyle='-', marker=None, color='red')
        ax2.set_ylabel('Speed', color='red')
        ax2.tick_params(axis='y', labelcolor='red')

        fig.tight_layout()
        plt.show()

def on_change(change):
    if change['name'] == 'value' and (change['new'] != change['old']):
        clear_output(wait=True)
        display(dropdown)
        plot_behavior_tracks(merged_tracks_df, merged_spots_df, change['new'])

# Create a dropdown with unique behaviors
behaviours = merged_tracks_df['Behaviour'].unique()
dropdown = widgets.Dropdown(
    options=behaviours,
    value=behaviours[0],
    description='Behaviour:',
    disabled=False,
)

# Bind the dropdown event to the plotting function
dropdown.observe(on_change)

# Display the dropdown widget
display(dropdown)


In [None]:
# @title ##Plot individual tracks


import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt

def plot_track_data(tracks_df, spots_df, file_name, track_id):
    unique_id = file_name + "_" + track_id

    # Fetch the behavior for the given TRACK_ID
    behavior = tracks_df[tracks_df['Unique_ID'] == unique_id]['Behaviour'].iloc[0]
    print(f"Behaviour for TRACK_ID {track_id}: {behavior}")

    track = spots_df[spots_df['Unique_ID'] == unique_id]
    if track.empty:
        print(f"No data found for track ID {track_id}")
        return

    track = track.sort_values(by='POSITION_T')

    fig, ax1 = plt.subplots(figsize=(12, 6))

    # Plot RollingAvgSpeed with blue color on the primary y-axis
    ax1.plot(track['POSITION_T'], track['RollingAvgSpeed'], label=f'Rolling Average Speed', linestyle='-', marker=None, color='blue')
    ax1.set_xlabel('Time')
    ax1.set_ylabel('Rolling Average Speed', color='blue')
    ax1.tick_params(axis='y', labelcolor='blue')
    ax1.set_title(f'Rolling Average Speed and Rolling Distance Over Time for Track {track_id} ({behavior})')

    # Create the secondary y-axis and plot RollingDistance with red color
    ax2 = ax1.twinx()
    ax2.plot(track['POSITION_T'], track['RollingDistance'], label=f'Rolling Distance', linestyle='-', marker=None, color='red')
    ax2.set_ylabel('Rolling Distance', color='red')
    ax2.tick_params(axis='y', labelcolor='red')

    fig.tight_layout()
    plt.show()

def on_button_click(button):
    clear_output(wait=True)
    file_name = file_dropdown.value
    track_id = track_input.value
    plot_track_data(merged_tracks_df, merged_spots_df, file_name, track_id)
    display(file_dropdown, track_input, plot_button)

# Create a dropdown for File_name
file_names = merged_spots_df['File_name'].unique()
file_dropdown = widgets.Dropdown(
    options=file_names,
    value=file_names[0],
    description='File_name:',
    disabled=False,
)

# Create a text input for TRACK_ID
track_input = widgets.Text(
    value='',
    placeholder='Enter TRACK_ID',
    description='TRACK_ID:',
    disabled=False
)

# Create a button to trigger the plotting
plot_button = widgets.Button(description="Plot Data")
plot_button.on_click(on_button_click)

# Display the widgets
display(file_dropdown, track_input, plot_button)


--------------------------------------------------------
# **Part 3. Compute the distance to the nearest junction and nuclei**
--------------------------------------------------------


## **3.1. Load your Nuclei and Junction segmentation maps**


In [None]:
from tqdm.notebook import tqdm
import pandas as pd
from skimage import io
import matplotlib.pyplot as plt
from tifffile import imread
from skimage.measure import label, regionprops, find_contours
from scipy.ndimage import distance_transform_edt

Video_path = '/gdrive/MyDrive/PDAC_adhesion/Landmark'  # @param {type: "string"}

Pixel_calibration = 0.6496139

# @title #Process the dataset

def compute_distances_using_distance_transform(df, image_dir):
    """
    Compute distances to the nearest labeled pixel for each spot using the distance transform method.
    """
    for file_name in tqdm(df['File_name'].unique(), desc="Processing files"):
        # Paths to the label images
        junctions_img_path = f"{image_dir}/{file_name}_HUVEC_junctions.tif"
        nuclei_img_path = f"{image_dir}/{file_name}_HUVEC_nuclei.tif"

        try:
            junctions_img = io.imread(junctions_img_path)
            nuclei_img = io.imread(nuclei_img_path)
        except FileNotFoundError:
            print(f"Error: Images for {file_name} not found. Skipping...")
            continue

        # Compute distance transform
        distance_transform_junctions = distance_transform_edt(junctions_img == 0) * Pixel_calibration
        distance_transform_nuclei = distance_transform_edt(nuclei_img == 0) * Pixel_calibration

        # Process coordinates and update the distances
        for idx, row in tqdm(df[df['File_name'] == file_name].iterrows(), total=df[df['File_name'] == file_name].shape[0], desc=f"Processing coordinates for {file_name}", leave=False):
            y, x = int(row['POSITION_Y'] / Pixel_calibration), int(row['POSITION_X'] / Pixel_calibration)
            df.loc[idx, 'DistanceToJunctions'] = distance_transform_junctions[y, x]
            df.loc[idx, 'DistanceToNuclei'] = distance_transform_nuclei[y, x]

    return df

compute_distances_using_distance_transform(Filtered_merged_spots_df, Video_path)



## **3.2. Visual validation**


In [None]:
# @title #Visual validation
from ipywidgets import Button, interactive, IntSlider, widgets
from ipywidgets import Output
from IPython.display import clear_output
from tifffile import imread


Pixel_calibration = 0.6496139
error_output = Output()

def display_error_message(message):
    print(f"Error: {message}")

filename_dropdown = widgets.Dropdown(
    options=merged_spots_df['File_name'].unique(),
    description='Filename:',
    disabled=False
)

# Function to visualize distances for a given filename using pre-computed distances from merged_spot_df
def visualize_precomputed_distances_for_filename(filename):
    # Construct paths for the label images
    junctions_img_path = f"{Video_path}/{filename}_HUVEC_junctions.tif"
    nuclei_img_path = f"{Video_path}/{filename}_HUVEC_nuclei.tif"

    try:
        junctions_img = imread(junctions_img_path)
        nuclei_img = imread(nuclei_img_path)

        # Convert images to binary
        junctions_img[junctions_img > 0] = 255
        nuclei_img[nuclei_img > 0] = 255

    except FileNotFoundError:
        display_error_message(f"Images for {filename} not found.")
        return

    # Combine images into an RGB image
    combined_img = np.zeros((nuclei_img.shape[0], nuclei_img.shape[1], 3), dtype=np.uint8)
    combined_img[:, :, 0] = junctions_img  # Red channel
    combined_img[:, :, 1] = nuclei_img  # Green channel
    combined_img[:, :, 2] = junctions_img  # Blue channel

    # Fetch the coordinates and precomputed distances for the selected filename
    data_for_frame = Filtered_merged_spots_df[Filtered_merged_spots_df['File_name'] == filename]

    # Define a function to update the display based on the frame slider
    def update_display(frame_number):
        plt.figure(figsize=(10, 10))

        # Use combined RGB image for visualization
        frame = combined_img.copy()
        coords_for_frame = data_for_frame[data_for_frame['POSITION_T'] == frame_number]
        for idx, row in coords_for_frame.iterrows():
            x, y = int(row['POSITION_X']/Pixel_calibration), int(row['POSITION_Y']/Pixel_calibration)

            distance_to_junction = row['DistanceToJunctions']/Pixel_calibration
            distance_to_nuclei = row['DistanceToNuclei']/Pixel_calibration

            circle_junction = plt.Circle((x, y), distance_to_junction, color='red', fill=False)
            circle_nuclei = plt.Circle((x, y), distance_to_nuclei, color='blue', fill=False)

            plt.gca().add_patch(circle_junction)
            plt.gca().add_patch(circle_nuclei)
            plt.scatter(x, y, c='yellow')  # Coordinate point

        plt.imshow(frame)
        plt.title(f"Frame {frame_number} for {filename}")
        plt.show()

    # Create a slider for frame navigation
    max_position_t = data_for_frame['POSITION_T'].max()
    frame_slider = widgets.IntSlider(min=0, max=max_position_t, description='Frame')

    # Display the visualization with interactive for more reactive updates
    w = interactive(update_display, frame_number=frame_slider)
    display(w)


# Button to trigger visualization
plot_button_filename = Button(description="Visualize Distances")

# Function to handle button click for filename visualization
def on_plot_button_filename_click(b):
    filename = filename_dropdown.value
    # Clear the previous output
    clear_output()

    display(filename_dropdown)
    display(plot_button_filename)
    display(error_output)
    visualize_precomputed_distances_for_filename(filename)

# Bind the function to the button click event
plot_button_filename.on_click(on_plot_button_filename_click)

# Display the widgets for filename visualization
display(filename_dropdown)
display(plot_button_filename)
display(error_output)

Dropdown(description='Filename:', index=22, options=('As_Ctrl_n5_p1_400_2650', 'As_Ctrl_n5_p2_3650_5900', 'As_…

Button(description='Visualize Distances', style=ButtonStyle())

Output()

interactive(children=(IntSlider(value=0, description='Frame', max=90), Output()), _dom_classes=('widget-intera…

## **3.3. Extract distance and speed at landing, (first) arrest and end**


In [None]:
# @title #Extract distance and speed at landing, arrest and end


from tqdm.notebook import tqdm
import numpy as np
import pandas as pd

def get_distances_and_speeds(track_df, spots_df):
    results = []

    for _, track in tqdm(track_df.iterrows(), total=track_df.shape[0], desc="Processing Tracks"):
        unique_id = track['Unique_ID']
        slowing_down_time = track['Slowing_Point_T']
        stopping_time = track['Stopping_Point_T']

        # Filter spots for this track
        track_spots = spots_df[spots_df['Unique_ID'] == unique_id]

        def get_spot_values_at_time(spot_df, time):
            spot = spot_df[spot_df['POSITION_T'] == time]
            if not spot.empty:
                return {
                    'distance_to_nuclei': spot['DistanceToNuclei'].iloc[0],
                    'distance_to_junctions': spot['DistanceToJunctions'].iloc[0],
                    'speed': spot['Speed'].iloc[0]
                }
            else:
                return {
                    'distance_to_nuclei': np.nan,
                    'distance_to_junctions': np.nan,
                    'speed': np.nan
                }

        # Get distances and speed at the slowing down time
        slowing_values = get_spot_values_at_time(track_spots, slowing_down_time)

        # Get distances and speed at the stopping time
        stopping_values = get_spot_values_at_time(track_spots, stopping_time)

        # Get distances and speed at the end of the track
        end_time = track_spots['POSITION_T'].max()
        end_values = get_spot_values_at_time(track_spots, end_time)

        # Append results
        results.append({
            'Unique_ID': unique_id,
            'DistanceToNuclei_Slowing': slowing_values['distance_to_nuclei'],
            'DistanceToJunctions_Slowing': slowing_values['distance_to_junctions'],
            'Speed_Slowing': slowing_values['speed'],
            'DistanceToNuclei_Stopping': stopping_values['distance_to_nuclei'],
            'DistanceToJunctions_Stopping': stopping_values['distance_to_junctions'],
            'Speed_Stopping': stopping_values['speed'],
            'DistanceToNuclei_End': end_values['distance_to_nuclei'],
            'DistanceToJunctions_End': end_values['distance_to_junctions'],
            'Speed_End': end_values['speed']
        })

    return pd.DataFrame(results)

# Usage example
distances_and_speeds_df = get_distances_and_speeds(Filtered_merged_tracks_df, Filtered_merged_spots_df)

# Merging process
overlapping_columns = Filtered_merged_tracks_df.columns.intersection(distances_and_speeds_df.columns).drop('Unique_ID')
Filtered_merged_tracks_df.drop(columns=overlapping_columns, inplace=True)
Filtered_merged_tracks_df = pd.merge(Filtered_merged_tracks_df, distances_and_speeds_df, on='Unique_ID', how='left')

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')


Processing Tracks:   0%|          | 0/5398 [00:00<?, ?it/s]

Saving:   0%|          | 0/5398 [00:00<?, ?rows/s]

## **3.4. Plot your results**


### **3.4.1 Combine the flow speeds**


In [None]:
# @title #Combine the flow speeds (optional)


def categorize_flow_speed(speed):
    if speed in ['100','200', '300', 'wash']:
        return 'Combined'
    else:
        return speed

# Apply the function to create the Flow_speed_category column
Filtered_merged_tracks_df['Flow_speed_category'] = Filtered_merged_tracks_df['Flow_speed'].apply(categorize_flow_speed)

# Print unique Flow_speed_category values
unique_flow_speed_categories = Filtered_merged_tracks_df['Flow_speed_category'].unique()
print(unique_flow_speed_categories)

Filtered_merged_tracks_df['Condition_category'] = Filtered_merged_tracks_df['Cells'] + '_' + Filtered_merged_tracks_df['Flow_speed_category'].astype(str) + '_' + Filtered_merged_tracks_df['ILbeta']


['Combined']


### **3.4.2 Plots the various distances per cells**


In [None]:
# @title #Plots the various distances per cells


import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import os

# Variables to compare (replace these with your actual column names)
variables_to_compare = ['DistanceToNuclei_Slowing', 'DistanceToJunctions_Slowing', 'DistanceToNuclei_Stopping', 'DistanceToJunctions_Stopping', 'DistanceToNuclei_End', 'DistanceToJunctions_End' ]

# Unique condition categories
unique_conditions = Filtered_merged_tracks_df['Condition_category'].unique()

# Directory to save plots and data
save_dir = Results_Folder+"/Distance_nucleus_junctions"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)  # Create Results_Folder if it doesn't exist

# Loop through each unique condition
for condition in unique_conditions:
    # Filter DataFrame for the current condition
    df_filtered = Filtered_merged_tracks_df[Filtered_merged_tracks_df['Condition_category'] == condition]

    # Prepare data for plotting
    plot_data = pd.melt(df_filtered, id_vars=['Condition_category'], value_vars=variables_to_compare, var_name='Variable', value_name='Value')

    # Create a figure for the plot
    plt.figure(figsize=(10, 6))

    # Create a plot comparing the variables
    sns.boxplot(x='Variable', y='Value', data=plot_data)
    plt.title(f'Comparison of Variables for {condition}')
    plt.xlabel('Variable')
    plt.ylabel('Value')

    # Save the figure as a PDF
    plot_filename = f"{save_dir}/{condition}_comparison_plot.pdf"
    plt.savefig(plot_filename)
    plt.show()
    plt.close()

    # Save the data used for the plot as a CSV
    data_filename = f"{save_dir}/{condition}_data.csv"
    plot_data.to_csv(data_filename, index=False)


-------------------------------------------

# **Part 4. Plot track parameters**
-------------------------------------------

##**Statistical analyses**
### Cohen's d (Effect Size):
<font size = 4>Cohen's d measures the size of the difference between two groups, normalized by their pooled standard deviation. Values can be interpreted as small (0 to 0.2), medium (0.2 to 0.5), or large (0.5 and above) effects. It helps quantify how significant the observed difference is, beyond just being statistically significant.

### Randomization Test:
<font size = 4>This non-parametric test evaluates if observed differences between conditions could have arisen by random chance. It shuffles condition labels multiple times, recalculating the Cohen's d each time. The resulting p-value, which indicates the likelihood of observing the actual difference by chance, provides evidence against the null hypothesis: a smaller p-value implies stronger evidence against the null.

### Bonferroni Correction:
<font size = 4>Given multiple comparisons, the Bonferroni Correction adjusts significance thresholds to mitigate the risk of false positives. By dividing the standard significance level (alpha) by the number of tests, it ensures that only robust findings are considered significant. However, it's worth noting that this method can be conservative, sometimes overlooking genuine effects.


In [None]:
# @title #Combine the flow speeds (optional)


def categorize_flow_speed(speed):
    if speed in ['100','200', '300', 'wash']:
        return 'Combined'
    else:
        return speed

# Apply the function to create the Flow_speed_category column
Filtered_merged_tracks_df['Flow_speed_category'] = Filtered_merged_tracks_df['Flow_speed'].apply(categorize_flow_speed)

# Print unique Flow_speed_category values
unique_flow_speed_categories = Filtered_merged_tracks_df['Flow_speed_category'].unique()
print(unique_flow_speed_categories)

Filtered_merged_tracks_df['Condition_category'] = Filtered_merged_tracks_df['Cells'] + '_' + Filtered_merged_tracks_df['Flow_speed_category'].astype(str) + '_' + Filtered_merged_tracks_df['ILbeta']


In [None]:
# @title ##Plot track parameters

# Import necessary libraries
import os
import itertools
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from matplotlib.backends.backend_pdf import PdfPages
import ipywidgets as widgets
from matplotlib.ticker import FixedLocator



# Check and create necessary directories
if not os.path.exists(f"{Results_Folder}/track_parameters_plots"):
    os.makedirs(f"{Results_Folder}/track_parameters_plots")

if not os.path.exists(f"{Results_Folder}/track_parameters_plots/pdf"):
    os.makedirs(f"{Results_Folder}/track_parameters_plots/pdf")

if not os.path.exists(f"{Results_Folder}/track_parameters_plots/csv"):
    os.makedirs(f"{Results_Folder}/track_parameters_plots/csv")

# Helper functions
def cohen_d(group1, group2):
    """Compute Cohen's d."""
    mean_diff = group1.mean() - group2.mean()
    pooled_var = (len(group1) * group1.var() + len(group2) * group2.var()) / (len(group1) + len(group2))
    d = mean_diff / pooled_var**0.5
    return d

def get_selectable_columns(df):
    """Get columns that can be plotted."""
    exclude_cols = ['Condition', 'File_name', 'Flow_speed', 'Cells', 'ILbeta', 'Repeat', 'Unique_ID',
                    'experiment_nb', 'LABEL', 'TRACK_INDEX', 'TRACK_ID', 'TRACK_X_LOCATION',
                    'TRACK_Y_LOCATION', 'TRACK_Z_LOCATION']
    return [col for col in df.columns if col not in exclude_cols]

def display_variable_checkboxes(selectable_columns):
    """Display checkboxes for selecting variables."""
    variable_checkboxes = [widgets.Checkbox(value=False, description=col) for col in selectable_columns]
    display(widgets.VBox([
        widgets.Label('Variables to Plot:'),
        widgets.GridBox(variable_checkboxes, layout=widgets.Layout(grid_template_columns="repeat(%d, 300px)" % 3))
    ]))
    return variable_checkboxes


def create_filename(base, selected_cells, selected_speeds, selected_ilbetas, selected_behaviours, var):
    """Create a unique filename based on selected options."""
    def summarize_options(options):
        if len(options) > 3:
            return f"{len(options)}options"
        return "_".join(options)

    selected_options = "_".join([
        summarize_options(selected_cells),
        summarize_options(selected_speeds),
        summarize_options(selected_ilbetas),
        summarize_options(selected_behaviours)
    ])

    filename = f"{base}_{selected_options}_{var}.pdf"
    return filename.replace(" ", "_")  # Replace spaces with underscores for file compatibility


# Create checkboxes for various attributes
cells_checkboxes = [widgets.Checkbox(value=False, description=str(cell)) for cell in Filtered_merged_tracks_df['Cells'].unique()]
flow_speed_checkboxes = [widgets.Checkbox(value=False, description=str(speed)) for speed in Filtered_merged_tracks_df['Flow_speed_category'].unique()]
ilbeta_checkboxes = [widgets.Checkbox(value=False, description=str(ilbeta)) for ilbeta in Filtered_merged_tracks_df['ILbeta'].unique()]
# Create checkboxes for Behaviour
behaviour_checkboxes = [widgets.Checkbox(value=False, description=str(behaviour)) for behaviour in Filtered_merged_tracks_df['Behaviour'].unique()]

# Display checkboxes
display(widgets.VBox([
    widgets.Label('Cells:'),
    widgets.GridBox(cells_checkboxes, layout=widgets.Layout(grid_template_columns="repeat(%d, 100px)" % 4)),
    widgets.Label('Flow Speed:'),
    widgets.GridBox(flow_speed_checkboxes, layout=widgets.Layout(grid_template_columns="repeat(%d, 100px)" % 4)),
    widgets.Label('ILbeta:'),
    widgets.GridBox(ilbeta_checkboxes, layout=widgets.Layout(grid_template_columns="repeat(%d, 100px)" % 4)),
    widgets.Label('Behaviour:'),
    widgets.GridBox(behaviour_checkboxes, layout=widgets.Layout(grid_template_columns="repeat(%d, 100px)" % 4))
]))

# Convert Flow_speed to string for checkbox matching
Filtered_merged_tracks_df['Flow_speed_category'] = Filtered_merged_tracks_df['Flow_speed_category'].astype(str)

# Define the plotting function
def plot_selected_vars(button, variable_checkboxes):
    print("Plotting in progress...")

    # Fetch selected values
    selected_cells = [box.description for box in cells_checkboxes if box.value]
    selected_speeds = [box.description for box in flow_speed_checkboxes if box.value]
    selected_ilbetas = [box.description for box in ilbeta_checkboxes if box.value]
    variables_to_plot = [box.description for box in variable_checkboxes if box.value]
    selected_behaviours = [box.description for box in behaviour_checkboxes if box.value]


    # Filter dataframe
    filtered_df = Filtered_merged_tracks_df.copy()
    filtered_df = filtered_df[filtered_df['Cells'].isin(selected_cells)]
    filtered_df = filtered_df[filtered_df['Flow_speed_category'].isin(selected_speeds)]
    filtered_df = filtered_df[filtered_df['ILbeta'].isin(selected_ilbetas)]
    filtered_df = filtered_df[filtered_df['Behaviour'].isin(selected_behaviours)]


    # Initialize matrices for statistics
    effect_size_matrices = {}
    p_value_matrices = {}
    bonferroni_matrices = {}

    unique_conditions = filtered_df['Condition_category'].unique().tolist()
    num_comparisons = len(unique_conditions) * (len(unique_conditions) - 1) // 2
    alpha = 0.05
    corrected_alpha = alpha / num_comparisons
    n_iterations = 1000

# Loop through each variable to plot
    for var in variables_to_plot:

      filename = create_filename("track_parameters_plots", selected_cells, selected_speeds, selected_ilbetas, selected_behaviours, var)
      pdf_path = os.path.join(Results_Folder, "track_parameters_plots", "pdf", filename)
      csv_path = os.path.join(Results_Folder, "track_parameters_plots", "csv", f"{filename[:-4]}.csv")  # Remove '.pdf' and add '.csv'

      pdf_pages = PdfPages(pdf_path)

      effect_size_matrix = pd.DataFrame(index=unique_conditions, columns=unique_conditions)
      p_value_matrix = pd.DataFrame(index=unique_conditions, columns=unique_conditions)
      bonferroni_matrix = pd.DataFrame(index=unique_conditions, columns=unique_conditions)

      for cond1, cond2 in itertools.combinations(unique_conditions, 2):
        group1 = filtered_df[filtered_df['Condition_category'] == cond1][var]
        group2 = filtered_df[filtered_df['Condition_category'] == cond2][var]

        original_d = abs(cohen_d(group1, group2))
        effect_size_matrix.loc[cond1, cond2] = original_d
        effect_size_matrix.loc[cond2, cond1] = original_d  # Mirroring

        count_extreme = 0
        for i in range(n_iterations):
            combined = pd.concat([group1, group2])
            shuffled = combined.sample(frac=1, replace=False).reset_index(drop=True)
            new_group1 = shuffled[:len(group1)]
            new_group2 = shuffled[len(group1):]

            new_d = cohen_d(new_group1, new_group2)
            if np.abs(new_d) >= np.abs(original_d):
                count_extreme += 1

        p_value = count_extreme / n_iterations
        p_value_matrix.loc[cond1, cond2] = p_value
        p_value_matrix.loc[cond2, cond1] = p_value  # Mirroring

        # Apply Bonferroni correction
        bonferroni_corrected_p_value = min(p_value * num_comparisons, 1.0)
        bonferroni_matrix.loc[cond1, cond2] = bonferroni_corrected_p_value
        bonferroni_matrix.loc[cond2, cond1] = bonferroni_corrected_p_value  # Mirroring

      effect_size_matrices[var] = effect_size_matrix
      p_value_matrices[var] = p_value_matrix
      bonferroni_matrices[var] = bonferroni_matrix

    # Concatenate the three matrices side-by-side
      combined_df = pd.concat(
        [
            effect_size_matrices[var].rename(columns={col: f"{col} (Effect Size)" for col in effect_size_matrices[var].columns}),
            p_value_matrices[var].rename(columns={col: f"{col} (P-Value)" for col in p_value_matrices[var].columns}),
            bonferroni_matrices[var].rename(columns={col: f"{col} (Bonferroni-corrected P-Value)" for col in bonferroni_matrices[var].columns})
        ], axis=1
    )

    # Save the combined DataFrame to a CSV file
      combined_df.to_csv(csv_path)

    # Create a new figure
      fig = plt.figure(figsize=(16, 10))

    # Create a gridspec for 2 rows and 4 columns
      gs = GridSpec(2, 3, height_ratios=[1.5, 1])

    # Create the ax for boxplot using the gridspec
      ax_box = fig.add_subplot(gs[0, :])

    # Extract the data for this variable
      data_for_var = filtered_df[['Condition_category', var, 'Repeat', 'File_name' ]]

    # Save the data_for_var to a CSV for replotting
      data_for_var.to_csv(f"{Results_Folder}/track_parameters_plots/csv/{filename}_boxplot_data.csv", index=False)

    # Calculate the Interquartile Range (IQR) using the 25th and 75th percentiles
      Q1 = filtered_df[var].quantile(0.25)
      Q3 = filtered_df[var].quantile(0.75)
      IQR = Q3 - Q1

    # Define bounds for the outliers
      multiplier = 10
      lower_bound = Q1 - multiplier * IQR
      upper_bound = Q3 + multiplier * IQR

    # Plotting
      sns.boxplot(x='Condition_category', y=var, data=filtered_df, ax=ax_box, color='lightgray')  # Boxplot
      sns.stripplot(x='Condition_category', y=var, data=filtered_df, ax=ax_box, hue='Repeat', dodge=True, jitter=True, alpha=0.2)  # Individual data points
      ax_box.set_ylim([max(min(filtered_df[var]), lower_bound), min(max(filtered_df[var]), upper_bound)])
      ax_box.set_title(f"{var}")
      ax_box.set_xlabel('Condition')
      ax_box.set_ylabel(var)
      tick_labels = ax_box.get_xticklabels()
      tick_locations = ax_box.get_xticks()
      ax_box.xaxis.set_major_locator(FixedLocator(tick_locations))
      ax_box.set_xticklabels(tick_labels, rotation=90)
      ax_box.legend(loc='center left', bbox_to_anchor=(1, 0.5), title='Repeat')

    # Statistical Analyses and Heatmaps

    # Effect Size heatmap ax
      ax_d = fig.add_subplot(gs[1, 0])
      sns.heatmap(effect_size_matrices[var].fillna(0), annot=True, cmap="viridis", cbar=True, square=True, ax=ax_d, vmax=1)
      ax_d.set_title(f"Effect Size (Cohen's d) for {var}")

    # p-value heatmap ax
      ax_p = fig.add_subplot(gs[1, 1])
      sns.heatmap(p_value_matrices[var].fillna(1), annot=True, cmap="viridis_r", cbar=True, square=True, ax=ax_p, vmax=0.1)
      ax_p.set_title(f"Randomization Test p-value for {var}")

    # Bonferroni corrected p-value heatmap ax
      ax_bonf = fig.add_subplot(gs[1, 2])
      sns.heatmap(bonferroni_matrices[var].fillna(1), annot=True, cmap="viridis_r", cbar=True, square=True, ax=ax_bonf, vmax=0.1)
      ax_bonf.set_title(f"Bonferroni-corrected p-value for {var}")

      plt.tight_layout()
      pdf_pages.savefig(fig)
# Close the PDF
      pdf_pages.close()

# Display variable checkboxes and button
selectable_columns = get_selectable_columns(Filtered_merged_tracks_df)
variable_checkboxes = display_variable_checkboxes(selectable_columns)
button = widgets.Button(description="Plot Selected Variables", layout=widgets.Layout(width='400px'))
button.on_click(lambda b: plot_selected_vars(b, variable_checkboxes))
display(button)


# **Part 5. Count the number of arrest using peak detection (In progress)**

In [None]:
from scipy.signal import find_peaks

from scipy.signal import find_peaks

def detect_peaks(spots_df, tracks_df, track_id, height=None, threshold=None, distance=None):
    # Extract the track data from spots_df
    track_data = spots_df[spots_df['Unique_ID'] == track_id]

    # Retrieve the first arrest time from tracks_df
    first_arrest_time = tracks_df[tracks_df['Unique_ID'] == track_id]['Stopping_Point_T'].iloc[0]

    # Segment the data after the first arrest
    post_arrest_data = track_data[track_data['POSITION_T'] > first_arrest_time]

    # Apply peak detection with additional parameters
    peaks, _ = find_peaks(post_arrest_data['Speed'], height=height, threshold=threshold, distance=distance)

    return post_arrest_data, peaks


def batch_process_peak_detection(spots_df, tracks_df, height=None, threshold=None, distance=None):
    peak_info = []

    for track_id in tracks_df['Unique_ID'].unique():
        post_arrest_data, peaks = detect_peaks(spots_df, tracks_df, track_id, height, threshold, distance)
        peak_times = post_arrest_data.iloc[peaks]['POSITION_T'].values
        peak_count = len(peaks)

        peak_info.append({'Unique_ID': track_id, 'Number_of_Peaks': peak_count, 'Peak_Times': peak_times})

    return pd.DataFrame(peak_info)

# Example usage parameters
height = 0.5    # Minimum height of peaks
threshold = 0.1 # Required threshold (vertical distance) to neighboring samples
distance = 10   # Required minimal horizontal distance (>= 1) in number of samples between neighboring peaks

peak_results_df = batch_process_peak_detection(Filtered_merged_spots_df, Filtered_merged_tracks_df, height, threshold, distance)


# Example: Plotting the first 5 tracks with the highest number of peaks
top_peak_tracks = peak_results_df.sort_values(by='Number_of_Peaks', ascending=False).head(5)['Unique_ID']
for track_id in top_peak_tracks:
    plot_track_with_refined_peaks(Filtered_merged_tracks_df, peak_results_df, track_id)

NameError: ignored

In [None]:
import matplotlib.pyplot as plt

def plot_track_with_refined_peaks(tracks_df, peak_results_df, track_id):
    # Check if the track_id exists in both dataframes
    if track_id in tracks_df['Unique_ID'].values and track_id in peak_results_df['Unique_ID'].values:
        # Retrieve track data and peak information
        track_data = tracks_df[tracks_df['Unique_ID'] == track_id]
        peak_times = peak_results_df[peak_results_df['Unique_ID'] == track_id]['Peak_Times'].iloc[0]

        plt.figure(figsize=(12, 6))
        plt.plot(track_data['POSITION_T'], track_data['Speed'], label=f'Track {track_id}', linestyle='-', marker='o')

        # Highlight the peaks
        for peak_time in peak_times:
            if peak_time in track_data['POSITION_T'].values:
                peak_speed = track_data[track_data['POSITION_T'] == peak_time]['Speed'].iloc[0]
                plt.scatter(peak_time, peak_speed, color='red', zorder=5)
                plt.text(peak_time, peak_speed, ' Peak', color='red', verticalalignment='bottom')

        plt.xlabel('Time')
        plt.ylabel('Speed')
        plt.title(f'Speed Profile with Detected Peaks for Track {track_id}')
        plt.legend()
        plt.show()
    else:
        print(f"Track ID {track_id} not found in the datasets.")


# Example: Plotting the first 5 tracks with the highest number of peaks
top_peak_tracks = peak_results_df.sort_values(by='Number_of_Peaks', ascending=False).head(5)['Unique_ID']
for track_id in top_peak_tracks:
    plot_track_with_refined_peaks(Filtered_merged_tracks_df, peak_results_df, track_id)


KeyError: ignored

<Figure size 1200x600 with 0 Axes>

# **Part 6. DEPRECATED**
----------------------------------------------
DEPRECATED

----------------------------------------------

In [None]:
import pandas as pd
import numpy as np

# @title ##Compute track metrics from slow down to arrest

# Function to identify the slowing point and return its coordinates and time
def identify_and_get_slowing_point_details(track, slowdown_threshold=10):
    track = track.sort_values(by='POSITION_T')
    slowing_point_candidates = track[track['Speed'] < slowdown_threshold]
    slowing_start_index = slowing_point_candidates.index.min() if not slowing_point_candidates.empty else None

    if slowing_start_index is not None:
        slowing_point_details = track.loc[slowing_start_index, ['POSITION_X', 'POSITION_Y', 'POSITION_T']]
        return slowing_start_index, slowing_point_details
    else:
        return None, pd.Series({'POSITION_X': np.nan, 'POSITION_Y': np.nan, 'POSITION_T': np.nan})

def identify_and_get_stopping_point_details(track):
    track = track.sort_values(by='POSITION_T')
    # Identify the minimum speed
    min_speed = track['Speed'].min()
    # Find all points where speed equals the minimum speed
    stopping_point_candidates = track[track['Speed'] == min_speed]
    # Sort these candidates by time and take the first one
    stopping_point = stopping_point_candidates.sort_values(by='POSITION_T').head(1)

    if not stopping_point.empty:
        stopping_start_index = stopping_point.index[0]
        stopping_point_details = stopping_point.iloc[0][['POSITION_X', 'POSITION_Y', 'POSITION_T']]
        return stopping_start_index, stopping_point_details
    else:
        return None, pd.Series({'POSITION_X': np.nan, 'POSITION_Y': np.nan, 'POSITION_T': np.nan})

# Function to compute the total distance traveled from the slowing point
def compute_total_distance_from_slowing_point(track):
    distances = np.sqrt(track['POSITION_X'].diff()**2 + track['POSITION_Y'].diff()**2)
    return distances.sum()

# Function to calculate Directionality
def calculate_directionality(group):
    group = group.sort_values('POSITION_T')
    start_point = group.iloc[0][['POSITION_X', 'POSITION_Y']].to_numpy()
    end_point = group.iloc[-1][['POSITION_X', 'POSITION_Y']].to_numpy()

    euclidean_distance = np.linalg.norm(end_point - start_point)
    deltas = np.linalg.norm(np.diff(group[['POSITION_X', 'POSITION_Y']].values, axis=0), axis=1)
    total_path_length = deltas.sum()

    D = euclidean_distance / total_path_length if total_path_length != 0 else 0
    return pd.Series({'Directionality': D})

# Function to calculate FMI
def calculate_fmi(group):
    group = group.sort_values('POSITION_T')
    total_forward_displacement = group['POSITION_X'].diff().fillna(0).sum()
    total_path_length = np.linalg.norm(np.diff(group[['POSITION_X', 'POSITION_Y']].values, axis=0), axis=1).sum()

    FMI = total_forward_displacement / total_path_length if total_path_length != 0 else 0
    return pd.Series({'FMI': FMI})

# Function to compute track parameters, total distance, and record the slowing point details
def compute_parameters_from_slowing_point(track, slowdown_threshold=10):
    slowing_point_index, slowing_point_details = identify_and_get_slowing_point_details(track, slowdown_threshold)
    stopping_point_index, stopping_point_details = identify_and_get_stopping_point_details(track)


    # Check if both slowing and stopping points are identified
    if slowing_point_index is None or stopping_point_index is None:
        return pd.Series({'Slowdown_Max_Speed': np.nan, 'Slowdown_Min_Speed': np.nan, 'Slowdown_Average_Speed': np.nan,
                          'Slowdown_Directionality': np.nan, 'Slowdown_FMI': np.nan, 'Slowdown_Total_Distance': np.nan,
                          'Slowdown_Euclidean_Distance': np.nan,
                          'Slowing_Point_X': np.nan, 'Slowing_Point_Y': np.nan, 'Slowing_Point_T': np.nan,
                          'Stopping_Point_X': np.nan, 'Stopping_Point_Y': np.nan, 'Stopping_Point_T': np.nan})

    # Subset the track from the slowing point onwards
    subset_track = track.loc[slowing_point_index:stopping_point_index]

    # Compute speed statistics and other parameters
    max_speed = subset_track['Speed'].max()
    min_speed = subset_track['Speed'].min()
    avg_speed = subset_track['Speed'].mean()
    directionality = calculate_directionality(subset_track)['Directionality']
    fmi = calculate_fmi(subset_track)['FMI']
    total_distance = compute_total_distance_from_slowing_point(subset_track)

    # Compute Euclidean distance from slowdown to track end
    start_point = subset_track.iloc[0][['POSITION_X', 'POSITION_Y']].to_numpy()
    end_point = subset_track.iloc[-1][['POSITION_X', 'POSITION_Y']].to_numpy()
    euclidean_distance = np.linalg.norm(end_point - start_point)

    # Add the stopping point details to the return series
    return pd.Series({'Slowdown_Max_Speed': max_speed, 'Slowdown_Min_Speed': min_speed, 'Slowdown_Average_Speed': avg_speed,
                      'Slowdown_Directionality': directionality, 'Slowdown_FMI': fmi, 'Slowdown_Total_Distance': total_distance,
                      'Slowdown_Euclidean_Distance': euclidean_distance,
                      'Slowing_Point_X': slowing_point_details['POSITION_X'],
                      'Slowing_Point_Y': slowing_point_details['POSITION_Y'],
                      'Slowing_Point_T': slowing_point_details['POSITION_T'],
                      'Stopping_Point_X': stopping_point_details['POSITION_X'],
                      'Stopping_Point_Y': stopping_point_details['POSITION_Y'],
                      'Stopping_Point_T': stopping_point_details['POSITION_T']})

# Apply the function to the grouped DataFrame
grouped_df = Filtered_merged_spots_df.groupby('Unique_ID')
filtered_arrested_tracks = grouped_df.apply(compute_parameters_from_slowing_point).reset_index()

# Save the new DataFrame
save_dataframe_with_progress(filtered_arrested_tracks, Results_Folder + '/' + 'Filtered_Arrested_Tracks.csv')

# Find overlapping columns and remove them from the original DataFrame
overlapping_columns = Filtered_merged_tracks_df.columns.intersection(filtered_arrested_tracks.columns).drop('Unique_ID')
Filtered_merged_tracks_df.drop(columns=overlapping_columns, inplace=True)

# Merge the new data (filtered_arrested_tracks) into the original DataFrame
Filtered_merged_tracks_df = pd.merge(Filtered_merged_tracks_df, filtered_arrested_tracks, on='Unique_ID', how='left')

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')

check_for_nans(Filtered_merged_tracks_df, "filtered_arrested_tracks")


In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# @title ##Plot examples
if not os.path.exists(Results_Folder+"/Track_speed"):
    os.makedirs(Results_Folder+"/Track_speed")  # Create Results_Folder if it doesn't exist

def plot_flow_arrested_tracks(tracks_df, spots_df, num_tracks=15, save_path='plots'):
    save_path = Results_Folder+"/Track_speed"

    arrested_track_ids = tracks_df['Unique_ID'].unique()

    plotted_tracks = 0
    for track_id in arrested_track_ids:
        if plotted_tracks >= num_tracks:
            break

        track = spots_df[spots_df['Unique_ID'] == track_id]
        if track.empty:
            continue
        track = track.sort_values(by='POSITION_T')

        # Get the recorded slowdown and stopping time from the tracks dataframe
        recorded_slowdown_time = tracks_df[tracks_df['Unique_ID'] == track_id]['Slowing_Point_T'].iloc[0]
        recorded_stopping_time = tracks_df[tracks_df['Unique_ID'] == track_id]['Stopping_Point_T'].iloc[0]

        if pd.isna(recorded_slowdown_time) or pd.isna(recorded_stopping_time):
            # Skip plotting if no slowdown or stopping time is recorded
            continue

        # Find the points in the track data corresponding to the slowdown and stopping time
        slowdown_point = track[track['POSITION_T'] == recorded_slowdown_time].iloc[0]
        stopping_point = track[track['POSITION_T'] == recorded_stopping_time].iloc[0]

        # Plotting
        plt.figure(figsize=(12, 6))
        plt.plot(track['POSITION_T'], track['Speed'], label=f'Track {track_id}', linestyle='-', marker=None)

        # Highlight the recorded slowdown point
        plt.scatter(slowdown_point['POSITION_T'], slowdown_point['Speed'], color='red', zorder=5)
        plt.text(slowdown_point['POSITION_T'], slowdown_point['Speed'], ' Slowdown', color='red')

        # Highlight the recorded stopping point
        plt.scatter(stopping_point['POSITION_T'], stopping_point['Speed'], color='blue', zorder=5)
        plt.text(stopping_point['POSITION_T'], stopping_point['Speed'], ' Stopping', color='blue')

        plt.xlabel('Time')
        plt.ylabel('Instantaneous Speed')
        plt.title(f'Instantaneous Speed Over Time for Track {track_id} (Flow Arrested)')
        plt.legend()

        # Save the plot as a PDF file
        plt.savefig(f'{save_path}/Track_{track_id}.pdf')
        plt.show()
        plt.close()  # Close the plot to free up memory

        plotted_tracks += 1

# Example usage
plot_flow_arrested_tracks(Filtered_merged_tracks_df, Filtered_merged_spots_df)


In [None]:
# @title ##Compute the distance between slowing down and arrest (old)


def compute_distance(track):
    diffs = np.sqrt(track['POSITION_X'].diff()**2 + track['POSITION_Y'].diff()**2)
    return diffs.sum()

def distance_when_slowing(track, slowdown_threshold=5):
    track = track.sort_values(by='POSITION_T')

    # Find the first point where speed decreases significantly
    slowing_point_candidates = track[track['Speed'] < slowdown_threshold]
    slowing_start = slowing_point_candidates.index.min() if not slowing_point_candidates.empty else None

    # Find the point where the object stops
    stopping_point = track['Speed'].idxmin() if not track['Speed'].empty else None

    if slowing_start is not None and stopping_point is not None and slowing_start in track.index and stopping_point in track.index:
        segment = track.loc[slowing_start:stopping_point]
        slowing_start_time = track.at[slowing_start, 'POSITION_T']
        return slowing_start_time, compute_distance(segment)
    else:
        return np.nan, np.nan  # Return NaN if either point is not found or invalid

# Calculate distances and times
distances_times = Filtered_merged_spots_df.groupby('Unique_ID').apply(lambda x: distance_when_slowing(x))

# Prepare DataFrame for merge
distances_times_df = distances_times.apply(pd.Series).reset_index()
distances_times_df.columns = ['Unique_ID', 'Slowing_Down_Time', 'Travelled_Distance']

# Merging Process
overlapping_columns = Filtered_merged_tracks_df.columns.intersection(distances_times_df.columns).drop('Unique_ID')
Filtered_merged_tracks_df.drop(columns=overlapping_columns, inplace=True)
Filtered_merged_tracks_df = pd.merge(Filtered_merged_tracks_df, distances_times_df, on='Unique_ID', how='left')

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')



In [None]:
# @title #Extract distance and speed at landing, arrest and end

from tqdm.notebook import tqdm

def get_distances_and_speeds(track_df, spots_df):
    results = []

    # Wrapping the main loop with tqdm for progress tracking
    for _, track in tqdm(track_df.iterrows(), total=track_df.shape[0], desc="Processing Tracks"):
        unique_id = track['Unique_ID']
        slowing_down_time = track['Slowing_Point_T']
        stopping_time = track['Stopping_Point_T']  # New line to fetch stopping time

        # Filter spots for this track
        track_spots = spots_df[spots_df['Unique_ID'] == unique_id]

        # Function to get values
        def get_spot_values_at_time(spot_df, time):
            spot = spot_df[spot_df['POSITION_T'] == time]
            if not spot.empty:
                return {
                    'distance_to_nuclei': spot['DistanceToNuclei'].iloc[0],
                    'distance_to_junctions': spot['DistanceToJunctions'].iloc[0],
                    'speed': spot['Speed'].iloc[0]
                }
            else:
                return {
                    'distance_to_nuclei': np.nan,
                    'distance_to_junctions': np.nan,
                    'speed': np.nan
                }

        # Get distances and speed at the slowing down time
        slowing_values = get_spot_values_at_time(track_spots, slowing_down_time)

        # Get distances and speed at the stopping time instead of the end
        stopping_values = get_spot_values_at_time(track_spots, stopping_time)

        # Append results
        results.append({
            'Unique_ID': unique_id,
            'DistanceToNuclei_Slowing': slowing_values['distance_to_nuclei'],
            'DistanceToJunctions_Slowing': slowing_values['distance_to_junctions'],
            'Speed_Slowing': slowing_values['speed'],
            'DistanceToNuclei_Stopping': stopping_values['distance_to_nuclei'],  # Changed from 'End' to 'Stopping'
            'DistanceToJunctions_Stopping': stopping_values['distance_to_junctions'],  # Changed from 'End' to 'Stopping'
            'Speed_Stopping': stopping_values['speed']  # Changed from 'End' to 'Stopping'
        })

    return pd.DataFrame(results)

# Usage
distances_and_speeds_df = get_distances_and_speeds(Filtered_merged_tracks_df, Filtered_merged_spots_df)

# Merging Process
overlapping_columns = Filtered_merged_tracks_df.columns.intersection(distances_and_speeds_df.columns).drop('Unique_ID')
Filtered_merged_tracks_df.drop(columns=overlapping_columns, inplace=True)
Filtered_merged_tracks_df = pd.merge(Filtered_merged_tracks_df, distances_and_speeds_df, on='Unique_ID', how='left')

# Save the updated DataFrame
save_dataframe_with_progress(Filtered_merged_tracks_df, Results_Folder + '/' + 'Filtered_Merged_Tracks.csv')

