# **PDAC CellTracksColab - Track Clustering**
---

<font size = 4>This notebook is part of the **CellTracksColab** suite, specifically adapted to analyze tracking data highlighted in the manuscript titled "Quantitative analysis of pancreatic cancer cell attachment to endothelial cells." The CellTracksColab project aims to provide comprehensive tools for cell tracking data analysis, facilitating the exploration and quantification of cellular behaviors.


<font size = 4>Access the CellTracksColab project resources through the GitHub repository: [CellMigrationLab/CellTracksColab](https://github.com/CellMigrationLab/CellTracksColab).

<font size = 4>This notebook focuses on the assessment of spatial clustering of arrested circulating cells using a modified version of Ripley's L Function.

- **Spatial Clustering Analysis:** Learn how to perform advanced spatial clustering analyses using this notebook. Detailed instructions and examples guide you through the process, tailored specifically for cell tracking data.

- **Spatial Clustering Analyses:** For a comprehensive guide on performing spatial clustering analysis with CellTracksColab, visit the project's [Spatial Clustering analyses wiki page](https://github.com/CellMigrationLab/CellTracksColab/wiki/Spatial-Clustering-analyses).


**Notebook Creation:** This notebook was created by [Guillaume Jacquemet](https://cellmig.org/)


In [None]:
# @title #MIT License

print("""
**MIT License**

Copyright (c) 2023 Guillaume Jacquemet

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.""")

--------------------------------------------------------
# **Part 1: Prepare the session and load your data**
--------------------------------------------------------


## **1.1. Install key dependencies**
---
<font size = 4>

In [None]:
#@markdown ##Play to install
!pip -q install pandas scikit-learn
!pip -q install hdbscan
!pip -q install umap-learn
!pip -q install plotly
!pip -q install tqdm

!git clone https://github.com/CellMigrationLab/CellTracksColab.git


import ipywidgets as widgets
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
import itertools
from matplotlib.gridspec import GridSpec
import requests

import os
import pandas as pd
import seaborn as sns
import numpy as np
import sys
import matplotlib.colors as mcolors
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import itertools
import requests
import ipywidgets as widgets
import warnings
import scipy.stats as stats

from matplotlib.backends.backend_pdf import PdfPages
from matplotlib.gridspec import GridSpec
from ipywidgets import Dropdown, interact,Layout, VBox, Button, Accordion, SelectMultiple, IntText
from tqdm.notebook import tqdm
from IPython.display import display, clear_output
from scipy.spatial import ConvexHull
from scipy.spatial.distance import cosine, pdist
from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.metrics import pairwise_distances
from scipy.stats import zscore, ks_2samp
from sklearn.preprocessing import MinMaxScaler
from multiprocessing import Pool
from matplotlib.ticker import FixedLocator
from matplotlib.ticker import FuncFormatter
from matplotlib.colors import LogNorm
sys.path.append("../")
sys.path.append("CellTracksColab/")

import celltracks
from celltracks import *
from celltracks.Track_Plots import *
from celltracks.BoxPlots_Statistics import *
from celltracks.Track_Metrics import *


def save_dataframe_with_progress(df, path, desc="Saving", chunk_size=500000):
    """Save a DataFrame with a progress bar and gzip compression."""

    # Estimating the number of chunks based on the provided chunk size
    num_chunks = int(len(df) / chunk_size) + 1

    # Create a tqdm instance for progress tracking
    with tqdm(total=len(df), unit="rows", desc=desc) as pbar:
        # Open the file for writing with gzip compression
        with gzip.open(path, "wt") as f:
            # Write the header once at the beginning
            df.head(0).to_csv(f, index=False)

            for chunk in np.array_split(df, num_chunks):
                chunk.to_csv(f, mode="a", header=False, index=False)
                pbar.update(len(chunk))

## **1.2. Mount your Google Drive**
---
<font size = 4> To use this notebook on the data present in your Google Drive, you need to mount your Google Drive to this notebook.

<font size = 4> Play the cell below to mount your Google Drive and follow the instructions.

<font size = 4> Once this is done, your data are available in the **Files** tab on the top left of notebook.

In [None]:
#@markdown ##Play the cell to connect your Google Drive to Colab

from google.colab import drive
drive.mount('/content/gdrive/')



## **1.3. Compile your data or load existing dataframes**
---



In [None]:
#@markdown ##Provide the path to the dataset:


import os
import re
import glob
import pandas as pd
from tqdm.notebook import tqdm
import numpy as np
import requests
import zipfile

#@markdown ###You have existing dataframes, provide the path to your:

Track_table = ''  # @param {type: "string"}
Spot_table = ''  # @param {type: "string"}

#@markdown ###Provide the path to your Result folder

Results_Folder = ""  # @param {type: "string"}

if not Results_Folder:
    Results_Folder = '/content/Results'  # Default Results_Folder path if not defined

if not os.path.exists(Results_Folder):
    os.makedirs(Results_Folder)  # Create Results_Folder if it doesn't exist

# Print the location of the result folder
print(f"Result folder is located at: {Results_Folder}")

def validate_tracks_df(df):
    """Validate the tracks dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in tracks dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def validate_spots_df(df):
    """Validate the spots dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID', 'POSITION_X', 'POSITION_Y', 'POSITION_T']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in spots dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def check_unique_id_match(df1, df2):
    df1_ids = set(df1['Unique_ID'])
    df2_ids = set(df2['Unique_ID'])

    # Check if the IDs in the two dataframes match
    if df1_ids == df2_ids:
        print("The Unique_ID values in both dataframes match perfectly!")
    else:
        missing_in_df1 = df2_ids - df1_ids
        missing_in_df2 = df1_ids - df2_ids

        if missing_in_df1:
            print(f"There are {len(missing_in_df1)} Unique_ID values present in the second dataframe but missing in the first.")
            print("Examples of these IDs are:", list(missing_in_df1)[:5])

        if missing_in_df2:
            print(f"There are {len(missing_in_df2)} Unique_ID values present in the first dataframe but missing in the second.")
            print("Examples of these IDs are:", list(missing_in_df2)[:5])

# For existing dataframes
if Track_table:
    print("Loading track table file....")
    merged_tracks_df = pd.read_csv(Track_table, low_memory=False)
    if not validate_tracks_df(merged_tracks_df):
        print("Error: Validation failed for loaded tracks dataframe.")

if Spot_table:
    print("Loading spot table file....")
    merged_spots_df = pd.read_csv(Spot_table, low_memory=False)
    if not validate_spots_df(merged_spots_df):
        print("Error: Validation failed for loaded spots dataframe.")

check_for_nans(merged_spots_df, "merged_spots_df")
check_for_nans(merged_tracks_df, "merged_tracks_df")


In [None]:
#@markdown ##Check Metadata


# Define the metadata columns that are expected to have identical values for each filename
metadata_columns = ['Cells', 'Flow_speed', 'Treatment', 'Condition', 'experiment_nb', 'Repeat']

# Group the DataFrame by 'File_name' and then check if all entries within each group are identical
consistent_metadata = True
for name, group in merged_tracks_df.groupby('File_name'):
    for col in metadata_columns:
        if not group[col].nunique() == 1:
            consistent_metadata = False
            print(f"Inconsistency found for file: {name} in column: {col}")
            break  # Stop checking other columns for this group and move to the next file
    if not consistent_metadata:
        break  # Stop the entire process if any inconsistency is found

if consistent_metadata:
    print("All files have consistent metadata across the specified columns.")
else:
    print("There are inconsistencies in the metadata. Please check the output for details.")

# Drop duplicates based on the 'File_name' to get a unique list of filenames and their metadata
unique_files_df = merged_tracks_df.drop_duplicates(subset=['File_name'])[['File_name', 'Cells', 'Flow_speed', 'Treatment', 'Condition', 'experiment_nb', 'Repeat']]

# Reset the index to clean up the DataFrame
unique_files_df.reset_index(drop=True, inplace=True)

# Display the resulting DataFrame in a nicely formatted HTML table
unique_files_df

import pandas as pd

# Assuming 'df' is your DataFrame and it already contains 'Conditions' and 'Repeats' columns.

# Group by 'Conditions' and 'Repeats' and count the occurrences
grouped = unique_files_df.groupby(['Condition', 'Repeat']).size().reset_index(name='counts')

# Check if any combinations have a count greater than 1, which means they are not unique
non_unique_combinations = grouped[grouped['counts'] > 1]

# Print the non-unique combinations
if not non_unique_combinations.empty:
    print("There are non-unique combinations of Conditions and Repeats:")
    print(non_unique_combinations)
else:
    print("All combinations of Conditions and Repeats are unique.")

check_unique_id_match(merged_spots_df, merged_tracks_df)


# Group the DataFrame by 'Cells', 'ILbeta', 'Repeat' and then check if there are 4 unique 'Flow_speed' values for each group
consistent_flow_speeds = True
for (cells, ilbeta, repeat), group in merged_tracks_df.groupby(['Cells', 'Treatment', 'Repeat']):
    if group['Flow_speed'].nunique() != 4:
        consistent_flow_speeds = False
        print(f"Inconsistency found for Cells: {cells}, Treatment: {Treatment_conditions}, Repeat: {repeat} - Expected 4 Flow_speeds, found {group['Flow_speed'].nunique()}")
        break  # Stop the entire process if any inconsistency is found

if consistent_flow_speeds:
    print("Each combination of 'Cells', 'Treatment', 'Repeat' has exactly 4 different 'Flow_speed' values.")
else:
    print("There are inconsistencies in 'Flow_speed' values. Please check the output for details.")


unique_cells = unique_files_df['Cells'].unique()
unique_flow_speeds = unique_files_df['Flow_speed'].unique()
unique_Treatment = unique_files_df['Treatment'].unique()
unique_conditions = unique_files_df['Condition'].unique()

print("Unique Cells:", unique_cells)
print("Unique Flow Speeds:", unique_flow_speeds)
print("Unique Treatment:", unique_Treatment)
print("Unique Conditions:", unique_conditions)


## **1.4. Filter tracks shorter than 50 spots**


In [None]:
# @title ##Filter tracks shorter than 50 spots


merged_tracks_df = merged_tracks_df[merged_tracks_df['NUMBER_SPOTS'] >= 50]
merged_spots_df = merged_spots_df[merged_spots_df['Unique_ID'].isin(merged_tracks_df['Unique_ID'])]


## **1.5. Visualise your tracks**
---

In [None]:
# @title ##Run the cell and choose the file you want to inspect

import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt

if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = merged_spots_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrame based on the selected filename
        filtered_df = merged_spots_df[merged_spots_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
            plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=2)

        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")

# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


--------------------------------------------------------
# **Part 2: Assess spatial clustering using Ripley's L function**
--------------------------------------------------------

<font size = 4>**Spatial Clustering Analyses:** For a comprehensive guide on performing spatial clustering analysis with CellTracksColab, visit the project's [Spatial Clustering analyses wiki page](https://github.com/CellMigrationLab/CellTracksColab/wiki/Spatial-Clustering-analyses).


## **2.1. Filter tracks with Track_MIN_speed**

<font size = 4>This section enables to filter the dataset so that we only keep arresting tracks.





In [None]:
# @title ##Filter tracks using Min Speed


merged_tracks_df = merged_tracks_df[merged_tracks_df['Min Speed'] <= 5]
merged_spots_df = merged_spots_df[merged_spots_df['Unique_ID'].isin(merged_tracks_df['Unique_ID'])]

## **2.2. Visualise where cells slow down in each tracks**


In [None]:
# @title ##Run the cell and choose the file you want to inspect to visualise track and choosen point


import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt
import os

if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = merged_spots_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

# User-defined speed threshold
speed_threshold = 3  # Replace with the desired threshold value

def find_point_below_threshold(track):
    below_threshold = track[track['Speed'] < speed_threshold]
    return below_threshold.iloc[0] if not below_threshold.empty else None

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrame based on the selected filename
        filtered_df = merged_spots_df[merged_spots_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
            plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=2)

            # Find and mark the slowdown point
            slowdown_point = find_point_below_threshold(unique_df)
            if slowdown_point is not None:
                plt.scatter(slowdown_point['POSITION_X'], slowdown_point['POSITION_Y'], color='red', s=50)
            #else:
                #print(f"No slowdown point found for track {unique_id}")
        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")


# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


## **2.3 Identify the coordinates where cells slow down**

<font size = 4>Here we identify the where circulating cells slow down on the endothelial monolayer to identify possible hotspots.


In [None]:
# @title ##Run the cell to identify the coordinates to use for the clustering analysis


import matplotlib.pyplot as plt
import os

if not os.path.exists(Results_Folder + "/Slow_down_coordinates"):
    os.makedirs(Results_Folder + "/Slow_down_coordinates")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = merged_spots_df['File_name'].unique()

# User-defined speed threshold
speed_threshold = 5  # Replace with the desired threshold value

def find_point_below_threshold(track):
    below_threshold = track[track['Speed'] < speed_threshold]
    return below_threshold.iloc[0] if not below_threshold.empty else None

def plot_slowdown_points(filename):
    if filename:
        # Filter the DataFrame based on the filename
        filtered_df = merged_spots_df[merged_spots_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')

            # Find and mark the slowdown point
            slowdown_point = find_point_below_threshold(unique_df)
            if slowdown_point is not None:
                plt.scatter(slowdown_point['POSITION_X'], slowdown_point['POSITION_Y'], color='red', s=50, label=f'Track {unique_id}')

        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Slowdown Points for {filename}')
        plt.savefig(f"{Results_Folder}/Slow_down_coordinates/Slowdown_Points_{filename}.pdf")
        plt.close()
    else:
        print("No valid filename selected")

# Loop through each file and generate the plot for slowdown points
for filename in filenames:
    plot_slowdown_points(filename)



## **2.4 Compute the Ripley's L function for each FOV**


In [None]:
# @title ##Compute Ripley's L function for each FOV

# User-defined speed threshold
speed_threshold = 5

# Check and create necessary directories
if not os.path.exists(f"{Results_Folder}/Track_Clustering"):
    os.makedirs(f"{Results_Folder}/Track_Clustering")

import numpy as np
import pandas as pd
from scipy.spatial import distance_matrix
import matplotlib.pyplot as plt

# Define Ripley's K function
def ripley_k(points, r, area):
    n = len(points)
    d_matrix = distance_matrix(points, points)
    sum_indicator = np.sum(d_matrix < r) - n  # Subtract n to exclude self-pairs

    K_r = (area / (n ** 2)) * sum_indicator

    # Check if K_r is negative and print relevant information
    if K_r < 0:
        print("Negative K_r encountered!")
        print("Distance matrix:", d_matrix)
        print("Sum indicator:", sum_indicator)
        print("Area:", area, "Number of points:", n, "Distance threshold r:", r)

    return K_r


# Define Ripley's L function

def ripley_l(points, r, area):
    K_r = ripley_k(points, r, area)
    # Check if K_r has negative values
    if np.any(K_r < 0):
        print("Warning: Negative value encountered in K_r")

    L_r = np.sqrt(K_r / np.pi) - r
    return L_r

def find_point_below_threshold(track):
  below_threshold = track[track['Speed'] < speed_threshold]
  if not below_threshold.empty:
    return below_threshold.iloc[0][['POSITION_X', 'POSITION_Y']]
  return pd.Series([np.nan, np.nan], index=['POSITION_X', 'POSITION_Y'])

# Define area based on your dataset's extent
area = (merged_spots_df['POSITION_X'].max() - merged_spots_df['POSITION_X'].min()) * \
       (merged_spots_df['POSITION_Y'].max() - merged_spots_df['POSITION_Y'].min())

# Define r values
r_values = np.linspace(1, 250, 250)  # Adjust as needed

# Compute Ripley's L function for each FOV
l_values_per_fov_slow = {}
for file_name, group in tqdm(merged_spots_df.groupby('File_name'), desc="Processing FOVs"):
    # Sort each track by POSITION_T
    group = group.sort_values(by=['TRACK_ID', 'POSITION_T'])

    representative_points = group.groupby('TRACK_ID').apply(find_point_below_threshold).dropna()
    if not representative_points.empty:
        l_values = [ripley_l(representative_points.values, r, area) for r in tqdm(r_values, desc=f"Calculating L for {file_name}")]
        l_values_per_fov_slow[file_name] = l_values


## **2.5 Compute Monte Carlo simulations for each FOV**

In [None]:
from tqdm.notebook import tqdm

# @title ##Compute Monte Carlo simulations for each FOV


# Simulate random points for Monte Carlo simulations
def simulate_random_points(num_points, x_range, y_range):
    x_coords = np.random.uniform(x_range[0], x_range[1], num_points)
    y_coords = np.random.uniform(y_range[0], y_range[1], num_points)
    return np.column_stack((x_coords, y_coords))

# Initialize simulated_l_values as an empty dictionary
simulated_l_values_dict_slow = {}

# Perform Monte Carlo simulations for significance testing
confidence_envelopes_slow = {}
for file_name, group in tqdm(merged_spots_df.groupby('File_name'), desc='Processing FOVs'):

    group = group.sort_values(by=['TRACK_ID', 'POSITION_T'])
    representative_points = group.groupby('TRACK_ID').apply(find_point_below_threshold).dropna()

    simulations = [simulate_random_points(len(representative_points),
                                          (merged_spots_df['POSITION_X'].min(), merged_spots_df['POSITION_X'].max()),
                                          (merged_spots_df['POSITION_Y'].min(), merged_spots_df['POSITION_Y'].max()))
                   for _ in tqdm(range(10), desc=f'Simulating for {file_name}', leave=False)]

    simulated_l_values = [[ripley_l(points, r, area) for r in r_values] for points in simulations]
    simulated_l_values_dict_slow[file_name] = simulated_l_values  # Store the simulated values in the dictionary

    lower_bound = np.percentile(simulated_l_values, 2.5, axis=0)
    upper_bound = np.percentile(simulated_l_values, 97.5, axis=0)
    confidence_envelopes_slow[file_name] = (lower_bound, upper_bound)



## **2.6 Plot the results for each FOV**

In [None]:
# @title ##Plots for each FOV - Slow down

import os
import matplotlib.pyplot as plt

# Visualization of Ripley's L function with confidence envelopes
for file_name, l_values in l_values_per_fov_slow.items():
    # Retrieve the confidence envelope for the current file
    lower_bound, upper_bound = confidence_envelopes_slow.get(file_name, (None, None))

    # Only proceed if the confidence envelope exists
    if lower_bound is not None and upper_bound is not None:
        plt.figure(figsize=(10, 6))
        plt.plot(r_values, l_values, label=f'L(r) for {file_name}')
        plt.fill_between(r_values, lower_bound, upper_bound, color='gray', alpha=0.5)
        plt.xlabel('Radius (r)')
        plt.ylabel("Ripley's L Function")
        plt.title(f"Ripley's L Function - {file_name}")
        plt.legend()
        plt.grid(True)

        # Save the plot as a PDF in the specified folder
        pdf_path = os.path.join(f"{Results_Folder}/Track_Clustering/{file_name}.pdf")
        plt.savefig(pdf_path,bbox_inches='tight')
        plt.show()
        plt.close()  # Close the plot to free memory
    else:
        print(f"No confidence envelope data available for {file_name}")


## **2.7 Define a specific radius and save as dataframe**

This is performed to compare FOV and conditions

In [None]:
# @title ##Define a specific radius and save as dataframe - Slow down


# Define the specific radius for comparison
specific_radius = 50  # Replace with your chosen radius

# Extract L values at the specific radius
specific_radius_index = np.argmin(np.abs(r_values - specific_radius))  # Find the index of the closest radius value
l_values_at_specific_radius_slow = {fov: l_values[specific_radius_index] for fov, l_values in l_values_per_fov_slow.items()}

# Plotting
plt.figure(figsize=(12, 6))
plt.bar(l_values_at_specific_radius_slow.keys(), l_values_at_specific_radius_slow.values())
plt.xlabel('Field of View')
plt.ylabel(f"Ripley's L at radius {specific_radius}")
plt.title(f"Comparison of Ripley's L Function at Radius {specific_radius} Across Different FOVs")
plt.xticks(rotation=45)
# Save the plot as a PDF in the specified folder
pdf_path = os.path.join(f"{Results_Folder}/Track_Clustering/l_values_at_specific_radius_slow.pdf")
plt.savefig(pdf_path, bbox_inches='tight')

plt.show()


# Create DataFrame with confidence envelopes, median, and L values at the specific radius
rows = []
for fov, (lower_bound, upper_bound) in confidence_envelopes_slow.items():
    l_value = l_values_per_fov_slow[fov][specific_radius_index]
    lower = lower_bound[specific_radius_index]
    upper = upper_bound[specific_radius_index]

    # Retrieve simulated L values for the FOV
    simulated_l_values_for_fov_slow = simulated_l_values_dict_slow.get(fov, [])

    # Calculate median if simulated L values are available for the FOV
    if simulated_l_values_for_fov_slow:
        median_vals = [l_vals[specific_radius_index] for l_vals in simulated_l_values_for_fov_slow]
        median = np.median(median_vals) if median_vals else np.nan
    else:
        median = np.nan

    rows.append([fov, l_value, lower, upper, median])

confidence_df = pd.DataFrame(rows, columns=['File_name', 'Ripley_L_at_Specific_Radius_slow', 'Lower_Bound_slow', 'Upper_Bound_slow', 'Median_slow'])

# Merge with additional information
additional_info_df = merged_tracks_df[['File_name', 'Cells', 'Flow_speed', 'Treatment', 'Condition', 'experiment_nb', 'Repeat']].drop_duplicates('File_name')
merged_df = pd.merge(confidence_df, additional_info_df, left_on='File_name', right_on='File_name')

# Save the merged DataFrame to a CSV file
merged_df.to_csv(f"{Results_Folder}/Track_Clustering/ripleys_l_values.csv", index=False)


## **2.8 Ripley's L Values Across conditions and cells**


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# @title ##Comparison of Ripley\'s L Values Across Conditions

# Convert 'Condition' to string if it's not already
merged_df['Condition'] = merged_df['Condition'].astype(str)

# Create the box plot
plt.figure(figsize=(12, 8))
sns.boxplot(data=merged_df, x='Condition', y='Ripley_L_at_Specific_Radius_slow')

# Overlay the Monte Carlo simulation results
for condition in merged_df['Condition'].unique():
    condition_data = merged_df[merged_df['Condition'] == condition]

    # Plot median values
    medians = condition_data['Median_slow']
    plt.scatter([condition] * len(medians), medians, color='red', alpha=0.5)  # Median

    # Handle NaN values and calculate mean and error only for non-NaN values
    valid_data = condition_data.dropna(subset=['Median_slow', 'Lower_Bound_slow', 'Upper_Bound_slow'])
    if not valid_data.empty:
        median_mean = valid_data['Median_slow'].mean()
        lower_mean = valid_data['Lower_Bound_slow'].mean()
        upper_mean = valid_data['Upper_Bound_slow'].mean()
        yerr = [[median_mean - lower_mean], [upper_mean - median_mean]]

        # Check if yerr contains valid data before plotting
        if not any(np.isnan(yerr)):
            plt.errorbar(condition, median_mean, yerr=yerr, fmt='o', color='black', alpha=0.5)  # Confidence interval

# Add labels and title
plt.xlabel('Condition')
plt.ylabel('Ripley\'s L at Specific Radius')
plt.title('Comparison of Ripley\'s L Values Across Conditions with Monte Carlo Simulation Results')
plt.xticks(rotation=45)
plt.grid(True)

# Save the figure before showing it
pdf_path = os.path.join(f"{Results_Folder}/Track_Clustering/l_values_Conditions_slow.pdf")
plt.savefig(pdf_path, bbox_inches='tight')

# Show the plot
plt.show()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# @title ##Comparison of Ripley\'s L Values Across Cells

# Convert 'Condition' to string if it's not already
merged_df['Cells'] = merged_df['Cells'].astype(str)

# Create the box plot
plt.figure(figsize=(12, 8))
sns.boxplot(data=merged_df, x='Cells', y='Ripley_L_at_Specific_Radius_slow')

# Overlay the Monte Carlo simulation results
for condition in merged_df['Cells'].unique():
    condition_data = merged_df[merged_df['Cells'] == condition]

    # Plot median values
    medians = condition_data['Median_slow']
    plt.scatter([condition] * len(medians), medians, color='red', alpha=0.5)  # Median

    # Handle NaN values and calculate mean and error only for non-NaN values
    valid_data = condition_data.dropna(subset=['Median_slow', 'Lower_Bound_slow', 'Upper_Bound_slow'])
    if not valid_data.empty:
        median_mean = valid_data['Median_slow'].mean()
        lower_mean = valid_data['Lower_Bound_slow'].mean()
        upper_mean = valid_data['Upper_Bound_slow'].mean()
        yerr = [[median_mean - lower_mean], [upper_mean - median_mean]]

        # Check if yerr contains valid data before plotting
        if not any(np.isnan(yerr)):
            plt.errorbar(condition, median_mean, yerr=yerr, fmt='o', color='black', alpha=0.5)  # Confidence interval

# Add labels and title
plt.xlabel('Condition')
plt.ylabel('Ripley\'s L at Specific Radius')
plt.title('Comparison of Ripley\'s L Values Across Conditions with Monte Carlo Simulation Results')
plt.xticks(rotation=45)
plt.grid(True)

# Save the figure before showing it
pdf_path = os.path.join(f"{Results_Folder}/Track_Clustering/l_values_Cells_slow.pdf")
plt.savefig(pdf_path, bbox_inches='tight')

# Show the plot
plt.show()


In [None]:
# @title ##Comparison of Ripley\'s L Values Across Cells and Treatment


import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

# Convert 'Cells' and 'Treatment' to string if they are not already
merged_df['Cells'] = merged_df['Cells'].astype(str)
merged_df['Treatment'] = merged_df['Treatment'].astype(str)

# Create a combined factor for Cells and Silencing
merged_df['Cells_Treatment'] = merged_df['Cells'] + "_" + merged_df['Treatment']

# Create the box plot
plt.figure(figsize=(14, 8))
sns.boxplot(data=merged_df, x='Cells_Treatment', y='Ripley_L_at_Specific_Radius_slow')

# Overlay the Monte Carlo simulation results
for condition in merged_df['Cells_Treatment'].unique():
    condition_data = merged_df[merged_df['Cells_Treatment'] == condition]

    # Plot median values
    medians = condition_data['Median_slow']
    plt.scatter([condition] * len(medians), medians, color='red', alpha=0.5)  # Median

    # Handle NaN values and calculate mean and error only for non-NaN values
    valid_data = condition_data.dropna(subset=['Median_slow', 'Lower_Bound_slow', 'Upper_Bound_slow'])
    if not valid_data.empty:
        median_mean = valid_data['Median_slow'].mean()
        lower_mean = valid_data['Lower_Bound_slow'].mean()
        upper_mean = valid_data['Upper_Bound_slow'].mean()
        yerr = [[median_mean - lower_mean], [upper_mean - median_mean]]

        # Check if yerr contains valid data before plotting
        if not any(np.isnan(yerr)):
            plt.errorbar(condition, median_mean, yerr=yerr, fmt='o', color='black', alpha=0.5)  # Confidence interval

# Add labels and title
plt.xlabel('Cells and Treatment')
plt.ylabel('Ripley\'s L at Specific Radius')
plt.title('Comparison of Ripley\'s L Values Across Cells and Silencing Conditions with Monte Carlo Simulation Results')
plt.xticks(rotation=45)
plt.grid(True)

# Save the figure before showing it
pdf_path = os.path.join(f"{Results_Folder}/Track_Clustering/l_values_Cells_Treatment_slow.pdf")
plt.savefig(pdf_path, bbox_inches='tight')

# Show the plot
plt.show()
