# **PDAC CellTracksColab - Count arrested tracks**
---

<font size = 4>The CellTracksColab notebooks have been adapted to analyze tracking data featured in the manuscript "Quantitative analysis of pancreatic cancer cell attachment to endothelial cells." This suite of notebooks is part of the CellTracksColab project, designed to facilitate comprehensive analyses of cell tracking data. The project's resources are accessible via the GitHub repository provided below.

<font size = 4>The CellTracksColab project repository: [CellMigrationLab/CellTracksColab](https://github.com/CellMigrationLab/CellTracksColab).

<font size = 4>This notebook is designed to process dataframes generated by the General Notebook within the CellTracksColab framework. It focuses on analyzing cell tracking data by identifying and counting the number of tracks that exhibit arrest behavior at each time point throughout the observation period. Key functionalities of this notebook include:

- **Loading CellTracksColab Dataframes:** It starts by importing the comprehensive dataframes prepared by the General Notebook, ensuring a seamless transition from data compilation to detailed analysis.

- **Counting Arrested Tracks:** The notebook identifies tracks where cells have ceased to move, categorizing them as 'arrested'. This allows for a quantitative analysis of cell arrest dynamics over time.

- **Computing Attachment Rates:** Beyond counting arrested tracks, it calculates the rate of cell attachment over the observation period. This metric is crucial for understanding how PDAC cells interact with endothelial cells over time.

- **Generating Attachment Plots:** Visual representation of attachment rates and arrested cell counts are produced, offering intuitive insights into the temporal dynamics of cell behavior.

<font size = 4>Notebook created by [Guillaume Jacquemet](https://cellmig.org/)




In [None]:
# @title #MIT License

print("""
**MIT License**

Copyright (c) 2023 Guillaume Jacquemet

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.""")

--------------------------------------------------------
# **Part 1: Prepare the session and load your data**
--------------------------------------------------------


## **1.1. Install key dependencies**
---
<font size = 4>

In [None]:
#@markdown ##Play to install
!pip -q install pandas scikit-learn
!pip -q install plotly
!pip -q install tqdm

import ipywidgets as widgets
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
import itertools
from matplotlib.gridspec import GridSpec
import requests


#----------------------- Key functions -----------------------------#

# Function to calculate Cohen's d
def cohen_d(group1, group2):
    diff = group1.mean() - group2.mean()
    n1, n2 = len(group1), len(group2)
    var1 = group1.var()
    var2 = group2.var()
    pooled_var = ((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2)
    d = diff / np.sqrt(pooled_var)
    return d

import requests


def save_dataframe_with_progress(df, path, desc="Saving", chunk_size=50000):
    """Save a DataFrame with a progress bar."""

    # Estimating the number of chunks based on the provided chunk size
    num_chunks = int(len(df) / chunk_size) + 1

    # Create a tqdm instance for progress tracking
    with tqdm(total=len(df), unit="rows", desc=desc) as pbar:
        # Open the file for writing
        with open(path, "w") as f:
            # Write the header once at the beginning
            df.head(0).to_csv(f, index=False)

            for chunk in np.array_split(df, num_chunks):
                chunk.to_csv(f, mode="a", header=False, index=False)
                pbar.update(len(chunk))


def check_for_nans(df, df_name):
    """
    Checks the given DataFrame for NaN values and prints the count for each column containing NaNs.

    Args:
    df (pd.DataFrame): DataFrame to be checked for NaN values.
    df_name (str): The name of the DataFrame as a string, used for printing.
    """
    # Check if the DataFrame has any NaN values and print a warning if it does.
    nan_columns = df.columns[df.isna().any()].tolist()

    if nan_columns:
        for col in nan_columns:
            nan_count = df[col].isna().sum()
            print(f"Column '{col}' in {df_name} contains {nan_count} NaN values.")
    else:
        print(f"No NaN values found in {df_name}.")




## **1.2. Mount your Google Drive**
---
<font size = 4> To use this notebook on the data present in your Google Drive, you need to mount your Google Drive to this notebook.

<font size = 4> Play the cell below to mount your Google Drive and follow the instructions.

<font size = 4> Once this is done, your data are available in the **Files** tab on the top left of notebook.

In [None]:
#@markdown ##Play the cell to connect your Google Drive to Colab

from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive



## **1.3. Compile your data or load existing dataframes**
---

<font size = 4> Please ensure that your data is properly organised (see above)


In [None]:
#@markdown ##Provide the path to your dataset (chunk):


import os
import re
import glob
import pandas as pd
from tqdm.notebook import tqdm
import numpy as np
import requests
import zipfile

#@markdown ###You have existing dataframes, provide the path to your:

Track_table = ''  # @param {type: "string"}
Spot_table = ''  # @param {type: "string"}

#@markdown ###Provide the path to your Result folder

Results_Folder = ""  # @param {type: "string"}

if not Results_Folder:
    Results_Folder = '/content/Results'  # Default Results_Folder path if not defined

if not os.path.exists(Results_Folder):
    os.makedirs(Results_Folder)  # Create Results_Folder if it doesn't exist

# Print the location of the result folder
print(f"Result folder is located at: {Results_Folder}")

def validate_tracks_df(df):
    """Validate the tracks dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in tracks dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def validate_spots_df(df):
    """Validate the spots dataframe for necessary columns and data types."""
    required_columns = ['TRACK_ID', 'POSITION_X', 'POSITION_Y', 'POSITION_T']
    for col in required_columns:
        if col not in df.columns:
            print(f"Error: Column '{col}' missing in spots dataframe.")
            return False

    # Additional data type checks or value ranges can be added here
    return True

def check_unique_id_match(df1, df2):
    df1_ids = set(df1['Unique_ID'])
    df2_ids = set(df2['Unique_ID'])

    # Check if the IDs in the two dataframes match
    if df1_ids == df2_ids:
        print("The Unique_ID values in both dataframes match perfectly!")
    else:
        missing_in_df1 = df2_ids - df1_ids
        missing_in_df2 = df1_ids - df2_ids

        if missing_in_df1:
            print(f"There are {len(missing_in_df1)} Unique_ID values present in the second dataframe but missing in the first.")
            print("Examples of these IDs are:", list(missing_in_df1)[:5])

        if missing_in_df2:
            print(f"There are {len(missing_in_df2)} Unique_ID values present in the first dataframe but missing in the second.")
            print("Examples of these IDs are:", list(missing_in_df2)[:5])

# For existing dataframes
if Track_table:
    print("Loading track table file....")
    merged_tracks_df = pd.read_csv(Track_table, low_memory=False)
    if not validate_tracks_df(merged_tracks_df):
        print("Error: Validation failed for loaded tracks dataframe.")

if Spot_table:
    print("Loading spot table file....")
    merged_spots_df = pd.read_csv(Spot_table, low_memory=False)
    if not validate_spots_df(merged_spots_df):
        print("Error: Validation failed for loaded spots dataframe.")

check_for_nans(merged_spots_df, "merged_spots_df")
check_for_nans(merged_tracks_df, "merged_tracks_df")


In [None]:
#@markdown ##Check Metadata


# Define the metadata columns that are expected to have identical values for each filename
metadata_columns = ['Cells', 'Flow_speed', 'ILbeta', 'Condition', 'experiment_nb', 'Repeat']

# Group the DataFrame by 'File_name' and then check if all entries within each group are identical
consistent_metadata = True
for name, group in merged_tracks_df.groupby('File_name'):
    for col in metadata_columns:
        if not group[col].nunique() == 1:
            consistent_metadata = False
            print(f"Inconsistency found for file: {name} in column: {col}")
            break  # Stop checking other columns for this group and move to the next file
    if not consistent_metadata:
        break  # Stop the entire process if any inconsistency is found

if consistent_metadata:
    print("All files have consistent metadata across the specified columns.")
else:
    print("There are inconsistencies in the metadata. Please check the output for details.")

# Drop duplicates based on the 'File_name' to get a unique list of filenames and their metadata
unique_files_df = merged_tracks_df.drop_duplicates(subset=['File_name'])[['File_name', 'Cells', 'Flow_speed', 'ILbeta', 'Condition', 'experiment_nb', 'Repeat']]

# Reset the index to clean up the DataFrame
unique_files_df.reset_index(drop=True, inplace=True)

# Display the resulting DataFrame in a nicely formatted HTML table
unique_files_df

import pandas as pd

# Assuming 'df' is your DataFrame and it already contains 'Conditions' and 'Repeats' columns.

# Group by 'Conditions' and 'Repeats' and count the occurrences
grouped = unique_files_df.groupby(['Condition', 'Repeat']).size().reset_index(name='counts')

# Check if any combinations have a count greater than 1, which means they are not unique
non_unique_combinations = grouped[grouped['counts'] > 1]

# Print the non-unique combinations
if not non_unique_combinations.empty:
    print("There are non-unique combinations of Conditions and Repeats:")
    print(non_unique_combinations)
else:
    print("All combinations of Conditions and Repeats are unique.")

check_unique_id_match(merged_spots_df, merged_tracks_df)


# Group the DataFrame by 'Cells', 'ILbeta', 'Repeat' and then check if there are 4 unique 'Flow_speed' values for each group
consistent_flow_speeds = True
for (cells, ilbeta, repeat), group in merged_tracks_df.groupby(['Cells', 'ILbeta', 'Repeat']):
    if group['Flow_speed'].nunique() != 4:
        consistent_flow_speeds = False
        print(f"Inconsistency found for Cells: {cells}, ILbeta: {ilbeta}, Repeat: {repeat} - Expected 4 Flow_speeds, found {group['Flow_speed'].nunique()}")
        break  # Stop the entire process if any inconsistency is found

if consistent_flow_speeds:
    print("Each combination of 'Cells', 'ILbeta', 'Repeat' has exactly 4 different 'Flow_speed' values.")
else:
    print("There are inconsistencies in 'Flow_speed' values. Please check the output for details.")


## **1.4. Filter tracks shorter than 50 spots**


In [None]:
# @title ##Filter tracks shorter than 50 spots


merged_tracks_df = merged_tracks_df[merged_tracks_df['NUMBER_SPOTS'] >= 50]
merged_spots_df = merged_spots_df[merged_spots_df['Unique_ID'].isin(merged_tracks_df['Unique_ID'])]


## **1.5. Visualise your tracks**
---

In [None]:
# @title ##Run the cell and choose the file you want to inspect

import ipywidgets as widgets
from ipywidgets import interact
import matplotlib.pyplot as plt

if not os.path.exists(Results_Folder+"/Tracks"):
    os.makedirs(Results_Folder+"/Tracks")  # Create Results_Folder if it doesn't exist

# Extract unique filenames from the dataframe
filenames = merged_spots_df['File_name'].unique()

# Create a Dropdown widget with the filenames
filename_dropdown = widgets.Dropdown(
    options=filenames,
    value=filenames[0] if len(filenames) > 0 else None,  # Default selected value
    description='File Name:',
)

def plot_coordinates(filename):
    if filename:
        # Filter the DataFrame based on the selected filename
        filtered_df = merged_spots_df[merged_spots_df['File_name'] == filename]

        plt.figure(figsize=(10, 8))
        for unique_id in filtered_df['Unique_ID'].unique():
            unique_df = filtered_df[filtered_df['Unique_ID'] == unique_id].sort_values(by='POSITION_T')
            plt.plot(unique_df['POSITION_X'], unique_df['POSITION_Y'], marker='o', linestyle='-', markersize=2)

        plt.xlabel('POSITION_X')
        plt.ylabel('POSITION_Y')
        plt.title(f'Coordinates for {filename}')
        plt.savefig(f"{Results_Folder}/Tracks/Tracks_{filename}.pdf")
        plt.show()
    else:
        print("No valid filename selected")

# Link the Dropdown widget to the plotting function
interact(plot_coordinates, filename=filename_dropdown)


--------------------------------------------------------
# **Part 2: Count number of arrested tracks per time points**
--------------------------------------------------------

In [None]:
# @title #Count number of tracks per time points
from sklearn.metrics import r2_score

# Define the window size for rolling average
window_size = 20  # Adjust this value as needed

speed_threshold = 5

# Check and create necessary directories
if not os.path.exists(f"{Results_Folder}/Track_Counts"):
    os.makedirs(f"{Results_Folder}/Track_Counts")

# Modified function to count slow tracks with rolling average
def count_slow_tracks(spots_df, speed_threshold):
    slow_tracks = spots_df[spots_df['Speed'] < speed_threshold]
    count_df = slow_tracks.groupby(['Condition','Cells','Flow_speed','ILbeta', 'Repeat', 'POSITION_T'])['Unique_ID'].nunique().reset_index()
        # Sort the DataFrame by 'POSITION_T' within each group
    count_df.sort_values(by=['Condition','Cells','Flow_speed','ILbeta', 'Repeat', 'POSITION_T'], inplace=True)
    count_df['Slow_Track_Count_Rolling'] = count_df.groupby(['Condition','Cells','Flow_speed','ILbeta', 'Repeat'])['Unique_ID'].transform(lambda x: x.rolling(window=window_size, min_periods=1).mean())
    return count_df

# Modified function to count total tracks with rolling average
def count_total_tracks(spots_df):
    total_count_df = spots_df.groupby(['Condition','Cells','Flow_speed','ILbeta', 'Repeat', 'POSITION_T'])['Unique_ID'].nunique().reset_index()
    total_count_df.sort_values(by=['Condition','Cells','Flow_speed','ILbeta', 'Repeat', 'POSITION_T'], inplace=True)
    total_count_df['Total_Track_Count_Rolling'] = total_count_df.groupby(['Condition','Cells','Flow_speed','ILbeta', 'Repeat'])['Unique_ID'].transform(lambda x: x.rolling(window=window_size, min_periods=1).mean())
    return total_count_df

# Plotting function with percentages and save figures as PDF
def plot_tracks_with_percentage(count_df, total_count_df, window_size):
    conditions = count_df['Condition'].unique()
    repeats = count_df['Repeat'].unique()
    fit_results = []

    for condition in conditions:
        for repeat in repeats:
            condition_repeat_df = count_df[(count_df['Condition'] == condition) & (count_df['Repeat'] == repeat)]

            if not condition_repeat_df.empty:
                merged_df = pd.merge(
                    total_count_df[(total_count_df['Condition'] == condition) & (total_count_df['Repeat'] == repeat)],
                    condition_repeat_df,
                    on=['Condition', 'Repeat', 'POSITION_T'],
                    how='left',
                    suffixes=('_total', '_slow')
                ).fillna(0)

                # Determine the start and end times based on window size
                start_time = merged_df['POSITION_T'].min() + window_size -1
                end_time = merged_df['POSITION_T'].max() - window_size -1

                # Trim data to the specified time window
                trimmed_df = merged_df[(merged_df['POSITION_T'] >= start_time) & (merged_df['POSITION_T'] <= end_time)]

                percentage_slow = (trimmed_df['Slow_Track_Count_Rolling'] / trimmed_df['Total_Track_Count_Rolling']) * 100
                m, b = np.polyfit(trimmed_df['POSITION_T'], trimmed_df['Slow_Track_Count_Rolling'], 1)
                y_fit = m * trimmed_df['POSITION_T'] + b
                r2 = r2_score(trimmed_df['Slow_Track_Count_Rolling'], y_fit)

                fit_results.append({'Condition': condition, 'Repeat': repeat, 'm': m, 'b': b, 'R2': r2})

                # Plotting
                fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
                sns.lineplot(data=trimmed_df, x='POSITION_T', y='Slow_Track_Count_Rolling', ax=ax1, label='Slow Tracks')
                sns.lineplot(data=trimmed_df, x='POSITION_T', y='Total_Track_Count_Rolling', ax=ax1, label='Total Tracks')
                ax1.plot(trimmed_df['POSITION_T'], y_fit, '--', label=f'Linear fit for Slow Tracks: y = {m:.2f}x + {b:.2f}')
                ax1.set_title(f'Number of Tracks for {condition} - Repeat {repeat}\nR^2 for Slow Tracks = {r2:.2f}')

                sns.lineplot(data=trimmed_df, x='POSITION_T', y=percentage_slow, ax=ax2, label='% Slow Tracks')
                ax2.set_title(f'Percentage of Slow Tracks for {condition} - Repeat {repeat}')

                plt.tight_layout()
                plt.savefig(Results_Folder + f'/Track_Counts/{condition}_Repeat{repeat}_plot.pdf', format='pdf')
                plt.show()

    return pd.DataFrame(fit_results)


# Use the functions
count_df = count_slow_tracks(merged_spots_df, speed_threshold)
total_count_df = count_total_tracks(merged_spots_df)

# Save dataframes
save_dataframe_with_progress(count_df, Results_Folder + '/Track_Counts/' +"slow_tracks_count.csv", desc="Saving slow tracks count")
save_dataframe_with_progress(total_count_df, Results_Folder + '/Track_Counts/' +"total_tracks_count.csv", desc="Saving total tracks count")

# Plot and save fit results
fit_df = plot_tracks_with_percentage(count_df, total_count_df, window_size)
save_dataframe_with_progress(fit_df, Results_Folder + '/Track_Counts/' +"fit_results.csv", desc="Saving fit results")

In [None]:
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np

# @title #Plot the fits

def plot_combined_fits(fit_df, save_path):
    conditions = fit_df['Condition'].unique()

    # Create a PDF file for the plots
    pdf_filename = "combined_fits.pdf"
    pdf_filepath = os.path.join(save_path, pdf_filename)
    pdf = PdfPages(pdf_filepath)

    for condition in conditions:
        condition_df = fit_df[fit_df['Condition'] == condition]

        fig, ax = plt.subplots(figsize=(8, 6))

        for index, row in condition_df.iterrows():
            x_vals = np.linspace(0, 100, 400)  # Modify as needed
            y_vals = row['m'] * x_vals + row['b']

            ax.plot(x_vals, y_vals, label=f'Fit for {row["Repeat"]}: y = {row["m"]:.2f}x + {row["b"]:.2f}')

        ax.set_title(f'Combined Fits for {condition}')
        ax.legend(loc='upper left')

        plt.tight_layout()

        # Save the current plot to the PDF
        pdf.savefig(fig)
        plt.show()
        plt.close(fig)

    # Close the PDF object
    pdf.close()
    print(f"All combined fit plots saved to {pdf_filepath}")

# Example usage
save_path = Results_Folder + '/Track_Counts/'
plot_combined_fits(fit_df, save_path)


In [None]:
# @title #Plot the fits as bars

plt.figure(figsize=(10, 6))

# Barplot to display the mean value for each condition
sns.barplot(data=fit_df, x='Condition', y='m', color='lightgray', estimator=np.mean, errorbar=None, alpha=0.7, label='Mean')

# Swarmplot to show individual repeats for each condition
filtered_df = fit_df[fit_df['m'] >= 0]
sns.stripplot(data=filtered_df, x='Condition', y='m', hue='Repeat', palette="Set3", size=7)

# Annotating the bars with the mean values
for index, value in enumerate(fit_df.groupby('Condition')['m'].mean()):
    if value >= 0:  # Only annotate bars with non-negative means
        plt.text(index, value + 0.01, f'{value:.2f}', ha='center')

plt.title("Variation of 'm' value for each condition")
plt.ylabel("Slope (m)")
plt.xlabel("Condition")
plt.legend(title='Legend')

# Set y-axis to start from 0
plt.ylim(bottom=0)

# Rotate x-axis labels
plt.xticks(rotation=90)

# Adjust layout to make room for the x-tick labels
plt.gcf().subplots_adjust(bottom=0.15)

# Save the figure

plt.savefig(Results_Folder + '/Track_Counts/m_value_variation_per_condition_combined.pdf', bbox_inches='tight')
plt.show()

In [None]:
import pandas as pd

# @title #Plot the profiles together

window_size = 10  # Adjust this value as needed

# Define the order for 'Flow_speed'
flow_speed_order = ["300", "200", "100", "wash"]

# Convert 'Flow_speed' to string
count_df['Flow_speed'] = count_df['Flow_speed'].astype(str)

count_df['unique_combinations'] = count_df['Cells'].astype(str) + '_' + count_df['ILbeta'].astype(str) + '_' + count_df['Repeat'].astype(str)

# Function to create POSITION_T_REPEAT values
def create_position_t_repeat(group):
    # Apply custom sorting within the group
    group = group.copy()
    group['Flow_speed'] = pd.Categorical(group['Flow_speed'], categories=flow_speed_order, ordered=True)
    group = group.sort_values(by=['Flow_speed', 'POSITION_T'])

    # Calculate POSITION_T_REPEAT
    position_t_repeat = 0.04
    for i in range(len(group)):
        if i != 0:
            position_t_repeat += 0.04
        group.iat[i, group.columns.get_loc('POSITION_T_REPEAT')] = position_t_repeat

    return group

# Add a placeholder column for POSITION_T_REPEAT
count_df['POSITION_T_REPEAT'] = 0

# Apply the function to each unique combination of 'Cells', 'ILbeta', 'Repeat'
updated_dfs = []
unique_combinations = count_df[['Cells', 'ILbeta', 'Repeat']].drop_duplicates()
for _, row in unique_combinations.iterrows():
    combo_df = count_df[(count_df['Cells'] == row['Cells']) &
                        (count_df['ILbeta'] == row['ILbeta']) &
                        (count_df['Repeat'] == row['Repeat'])]
    updated_combo_df = create_position_t_repeat(combo_df)
    updated_dfs.append(updated_combo_df)

# Concatenate the updated DataFrames
count_df = pd.concat(updated_dfs, ignore_index=True)

# Function to apply rolling average on 'Unique_ID' column
def apply_rolling_average(group):
    group['Unique_ID_Rolling'] = group['Unique_ID'].rolling(window=window_size, min_periods=1).mean()
    return group

# Apply the rolling average function to each group
count_df = count_df.groupby('unique_combinations', group_keys=False).apply(apply_rolling_average)

save_dataframe_with_progress(count_df, Results_Folder + '/Track_Counts/' +"slow_tracks_count.csv", desc="Saving slow tracks count")

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages
import os

# Get unique combinations
unique_combinations = count_df['unique_combinations'].unique()

# Create a PDF file for the plots
pdf_filename = "Profiles_together.pdf"
pdf_filepath = os.path.join(f"{Results_Folder}/Track_Counts/", pdf_filename)
pdf = PdfPages(pdf_filepath)

# Loop through each unique combination and plot
for combination in unique_combinations:
    # Filter data for the specific combination
    combo_df = count_df[count_df['unique_combinations'] == combination]

    # Define the filename based on the combination
    filename = f"{combination}_data.csv"
    filepath = os.path.join(f"{Results_Folder}/Track_Counts/", filename)

    # Save the dataframe
    save_dataframe_with_progress(combo_df, filepath)
    print(f"Dataframe for {combination} saved to {filepath}")

    # Create a new figure for each plot
    plt.figure(figsize=(10, 6))
    sns.lineplot(data=combo_df, x='POSITION_T_REPEAT', y='Unique_ID_Rolling', label=combination)
    plt.title(f'Track Count over Time - {combination}')
    plt.xlabel('Time (Continuous)')
    plt.ylabel('Number of Tracks')
    plt.grid(True)
    plt.legend()

    # Save the current plot to the PDF
    pdf.savefig()
    plt.show()
    plt.close()

# Close the PDF object
pdf.close()
print(f"All plots saved to {pdf_filepath}")




In [None]:
import pandas as pd
import numpy as np

# @title #Make the attachment profiles the same length

# Function to adjust series length
def adjust_series_length(group, target_length=8750, interval=0.04, window_size=10):
    # Sort by POSITION_T_REPEAT
    group = group.sort_values(by='POSITION_T_REPEAT')

    # Metadata columns to be copied
    metadata_columns = ['Cells', 'ILbeta', 'Repeat', 'Flow_speed', 'Condition','unique_combinations']  # Add other relevant columns here

    # Check the length of the current series
    current_length = len(group)

    if current_length > target_length:
        # If the series is too long, trim it
        return group.head(target_length)
    elif current_length < target_length:
        # If the series is too short, evenly distribute and fill
        distributed_indices = np.linspace(0, target_length - 1, len(group)).astype(int)
        new_df = pd.DataFrame({'POSITION_T_REPEAT': np.arange(0, target_length) * interval})
        new_df['Slow_Track_Count_Rolling'] = np.nan  # Initialize with NaN
        new_df.loc[distributed_indices, 'Slow_Track_Count_Rolling'] = group['Slow_Track_Count_Rolling'].values
        new_df['Slow_Track_Count_Rolling'] = new_df['Slow_Track_Count_Rolling'].rolling(window=window_size, min_periods=1, center=True).mean()

        new_df['Unique_ID'] = np.nan  # Initialize with NaN
        new_df.loc[distributed_indices, 'Unique_ID'] = group['Unique_ID'].values
        new_df['Unique_ID'] = new_df['Unique_ID'].rolling(window=window_size, min_periods=1, center=True).mean()

        new_df['Unique_ID_Rolling'] = np.nan  # Initialize with NaN
        new_df.loc[distributed_indices, 'Unique_ID_Rolling'] = group['Unique_ID_Rolling'].values
        new_df['Unique_ID_Rolling'] = new_df['Unique_ID_Rolling'].rolling(window=window_size, min_periods=1, center=True).mean()

        # Copy metadata columns
        for col in metadata_columns:
            new_df[col] = group[col].iloc[0]

        return new_df
    else:
        # If the series is already the right length, return as is
        return group

# Apply the function to each unique combination
adjusted_dfs = []
for unique_combination in count_df['unique_combinations'].unique():
    group = count_df[count_df['unique_combinations'] == unique_combination]
    adjusted_df = adjust_series_length(group)
    adjusted_dfs.append(adjusted_df)

# Combine all adjusted dataframes
adjusted_df = pd.concat(adjusted_dfs, ignore_index=True)

# Check for NaNs in the adjusted dataframe
def check_for_nans(df, df_name):
    nan_columns = df.columns[df.isna().any()].tolist()
    if nan_columns:
        print(f"NaNs found in {df_name} in columns: {nan_columns}")
    else:
        print(f"No NaNs found in {df_name}.")


# Drop 'POSITION_T' column from adjusted_df
adjusted_df.drop(columns='POSITION_T', inplace=True)

# Group by unique_combinations, sort by POSITION_T_REPEAT, and renumber
adjusted_df = adjusted_df.groupby('unique_combinations', group_keys=False).apply(lambda x: x.sort_values(by='POSITION_T_REPEAT'))
adjusted_df['POSITION_T_REPEAT'] = adjusted_df.groupby('unique_combinations').cumcount() * 0.04

check_for_nans(adjusted_df, "adjusted_df")

save_dataframe_with_progress(adjusted_df, Results_Folder + '/Track_Counts/' +"slow_tracks_count_adjusted.csv", desc="Saving slow tracks count adjusted")


import matplotlib.pyplot as plt
import seaborn as sns

# Plot all original and interpolated time series in a single plot
plt.figure(figsize=(12, 8))

# Loop through each unique combination and add both original and interpolated series to the plot
for unique_id in count_df['unique_combinations'].unique():
    original_series = count_df[count_df['unique_combinations'] == unique_id]
    processed_series = adjusted_df[adjusted_df['unique_combinations'] == unique_id]

    # Plot original series
    sns.lineplot(data=original_series, x='POSITION_T_REPEAT', y='Unique_ID_Rolling', label=f'{unique_id} Original', linestyle='--')

    # Plot interpolated series
    sns.lineplot(data=processed_series, x='POSITION_T_REPEAT', y='Unique_ID_Rolling', label=f'{unique_id} Interpolated')

plt.title('Original vs Interpolated Time Series for All Combinations')
plt.xlabel('POSITION_T_REPEAT')
plt.ylabel('Slow_Track_Count_Rolling')
plt.legend(title='Series Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import os

# @title #Plot altogether


# Get unique combinations of 'Cells' and 'ILbeta'
unique_cells_ilbeta = adjusted_df[['Cells', 'ILbeta']].drop_duplicates()

# Adjust figure size and layout
fig, ax = plt.subplots(figsize=(12, 8))  # Adjusted figure size

for _, row in unique_cells_ilbeta.iterrows():
    cells, ilbeta = row['Cells'], row['ILbeta']
    combo_df = adjusted_df[(adjusted_df['Cells'] == cells) & (adjusted_df['ILbeta'] == ilbeta)]

    filename = f"{cells}_{ilbeta}_data.csv"
    filepath = os.path.join(Results_Folder + '/Track_Counts/', filename)
    combo_df.to_csv(filepath, index=False)
    print(f"Dataframe for {cells}, {ilbeta} saved to {filepath}")

    sns.lineplot(data=combo_df, x='POSITION_T_REPEAT', y='Unique_ID_Rolling', label=f"{cells}, {ilbeta}", errorbar="se")

# Manually adjust y-axis limits
current_ylim = ax.get_ylim()
ax.set_ylim(current_ylim[0], current_ylim[1] * 1.1)

# Add horizontal lines for different Flow_speed segments
ax.hlines(y=current_ylim[1]*1.00, xmin=0, xmax=87, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=88, xmax=175, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=176, xmax=263, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=264, xmax=350, colors='gray', linestyles='solid', lw=5)

ax.text(40, current_ylim[1]*1.03, '300', horizontalalignment='center')
ax.text(130, current_ylim[1]*1.03, '200', horizontalalignment='center')
ax.text(220, current_ylim[1]*1.03, '100', horizontalalignment='center')
ax.text(310, current_ylim[1]*1.03, 'Wash', horizontalalignment='center')

ax.set_title('Track Count over Time')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Number of Tracks')

# Place the legend outside the plot on the right
ax.legend(title='Conditions', loc='center left', bbox_to_anchor=(1, 0.5))

plt.tight_layout()

# Save the plot as a PDF
plt.savefig(pdf_filepath)
plt.show()
plt.close()
print(f"Plot saved to {pdf_filepath}")


--------------------------------------------------------
# **Part 3: Filter and plot your data**
--------------------------------------------------------

In [None]:
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

# @title #Filter the data

count_df = adjusted_df


# Global variables to store the selected options
global filtered_df
filtered_df = pd.DataFrame()

global selected_cells, selected_speeds, selected_ilbetas
selected_cells, selected_speeds, selected_ilbetas = [], [], []

# Function to summarize selected options into a string
def summarize_options(options):
    return "_".join([str(option) for option in options if option])  # Filters out any 'falsy' values like empty strings or None

# Function to create a filename based on selected options
def create_filename(selected_cells, selected_speeds, selected_ilbetas):
    # Join the summarized options for each parameter with an underscore
    selected_options = "_".join([
        summarize_options(selected_cells),
        summarize_options(selected_speeds),
        summarize_options(selected_ilbetas)
    ])

    # Replace spaces with underscores and return the filename
    filename = f"{selected_options}"
    return filename.replace(" ", "_")

# Create checkboxes for each category
cells_checkboxes = [widgets.Checkbox(value=False, description=str(cell)) for cell in count_df['Cells'].unique()]
flow_speed_checkboxes = [widgets.Checkbox(value=False, description=str(speed)) for speed in count_df['Flow_speed'].unique()]
ilbeta_checkboxes = [widgets.Checkbox(value=False, description=str(ilbeta)) for ilbeta in count_df['ILbeta'].unique()]

# Function to filter dataframe and update global variables based on selected checkbox values
def filter_dataframe(button):
    global filtered_df, selected_cells, selected_speeds, selected_ilbetas

    # Trim whitespace and correct cases if necessary
    count_df['Cells'] = count_df['Cells'].str.strip()
    count_df['Flow_speed'] = count_df['Flow_speed'].str.strip()
    count_df['ILbeta'] = count_df['ILbeta'].str.strip()

    selected_cells = [box.description for box in cells_checkboxes if box.value]
    selected_speeds = [box.description for box in flow_speed_checkboxes if box.value]
    selected_ilbetas = [box.description for box in ilbeta_checkboxes if box.value]

    # Debugging output
    print("Selected Cells:", selected_cells)
    print("Selected Speeds:", selected_speeds)
    print("Selected ILbetas:", selected_ilbetas)
    print("Original DF length:", len(count_df))

    filtered_df = count_df[
        (count_df['Cells'].isin(selected_cells)) &
        (count_df['Flow_speed'].isin(selected_speeds)) &
        (count_df['ILbeta'].isin(selected_ilbetas))
    ]

    # More debugging output
    print("Filtered DF length:", len(filtered_df))
    if len(filtered_df) == 0:
        print("No data matched the selected filters. Check filters and data for consistency.")
        print("Unique 'Cells' in DataFrame:", count_df['Cells'].unique())
        print("Unique 'Flow_speed' in DataFrame:", count_df['Flow_speed'].unique())
        print("Unique 'ILbeta' in DataFrame:", count_df['ILbeta'].unique())

    print("Done")

# Now call the filter function or trigger the button to filter the dataframe and see the output.


# Button to trigger dataframe filtering
filter_button = widgets.Button(description="Filter Dataframe")
filter_button.on_click(filter_dataframe)

# Display checkboxes and button
display(widgets.VBox([
    widgets.Label('Select Cells:'),
    widgets.HBox(cells_checkboxes),
    widgets.Label('Select Flow Speed:'),
    widgets.HBox(flow_speed_checkboxes),
    widgets.Label('Select ILbeta:'),
    widgets.HBox(ilbeta_checkboxes),
    filter_button
]))


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import os

# @title #Plot selected conditions

# Check and create necessary directories
if not os.path.exists(f"{Results_Folder}/Track_Counts"):
    os.makedirs(f"{Results_Folder}/Track_Counts")

filename = create_filename(selected_cells, selected_speeds, selected_ilbetas)


pdf_filepath = os.path.join(Results_Folder + '/Track_Counts/', filename+'_plot.pdf')

# Get unique combinations of 'Cells' and 'ILbeta'
unique_cells_ilbeta = filtered_df[['Cells', 'ILbeta']].drop_duplicates()

# Adjust figure size and layout
fig, ax = plt.subplots(figsize=(12, 8))  # Adjusted figure size

for _, row in unique_cells_ilbeta.iterrows():
    cells, ilbeta = row['Cells'], row['ILbeta']
    combo_df = filtered_df[(filtered_df['Cells'] == cells) & (filtered_df['ILbeta'] == ilbeta)]

    filepath = os.path.join(Results_Folder + '/Track_Counts/', filename +'_data.csv')
    combo_df.to_csv(filepath, index=False)
    print(f"Dataframe for {cells}, {ilbeta} saved to {filepath}")

    sns.lineplot(data=combo_df, x='POSITION_T_REPEAT', y='Unique_ID_Rolling', label=f"{cells}, {ilbeta}", errorbar="se")

# Manually adjust y-axis limits
current_ylim = ax.get_ylim()
ax.set_ylim(current_ylim[0], current_ylim[1] * 1.1)

# Add horizontal lines for different Flow_speed segments
ax.hlines(y=current_ylim[1]*1.00, xmin=0, xmax=87, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=88, xmax=175, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=176, xmax=263, colors='gray', linestyles='solid', lw=5)
ax.hlines(y=current_ylim[1]*1.00, xmin=264, xmax=350, colors='gray', linestyles='solid', lw=5)

ax.text(40, current_ylim[1]*1.03, '300', horizontalalignment='center')
ax.text(130, current_ylim[1]*1.03, '200', horizontalalignment='center')
ax.text(220, current_ylim[1]*1.03, '100', horizontalalignment='center')
ax.text(310, current_ylim[1]*1.03, 'Wash', horizontalalignment='center')

ax.set_title('Track Count over Time')
ax.set_xlabel('Time (s)')
ax.set_ylabel('Number of Tracks')

# Place the legend outside the plot on the right
ax.legend(title='Conditions', loc='center left', bbox_to_anchor=(1, 0.5))

plt.tight_layout()

# Save the plot as a PDF
plt.savefig(pdf_filepath)
plt.show()
plt.close()
print(f"Plot saved to {pdf_filepath}")
