<img src="../assets/teracyte-logo.png" alt="Teracyte Logo" width="300"/>

<a href="https://colab.research.google.com/github/TeraCyte-ai/notebooks-demo/blob/main/notebooks/teracyte_data_overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TeraCyte Data Overview Notebook

Welcome to TeraCyte's Data Overview notebook.

This notebook provides an interactive, step-by-step analysis of TeraCyte's time-resolved single-cell imaging dataset. It guides you through setting up the environment, exploring metadata, visualizing sample data, and performing basic data analysis.

Use the table of contents to navigate through the different sections:

1.  **üõ†Ô∏è Setup**: Install necessary packages and define user-specific experiment details.
2.  **üìã Metadata Overview**: Review experiment and sample metadata.
3.  **üî¨ Sample Viewer**: Visualize raw image data at the FOV and well levels.
4.  **üìä Data Analysis**: Query and analyze data, including creating heatmaps and scatter plots.

Follow the instructions in each section to explore and analyze your data.

# üõ†Ô∏è Setup

## üì• Packages and libraries

This section ensures that all necessary Python packages and libraries are installed and imported for the notebook to function correctly.

In [None]:
%%capture
# Install TeraCyte Notebooks Utils package directly from GitHub
!pip install git+https://github.com/TeraCyte-ai/notebooks-demo.git

In [None]:
import teracyte_notebooks_utils as tnu
from teracyte_notebooks_utils import Sample
from teracyte_notebooks_utils.metadata_display import display_hardware_metadata, display_sample_metadata, display_serial_number_records
from teracyte_notebooks_utils.vizarr_viewer import create_fovs_vizarr_viewer
from teracyte_notebooks_utils import create_wells_groups_manager
from teracyte_notebooks_utils.data_query import create_interactive_data_query, download_query_data_csv
from teracyte_notebooks_utils.analysis_plots import *
tnu.set_service_ip('51.124.52.51')

## üßë‚Äçüíª User setup

Update the `SERIAL_NUMBER` to match your sample's serial number to retrieve all experiments and assays related to that specific sample.

Run the cell below after setting the serial number.

In [None]:
# --- USER INPUT REQUIRED ---
USER_ID = "<USER_ID>"
SERIAL_NUMBER = "<SERIAL_NUMBER>"

In [None]:
display_serial_number_records(SERIAL_NUMBER, USER_ID)

Set the `EXP_ID` and `ASSAY_ID` in the next cell below to select the experiment and assay you wish to analyze.

Run the cell below after setting these values to continue.

In [None]:
# --- USER INPUT REQUIRED ---
EXP_ID = "<EXP_ID>"
ASSAY_ID = "<ASSAY_ID>"

In [None]:
# Initialize Sample object
sample = Sample(serial_number=SERIAL_NUMBER, exp_id=EXP_ID, assay_id=ASSAY_ID)

# üìã Metadata Overview

Metadata provides information about the experiment, sample, and hardware used for data acquisition. Reviewing the metadata helps you understand the context of your data and any relevant experimental parameters.

In [None]:
# Display metadata using the sample object
display_hardware_metadata(sample.exp_metadata, sample.sample_metadata)
display_sample_metadata(sample.exp_metadata, sample.sample_metadata)

# üî¨ Sample Viewer

This section allows you to visually inspect your data at different levels, from Fields of View (FOVs) to individual wells.

## üñºÔ∏è FOV Viewer

The FOV image dataset is stored in Zarr format, optimized for large-scale imaging data.
Each FOV dataset with shape (T, C, Z, Y, X) where:
  - **T**: Timepoint (sequence)
  - **C**: Channel (based on the filter set)
  - **Z**: Z (based on the z axis position in micrometer)
  - **Y**: Height of the image (pixels)
  - **X**: Width of the image (pixels)

In [None]:
# Create and display the Vizarr viewer UI
ui = create_fovs_vizarr_viewer(sample=sample)
display(ui)

## üîÑ Workflows Progress

In [None]:
workflow_selector = create_workflow_selector(sample=sample)
display(workflow_selector)

# üìä Data Analysis

## üîç Interactive Data Query

Select partition values to filter the experimental data. You can choose specific values or select "All" to include all available values for each partition.

In [None]:
# Create and display the interactive data query
interactive_query = create_interactive_data_query(sample)
display(interactive_query)

## ‚¨áÔ∏è Download Data as CSV

In [None]:
download_query_data_csv(sample)

## ‚òëÔ∏è Select Data Type

In [None]:
df_selector = ["wells_data", "cells_data"]

In [None]:
# Get the selected dataframe
data = sample.get_dataframe(df_selector.value)

## üßπ Preprocessing & Outlier Removal
Before analysis, we clean the dataset by removing invalid or extreme outlier values based on percentile thresholds. This step ensures that the data is robust and reliable for downstream analysis.This step using the IQR method.

**Filtering options:**

- Default usage

- Custom feature list

- Different grouping

- No grouping (global outlier removal)


In [None]:
# Create and display the outlier filtering controls
filtered_df = create_outlier_filtering_controls(data, df_selector.value)

## ‚è±Ô∏è Single Time Point Analysis (For a specific sequence you choose)

### üèÅ Sample Heatmap

This section allows you to visualize the values of a selected feature across fields of view (FOVs) on the chip layout. It helps in understanding the spatial distribution of that feature across the chip. There are two cells in this section: the first provides an interactive interface for choosing parameters, and the second generates the corresponding heatmap.

**Key Features:**
- **Interactive Parameter Selection**: Choose feature, channel, sequence, and visualization options through user-friendly widgets
- **Spatial Distribution Visualization**: See how your selected feature varies across different positions on the chip
- **Channel-Specific Analysis**: Focus on specific channels (e.g., Brightfield, GFP, CY5) for targeted analysis
- **Time Point Selection**: Analyze data from specific sequence time points
- **Quality Control**: Identify outlier FOVs or systematic spatial variations in your data

In [None]:
# Choose the wanted heatmap parameters
chip_heatmap_controls(filtered_df, sample, df_selector.value)

### ‚ú≥Ô∏è Scatter Plot

This section allows you to create interactive scatter plots to explore relationships in your data. The scatter plot tool provides powerful visualization and data selection capabilities.

üéØ **Comparison Modes:**

- **Feature Comparison**: Compare different features (e.g., intensity vs area) within a single channel
- **Channel Comparison**: Compare the same feature across different channels

üîó **Clustering**:  

Enable clustering to automatically group similar data points using the DBSCAN algorithm. This helps identify distinct populations, outliers, or patterns within your scatter plot, making it easier to interpret complex relationships in your data. Clusters are color-coded for visual clarity.


üíæ **Data Selection and Group Management:**
- **Select Data Points**: Use the interactive selection tools (box select, lasso select) to highlight specific regions
- **Save Selected Groups**: Save your selected data points as named groups (in the *saved_queries* variable) for future analysis
- **Global Index Extraction**: Selected points provide global indexes that can be used for:
  - Creating custom analysis groups
  - Filtering data in other sections
  - Exporting specific cell populations


In [None]:
# Initialize comparison mode selector
comparison_mode_widget = create_comparison_mode_selector()

In [None]:
# Create the parameters for the current mode
controls = create_scatter_controls(filtered_df, comparison_mode_widget.value,sample, df_selector.value)

In [None]:
# Create The scatter plot
saved_queries = plot_interactive_scatter(filtered_df, controls, eps=500)

## ‚è≥ Multi Time Point Analysis 

### üìä Histograms

This section provides interactive controls for generating multi-timepoint histograms to analyze feature distributions across different time points. Compare how cellular measurements evolve over time and identify population shifts or changes in distribution patterns.

**Key Features:**
- **Multi-timepoint Analysis**: Compare feature distributions across selected time points
- **Flexible Scaling**: Linear, Log, or SymLog scaling options with customizable thresholds  
- **Display Modes**: Choose between histogram bars, smooth outlines, or cumulative distributions
- **Advanced Sampling**: Constant N option ensures equal sample sizes across time points
- **Data Filtering**: Built-in negative CTCF filtering and customizable transparency controls
- **Interactive Controls**: Full parameter customization through organized widget interface

**Analysis Capabilities:**
- Track population distribution changes over time
- Identify shifts in cellular feature patterns
- Compare treatment effects across time points
- Visualize data quality and outlier patterns

In [None]:
# Create and display the controls
hist_controls_layout = create_multi_timepoint_hist_controls(filtered_df, sample, df_selector.value)

In [None]:
plot_histogram_from_controls(filtered_df, sample, hist_controls_layout)

### üìà Mean Population Over Time

This section provides tools for visualizing how feature values change over time across different channels. The interactive controls allow you to create time series plots showing mean population values with error bars, helping you track temporal trends and compare responses between channels.

**Key Features:**
- Plot mean ¬± error bars (SEM or STD) over sequence time points
- Compare multiple channels simultaneously with color-coded lines
- Interactive controls for feature selection, sequence range, and display options
- Support for both linear and logarithmic Y-axis scaling
- Hover tooltips showing detailed measurements for each data point

In [None]:
# Create and display the controls
controls_layout = create_time_series_controls(filtered_df, sample, df_selector.value)

In [None]:
plot_time_series_from_controls(filtered_df, sample, controls_layout)