# Getting Started: how to prepare data for Timelapse Feature Explorer

The [Timelapse Feature Explorer (TFE)](https://timelapse.allencell.org) is a web-based application designed for the interactive visualization and analysis of segmented time-series microscopy data! Data needs to be processed into a specific format to be loaded into the viewer.

In this tutorial, you'll learn how to prepare your data for the Timelapse Feature Explorer.

*This notebook can be run with Google Colab, but we recommend running it on a local machine if you plan to convert and view your own datasets later.*

<a target="_blank" href="https://colab.research.google.com/github/allen-cell-animated/colorizer-data/blob/main/documentation/getting_started_guide/GETTING_STARTED.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>

## 1. Prerequisites

Setup instructions differ based on whether you are running a local Jupyter notebook instance or Google Colab.
> **_NOTE_**: You must be on Python version 3.9 or above. The installation may fail unexpectedly on older versions of Python.

#### 1.1A Running on a local machine with **Jupyter Lab**

From a command terminal, clone this repository if you haven't already and run Jupyter Lab from the repository root directory. 
You will likely want to do this from a virtual Python environment. We've included steps below on how to activate a [venv](https://docs.python.org/3/library/venv.html) virtual Python environment.

```bash
# Skip this step if you've already cloned the repository.
git clone https://github.com/allen-cell-animated/colorizer-data.git
cd colorizer-data

# Set up a virtual environment
python -m venv venv
source .venv/bin/activate
# On Windows, you may need to run this instead:
# source .venv/Scripts/activate

# Install and start Jupyter lab
python -m pip install jupyterlab
jupyter lab
```

Follow the provided link to open the browser to access Jupyter, and navigate to this notebook in Jupyter Lab. Run the following cell to set up and install dependencies.

In [None]:
# Install dependencies
%pip install -r ./requirements.txt

#### 1.1B Running in **Google Colab**

If opening this notebook in Google Colab, run the following commands to install the repository and set up the environment.

In [None]:
!git clone https://github.com/allen-cell-animated/colorizer-data.git

%cd colorizer-data/documentation/getting_started_guide

In [None]:
%pip install -r ./requirements.txt

## 2. Terminology

- **Segmentation ID**: An ID associated with a single segmentation shape at a single timepoint. This is also commonly also called "label" in segmentation & tracking workflows.
- **Track ID**: An identifier for a unique set of segmentations, linking them across timepoints. Generally this describes the track of one object along the time sequence. 

## 3. Expected formats for raw data

For this tutorial, we'll be working with sample data included in the [`getting_started_guide/raw_datasets`](./raw_datasets/) directory.

This dataset is a simplified example of raw, pre-processed segmentation data. The data was generated using the [`generate_raw_data.py` script](./scripts/generate_data.py), which generates a **CSV file** with columns for segmentation IDs, track IDs, times, centroids, features (volume/height), and paths to the segmentation images. 

The **segmentation images** are 2D images in the OME-TIFF format, encoding the locations of segmented objects. Pixel values correspond to **segmentation IDs** , where 0 is the background and positive integers correspond to different segmented objects. If your segmentation images are 3-dimensional, you may choose to flatten them to 2D or use our provided utilites to do so. We also offer experimental 3D support in Timelapse Feature Explorer for OME-Zarr array data.

Your data may be in a different format, in which case it will need to be transformed to work well with our utilities. Generally, we recommend the following steps:

1. Save your data as a CSV or other format that can be read into a pandas `DataFrame`.
2. Make every segmentation at each time point into its own row in the table.
3. Save track ID, time, centroids, and other information as columns.

### What does the example dataset look like?

Here's a preview of the raw dataset, `data.csv`:

| segmentation_id | track_id | time | centroid_x | centroid_y | area | radius | location | segmentation_path |
| ----------- | ---------- | ------ | ------------ | ------------ | -------- | -------- | ------------------- | --- |
| 1 | 0 | 0 | 33 | 110 | 706.9 | 15 | middle | frame_0.tiff |
| 2 | 1 | 0 | 67 | 100 | 804.2 | 16 | middle | frame_0.tiff |
| 3 | 2 | 0 | 100 | 108 | 804.2 | 16 | middle | frame_0.tiff |
| 4 | 3 | 0 | 133 | 88 | 706.9 | 15 | middle | frame_0.tiff |
| 5 | 4 | 0 | 167 | 101 | 804.2 | 16 | middle | frame_0.tiff |
| 1 | 0 | 1 | 33 | 121 | 530.9 | 13 | bottom | frame_1.tiff |
| 2 | 1 | 1 | 67 | 113 | 804.2 | 16 | middle | frame_1.tiff |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |

Each of the segmentation images in the `segmentation_path` column is an OME-TIFF image with segmentation IDs encoded as pixel values:

_Frame 0 of the example dataset, as viewed in FIJI. Contrast has been increased for easier viewing._

![Frame 0 of the example dataset, as viewed in FIJI. Contrast has been increased for easier viewing. The black background has been labeled as ID=0. Five red bubbles in the center of the image are labeled in ascending order from 1-5, left to right.](./assets/sample-segmentation.png)

## 4. Processing data

Timelapse Feature Explorer reads data in the format specified by the [`DATA_FORMAT` document](../DATA_FORMAT.md). We'll use the utilities provided by `colorizer-data` to convert to this format.

### 4.1. Basic usage


In [None]:
from pathlib import Path

from colorizer_data import convert_colorizer_data
import pandas as pd

source_directory = Path("raw_datasets/dataset_1")
output_directory = Path("processed_dataset")

# Load the dataset
data: pd.DataFrame = pd.read_csv(source_directory / "data.csv")

# `convert_colorizer_data` is a helper function for dataset conversion! You can pass in a pandas DataFrame
# and the output directory, as well as the names of the data columns for the id, time, track, and image data.
# Any other columns are treated as features. We'll show off more advanced behavior in later steps, but this will
# output a complete readable dataset.

convert_colorizer_data(
    data,
    output_directory,
    source_dir=source_directory,
    object_id_column="segmentation_id",
    times_column="time",
    track_column="track_id",
    image_column="segmentation_path",
    centroid_x_column="centroid_x",
    centroid_y_column="centroid_y",
)

Congratulations! You now have a complete dataset ready to view in the TFE viewer. The new dataset will be found in the `processed_dataset` directory. You can skip to [section 5, 'Viewing the dataset'](#5-Viewing-the-dataset) to view your dataset in the TFE viewer, or run a few additional steps for a more polished dataset.

### 4.2. Advanced usage

#### Feature metadata

By default, any columns that aren't mapped to data columns (e.g. object ID, times, tracks, etc.) are automatically handled as feature data. You can specify which columns are features by passing a list of column names to the `feature_columns` parameter.

While `convert_colorizer_data` will infer feature types for you, you can also provide metadata about each feature, such as its label, key, type, units, and description using the `feature_info` argument.

Features can be one of three types:

1. **Continuous** features are used for floating-point numbers.
2. **Discrete** features are used for integers.
3. **Categorical** features are used for string-based labels. (Note that there is a hard limit of 12 categories for a categorical feature.)


In [None]:
from colorizer_data import (
    FeatureInfo,
    FeatureType,
)

area_info = FeatureInfo(
    label="Area",
    key="area",
    type=FeatureType.CONTINUOUS,
    unit="px²",
    description="Area of object in square pixels, calculated from radius."
)
radius_info = FeatureInfo(
    label="Radius",
    key="radius",
    # Discrete features are used for integers.
    type=FeatureType.DISCRETE,
    unit="px",
    description="Radius of object in pixels."
)
location_info = FeatureInfo(
    label="Location",
    key="location",
    # Categorical features are used for string-based labels.
    type=FeatureType.CATEGORICAL,
    # Categories can be auto-detected from the data, or provided manually
    # if you want to preserve a specific order for the labels.
    categories=["top", "middle", "bottom"],
    description="Y position of object's centroid in the frame, as either 'top' (y < 40%), 'middle' (40% ≤ y ≤ 60%), or 'bottom' (y > 60%) of the frame."
)

# Map from column names to FeatureInfo objects.
feature_info = {
    "area": area_info,
    "radius": radius_info,
    "location": location_info
}
# Note that providing `feature_column_names` will turn off automatic feature detection.
# Providing it is optional, and `feature_info` will still update metadata without it.
feature_column_names = ["area", "radius", "location"]

convert_colorizer_data(
    data,
    output_directory,
    source_dir=source_directory,
    object_id_column="segmentation_id",
    times_column="time",
    track_column="track_id",
    image_column="segmentation_path",
    centroid_x_column="centroid_x",
    centroid_y_column="centroid_y",
    feature_column_names=feature_column_names,
    feature_info=feature_info
)

#### Dataset metadata

We recommend including additional metadata about your dataset, which can be provided using the `metadata` argument. This includes the dataset name, author, description, and any additional information you want to include.

In [None]:
from colorizer_data import (
    ColorizerMetadata,
)

# Define the metadata for this dataset.
metadata = ColorizerMetadata(
    name="Example dataset 1",
    description="An example dataset for the Timelapse Feature Explorer!",
    author="Jane Doe et al.",
    dataset_version="v1.0",
    # The width and height of the original segmentations, in units defined
    # by `frame_units`. This configures the scale bar in the viewer.
    frame_width=100,
    frame_height=100,
    frame_units="nm",
    # Time elapsed between each frame capture, in seconds.
    frame_duration_sec=1,
)

convert_colorizer_data(
    data,
    output_directory,
    source_dir=source_directory,
    object_id_column="segmentation_id",
    times_column="time",
    track_column="track_id",
    image_column="segmentation_path",
    centroid_x_column="centroid_x",
    centroid_y_column="centroid_y",
    feature_column_names=feature_column_names,
    feature_info=feature_info,
    metadata=metadata
)

### 4.3 Collections (optional)

If you have multiple datasets, you can group them into a **collection** to make it easier to view and compare them in Timelapse Feature Explorer. These steps are optional but will let you quickly switch between two different datasets.

In [None]:
# Move the existing dataset to a subdirectory
import os
os.makedirs("processed_dataset/dataset_1", exist_ok=True)
!mv processed_dataset/*.parquet processed_dataset/dataset_1/
!mv processed_dataset/*.png processed_dataset/dataset_1/
!mv processed_dataset/*.json processed_dataset/dataset_1/

In [None]:
# Convert an additional dataset for the example.
source_directory = Path("raw_datasets/dataset_2")
output_directory = Path("processed_dataset/dataset_2")
data: pd.DataFrame = pd.read_csv(source_directory / "data.csv")

convert_colorizer_data(
    data,
    output_directory,
    source_dir=source_directory,
    object_id_column="segmentation_id",
    times_column="time",
    track_column="track_id",
    image_column="segmentation_path",
    centroid_x_column="centroid_x",
    centroid_y_column="centroid_y",
    feature_column_names=feature_column_names,
    feature_info=feature_info,
    metadata=metadata
)

Collections are represented with a JSON file containing a list of paths to datasets and their display names. The `update_collection` utility can be used to create or update a collection file.

Right now, our datasets are in subdirectories within the `processed_dataset` directory. We'll place our collection file at the top level of the `processed_dataset` directory, alongside our datasets. The collection file will store relative paths to our two datasets.

The final directory structure should look like this:
```txt
📁 processed_dataset
├── 📄 collection.json
├── 📁 dataset_1
|   └── ...
└── 📁 dataset_2
    └── ...
```

In [None]:
from colorizer_data import update_collection, CollectionMetadata

# Like with datasets, collections can also include optional metadata.
collection_metadata = CollectionMetadata(
    name="Example collection",
    description="An example collection of datasets for the Timelapse Feature Explorer!",
    author="Jane Doe et al.",
    collection_version="v1.0",
)

# Create the collection file and add our two datasets.
# update_collection(collection_path, display name, relative_dataset_path, optional metadata)
update_collection(
    "processed_dataset/collection.json", "Dataset 1", "dataset_1/manifest.json", metadata=collection_metadata
)
update_collection(
    "processed_dataset/collection.json", "Dataset 2", "dataset_2/manifest.json", metadata=collection_metadata
)

## 5. Viewing the dataset

![The loaded dataset in Timelapse Feature Explorer. Five bubbles appear in the main viewport in various shades of purple and blue. The selected feature is Area, in units of pixels squared.](./assets/loaded-dataset.png)

Now that the dataset is processed, we can view it in the Timelapse Feature Explorer!

There are several ways to load datasets into the viewer:

1. Load via CLI
2. Load from a zip file
3. Load via the web (HTTPS)

We recommend hosting your datasets on an HTTPS-accessible server (option 3) whenever possible, because it makes sharing and collaboration simpler. We'll include instructions for all three methods below.

### 5.1 Open locally via CLI

We've provided a simple command-line script to open a local dataset in TFE, `tfe-open`. It launches a local instance of the viewer and serves the dataset over HTTP.

> ⚠ The `tfe-open` script will not work in **Google Colab**, so this step can only be run on a local machine.

In [None]:
# If the project has been installed (see README instructions, `pip install -e '.[dev]'`),
# you can run `tfe-open` directly from the command line:
# !tfe-open ./processed_dataset

# For this example, we'll run the tool directly from the Python file.
# Make sure Jupyter Lab is running from the root of the repository directory.
!python ../../colorizer_data/bin/tfe_open.py ./processed_dataset

The viewer should open in a new browser tab with the dataset loaded. 

To stop the server, return to the terminal and press `Ctrl+C` or close the
terminal window. (On Jupyter, press the stop button on the code block.)

### 5.2 Load zip file (for Google Colab)

If running on Google Colab, you can load a dataset from a `.zip` file containing the processed dataset files.

1. Create and download the zip file of the processed dataset:

In [None]:
# GOOGLE COLAB:
!zip -r processed_dataset.zip processed_dataset

from google.colab import files
files.download("processed_dataset.zip")

# If the download doesn't start (can be browser-dependent), you can manually
# navigate to and download the zip file from the file browser on the left side
# of the screen. It will be in
# `colorizer_data/documentation/getting_started_guide/`.

In [None]:
# IF LOCAL: LINUX + MACOS
!zip -r processed_dataset.zip processed_dataset

In [None]:
# IF LOCAL: WINDOWS
# On Windows, the most reliable way to get a valid zip is to manually locate
# and right click to compress the dataset folder to a zip file. `tar.exe` may
# not be present on older versions of Windows.
!C:/Windows/System32/tar.exe -caf processed_dataset.zip processed_dataset


2. Open Timelapse Feature Explorer: [https://timelapse.allencell.org](https://timelapse.allencell.org).
3. Click **Load** in the top-right corner, then select **Load .zip file**.
4. Select the zip file.

You should now be able to view and explore your dataset!

### 5.3 Viewing datasets via the web (HTTPS)

Timelapse Feature Explorer can load datasets over the web using any URL starting with `https://`. This is the recommended way to interact with datasets, because it makes sharing and collaboration simpler.

There are several options for hosting files online. A few include:
- [Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html)
- [Google Cloud Storage bucket](https://cloud.google.com/storage?hl=en)
- GitHub repository (not recommended as GitHub may impose rate limits)

For this example, you can load a dataset from our GitHub Repository:

1. Open Timelapse Feature Explorer at [https://timelapse.allencell.org](https://timelapse.allencell.org).
2. Click the **Load** button in the header.
3. Paste in the following URL: `https://raw.githubusercontent.com/allen-cell-animated/colorizer-data/main/documentation/getting_started_guide/example/processed_dataset/`
4. Click **Load**.

The viewer should appear with the dataset loaded!

For HTTPS-hosted datasets, you can share your current view in TFE by clicking the **Share** button in the top right corner to get a shareable link. This will allow anyone else with access to the dataset URL

## 6. What's next?

### Advanced conversion

[`convert_colorizer_data`](https://github.com/allen-cell-animated/colorizer-data/blob/e3fb0520a823f0c15396937c8b385f6de7798401/colorizer_data/converter.py#L390) includes detailed documentation for additional configuration options and parameters.

This includes support for:
- **Outliers**, objects that should have unique visualization (e.g. dead cells)
- **Backdrop images**, comparison images (like fluorescence channels) to display with segmentations
- **3D segmentation data**, using OME-Zarr for fast volumetric visualization.

For example, for 3D data, you can specify a 3D frame source like this:

```python
from colorizer_data import convert_colorizer_data
from colorizer_data.types import Frames3dMetadata

frames_3d = Frames3dMetadata(
    # Can be a relative path or a URL
    source="https://some-bucket.com/image.ome.zarr",
    segmentation_channel=0,
    total_frames=50,
)

convert_colorizer_data(
    ...
    frames_3d=frames_3d,
    ...
)
```

If you need more control over the dataset conversion process, we include several examples of advanced usage in the [`bin/example_scripts` directory](https://github.com/allen-cell-animated/colorizer-data/tree/main/documentation/bin/example_scripts).

### Installing Timelapse Feature Explorer locally
The pre-built version of TFE may not be up to date with the latest features and bugfixes. If you want to run your own local instance of TFE, you can clone the repository yourself and run it locally.

The repository for TFE and instructions on installation can be found at [https://github.com/allen-cell-animated/timelapse-colorizer](https://github.com/allen-cell-animated/timelapse-colorizer).
