# **Part A: Preprocessing of imaging data**  
<i>**Latest update</i> - Jan 2025**  

#### **Authors:**  
[Thomas O'Neil](https://github.com/DrThomasOneil) (thomas.oneil@sydney.edu.au) | [Oscar Dong](https://github.com/Awesomenous) (oscardong4@gmail.com) | [Heeva Baharlou](heeva.baharlou@sydney.edu.com)  

##### The purpose of this notebook is to provide a consolidated approach to IMC analysis and forms the prerequisite steps to the IMComplete R package workflow. We focused 

Nature Method of the Year in 2024 was [**spatial proteomics**](https://www.nature.com/articles/s41592-024-02565-3). 

> Computational tools for spatial proteomics are the focus of the second Comment, from Yuval Bussi and Leeat Keren. These authors note that current image processing and analysis workflow are **well defined but fragmented**, with various steps happening back to back **rather than in an integrated fashion**. They envision a future for the field where **image processing and analysis steps work in concert** for improved biological discovery.

In alignment to these comments, we have committed to provide a comprehensive and dynamic workflow. In part, we aimed to achieve this by compiling as much as we could into this pre-processing workflow. 

Particularly, we have emphasised tools that can be performed in <strong>*one*</strong> workflow. For example, we introduce here `PyProfiler`, a tool that performs the same functions as Cell Profiler, allowing users to not leave this linear workflow and install additional applications.

<hr>

Some scripts adapted from [BodenmillerGroup/ImcSegmentationPipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline) & [PENGLU-WashU/IMC_Denoise](https://github.com/PENGLU-WashU/IMC_Denoise) 

<i>**Therefore, make sure to also reference these studies:**</i>  
- Windhager, J., Zanotelli, V.R.T., Schulz, D. et al. An end-to-end workflow for multiplexed image processing and analysis. [Nat Protoc](https://doi.org/10.1038/s41596-023-00881-0) (2023).  
- Lu P, Oetjen K, Bender D, et al. IMC-Denoise: a content aware pipeline to enhance Imaging Mass Cytometry. [Nature Communications](https://www.nature.com/articles/s41467-023-37123-6), 14(1), 1601, 2023.  

<br>
<hr>

##### Planned future additions:  
- Simple compartmentalisation in python widget

<br>
<hr>

## Folder structure

```text
ImagingAnalysis/ (root directory)
├── IMComplete-Workflow
├── ImcSegmentationPipeline
├── Experiment_name_1
│     └── raw
│            └── Sample1.zip
│            └── Sample2.zip
│            └── ...
│     └── analysis
│            └── 1_image_out
│            └── 2_cleaned
│            └── 3_segmentation
│                   └── 3a_cellpose_crop
│                   └── 3b_cellpose_full
│                   └── 3c_cellpose_mask
│                   └── 3d_compartments
│            └── 4_pyprofiler_output
│     └── panel.csv
├── ...
├── Experiment_name_
```
<br>
<hr> 

## Workflow

1. Set up (`CheckSetup()`) ✅ 

2. Create a new project (`NewProject()`) ✅ 

3. Prep the raw folder and `panel.csv` ✅ 

4. Extract images from the raw folder (`ExtractImages()`) ✅ 

- *Optional 1:* Check filter parameters of IF data ✅ 

- *Optional 2:* Filter images (`FilterImages()`) ✅ 

- *Optional 3:* Select crop regions for segmentation training (`CropSelector()`) ✅ 

5. Prepare the images for Segmentation model training (`PrepCellpose()`) ✅ 

- *Optional 4:* Register low-resolution images with high-resolution images to improve cell segmentation  ✅ 

6. Train a segmentation model (`cellpose`) ✅ 

- *Optional 5:* You have the option to not train a segmentation model and use a generic model.  ✅ 

7. Batch segment the images and generate cell masks (`BatchSegment()`)

- *Optional 6 <strong>in development</strong>:* Generate masks for compartments or distance metrics. (*Currently in ImageJ/Qupath* - want to add simple `add_compartment_mask()` or `add_threshold_mask()`) 

8. Extract data from your images using the cell segment masks (`PyProfiler()`)

<hr><hr>


# 1. Set up

Anaconda is a program needed to run many steps of the workflow, primarily during setup. Follow the steps below to set up Anaconda and a `conda` environment:

Install [**Anaconda** ](https://www.anaconda.com/download) and navigate to the relevant command line interface:
<br>
<div align="left">

| Windows                                                                                            | macOS                                                                                                      |
|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| 1. Search for **'Anaconda Prompt'** in the taskbar search <br> 2. Select **Anaconda Prompt**  <br> | 1. Use `cmd + space` to open Spotlight Search  <br> 2. Type **'Terminal'** and press `return` to open <br> |

</div>
<br>

<hr><hr>

### *Using Anaconda...*

#### **Step 1:** Set your directory to the analysis folder (or the `root directory` for image analysis)

```bash
cd /Desktop/ImageAnalysis
```
<hr>

#### **Step 2:** Clone the IMComplete repository.

<storng>*From Github*</strong>  
Go to the [Github page](https://github.com/CVR-MucosalImmunology/IMComplete-Workflow) and near the top click the `code` button and download the zip. Unzip the folder into the `root` directory. This will contain the IMComplete-Workflow documents and allow ready access to the necessary files.

</strong>*Using Git*</strong> in command line

<details><summary>Install Git</summary>

Git needs to be installed on your system. Find the instructions [here](https://git-scm.com/downloads)

<hr></details>

```bash
git clone --recursive https://github.com/CVR-MucosalImmunology/IMComplete-Workflow.git
``` 
<hr>

#### **Step 3:** Clone the extra repositories: 

- [BodenmillerGroup/ImcSegmentationPipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline): Windhager, J., Zanotelli, V.R.T., Schulz, D. et al. An end-to-end workflow for multiplexed image processing and analysis. [Nat Protoc](https://doi.org/10.1038/s41596-023-00881-0) (2023).  

```bash
git clone --recursive https://github.com/BodenmillerGroup/ImcSegmentationPipeline.git
```
<!---  
- [deMirandaLab/PENGUIN](https://github.com/deMirandaLab/PENGUIN): Sequeira, A. M., Ijsselsteijn, M. E., Rocha, M., & de Miranda, N. F. (2024). PENGUIN: A rapid and efficient image preprocessing tool for multiplexed spatial proteomics. [Computational and Structural Biotechnology Journal](https://doi.org/10.1101/2024.07.01.601513)
```bash
git clone --recursive https://github.com/deMirandaLab/PENGUIN.git
```
<--->

<hr>

#### **Step 4:** Create a conda environment and install some  packages (in one line)

```bash
conda env create -f IMComplete-Workflow/environment.yml
```

*This can take some time so be patient!*

<hr>

#### **Step 5:** Activate the newly created conda environment

```bash
conda activate IMComplete
```

<hr>

#### **Step 6:** Activate and ensure your GPU-acceleration

Unfortunately, parts of this workflow will require GPU-acceleration: Cell segmentation, Denoise, PyProfiler (will run quicker, but not necessary).

You will need to install Pytorch and pytorch-cuda versions that are suitable for your PC. Instructions are found [here](https://pytorch.org/get-started/previous-versions/). The code will look like this:

```bash
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia
```

<hr>

#### **Step 7:** Select the IMComplete kernel in your IDE

If you are using VSCode, you'll see this option in the top right of the window. 

If you are using a jupyter notebook, you will see this...<span style="color:white; background:red">[TO ADD]</span>

<hr><hr>



`Function: CheckSetup()`

You can check the installation requirements with the following function:

<details><summary>Information</summary>

```bash

CheckSetup(
    torch=1
    )

```

================================================================

**Arguments:**  
- `torch`: Default is `1` which simply checks that GPU is installed and ready. This can be turned off if you're using a Mac and/or aware that GPU is not properly setup.

================================================================

**Expected Outputs:**

```text
Checking required packages in the current Conda environment...

  All required packages are installed and meet the required versions.

-----------------

Checking that CUDA has been installed properly...

  GPU acceleration has not been prepared. Consult https://pytorch.org/get-started/previous-versions/
  and try again
```

================================================================

**Packages:**   
- pkg_resources

================================================================


</details>

In [None]:
import pkg_resources

def CheckSetup(torch=1):
    """
    Checks for required Python packages and verifies CUDA installation.

    Returns:
        dict: A dictionary of missing or insufficient packages with their required versions.
    """
    print("Checking required packages in the current Conda environment...\n")
    
    required_packages = {

        "imcsegpipe": "1.0.0",
        "pymcomplete":"",
        "readimc": "0.8.0",
        "pip": "",
        "numpy": "",
        "jupyter" : "",
        "jupyterlab": "",
        "jupytext": "",
        "cellpose" : "",
        "pyqtgraph": "",
        "numba": "",
        "scipy": "",
        "natsort": "",
        "tifffile": "2024.8.10",
        "brotlipy": "",
        "matplotlib": "",
        "pandas": "", 
        "panel": "", 
        "opencv-python": "", 
        "scikit-image": "",
        "ipywidgets" : "",
        "ipykernel" : "",
        "ipympl": "",
        "plotly": "",
        "ttkbootstrap":"", 
        "PyQt5":"",
    }

    missing_packages = {}

    for package, version in required_packages.items():
        try:
            installed_version = pkg_resources.get_distribution(package).version
            if version and pkg_resources.parse_version(installed_version) < pkg_resources.parse_version(version):
                missing_packages[package] = version
        except pkg_resources.DistributionNotFound:
            missing_packages[package] = version

    if missing_packages:
        print("The following packages are missing or have insufficient versions:")
        for package, version in missing_packages.items():
            if version:
                print(f" - {package} (required version: {version})")
            else:
                print(f" - {package} (no version specified)")
    else:
        print("  All required packages are installed and meet the required versions.")

    if torch:
        import torch
        print("\n-----------------\n\nChecking that CUDA has been installed properly...\n")
        if torch.cuda.is_available():
            print("  GPU acceleration via CUDA is available")
        else:
            print("  GPU acceleration has not been prepared. Consult https://pytorch.org/get-started/previous-versions/\nand try again")



In [None]:
from PyMComplete import CheckSetup

CheckSetup()

<hr><hr>

# 2. Set up a new Project for Imaging Analysis

The following function will create the folder structure for this workflow and generate a template `panel.csv` and `image.csv`.

Set `rootdir` as your **ImageAnalysis** folder directory and `projdir` as your **project** folder name.

**Important**: These need to established each time you open this workflow, as all subsequent folders will rely on these values.

In [1]:
rootdir = "/Users/thomasoneil/Desktop/test_IF"
projdir = "LizIMCIFReg"

`Function: NewProject()`

The workflow is designed to utilize both a `rootdir` and a `projdir` for better organization and efficiency.

- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

By structuring the directories this way, users with multiple projects can benefit from a consistent workflow. They only need to install the repositories once in the `rootdir`, and all projects can access these resources without duplication. This approach eliminates the need to repeatedly refer to or duplicate distant folders, streamlining the workflow and ensuring all relevant files are easily accessible.

<details><summary>Information</summary>

```bash
NewProject(
    rootdir=rootdir, 
    projdir=projdir
    )
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

================================================================

**Expected Outputs:**

```text
Project '2025_ProjectName' created successfully.
```

================================================================

**Packages:**   
- os
- csv

================================================================


</details>

In [None]:
import os
import csv

def NewProject(rootdir, projdir):
    """
    Creates a structured project folder with the given root directory and project name.

    Args:
        rootdir (str): The root directory where the project will be created.
        project_name (str): The name of the new project folder.
    """
    if os.path.isdir(os.path.join(rootdir,projdir)):
        print("Directory does not exist") 
        return

    # Define all required subdirectories
    acquisitions_dir = os.path.join(rootdir, projdir, "analysis/1_image_out")
    cleaned_dir = os.path.join(rootdir, projdir, "analysis/2_cleaned")
    segment_fold_dir = os.path.join(rootdir, projdir, "analysis/3_segmentation")
    crop_output = os.path.join(segment_fold_dir, "3a_cellpose_crop")
    im_output = os.path.join(segment_fold_dir, "3b_cellpose_full")
    mask_dir = os.path.join(segment_fold_dir, "3c_cellpose_mask")
    compart = os.path.join(segment_fold_dir, "3d_compartments")
    pyprof_out = os.path.join(rootdir, projdir, "analysis/4_cellprofiler_output")
    R_out = os.path.join(rootdir, projdir, "analysis/5_R_analysis")
    #meta_out = os.path.join(rootdir, projdir, ".meta")

    # Create directories
    os.makedirs(os.path.join(rootdir, projdir), exist_ok=True)
    os.makedirs(os.path.join(rootdir, projdir, "raw"), exist_ok=True)
    os.makedirs(os.path.join(rootdir, projdir, "analysis"), exist_ok=True)
    os.makedirs(acquisitions_dir, exist_ok=True)
    os.makedirs(cleaned_dir, exist_ok=True)
    os.makedirs(segment_fold_dir, exist_ok=True)
    os.makedirs(output_dir, exist_ok=True)
    os.makedirs(segment_dir, exist_ok=True)
    os.makedirs(crop_output, exist_ok=True)
    os.makedirs(im_output, exist_ok=True)
    os.makedirs(mask_dir, exist_ok=True)
    os.makedirs(compart, exist_ok=True)
    os.makedirs(pyprof_out, exist_ok=True)
    os.makedirs(R_out, exist_ok=True)

    print(f"Project '{projdir}' created successfully.")

    # Data to be written to the CSV file
    panel = [
        ["Conjugate", "Target", "Full", "Segment"]
    ]

    # Specify the file name
    filename = os.path.join(rootdir, projdir, "panel.csv")

    # Writing to the CSV file
    with open(filename, mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(panel)

In [None]:
from PyMComplete import NewProject

NewProject(
    rootdir=rootdir, 
    projdir=projdir)

<hr><hr>

# 3. Set up your `raw` folder 

### **For IMC data**

Your IMC data should be zipped in a specific format. Our IMC image extraction utilises the imcsegpipe package developed by **Windhager et al. 2023**. Therefore, we opted to match the file format suggested for that workflow. The following was taken from the **Bodenmiller** [**Preprocessing instructions**.](https://bodenmillergroup.github.io/ImcSegmentationPipeline/prepro.html)

<span style="color:grey">*The Hyperion Imaging System produces vendor controlled .mcd and .txt files in the following folder structure:*</span>

```text
Sample1.zip
├── {XYZ}_ROI_001_1.txt
├── {XYZ}_ROI_002_2.txt
├── {XYZ}_ROI_003_3.txt
├── {XYZ}.mcd
...
```
<span style="color:grey">*where `XYZ` defines the filename, `ROI_001`, `ROI_002`, `ROI_003` are names (description) for the selected regions of interest (ROI) and `1`, `2`, `3` indicate the acquistion identifiers. The ROI description entry can be specified in the Fluidigm software when selecting ROIs. The `.mcd` file contains the raw imaging data of all acquired ROIs while each `.txt` file contains data of a single ROI. To enforce a consistent naming scheme and to bundle all metadata, we recommend to zip the folder and specify the location of all `.zip` files for preprocessing. Each `.zip` file should only contain data from a single `.mcd` file and the name of the `.zip` file should match the name of the `.mcd` file.*</span>

<suoerscript>**Citation:**   
*Windhager, J., Zanotelli, V.R.T., Schulz, D. et al. An end-to-end workflow for multiplexed image processing and analysis. Nat Protoc (2023). https://doi.org/10.1038/s41596-023-00881-0*</superscript>

### **For IF data**

Images *currently* need be a `.tiff` image as a stack inside a folder of the same name. 

```text
Image1/
├── Image1.tiff
Image2/
├── Image2.tiff
...
```

In the future, we will try to adjust the requirements to match the general output
<hr><hr>

# Edit the panel

The panel.csv file in the projdir needs to match the stack order of your images, including empty channels

There are *currently* four columns in the default panel.csv:

- **Conjugate**: The metal tag or fluorophore.

- **Target**: The name of the target antibody. Try to avoid using a numeric at the start of the name. 

- **Full**: Should be filled with a `1` if this channel is an image you want to extract. 

- **Segment**: Should be filled with a `1` if this includes a marker for Segmentation. 

You can add as many additional columns as you like that may aide your analysis further down.


<hr><hr>

# 4. Extract images from the raw folder

This is a simple step that extracts the image data from the raw folders and saves them as stacked `.tiff` files in `analysis/1_image_out`.

For **IMC**, the Bodenmiller `extract_mcd_file()` function will convert the `.mcd` and `.txt` files to separate `.tiff` stacks. 

For **IF**, simply the images are moved. Currently, this function works such that each IF stack requires its own folder. 

`Function: ExtractImages()`

This function extracts images if they're IMC, or copies them if they're immunfluorescent or in .tiff format, and deposits them in an `extract_dir`. 

<details><summary>Information</summary>

```bash
ExtractImages(
    rootdir = rootdir,
    projdir = projdir,
    format = "if",
    rawimage_dir = "raw",
    extract_dir = "analysis/1_image_out")
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `format`: This argument strictly takes `if` or `imc` and processes the images accordingly. 

- `rawimage_dir`: This argument specifies where the raw images are stored. The default is **"raw"**.

- `extract_dir`: This argument specifies where the raw images are deposited. The default is **"analysis/1_image_out"**.

================================================================

**Expected Outputs:**

```text
Extracting Immunofluorescent Images...
Done!
```
```text
Extracting IMC images using Bodenmiller's extract_zip_file function ...
Done!
```

================================================================

**Packages:**   
- pathlib
- tempfile
- imcsegpipe

================================================================

</details>

In [None]:
from pathlib import Path
import pandas as pd
from skimage import io

def ExtractImages(rootdir:str,
                   projdir:str,
                   format:str,
                   rawimage_dir = "raw",
                   extract_dir = "analysis/1_image_out"):
    
    project_path = Path(rootdir) / projdir
    images_dir = project_path / extract_dir
    raw = project_path / rawimage_dir
    
    if format == "if":
        print("Extracting Immunofluorescent Images...\n")
        for sample_dir in raw.iterdir():
            if not sample_dir.is_dir() or sample_dir.name.startswith("."):
                continue

            # Create subfolder in analysis/1_image_out
            acquisition_subdir = images_dir / sample_dir.name
            acquisition_subdir.mkdir(parents=True, exist_ok=True)

            # Final stacked TIFF path
            out_tiff_path = acquisition_subdir / f"{sample_dir.name}.tiff"

            # We expect exactly one TIF in the folder
            tif_files = list(sample_dir.glob("*.tif*"))
            if len(tif_files) != 1:
                raise ValueError(
                    f"Expected exactly 1 TIF in '{sample_dir.name}', found {len(tif_files)}."
                )
            single_tif = tif_files[0]

            # Read the stack
            image = io.imread(str(single_tif))

            # Validate that the stack depth == number of rows in panel.csv
            if image.shape[0] != len(panel):
                raise ValueError(
                    f"Panel length is {len(panel)} but found `{image.shape[0]}` channels"
                    f" in '{sample_dir.name}'."
                )

            # Save the original stack (unprocessed or raw)
            io.imsave(str(out_tiff_path), image)
        print("Done!\n")

    if format == "imc":
        print("Extracting IMC images using Bodenmiller's extract_zip_file function ...\n")

        from tempfile import TemporaryDirectory
        import imcsegpipe
                
        temp_dirs = []
        try:
            for raw_dir in [raw]:
                zip_files = list(raw_dir.rglob("**/*.zip"))
                if len(zip_files) > 0:
                    temp_dir = TemporaryDirectory()
                    temp_dirs.append(temp_dir)
                    for zip_file in sorted(zip_files):
                        imcsegpipe.extract_zip_file(zip_file, temp_dir.name)
            for raw_dir in [raw] + [Path(temp_dir.name) for temp_dir in temp_dirs]:
                mcd_files = list(raw_dir.rglob("*.mcd"))
                mcd_files = [i for i in mcd_files if not i.stem.startswith('.')]
                if len(mcd_files) > 0:
                    txt_files = list(raw_dir.rglob("*.txt"))
                    txt_files = [i for i in txt_files if not i.stem.startswith('.')]
                    matched_txt_files = imcsegpipe.match_txt_files(mcd_files, txt_files)
                    for mcd_file in mcd_files:
                        imcsegpipe.extract_mcd_file(
                            mcd_file,
                            images_dir / mcd_file.stem,
                            txt_files=matched_txt_files[mcd_file]
                        )
        finally:
            for temp_dir in temp_dirs:
                temp_dir.cleanup()
            del temp_dirs

        print("Done!")

    # Create image.csv
    image_data = []
    for subdir in images_dir.iterdir():
        if subdir.is_dir():
            for tiff_file in subdir.glob("*.tiff"):
                image_data.append({
                    "Image": tiff_file.stem,
                    "ImShort": "",  # Example short name, adjust as needed
                    "ROI": "",  # Placeholder, adjust as needed
                    "ImageID": "",  # Example ID, adjust as needed
                    "DonorID": "",  # Placeholder, adjust as needed
                    "Condition": "",  # Placeholder, adjust as needed
                    "Crop": ""  # Placeholder, adjust as needed
                })

    image_df = pd.DataFrame(image_data)
    image_csv_path = images_dir / "image.csv"
    image_df.to_csv(image_csv_path, index=False)
    print(f"image.csv created at {image_csv_path}")

In [None]:
from PyMComplete imoprt ExtractImages

ExtractImages(
    rootdir = rootdir,
    projdir = projdir,
    format = "if") # or "imc"

<hr><hr>

## Optional steps

### Optional 1: Check the filter parameters for IF images

`Function: CheckExtract()`

This function lets you check the filter parameters for your IF images. 

<details><summary>Information</summary>

```bash
CheckExtract(
    rootdir = rootdir,
    projdir = projdir,
    extract_dir ="analysis/1_image_out",
    panel_path = "panel.csv",
    crop=None
)
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `extract_dir`: This argument specifies where the images are. The default is **"analysis/1_image_out"**.

- `panel_path`: This argument points to the panel.csv. The function will use the names in the `Target` column for the drop down options. 

- `crop`: Specifies the size of a random cropped area. The value correlates to the square size in pixels. When Update is clicked, a new crop is randomly shown. Default is None. 

================================================================

**Expected Outputs:**

A Jupyter widget that allows you to visualise how hotpixel and guassian blur values will affect your images. 

================================================================

**Packages:**   
- pathlib
- numpy
- pandas
- tifffile
- matplotlib
- ipywidgets
- IPython.display
- skimage
- skiimage.filters
- scipy.ndimage

================================================================

</details>

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt

import ipywidgets as widgets
from IPython.display import display, clear_output

from skimage import img_as_float
from skimage.filters import gaussian
from scipy.ndimage import uniform_filter


def CheckExtract(
    rootdir:str,
    projdir:str,
    extract_dir ="analysis/1_image_out",
    panel_path = "panel.csv",
    crop=None
):
    def remove_hotpixels_threshold(img, threshold=5.0, neighborhood_size=3):
        """
        Replace 'hot' pixels that are above (threshold * local_mean) with that local mean.
        """
        img_float = img_as_float(img)
        local_mean = uniform_filter(img_float, size=neighborhood_size)
        hot_mask = img_float > (threshold * local_mean)
        cleaned_img = img_float.copy()
        cleaned_img[hot_mask] = local_mean[hot_mask]
        return cleaned_img

    def apply_gaussian_blur(img, sigma=1.0):
        """
        Applies a Gaussian blur with a given sigma.
        """
        # preserve_range=True ensures we keep original intensity scale
        blurred = gaussian(img, sigma=sigma, preserve_range=True)
        return blurred

    def process_channel(img, hp_threshold=None, hp_neighborhood=3, gauss_sigma=None):
        """
        Given a single 2D channel image:
        - If hp_threshold is not None, remove hotpixels.
        - If gauss_sigma is not None, apply Gaussian blur.
        Returns the processed image.
        """
        processed = img.copy()

        if hp_threshold is not None:
            processed = remove_hotpixels_threshold(
                processed,
                threshold=hp_threshold,
                neighborhood_size=hp_neighborhood
            )
        if gauss_sigma is not None:
            processed = apply_gaussian_blur(processed, sigma=gauss_sigma)

        return processed

    def random_crop_2D(img, crop_size=200):
        """
        Returns a random crop of shape (crop_size, crop_size) from a 2D image.
        """
        h, w = img.shape
        if crop_size > h or crop_size > w:
            raise ValueError(f"Crop size {crop_size}×{crop_size} is larger than image {h}×{w}.")
        # Random top-left corner
        y = np.random.randint(0, h - crop_size + 1)
        x = np.random.randint(0, w - crop_size + 1)
        return img[y:y+crop_size, x:x+crop_size]

    # --------------------------------------------------------------------------
    # 1. Locate directories and panel
    # --------------------------------------------------------------------------
    project_path = Path(rootdir) / projdir 
    image_dir = project_path / extract_dir
    
    panel_path = project_path / panel_path

    if not os.path.exists(panel_path):
        raise FileNotFoundError(f"Panel file not found: {panel_path}")

    panel = pd.read_csv(panel_path)
    if "Target" not in panel.columns:
        raise ValueError("panel.csv must contain a 'Target' column for channel names.")

    # We'll use the "Target" column as channel names
    channel_names = panel["Target"].tolist()
    num_channels = len(channel_names)

    # Gather subfolders in image_dir (these are our 'images')
    subfolders = [
        d for d in os.listdir(image_dir)
        if os.path.isdir(os.path.join(image_dir, d)) and not d.startswith('.')
    ]

    if not subfolders:
        raise ValueError(f"No subfolders found in {image_dir}.")

    image_dropdown = widgets.Dropdown(
        options=subfolders,
        description='Image:',
        layout=widgets.Layout(width='200px')
    )

    # Dropdown to pick which channel (by name in panel)
    channel_dropdown = widgets.Dropdown(
        options=channel_names,
        description='Channel:',
        layout=widgets.Layout(width='200px')
    )

    hp_threshold_text = widgets.FloatText(
        value=5.0,
        description='HP Thresh:',
        disabled=False,
        layout=widgets.Layout(width='160px')
    )

    hp_neighborhood_text = widgets.IntText(
        value=3,
        description='HP Neigh:',
        disabled=False,
        layout=widgets.Layout(width='160px')
    )
    
    gauss_sigma_text = widgets.FloatText(
        value=1.0,
        description='Gauss σ:',
        disabled=False,
        layout=widgets.Layout(width='160px')
    )

    # Contrast slider: we pick a default range [0, 1]
    # The user can drag the handles to set vmin/vmax for display.
    contrast_slider = widgets.FloatRangeSlider(
        value=[0.0, 1.0],
        min=0.0,
        max=1.0,
        step=0.01,
        description='Contrast',
        layout=widgets.Layout(width='300px')
    )

    update_button = widgets.Button(
        description="Update",
        button_style='success'
    )

    output_display = widgets.Output()

    # --------------------------------------------------------------------------
    # 3. Define the callback for "Update" button
    # --------------------------------------------------------------------------
    def on_update_clicked(b):
        with output_display:
            clear_output(wait=True)
            
            selected_image_name = image_dropdown.value
            selected_channel_name = channel_dropdown.value
            try:
                channel_idx = channel_names.index(selected_channel_name)
            except ValueError:
                print(f"Channel '{selected_channel_name}' not found in panel.")
                return

            hp_thresh_val = hp_threshold_text.value
            hp_neigh_val = hp_neighborhood_text.value
            gauss_val = gauss_sigma_text.value
            vmin, vmax = contrast_slider.value

            # Build the path to the TIF file (expected one TIF in the subfolder)
            subfolder_path = os.path.join(image_dir, selected_image_name)
            tif_files = [
                f for f in os.listdir(subfolder_path)
                if f.lower().endswith('.tif') or f.lower().endswith('.tiff')
            ]
            if len(tif_files) != 1:
                print(f"Warning: expected 1 TIF in {subfolder_path}, found {len(tif_files)}.")
                return

            tif_path = os.path.join(subfolder_path, tif_files[0])

            # Read the stack
            stack = tiff.imread(tif_path)
            
            # Check shape
            if stack.shape[0] != num_channels:
                print(
                    f"Warning: panel has {num_channels} channels, "
                    f"but TIF has {stack.shape[0]} channels."
                )
            
            # Extract the channel of interest
            raw_channel = stack[channel_idx, :, :]

            # Optionally crop
            channel_for_processing = raw_channel
            if crop is not None:
                try:
                    channel_for_processing = random_crop_2D(raw_channel, crop_size=crop)
                except ValueError as e:
                    print(str(e))
                    return
            
            # Process the channel
            processed_channel = process_channel(
                channel_for_processing,
                hp_threshold=hp_thresh_val,
                hp_neighborhood=hp_neigh_val,
                gauss_sigma=gauss_val
            )

            # ------------------------------------------------------------------
            # Normalize each image to [0,1] for display
            # This lets vmin/vmax in [0,1] do what we expect on the slider
            # ------------------------------------------------------------------
            def safe_normalize(img):
                m = img.max()
                if m < 1e-12:
                    # Avoid divide-by-zero
                    return np.zeros_like(img, dtype=np.float32)
                return img.astype(np.float32) / m

            raw_norm = safe_normalize(channel_for_processing)
            proc_norm = safe_normalize(processed_channel)

            fig, axes = plt.subplots(1, 2, figsize=(10, 5))
            # Raw
            im0 = axes[0].imshow(raw_norm, cmap='gray', vmin=vmin, vmax=vmax)
            axes[0].set_title(f'Raw [{selected_image_name}] - {selected_channel_name}')
            axes[0].axis('off')

            # Processed
            im1 = axes[1].imshow(proc_norm, cmap='gray', vmin=vmin, vmax=vmax)
            axes[1].set_title('Processed')
            axes[1].axis('off')

            plt.tight_layout()
            plt.show()

    update_button.on_click(on_update_clicked)

    # --------------------------------------------------------------------------
    # 4. Layout the UI
    # --------------------------------------------------------------------------
    ui = widgets.VBox([
        widgets.HBox([image_dropdown, channel_dropdown]),
        widgets.HBox([hp_threshold_text, hp_neighborhood_text, gauss_sigma_text]),
        contrast_slider,
        update_button,
        output_display
    ])
    
    display(ui)

    # Trigger an initial update to see something from the start
    on_update_clicked(None)


In [None]:
from PyMComplete import CheckExtract

CheckExtract(
    rootdir = rootdir,
    projdir =projdir
    )


<hr>

### Optional 2: Filter images

This optional step allows you to filter your images by:

- **Hot pixel filtering** (IF): Choose a threshold factor and a neighbourhood size in pixels. For each pixel, the value of a pixel is replaced by the average value of the neighbourhood if that pixel exceeds the threshold factor. This value can be tested using `CheckExtract()`.

- **Gauss_blur** (IF): Choose a sigma value to also apply a gaussian blur, which if chosen correctly, can sometimes improve the contrast between the border of a cell and the background. This value can be tested using `CheckExtract()`.

- **hpf** (IMC): Is the hotpixel filter applied to IMC data via the Bodenmiller function `create_analysis_stacks()`. The default value is 50. 

`Function: FilterImages()`

<details><summary>Information</summary>

```bash
FilterImages(
    rootdir = rootdir,
    projdir = projdir,
    panel_filename="panel.csv",
    fullstack = True, 
    format = "if", 
    hotpixel={"threshold":3,"neighbourhood":5}, 
    gauss_blur=1, 
    hpf=50,
    extract_dir = "analysis/1_image_out", 
    clean_dir = "analysis/2_cleaned")
```

================================================================

**Arguments:**  

- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.    

- `projdir`: This directory is specific to individual projects.     

- `panel_path`: This argument points to the `panel.csv`. The function will use the names in the `Target` column for the drop down options.    

- `fullstack`: A logical value to choose whether to deposit all channels (`False`), or just the images that match `Full = 1` in `panel.csv` (`True`).

- `format`: This argument strictly takes `if` or `imc` and processes the images accordingly.    

- `hotpixel`: *If format == "if"*. Taken as two values, first the threshold factor and second the neighbourhood value. For each pixel, the value of a pixel is replaced by the average value of the neighbourhood if that pixel exceeds the threshold factor.    

- `gauss_blur`: *If format == "if"*. The sigma value to apply to the images for blurring. Default is 1. 

- `hpf`: *If format == "imc"*. The hotpixel factor for IMC image filtering. 

- `extract_dir`: This argument specifies where the images are. The default is **"analysis/1_image_out"**.

- `clean_dir`: Output directory. Default is **analysis/2_cleaned**

================================================================

**Expected Outputs:**

Images will be deposited into the specified clean_dir with the suffix *_cleaned*

================================================================

**Packages:**   
- pathlib
- pandas
- skimage
- scipy.ndimage
- skimage.filters
- imcsegpipe
- imcsegpipe.utils

================================================================

</details>

In [None]:
from pathlib import Path
import pandas as pd
from skimage import io
from scipy.ndimage import uniform_filter
from skimage.filters import gaussian

def FilterImages(rootdir:str,
                   projdir:str,
                   panel_filename="panel.csv",
                   format:str, 
                   hotpixel=None, 
                   gauss_blur=None, 
                   fullstack = True, 
                   hpf=50,
                   extract_dir = "analysis/1_image_out", 
                   clean_dir = "analysis/2_cleaned"):
    
    project_path = Path(rootdir) / projdir

    images_dir = project_path / extract_dir
    cleaned_dir = project_path / clean_dir
    panel_path = project_path / panel_filename

    panel = pd.read_csv(panel_path)

    if format == "if":
        def remove_hotpixels_threshold(img, threshold=5.0, neighborhood_size=3):
            """
            Replace 'hot' pixels that are above (threshold * local_mean) with that local mean.
            """
            img_float = img.astype(float)
            local_mean = uniform_filter(img_float, size=neighborhood_size)
            
            # create mask of hot pixels
            hot_mask = img_float > (threshold * local_mean)

            cleaned_img = img_float.copy()
            cleaned_img[hot_mask] = local_mean[hot_mask]

            # Convert back to original dtype (e.g., uint16) if desired
            return cleaned_img.astype(img.dtype)

        def apply_gaussian_blur(img, sigma=1.0):
                """
                Applies a Gaussian blur with a given sigma.
                Returns the blurred image (preserving the original range).
                """
                blurred = gaussian(img, sigma=sigma, preserve_range=True)
                return blurred.astype(img.dtype)
    
        full_stack = []
        for sample_dir in images_dir.iterdir():

            # Skip hidden folders (for whatever reason they may exist)
            if not sample_dir.is_dir() or sample_dir.name.startswith("."):
                continue

            # We expect exactly one tiff. in the folder, check and then just read the first one.
            tif_files = list(sample_dir.glob("*.tif*"))
            if len(tif_files) != 1:
                raise ValueError(
                    f"Expected exactly 1 TIF in '{sample_dir.name}', found {len(tif_files)}."
                )
            single_tif = tif_files[0]
            image = io.imread(str(single_tif))

            # Check that the stack depth == number of rows in panel.csv
            if image.shape[0] != len(panel):
                raise ValueError(
                    f"Panel length is {len(panel)} but found `{image.shape[0]}` channels"
                    f" in '{sample_dir.name}'."
                )

            # Process each channel in the stack
            for idx in range(len(panel)):
                channel = image[idx, :, :]

                # 1) Hotpixel removal if threshold is not None
                if hotpixel and hotpixel.get("threshold") is not None:
                    channel = remove_hotpixels_threshold(
                        channel,
                        threshold=hotpixel["threshold"],
                        neighborhood_size=hotpixel.get("neighborhood", 3)
                    )

                # 2) Gaussian blur if gauss_blur is not None
                if gauss_blur is not None:
                    channel = apply_gaussian_blur(channel, sigma=gauss_blur)

                if panel.loc[idx, "Full"] == 1:
                    full_stack.append(img_as_uint(channel))
                # Replace the channel in the stack
                image[idx, :, :] = channel

            cleaned_image_name = f"{sample_dir.name}_cleaned.tiff"
            cleaned_image_path = cleaned_dir / cleaned_image_name
            
            if fullstack == True:
                if len(full_stack) > 0:
                    full_stack = np.stack(full_stack)
                    io.imsave(str(full_tiff_path), full_stack)
                else:
                    print(f"Warning: No 'Full' channels found for sample '{sample_dir.name}'.")
            else:
                io.imsave(str(cleaned_image_path), image)



    elif format == "imc":    

        print("Generating Cleaned Images...\n")
        print("Using hot pixel filter of ",hpf,".\n")
        import imcsegpipe
        from imcsegpipe.utils import sort_channels_by_mass

        for image_dir in images_dir.glob("[!.]*"):
            if image_dir.is_dir():
                imcsegpipe.create_analysis_stacks(
                    acquisition_dir=image_dir,
                    analysis_dir=cleaned_dir,
                    analysis_channels=sort_channels_by_mass(
                        panel.loc[panel["Full"] == 1, "Conjugate"].tolist()
                    ),
                    suffix="_cleaned",
                    hpf=hpf
                )

    elif format != "if" and format != "imc":
        print("Format not specified. Choose 'imc' or 'if' specifically and run again.\n")

filter_images(rootdir=rootdir,
                   projdir="project",
                   format = 'if',
                   panel_filename="panel.csv",
                   hotpixel=None,
                   gauss_blur=None,
                   hpf=50,
                   extract_dir = "analysis/1_image_out",
                   clean_dir = "analysis/2_cleaned")

In [None]:
from PyMComplete import FilterImages

FilterImages(
    rootdir = rootdir,
    projdir = projdir,
    panel_filename="panel.csv",
    format = "if", 
    hotpixel={"threshold":3,"neighbourhood":5}, 
    gauss_blur=1, 
    fullstack = True, 
    hpf=50
)

<hr>

### Optional 3: Select Cropped regions for model training

This function is also optional. This is a quick and easy way to select cropped regions for cell segmentation modelling. 

`Function: CropSelector()`

<details><summary>Information</summary>

```bash
CropSelector(
        rootdir = "rootdir", 
        projdir = "projdir",
        panel_path = "panel.csv",
        image_path = "image.csv",
        images_dir = "analysis/2_cleaned",
        suffix  = "_cleaned")
```

================================================================

**Arguments:**  

- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.    

- `projdir`: This directory is specific to individual projects.     

- `panel_path`: This argument points to the `panel.csv`. The function will use the names in the `Target` column for the drop down options.    

- `image_path`: This argument points to the `iamge.csv`. Cropped coordinates for images are stored in the `Crop` column

- `extract_dir`: This argument specifies where the images are. The default is **"analysis/2_cleaned"**.

- `suffix`: specifies the suffix attached to the image names. The default images will be suffixed with **"_cleaned"**

================================================================

**Expected Outputs:**

A widget opens that allows the selection of the image, channel and a box to be drawn to create a cropped region appropriate for segmentation training. The coordinates are stored in image.csv. If crop coordinates are not specified, then 

================================================================

**Packages:**   
- os
- numpy
- pandas
- tifffile 
- matplotlib.pyplot 
- ipywidgets 
- IPython.display 
- matplotlib.widgets 

================================================================

</details>

In [None]:
import os
import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt
from pathlib import Path

import ipywidgets as widgets
from IPython.display import display, clear_output
from matplotlib.widgets import RectangleSelector 

def CropSelector(
        rootdir:str, 
        projdir:str,
        panel_path = "panel.csv",
        image_path = "image.csv",
        images_dir = "analysis/2_cleaned",
        suffix  = "_cleaned"):
    
    project_path = Path(rootdir) / projdir
    im_dir =  project_path / images_dir
    panel_path = project_path / panel_path
    sample_csv_path = project_path / image_path

    if not os.path.exists(panel_path):
        raise FileNotFoundError(f"Panel file not found: {panel_path}")
    panel = pd.read_csv(panel_path)

    if not os.path.exists(sample_csv_path):
        raise FileNotFoundError(f"Image file not found: {sample_csv_path}")
    samples = pd.read_csv(sample_csv_path)

    if "Image" not in samples.columns:
        raise ValueError("image.csv must contain an 'Image' column.")
    if "Target" not in panel.columns:
        raise ValueError("panel.csv must contain a 'Target' column for channel names.")

    # We'll use the "Target" column as channel names
    channel_names = panel.loc[panel['Full'] == 1, 'Target'].tolist()
    num_channels  = len(channel_names)

    # Subfolders in 1_image_out
    # Check if im_dir contains subdirectories or TIFF files directly
    subdirs = [d for d in os.listdir(im_dir) if os.path.isdir(os.path.join(im_dir, d))]
    tiff_files = []

    if subdirs:
        # If there are subdirectories, list TIFF files within each subdirectory
        for subdir in subdirs:
            subdir_path = os.path.join(im_dir, subdir)
            tiff_files.extend([
                os.path.join(subdir, f) for f in os.listdir(subdir_path)
                if f.endswith(".tif") or f.endswith(".tiff")
            ])
    else:
        # If no subdirectories, list TIFF files directly in im_dir
        tiff_files = [
            f for f in os.listdir(im_dir)
                if f.endswith(".tif") or f.endswith(".tiff")
        ]

    tiff_files.sort()
    
    # 2. Create Interactive Widgets
    
    image_dropdown = widgets.Dropdown(
        options=tiff_files,
        description='Image:',
        layout=widgets.Layout(width='200px')
    )

    channel_dropdown = widgets.Dropdown(
        options=channel_names,
        description='Channel:',
        layout=widgets.Layout(width='200px')
    )

    crop_button = widgets.Button(
        description="Crop",
        button_style='success'
    )
    save_button = widgets.Button(
        description="Save",
        button_style='info'
    )

    output_display = widgets.Output()

    # We'll keep track of the figure, axes, selected ROI, and loaded data
    # in these variables. We'll define them in the function's closure so we can
    # access/update them in callbacks.
    fig       = None
    ax_left   = None
    ax_right  = None
    rect_sel  = None
    roi       = {"x": 0, "y": 0, "w": 0, "h": 0}  # will store rectangle coords
    current_channel_data = None      # the 2D channel image
    cropped_data         = None      # the cropped region

    #--------------------------------------------------------------------------
    # 3. RectangleSelector callback
    #--------------------------------------------------------------------------
    def on_select(eclick, erelease):
        """
        Called whenever the user finishes drawing or moving the rectangle.
        eclick/erelease: mouse events with xdata, ydata in axes coords
        """
        x1, y1 = eclick.xdata, eclick.ydata
        x2, y2 = erelease.xdata, erelease.ydata

        # Ensure we have integer coords
        x_min, x_max = sorted([int(round(x1)), int(round(x2))])
        y_min, y_max = sorted([int(round(y1)), int(round(y2))])
        w = x_max - x_min
        h = y_max - y_min

        roi["x"], roi["y"], roi["w"], roi["h"] = x_min, y_min, w, h

    #--------------------------------------------------------------------------
    # 4. Function to load and display the selected channel
    #--------------------------------------------------------------------------
    def display_image(*args):
        """
        Loads the selected image & channel, sets up the RectangleSelector on the left axis.
        """
        nonlocal fig, ax_left, ax_right, rect_sel, current_channel_data, cropped_data
        cropped_data = None  # reset
        with output_display:
            clear_output(wait=True)

            selected_image_name = image_dropdown.value
            selected_channel_name = channel_dropdown.value

            # Convert that to a channel index
            try:
                channel_idx = channel_names.index(selected_channel_name)
            except ValueError:
                print(f"Channel '{selected_channel_name}' not found in panel.")
                return

            # Build the path to the TIF file (expected one TIF in the subfolder)
            tif_path = os.path.join(im_dir, selected_image_name)


            # Read the stack
            stack = tiff.imread(tif_path)
            if stack.shape[0] != num_channels:
                print(
                    f"Warning: panel has {num_channels} channels, "
                    f"but TIF has {stack.shape[0]} channels."
                )

            current_channel_data = stack[channel_idx, :, :]

            # Create a new figure
            fig, (ax_left, ax_right) = plt.subplots(1, 2, figsize=(10, 5))
            fig.canvas.toolbar_visible = False  # optional: hide toolbar
            fig.canvas.header_visible = False
            fig.canvas.footer_visible = False

            # Display the channel on the left
            ax_left.imshow(current_channel_data, cmap='gray')
            ax_left.set_title(f"{selected_image_name} - {selected_channel_name}")
            ax_left.axis('off')

            # The right side is blank initially
            ax_right.imshow(np.zeros((10,10)), cmap='gray')
            ax_right.set_title("Cropped Region")
            ax_right.axis('off')

            # Create RectangleSelector for the left axis
            rect_sel = RectangleSelector(
                ax_left,
                onselect=on_select,        # your callback function
                useblit=False,
                interactive=True,          # let the user move/resize the rectangle
                button=[1],                # left mouse button
                props=dict(
                    facecolor='none',
                    edgecolor='red',
                    fill=False, alpha=1
                )
            )

            plt.tight_layout()
            plt.show()

    #--------------------------------------------------------------------------
    # Crop button callback
    #--------------------------------------------------------------------------
    def on_crop_clicked(b):
        """
        Uses the ROI (roi dict) to crop the currently displayed channel, then
        shows the result in ax_right.
        """
        nonlocal cropped_data

        if current_channel_data is None:
            print("No image loaded yet.")
            return

        x, y, w, h = roi["x"], roi["y"], roi["w"], roi["h"]
        if w <= 0 or h <= 0:
            print("Please draw a rectangle first.")
            return

        # Perform the crop
        cropped_data = current_channel_data[y:y+h, x:x+w]

        with output_display:
            clear_output(wait=False)  # keep the figure
            # We'll re-draw the figure, focusing on the right axis
            # The figure should already be defined
            ax_right.clear()
            ax_right.imshow(cropped_data, cmap='gray')
            ax_right.set_title(f"Cropped: x={x}, y={y}, w={w}, h={h}")
            ax_right.axis('off')
            plt.show()

    #--------------------------------------------------------------------------
    # Save button callback
    #--------------------------------------------------------------------------
    def on_save_clicked(b):
        """
        Saves the crop coords as "x_y_w_h" in image.csv for the row
        where samples['Image'] equals the selected image.
        """
        selected_image_name = image_dropdown.value
        imagename = os.path.splitext(selected_image_name)[0].replace(suffix, "")

        x, y, w, h = roi["x"], roi["y"], roi["w"], roi["h"]

        if w <= 0 or h <= 0:
            print("No valid crop selected. Did you draw a rectangle?")
            return
        coords_str = f"{x}_{y}_{w}_{h}_manual"

        # Find the row in samples where 'Image' = selected_image_name
        # If multiple rows match, will update all... 
        mask = (samples["Image"] == imagename)
        if not mask.any():
            print(f"No row in image.csv with Image == '{imagename}'.")
            return
        
        #Check if Crop exists as a column
        if "Crop" not in samples.columns:
            samples["Crop"] = np.nan
        samples["Crop"] = samples["Crop"].astype("string")

        samples.loc[mask, "Crop"] = coords_str

        # Save back to CSV
        samples.to_csv(sample_csv_path, index=False)
        print(f"Saved coords '{coords_str}' for image '{imagename}' into image.csv.")

    #--------------------------------------------------------------------------
    # 7. Wire up callbacks
    #--------------------------------------------------------------------------
    # Show or re-show the image whenever the user changes either dropdown
    image_dropdown.observe(display_image, names='value')
    channel_dropdown.observe(display_image, names='value')

    # Or whenever the function first runs
    # we can do an initial display after the UI is built.

    crop_button.on_click(on_crop_clicked)
    save_button.on_click(on_save_clicked)

    #--------------------------------------------------------------------------
    # 8. Layout the UI
    #--------------------------------------------------------------------------
    ui = widgets.VBox([
        widgets.HBox([image_dropdown, channel_dropdown]),
        widgets.HBox([crop_button, save_button]),
        output_display
    ])
    
    display(ui)

    # Trigger an initial display
    display_image()

%matplotlib widget

CropSelector(
    rootdir = rootdir, 
    projdir = projdir,
    panel_path = "panel.csv",
    image_path = "image.csv",
    images_dir = "analysis/2_cleaned")

In [None]:
%matplotlib widget

CropSelector(
    rootdir = rootdir, 
    projdir = projdir,
    panel_path = "panel.csv",
    image_path = "image.csv",
    images_dir = "analysis/2_cleaned")

<hr><hr>

# 5. Prepare the images for cell segmentation model training

This function takes images from the 2_cleaned/ folder and deposits images into two folders: 
- 3a_cellpose_crop: used to train a segmentation model.
- 3b_cellpose_full: used to generate cell masks.

`Function: PrepCellpose()`

<details><summary>Information</summary>

```bash
PrepCellpose(
    rootdir, 
    projdir, 
    nucleus="DNA", 
    resolution=1, 
    crop_size=200,
    panel_dir = "panel.csv",
    images_dir = "image.csv",
    im_from = "analysis/2_cleaned", 
    suffix = "_cleaned",
    fullstack = True,
    crop_to = "analysis/3_segmentation/3a_cellpose_crop",
    full_to = "analysis/3_segmentation/3b_cellpose_full"
)
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `nucleus`: The nuclear stain that corresponds with the name listed in the *Target* column of `panel.csv`.

- `resolution`: The user can specify the resolution of their images which is used for generating *x* µm cropped regions.

- `crop_size`: The user can specify the size in **µm** of cropped regions they'd like to randomly generate.

- `panel_dir`: The directory of the `panel.csv` file relative to `rootdir/projdir`.

- `images_dir`: The directory of the `image.csv` file relative to `rootdir/projdir`.

- `im_from`: The directory images to prepare for use in cell segmentation relative to `rootdir/projdir`.

- `suffix`: The suffix appended to the image name. For example, images in 2_cleaned are appended with "_cleaned"

- `fullstack`: Specify whether the image has been cut of non-essential channels or not. For example, `True` specifies that the images in the `im_from` directory are stacks containing channels where `Full == 1` in `panel.csv`. `False` specifies that the images need to be subset based on `Full == 1` in `panel.csv`.

- `crop_to`: The directory to output the cropped images relative to `rootdir/projdir`.

- `full_to`: The directory to output the full sized images relative to `rootdir/projdir`.

- `out_suffix`: Specifies the suffix attached to the output images. Default is *"_CpSeg"*

================================================================

**Expected Outputs:**

```text
Segmentation Targets: [....]
Image1: random-cropped at (x=31, y=67, size=200)."
Image2: used manual crop 65_101_189_210
```

================================================================

**Packages:**   
- os
- random
- tifffile
- numpy
- pandas
- skimage

================================================================

</details>

In [None]:
import os
import random
import tifffile
import numpy as np
import pandas as pd
from skimage import exposure, img_as_uint

def PrepCellpose(
    rootdir, 
    projdir, 
    nucleus="DNA", 
    resolution=1, 
    crop_size=200,
    panel_dir="panel.csv",
    images_dir="image.csv",
    im_from="analysis/2_cleaned", 
    suffix="_cleaned",
    fullstack=True,
    crop_to="analysis/3_segmentation/3a_cellpose_crop",
    full_to="analysis/3_segmentation/3b_cellpose_full",
    out_suffix = "_CpSeg"
): 
    # Define directories
    panel_file = os.path.join(rootdir, projdir, panel_dir)
    image_csv = os.path.join(rootdir, projdir, images_dir) 
    dir_images = os.path.join(rootdir, projdir, im_from)
    crop_output = os.path.join(rootdir, projdir, crop_to)
    im_output = os.path.join(rootdir, projdir, full_to)

    # Read panel.csv
    panel = pd.read_csv(panel_file)

    # Read image.csv and check for required column
    df_image = pd.read_csv(image_csv)
    if "Crop" not in df_image.columns:
        raise ValueError("image.csv must contain a 'Crop' column.\n")

    # Get list of images to process
    image_list = df_image["Image"].tolist()

    # Convert user-specified crop_size (in resolution units) to pixels
    crop_size_px = int(crop_size * resolution)

    # Process each image
    for image_file in image_list:
        # Build the expected filename using the suffix
        candidate1 = os.path.join(dir_images, f"{image_file}{suffix}.tiff")
        candidate2 = os.path.join(dir_images, f"{image_file}{suffix}.tif")
        if os.path.exists(candidate1):
            full_image_path = candidate1
        elif os.path.exists(candidate2):
            full_image_path = candidate2
        else:
            raise FileNotFoundError(
                f"Error: File '{image_file}{suffix}.tiff' (or .tif) not found in {dir_images}. "
                "Please check your suffix and naming.\n"
            )

        # Read the image stack from file
        image_stack = tifffile.imread(full_image_path)

        # Process based on the fullstack flag
        if fullstack:
            # Expect the stack to have only the channels where Full == 1
            expected_channels = int(panel["Full"].sum())
            if image_stack.shape[0] != expected_channels:
                raise ValueError(
                    f"Error: For fullstack==True, expected {expected_channels} channels (per panel.csv 'Full' column) "
                    f"but found {image_stack.shape[0]} in image '{image_file}{suffix}'.\n"
                )
            # Create a sub-panel for the full channels
            full_panel = panel[panel["Full"] == 1].reset_index(drop=True)
        else:
            # Expect the stack to have all channels (length == panel length)
            if image_stack.shape[0] != len(panel):
                raise ValueError(
                    f"Error: For fullstack==False, expected image stack length equal to panel length ({len(panel)}) "
                    f"but found {image_stack.shape[0]} channels in image '{image_file}{suffix}'.\n"
                )
            # Subset the stack to only the channels where Full == 1
            full_indices = panel.index[panel["Full"] == 1].tolist()
            image_stack = image_stack[full_indices, :, :]
            full_panel = panel.loc[panel["Full"] == 1].reset_index(drop=True)

        # Determine segmentation targets from the full_panel (only for channels flagged for segmentation)
        segmentation_targets = full_panel.loc[full_panel["Segment"] == 1, "Target"].tolist()
        print("Segmentation Targets for image", image_file, ":", segmentation_targets,".\n")

        # Find the index of the nucleus channel within the segmentation targets
        dna_index = [i for i, target in enumerate(segmentation_targets) if target == nucleus]
        if not dna_index:
            raise ValueError(
                f"Error: DNA channel '{nucleus}' not found in segmentation targets for image '{image_file}{suffix}'.\n"
            )
        # Normalise only the channels flagged for segmentation (Segment == 1)
        normalized_stack = []
        for i in range(image_stack.shape[0]):
            # Check if the current channel should be segmented according to panel
            if full_panel.iloc[i]["Segment"] == 1:
                channel = image_stack[i, :, :]
                normalized = exposure.rescale_intensity(channel, in_range='image', out_range=(0, 1))
                normalized_stack.append(img_as_uint(normalized))
        if normalized_stack:
            normalized_stack = np.stack(normalized_stack)  # shape: (C_segment, H, W)
        else:
            raise ValueError("No channels with Segment==1 found in panel.")
        
        # Identify the DNA channel from the normalised stack
        dna_chan = normalized_stack[dna_index[0]]
        # Remove the DNA channel(s) to compute the surface mask from the remaining channels
        for idx in sorted(dna_index, reverse=True):
            normalized_stack = np.delete(normalized_stack, idx, axis=0)
        surface_mask = np.mean(normalized_stack, axis=0).astype(np.uint16)

        # Build the composite stack with three channels: [empty, surface, DNA]
        empty_channel = np.zeros_like(dna_chan, dtype=np.uint16)
        composite_stack = np.stack([empty_channel, surface_mask, dna_chan])

        # Save the full composite image
        im_output_path = os.path.join(im_output, f"{image_file}{out_suffix}.tiff")
        tifffile.imwrite(im_output_path, composite_stack)

        # Determine cropping parameters
        user_crop_str = df_image.loc[df_image["Image"] == image_file, "Crop"].values[0]  # e.g., "50_100_400_300" or <NA>
        _, height, width = composite_stack.shape

        if isinstance(user_crop_str, str) and user_crop_str.lower() != "nan":
            try:
                parts = user_crop_str.split("_")
                if len(parts) < 4:
                    raise ValueError("Not enough crop parameters provided.\n")
                x, y, w, h = map(int, parts[:4])
                # Validate that the provided crop coordinates are within image bounds
                if (x + w <= width) and (y + h <= height) and (w > 0) and (h > 0):
                    cropped = composite_stack[:, y:y+h, x:x+w]
                    crop_output_path = os.path.join(crop_output, f"{image_file}{out_suffix}.tiff")
                    tifffile.imwrite(crop_output_path, cropped)
                    print(f"{image_file}: used manual crop {x}_{y}_{w}_{h}.\n")
                    continue
                else:
                    print(f"{image_file}: user crop coordinates out of bounds => performing random crop.\n")
            except Exception as e:
                print(f"{image_file}: error parsing Crop='{user_crop_str}' => performing random crop. {e}\n")

        # If no valid user crop is provided, perform a random crop
        if width < crop_size_px or height < crop_size_px:
            # If the image is smaller than the desired crop size, save it without cropping.
            crop_output_path = os.path.join(crop_output, f"{image_file}{out_suffix}.tiff")
            tifffile.imwrite(crop_output_path, composite_stack)
            print(f"Image {image_file} is smaller than {crop_size_px} px => saved without cropping.\n")
            continue

        workable_x = width - crop_size_px
        workable_y = height - crop_size_px
        rand_x = random.randint(0, workable_x)
        rand_y = random.randint(0, workable_y)
        cropped = composite_stack[:, rand_y:rand_y + crop_size_px, rand_x:rand_x + crop_size_px]
        
        coords_str = f"{rand_x}_{rand_y}_{crop_size_px}_{crop_size_px}_random"
        df_image.loc[df_image["Image"] == image_file, "Crop"] = coords_str

        crop_output_path = os.path.join(crop_output, f"{image_file}{out_suffix}.tiff")
        tifffile.imwrite(crop_output_path, cropped)
        print(f"{image_file}: random-cropped at (x={rand_x}, y={rand_y}, size={crop_size_px})\n.")

    # Write the updated image.csv back to disk
    df_image.to_csv(os.path.join(rootdir, projdir, images_dir), index=False)
    print("\nDone!\n")



In [None]:
from PyMComplete import PrepCellpose

PrepCellpose(
    rootdir = rootdir, 
    projdir=projdir, 
    nucleus="191Ir_191Ir_DNA1", 
    resolution=1, 
    crop_size=200,
    panel_dir="panel.csv",
    images_dir="image.csv",
    im_from="analysis/2_cleaned", 
    suffix="_full",
    fullstack=True,
    crop_to="analysis/3_segmentation/3a_cellpose",
    full_to="analysis/3_segmentation/3b_cellpose",
    out_suffix = "_full"
)


<hr><hr>

### Optional 3: Registration with CV2

Image registration might be necessary to improve cell segmentation

`Function: RegisterImages()`

You'll need two directories: 
- static images (registered to) and      
- moving images (to register).

Your directory may look like this:

```text
rootdir/projdir
├── analysis
│     └── 1_image_out
│     └── 2_cleaned
│            └── imagename1_cleaned.tiff
│            └── imagename2_cleaned.tiff
│            └── ...
│     └── 3_segmentation
│     └── ...
├── analysis
├── Immunofluorescent
│     └── imagename1_IF.tiff
│     └── imagename2_IF.tiff
│     └── ...
├── panel.csv
├── panel_reg.csv
```

- `imagename1_cleaned.tiff` will be cleaned images that are *n*-sized stacks that match Full==1 in `panel.csv`  
- `imagename1_IF.tiff` will be images to register that are *n*-sized stacks that match Full==1 in `panel_IF.csv` **Note**: This *can* be single layer images, and this just needs to be reflected in the `panel_IF.csv`    
- `panel_IF.csv` should contain UNIQUE channel names to `panel.csv` - so that you don't get instances of duplicates. E.g. you could have CD3 in your IMC stack and CD3 in your IF stack - just append anything from `panel_IF.csv` with a sufflix like *CD3_IF*

<details><summary>Information</summary>

```bash
RegisterImages(
    rootdir = rootdir, 
    projdir = projdir, 
    static_dir = "analysis/2_cleaned", 
    static_channel = "DNA", 
    static_suffix = "_cleaned", 
    static_panel_dir = "panel.csv", 
    moving_dir = "IF", 
    moving_channel = "DAPI", 
    moving_suffix = "_IF",
    moving_panel_dir = "panel_reg.csv",
    out_dir = "analysis/2b_registered", 
    out_dir_suffix = "_registered",
    combine = True
)
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `static_dir`: This directory specifies the images that you want to use as the static image for registration

- `static_channel`: This correlates to a name listed in the *Target* column in `panel.csv` that is used for registration - likely a DNA channel

- `static_suffix`: This specifies the suffix attached to the *imagename* - for example, in **2_cleaned**, the default suffix is *_cleaned*. 

- `static_panel_dir`: Points to the panel that corresponds to the static images. The default is `panel.csv`

- `moving_dir`: This directory specifies the images that you want to register to the static images. 

- `moving_channel`: This correlates to a name listed in the *Target* column in `panel.csv` that is used for registration - likely a DAPI channel

- `moving_suffix`: This specifies the suffix attached to the *imagename* - for example, in **2_cleaned**, the default suffix is *_cleaned*. 

- `moving_panel_dir`: Points to the panel that corresponds to the moving images. The default is `panel.csv`

- `out_dir`: Directory to output images relative to rootdir/projdir. Default is *"analysis/2b_registered"*

- `out_dir_suffix`: The suffix to add to output images. Default is *"_reg"*.

- `combine`: This argument specifies whether you want to output the transformed moving image by itself, or append the moving image stack to the static image stack. For example, 4 channel IF gets appended to the 30 channel IMC after registering DAPI to DNA. The output image would be a 34 channel `image_reg.tiff`.


================================================================

**Packages:**   
- os    
- cv2   
- numpy 
- pandas    
- tifffile  
- matplotlib.pyplot 
- matplotlib.widgets    
- IPython.display   
- ipywidgets    

================================================================

</details>

In [None]:
import os
import cv2
import numpy as np
import pandas as pd
import tifffile
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display, clear_output
from matplotlib.widgets import RectangleSelector

def RegisterImages(rootdir, 
                   projdir, 
                   static_dir, 
                   static_channel, 
                   static_suffix, 
                   moving_dir, 
                   moving_channel = "DAPI", 
                   moving_suffix,
                   out_dir = "analysis/2b_registered", 
                   out_dir_suffix = "_reg",
                   static_panel_dir = "panel.csv", 
                   moving_panel_dir = "panel_IF.csv",
                   combine=False):
    """
    Creates an interactive UI to register a static image (e.g. IMC DNA) with a moving image (e.g. IF DAPI).
    If combine==True, then when saving, the transform is applied to all slices of the moving image stack,
    and the resulting transformed moving stack is appended to the full static image stack.
    A combined panel CSV is also generated by concatenating the raw static and moving panel CSVs.
    """
    # --- Helper functions for contrast adjustment and tinting ---
    def adjust_contrast_range(image, contrast_range):
        image_norm = image.astype(np.float32) / 255.0
        low, high = contrast_range
        if high - low < 1e-6:
            return image.copy()
        stretched = np.clip((image_norm - low) / (high - low), 0, 1)
        return (stretched * 255).astype(np.uint8)

    def tint_blue(image):
        return np.stack([np.zeros_like(image), np.zeros_like(image), image], axis=-1)

    def tint_yellow(image):
        return np.stack([image, image, np.zeros_like(image)], axis=-1)

    # --- Load panel CSV and get channel index ---
    def load_panel(panel_csv_path, channel_name):
        panel = pd.read_csv(panel_csv_path)
        panel_full = panel[panel["Full"]==1].reset_index(drop=True)
        matches = panel_full.index[panel_full["Target"]==channel_name].tolist()
        if not matches:
            raise ValueError(f"Channel '{channel_name}' not found in panel {panel_csv_path}.")
        return panel_full, matches[0]
    
    # --- List image files and extract base names ---
    def list_images(image_dir, suffix):
        valid_ext = (".tif", ".tiff")
        files = [f for f in os.listdir(image_dir) if f.endswith(valid_ext) and f.find(suffix) != -1]
        base_names = {}
        for f in files:
            for ext in valid_ext:
                if f.endswith(suffix + ext):
                    name = f[:-len(suffix + ext)]
                    base_names[name] = f
                    break
        return base_names

    # --- Load an image stack and select the channel slice ---
    def load_channel_image(image_dir, base_name, suffix, panel_csv, channel_name):
        for ext in [".tif", ".tiff"]:
            candidate = os.path.join(image_dir, f"{base_name}{suffix}{ext}")
            if os.path.exists(candidate):
                file_path = candidate
                break
        else:
            raise FileNotFoundError(f"Image file for {base_name}{suffix} not found in {image_dir}.")
        img_stack = tifffile.imread(file_path)
        panel_full, chan_idx = load_panel(os.path.join(rootdir, projdir, panel_csv), channel_name)
        if img_stack.ndim == 3:
            if chan_idx >= img_stack.shape[0]:
                raise ValueError(f"Channel index {chan_idx} out of bounds for image stack with {img_stack.shape[0]} slices.")
            img = img_stack[chan_idx, :, :]
        else:
            img = img_stack.copy()
        if img.dtype != np.uint8:
            img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)
        return img

    # --- Helper function to load full image stack ---
    def load_full_stack(image_dir, base_name, suffix):
        for ext in [".tif", ".tiff"]:
            candidate = os.path.join(image_dir, f"{base_name}{suffix}{ext}")
            if os.path.exists(candidate):
                return tifffile.imread(candidate)
        raise FileNotFoundError(f"Full image stack for {base_name}{suffix} not found in {image_dir}.")

    # --- Prepare image file lists and dropdown menus ---
    static_images = list_images(os.path.join(rootdir, projdir, static_dir), static_suffix)
    moving_images = list_images(os.path.join(rootdir, projdir, moving_dir), moving_suffix)
    
    static_dropdown = widgets.Dropdown(options=sorted(static_images.keys()), description="Static:")
    moving_dropdown = widgets.Dropdown(options=sorted(moving_images.keys()), description="Moving:")

    # --- Global variables ---
    static_img_orig = None
    moving_img_orig = None
    global_overlay = None
    global_static  = None
    global_aligned = None
    global_fig = None
    global_ax_left = None
    global_ax_right = None
    ROI = {"x": 0, "y": 0, "w": 0, "h": 0}
    rect_sel = None
    saved_path = None
    # Store the computed transformation matrix and dimensions
    transformation_matrix = None
    global_dims = None
    registration_metrics = {}

    # --- Widgets for adjustable parameters ---
    static_contrast_slider = widgets.FloatRangeSlider(
        value=[0.0, 1.0], min=0.0, max=1.0, step=0.01,
        description='Static Contrast:',
        layout=widgets.Layout(width='300px')
    )
    moving_contrast_slider = widgets.FloatRangeSlider(
        value=[0.0, 1.0], min=0.0, max=1.0, step=0.01,
        description='Moving Contrast:',
        layout=widgets.Layout(width='300px')
    )
    ratio_threshold_slider = widgets.FloatSlider(value=0.92, min=0.80, max=0.98, step=0.01, description='Ratio Thresh:')
    ransac_threshold_slider  = widgets.IntSlider(value=25, min=5, max=50, step=1, description='RANSAC Thresh:')
    sift_sigma_slider        = widgets.FloatSlider(value=1.6, min=1.0, max=3.0, step=0.1, description='SIFT Sigma:')
    hist_eq_checkbox         = widgets.Checkbox(value=True, description='Histogram Equalisation')

    update_button = widgets.Button(description="Update Registration", button_style='success')
    crop_button   = widgets.Button(description="Crop", button_style='warning')
    save_button   = widgets.Button(description="Save Registered IF", button_style='info')
    output_folder_text = widgets.Text(
        value=os.path.join(rootdir, projdir, out_dir),
        description="Output Folder:",
        layout=widgets.Layout(width='400px')
    )
    
    fig_out = widgets.Output()

    # --- Load images based on dropdown selection ---
    def load_images_from_selection(change=None):
        nonlocal static_img_orig, moving_img_orig
        base_static = static_dropdown.value
        base_moving = moving_dropdown.value
        static_img_orig = load_channel_image(os.path.join(rootdir, projdir, static_dir),
                                             base_static, static_suffix,
                                             static_panel_dir, static_channel)
        moving_img_orig = load_channel_image(os.path.join(rootdir, projdir, moving_dir),
                                             base_moving, moving_suffix,
                                             moving_panel_dir, moving_channel)
    
    load_images_from_selection()
    static_dropdown.observe(load_images_from_selection, names='value')
    moving_dropdown.observe(load_images_from_selection, names='value')

    # --- Registration and display function ---
    def update_registration(b):
        nonlocal global_overlay, global_static, global_aligned, global_fig, global_ax_left, global_ax_right, rect_sel, ROI, transformation_matrix, global_dims, registration_metrics
        with fig_out:
            clear_output(wait=True)
            static_range = static_contrast_slider.value
            moving_range = moving_contrast_slider.value
            ratio_thresh    = ratio_threshold_slider.value
            ransac_thresh   = ransac_threshold_slider.value
            sift_sigma      = sift_sigma_slider.value
            use_hist_eq     = hist_eq_checkbox.value

            if use_hist_eq:
                static_img = cv2.equalizeHist(static_img_orig)
                moving_img = cv2.equalizeHist(moving_img_orig)
            else:
                static_img = static_img_orig.copy()
                moving_img = moving_img_orig.copy()
            static_img = adjust_contrast_range(static_img, static_range)
            moving_img = adjust_contrast_range(moving_img, moving_range)

            h_static, w_static = static_img.shape
            h_moving, w_moving = moving_img.shape
            global_dims = (w_static, h_static)
            if (h_moving, w_moving) != (h_static, w_static):
                scale_x = w_moving / w_static
                scale_y = h_moving / h_static
                moving_img_small = cv2.resize(moving_img, (w_static, h_static), interpolation=cv2.INTER_AREA)
            else:
                scale_x = scale_y = 1.0
                moving_img_small = moving_img.copy()

            sift = cv2.SIFT_create(nOctaveLayers=3, sigma=sift_sigma)
            kp_static, des_static = sift.detectAndCompute(static_img, None)
            kp_moving_small, des_moving_small = sift.detectAndCompute(moving_img_small, None)

            bf = cv2.BFMatcher(cv2.NORM_L2)
            matches = bf.knnMatch(des_static, des_moving_small, k=2) if (des_static is not None and des_moving_small is not None) else []
            good_matches = [m for m, n in matches if m.distance < ratio_thresh * n.distance] if matches else []

            if len(good_matches) >= 3:
                pts_static = np.float32([kp_static[m.queryIdx].pt for m in good_matches]).reshape(-1,2)
                pts_moving_small = np.float32([kp_moving_small[m.trainIdx].pt for m in good_matches]).reshape(-1,2)
                M_small, inliers = cv2.estimateAffine2D(pts_moving_small, pts_static, None, cv2.RANSAC, ransac_thresh, 2000, 0.99, 10)
                if M_small is not None:
                    M = M_small.copy()
                    M[0,0] /= scale_x
                    M[0,1] /= scale_y
                    M[1,0] /= scale_x
                    M[1,1] /= scale_y
                    transformation_matrix = M.copy()
                    rotation_rad = np.arctan2(M[1,0], M[0,0])
                    rotation_deg = np.degrees(rotation_rad)
                    scale_x_val = np.sqrt(M[0,0]**2 + M[1,0]**2)
                    scale_y_val = np.sqrt(M[0,1]**2 + M[1,1]**2)
                    translation_x = M[0,2]
                    translation_y = M[1,2]
                    registration_metrics['Rotation (deg)'] = rotation_deg
                    registration_metrics['Scale X'] = scale_x_val
                    registration_metrics['Scale Y'] = scale_y_val
                    registration_metrics['Translation X'] = translation_x
                    registration_metrics['Translation Y'] = translation_y

                    aligned_moving = cv2.warpAffine(moving_img, M, (w_static, h_static), flags=cv2.INTER_LINEAR)
                else:
                    aligned_moving = cv2.resize(moving_img, (w_static, h_static))
            else:
                aligned_moving = cv2.resize(moving_img, (w_static, h_static))

            global_aligned = aligned_moving.copy()
            blue_static   = tint_blue(static_img)
            yellow_aligned = tint_yellow(aligned_moving)
            overlay = cv2.addWeighted(blue_static, 0.5, yellow_aligned, 0.5, 0)

            global_overlay = overlay
            global_static  = static_img.copy()
            ROI["x"] = ROI["y"] = ROI["w"] = ROI["h"] = 0

            global_fig, (global_ax_left, global_ax_right) = plt.subplots(1, 2, figsize=(12,6))
            global_ax_left.imshow(static_img, cmap='gray')
            global_ax_left.set_title("Static Image — Draw ROI here")
            global_ax_left.axis('off')
            global_ax_right.imshow(overlay)
            global_ax_right.set_title("Registration Output (Overlay)")
            global_ax_right.axis('off')
            rect_sel = RectangleSelector(
                global_ax_left,
                on_select,
                useblit=False,
                interactive=True,
                button=[1],
                props=dict(facecolor='none', edgecolor='red', fill=False, alpha=1)
            )
            plt.tight_layout()
            plt.show()

    def on_select(eclick, erelease):
        x1, y1 = eclick.xdata, eclick.ydata
        x2, y2 = erelease.xdata, erelease.ydata
        x_min, x_max = sorted([int(round(x1)), int(round(x2))])
        y_min, y_max = sorted([int(round(y1)), int(round(y2))])
        ROI["x"] = x_min
        ROI["y"] = y_min
        ROI["w"] = x_max - x_min
        ROI["h"] = y_max - y_min

    def crop_callback(b):
        nonlocal global_ax_right
        if ROI["w"] <= 0 or ROI["h"] <= 0:
            with fig_out:
                print("Please draw a rectangle on the left image first.")
            return
        cropped = global_overlay[ROI["y"]:ROI["y"]+ROI["h"], ROI["x"]:ROI["x"]+ROI["w"]]
        global_ax_right.clear()
        global_ax_right.imshow(cropped)
        global_ax_right.set_title(f"Cropped Region (x={ROI['x']}, y={ROI['y']}, w={ROI['w']}, h={ROI['h']})")
        global_ax_right.axis('off')
        global_fig.canvas.draw_idle()

    def save_callback(b):
        nonlocal saved_path
        out_folder = output_folder_text.value.strip()
        if not out_folder:
            with fig_out:
                print("Please specify a valid output folder.")
            return
        if not os.path.exists(out_folder):
            os.makedirs(out_folder)
        base_static = static_dropdown.value
        filename = f"{base_static}{static_suffix}{out_dir_suffix}.tiff"
        saved_path = os.path.join(out_folder, filename)
        cv2.imwrite(saved_path, global_aligned)
        settings = {
            "Static Contrast": static_contrast_slider.value,
            "Moving Contrast": moving_contrast_slider.value,
            "Ratio Threshold": ratio_threshold_slider.value,
            "RANSAC Threshold": ransac_threshold_slider.value,
            "SIFT Sigma": sift_sigma_slider.value,
            "Histogram Equalisation": hist_eq_checkbox.value,
            "ROI": ROI
        }
        settings.update(registration_metrics)
        options_path = os.path.join(out_folder, f"{base_static}{static_suffix}{out_dir_suffix}_options.csv")
        pd.DataFrame([settings]).to_csv(options_path, index=False)
        with fig_out:
            print("Registered image saved to:", saved_path)
            print("Options saved to:", options_path)
        
        # If combine==True, perform full-stack registration and panel concatenation.
        if combine and transformation_matrix is not None:
            # Load full stacks
            static_stack = load_full_stack(os.path.join(rootdir, projdir, static_dir), base_static, static_suffix)
            moving_stack = load_full_stack(os.path.join(rootdir, projdir, moving_dir), moving_dropdown.value, moving_suffix)
            w_static, h_static = global_dims
            transformed_moving_stack = []
            for i in range(moving_stack.shape[0]):
                ch_img = moving_stack[i, :, :]
                transformed = cv2.warpAffine(ch_img, transformation_matrix, (w_static, h_static), flags=cv2.INTER_LINEAR)
                transformed_moving_stack.append(transformed)
            transformed_moving_stack = np.stack(transformed_moving_stack)
            combined_stack = np.concatenate([static_stack, transformed_moving_stack], axis=0)
            combined_filename = f"{base_static}{static_suffix}{out_dir_suffix}_combined.tiff"
            combined_path = os.path.join(out_folder, combined_filename)
            tifffile.imwrite(combined_path, combined_stack)
            # Create panel_combined by concatenating the raw CSVs
            panel_static = pd.read_csv(os.path.join(rootdir, projdir, static_panel_dir))
            panel_moving = pd.read_csv(os.path.join(rootdir, projdir, moving_panel_dir))
            panel_combined = pd.concat([panel_static, panel_moving], ignore_index=True)
            static_panel_dir_full = os.path.join(rootdir, projdir, os.path.dirname(static_panel_dir))
            panel_combined_path = os.path.join(static_panel_dir_full, "panel_combined.csv")
            panel_combined.to_csv(panel_combined_path, index=False)
            with fig_out:
                print("Combined image stack saved to:", combined_path)
                print("Combined panel saved to:", panel_combined_path)

    update_button.on_click(update_registration)
    crop_button.on_click(crop_callback)
    save_button.on_click(save_callback)

    ui = widgets.VBox([
        widgets.HBox([static_dropdown, moving_dropdown]),
        widgets.HBox([static_contrast_slider, moving_contrast_slider]),
        widgets.HBox([ratio_threshold_slider, ransac_threshold_slider, sift_sigma_slider]),
        hist_eq_checkbox,
        widgets.HBox([update_button, crop_button, save_button]),
        output_folder_text,
        fig_out
    ])
    display(ui)


In [None]:
%matplotlib widget

from PyMComplete import RegisterImages

RegisterImages(rootdir = rootdir, 
               projdir = projdir, 
               static_dir = "analysis/2_cleaned", 
               static_channel = "DNA", 
               static_suffix="_cleaned", 
               static_panel_dir = "panel.csv", 
               moving_dir = "raw/IFImages", 
               moving_channel = "DAPI", 
               moving_suffix="_IF",
               moving_panel_dir = "panel_IF.csv",
               out_dir="analysis/2b_registered", 
               out_dir_suffix="_reg",
               combine=True
)

From here, you would use prep_cellpose() using the DAPI stain instead of DNA. 

You'll train the images on IMC-surface + IF-nuclear stains

**If you do not have an IF image for every IMC image** - you'll need to have two `2_cleaned folders` and two `image.csv`. In this scenario, you would have a folder of images that match IF images. You would train on this first. Then, you would have all unregistered images and output all of these with DNA and, using the same model, train on all DNA images. Then, batch segment on the DNA images.

<hr><hr>

# 6. Train the segmentation model

In this workflow, we use the cellpose segmentation workflow. *This is technically an optional step*, but we've found that no generalised model works universally. Once you open cellpose, there's a dropdown menu option under Models > Training Instructions that you can follow. I outline the process below: 

a. **Run *cellpose* using**    
```bash
python -m cellpose
```

b. **Drag an image from 2a_cellpose_crop into the cellpose window.**

c. **Run a generalised segmentation model and adjust the available arguments until you're satisfied with an initial segmentation, such as:**

> *Cell probability threshold (cellprob threshold):* Sets the minimum confidence for a pixel to be assigned to a cell.
> **Increasing**: Leads to more conservative segmentation—only high-confidence pixels are included, which can reduce false positives but may miss faint or borderline cells. **Decreasing**: Results in a more inclusive segmentation, capturing more potential cell pixels at the risk of incorporating noise or false positives.

> *Flow threshold:* Determines the required strength of the flow field (i.e. directional gradients) used to outline cell boundaries.
> **Increasing**: Demands a stronger, clearer flow signal for segmentation, which may avoid oversegmentation but might miss cells with subtle boundaries. **Decreasing**: Lowers the barrier for segmentation based on flow, potentially detecting cells with weak signals but risking merging or oversegmentation of adjacent cells.

> *Norm percentiles:* Defines the intensity range (by setting lower and upper percentile bounds) for image normalisation, affecting contrast and dynamic range.
> **Increasing the upper percentile or decreasing the lower percentile**: Expands the dynamic range, enhancing contrast in brighter regions but possibly exaggerating noise. **Decreasing the upper percentile or increasing the lower percentile**: Compresses the dynamic range, which can reduce noise and improve overall uniformity, though it might diminish contrast for dimmer cells. 

d. **Adjust the cell masks and cmd/ctrl-S to save:**   
>  hold cmd/ctrl and click to remove a mask     
> right click and drag to create a new outline

e. **Press cmd/ctrl-T to train the model:** Choose your pretrained model and adjust the options available. Cellpose will open the next image with the segmentation mask generated by the model.

f. **Repeat d-e as many times as necessary, and until you're satisfied with the model outcomes.** 

In [None]:
python -m cellpose

<hr><hr>

# 7. Batch segment the images and generate cell masks

Once you've generated a segmentation model, or you intend to use a generalised model, you can use `BatchSegment()` to create mask tiff files. 



`Function: BatchSegment()`

There are two options:
- use a segmentation model trained on the cropped images in cellpose    
- use a segmentation model built-in to cellpose, such as *'cyto3', 'cyto2', 'cyto', 'nuclei', 'tissuenet_cp3', 'livecell_cp3', 'yeast_PhC_cp3','yeast_BF_cp3', 'bact_phase_cp3', 'bact_fluor_cp3', 'deepbacs_cp3', 'cyto2_cp3'*.    

If you want to outsource segmentation and want to continue to using this pipeline, reimport images to the `mask_to` folder. The default is *"analysis/3_segmentation/3c_cellpose_mask"*

<details><summary>Information</summary>

```bash
BatchSegment(
    rootdir,
    projdir,
    model = None, 
    builtin_model = None, 
    channels = [2, 3],
    cell_diameter,
    flow_threshold,
    cellprob_threshold,
    model_dir ="analysis/3_segmentation/3a_cellpose_crop/models",
    full_from = "analysis/3_segmentation/3b_cellpose_full",
    mask_to = "analysis/3_segmentation/3c_cellpose_mask",
    suffix = "_mask"
)
```

================================================================

**Arguments:**  
- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `model`: The name of your segmentation model. If you choose to use a builtin model, leave this empty.

- `builtin_model`: If you choose to use a built-in model, specify the name of the model. Current options are: *'cyto3', 'cyto2', 'cyto', 'nuclei', 'tissuenet_cp3', 'livecell_cp3', 'yeast_PhC_cp3','yeast_BF_cp3', 'bact_phase_cp3', 'bact_fluor_cp3', 'deepbacs_cp3', 'cyto2_cp3'*. 

- `channels`: The channels to use for segmentation. The default is [2,3] which specifies [green(surface),blue(nucleus)]

- `cell_diameter`: A value taken from cellpose for the expected cell diameter. 

- `flow_threshold`: A value taken from cellpose for the flow_threshold.  

- `cellprob_threshold`: A value taken from cellpose for the cellprob_threshold. 

- `model_dir`: Directory to the model folder. Default is *"analysis/3_segmentation/3a_cellpose_crop/models"*

- `full_from`: Directory to the images folder. Default is *"analysis/3_segmentation/3a_cellpose_crop/full_from"*

- `mask_to`: Where the masks are deposited.

- `suffix`: The appended suffix to each image. The default is *"_mask"*


================================================================

**Packages:**   
- os    
- skimage.io   
- cellpose
- cellpose.io
- pathlib
- tifffile  

================================================================

</details>

In [None]:
import os
import skimage.io
from cellpose import models
from cellpose.io import logger_setup
from pathlib import Path
import tifffile

def BatchSegment(
        rootdir,
        projdir,
        model = None, 
        builtin_model = None, 
        channels = [2, 3],
        cell_diameter: int,
        flow_threshold: int,
        cellprob_threshold: int,
        model_dir ="analysis/3_segmentation/3a_cellpose_crop/models",
        full_from = "analysis/3_segmentation/3b_cellpose_full",
        mask_to = "analysis/3_segmentation/3c_cellpose_mask",
        suffix = "_mask"
        ):
    
    # Define Cellpose model
    if model is not None: 
        model_path = os.path.join(rootdir, projdir,model_dir, model)
        if os.path.exists(model_path):
            print("Choosing ", model_path)
            model = models.CellposeModel(pretrained_model=model_path)

        else:
            print("Model path does not exist. Exiting...")
            print(model_path)
            return
        
    elif model is None and builtin_model is not None: 
        if builtin_model in ['cyto3', 'cyto2', 'cyto', 'nuclei']:
            print("Choosing ", builtin_model)
            model = models.Cellpose(model_type=builtin_model)
        elif builtin_model in ['tissuenet_cp3', 'livecell_cp3', 'yeast_PhC_cp3','yeast_BF_cp3', 'bact_phase_cp3', 'bact_fluor_cp3', 'deepbacs_cp3', 'cyto2_cp3']:
            model=models.CellposeModel(model_type='tissuenet_cp3')
        else: 
            print("'",builtin_model, "' not available as a built in model.")
            print("Choose: cyto, cyto2, cyto3, nuclei, tissuenet_cp3, livecell_cp3, yeast_PhC_cp3,yeast_BF_cp3, bact_phase_cp3, bact_fluor_cp3, deepbacs_cp3, or cyto2_cp3.")
            return

    # Set and create directories
    analysis = Path(os.path.join(rootdir, projdir))
    image_dir = analysis / full_from
    mask_dir = analysis / mask_to

    # Call logger_setup to have output of cellpose written
    logger_setup()

    # Get list of image files
    files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith(".tiff")] 
    imgs = [tifffile.imread(f) for f in files]

    # Run segmentation
    masks, flows, styles  = model.eval(imgs, diameter=cell_diameter, flow_threshold=flow_threshold, cellprob_threshold=cellprob_threshold, channels=channels)

    # Save mask images
    for idx, mask in enumerate(masks):
        original_path = Path(files[idx])
        new_path = mask_dir / (original_path.stem + suffix +".tif")
        skimage.io.imsave(new_path, mask)

    print("Done!")

In [None]:
from PyMComplete import BatchSegment

BatchSegment(
        rootdir = rootdir,
        projdir = projdir,
        model = None, 
        builtin_model = None, 
        channels = [2, 3],
        cell_diameter = 14.3,
        flow_threshold = 0,
        cellprob_threshold = 0,
        model_dir ="analysis/3_segmentation/3a_cellpose_crop/models",
        full_from = "analysis/3_segmentation/3b_cellpose_full",
        mask_to = "analysis/3_segmentation/3c_cellpose_mask",
        suffix = "_mask"
        )

<hr><hr>

# 8. PyProfiler

PyProfiler does exactly what cellprofiler does, but without the need for an additional application. 

`Function: PyProfiler()`

There are two options:
- use a segmentation model trained on the cropped images in cellpose    
- use a segmentation model built-in to cellpose, such as *'cyto3', 'cyto2', 'cyto', 'nuclei', 'tissuenet_cp3', 'livecell_cp3', 'yeast_PhC_cp3','yeast_BF_cp3', 'bact_phase_cp3', 'bact_fluor_cp3', 'deepbacs_cp3', 'cyto2_cp3'*.    

If you want to outsource segmentation and want to continue to using this pipeline, reimport images to the `mask_to` folder. The default is *"analysis/3_segmentation/3c_cellpose_mask"*

<details><summary>Information</summary>

```bash
pyprofiler(
    rootdir, 
    projdir, 
    mean = 1, 
    shape = 1, 
    geometry = 1, 
    compartment = 1,
    compartment_measure = "mean",
    neighbours = 0,
    boundary_contacts = False 
    panel_path = "panel.csv",
    mask_dir =  "analysis/3_segmentation/3e_cellpose_mask",
    image_dir = "analysis/3_segmentation/3a_fullstack", 
    compartment_dir =  "analysis/3_segmentation/3f_compartments", 
    out_dir = "analysis/4_pyprofiler_output/cell.csv",
    geom_out_dir = "analysis/4_pyprofiler_output/geom.csv", 
    mask_suffix = "_mask",
    image_suffix = "_full",
    comp_suffix = "_compartment",

)
```

================================================================

**Arguments:**  

- `rootdir`: This directory is intended to store commonly used GitHub Repositories and other relevant resources.  

- `projdir`: This directory is specific to individual projects. 

- `mean`: A logic value to collect the mean intensities of the cell mask in the fullstack images.  

- `shape`: A logic value to collect additional shape metrics: area and eccentricity. 

- `geometry`: A logic value that specifies whether you want to collect cell geometry features which can be used to reconstruct the shape of the cell masks. **Note**: This can add about 50% extra processing time. 

- `compartment`: A logic value that specifies whether you want to collect information from compartment masks.

- `compartment_measure`: The method for collecting information from a compartment mask. The options include:  
    
    - `"centroid"` will take the exact value at the centroid location of the cell

    - `"mean"` will take the mean pixel value of the cell mask pixels on the compartment image.

    - `"mode"` will take the most frequent value of the cell mask pixels on the compartment image.

- `neighbours`: A numeric value to determine the nearest *n* neighbouring cells and their distances. The output is additional columns (`nearest_neighbour` and `nearest_neighbours_dist`)to the cell.csv with values such as: nearestCell1_nearestCell2...nearestCell*n*, and distToCell1_distToCell2...distToCell*n*. Default is 0. **Note**: This does **not** increase the processing time. 

- `boundary_contacts`: A logic value that determines the number of pixels that border on other cells. The output is an additional column to the cell.csv with values such as "0x20_5x12" or "cellIDxpixels_cellIDxpixels". Default is 1. **Note**: This increases the time it takes to generate the cell.csv file by about 3x. 

- `panel_path`: Path to panel.csv.

- `mask_dir`: Directory of the mask images.

- `image_dir`: Directory of the fullstack images

- `compartment_dir`: Parent directory for the compartment folders. 

- `out_dir`: Output file for cell features.

- `geom_out_dir`: Output file for the csv containing geometry information. 

- `mask_suffix`: The suffix for masks. The default is "_mask"

- `image_suffix`: The suffix for fullstack images. The default is "_full"

- `comp_suffix`: The suffix for compartment masks. The default is "_compartment"

================================================================

**Packages:**   
- os    
- time
- numpy
- pandas
- tifffile
- skiimage.measure
- torch
- scipy.ndimage
- scipy.spatial.distance
- collections

================================================================

</details>

In [None]:
import os
import time
import numpy as np
import pandas as pd

from tifffile import imread
from skimage.measure import find_contours
import torch
from scipy.ndimage import binary_dilation, generate_binary_structure
from scipy.spatial.distance import cdist

from collections import defaultdict

def PyProfiler(rootdir = "", 
               projdir = "", 
               mean = 1, 
               shape = 1, 
               geometry = 1, 
               compartment = 1,
               compartment_measure = "mean",
               panel_path = "panel.csv",
               mask_dir =  "analysis/3_segmentation/3e_cellpose_mask",
               image_dir = "analysis/3_segmentation/3a_fullstack", 
               compartment_dir =  "analysis/3_segmentation/3f_compartments", 
               out_dir = "analysis/4_pyprofiler_output/cell.csv",
               geom_out_dir = "analysis/4_pyprofiler_output/geom.csv", 
               mask_suffix = "_mask",
               image_suffix = "_full",
               comp_suffix = "_compartment",
               neighbours=0,           # Number of nearest neighbours to find
               boundary_contacts=False # If True, store boundary contact breakdown in a single column
               ):

    # Check for CUDA availability
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    # Directories
    masks_dir = os.path.join(rootdir, projdir, mask_dir)
    stacks_dir = os.path.join(rootdir, projdir, image_dir)
    compartments_dir = os.path.join(rootdir, projdir, compartment_dir)

    # Get list of mask files (.tif or .tiff)
    mask_files = [f for f in os.listdir(masks_dir) if f.endswith(('.tif', '.tiff'))]
    image_names = [os.path.splitext(f)[0].replace(mask_suffix, "") for f in mask_files]

    # Identify compartments if applicable
    if compartment:
        compartment_folders = [
            f for f in os.listdir(compartments_dir) 
            if os.path.isdir(os.path.join(compartments_dir, f))
        ]
        if not compartment_folders:
            print("No folders found in the compartments directory. Disabling compartment processing.")
            compartment = 0
        else:
            compartment_masks = {}
            for folder in compartment_folders:
                folder_path = os.path.join(compartments_dir, folder)
                compartment_masks[folder] = {
                    f: os.path.join(folder_path, f)
                    for f in os.listdir(folder_path)
                    if f.endswith(('.tif', '.tiff'))
                }

    # Prepare for overall results
    all_results = []
    all_geom_results = []
    start_time = time.time()

    # ----------------------------------------------------------
    # Process each image
    # ----------------------------------------------------------
    for name in image_names:
        print(f"Processing {name}...")

        # Resolve actual mask path
        mask_path_tif = os.path.join(masks_dir, f"{name}{mask_suffix}.tif")
        mask_path_tiff = os.path.join(masks_dir, f"{name}{mask_suffix}.tiff")
        if os.path.exists(mask_path_tif):
            mask_path = mask_path_tif
        elif os.path.exists(mask_path_tiff):
            mask_path = mask_path_tiff
        else:
            print(f"No mask file found with .tif or .tiff extension for {name}")
            continue

        # Resolve actual stack path
        stack_path_tif = os.path.join(stacks_dir, f"{name}{image_suffix}.tif")
        stack_path_tiff = os.path.join(stacks_dir, f"{name}{image_suffix}.tiff")
        if os.path.exists(stack_path_tif):
            stack_path = stack_path_tif
        elif os.path.exists(stack_path_tiff):
            stack_path = stack_path_tiff
        else:
            print(f"No stack file found with .tif or .tiff extension for {name}")
            continue

        # Read images
        cell_mask = imread(mask_path)           
        fluorescence_stack = imread(stack_path) 

        # Convert to PyTorch
        cell_mask_tensor = torch.tensor(cell_mask, device=device, dtype=torch.int32)
        fluorescence_stack_tensor = torch.tensor(fluorescence_stack, device=device, dtype=torch.float32)

        # If boundary contacts are needed, keep a CPU copy
        cell_mask_cpu = cell_mask_tensor.cpu().numpy() if boundary_contacts else None

        # Load panel
        panel_path_full = os.path.join(rootdir, projdir, panel_path)
        panel = pd.read_csv(panel_path_full)
        selected_markers = panel[panel['Full'] == 1].reset_index(drop=True)
        selected_indices = range(len(selected_markers)) 
        selected_names = selected_markers['Target'].values

        # Extract unique CellIDs
        cell_ids = torch.unique(cell_mask_tensor).cpu().numpy()

        # Prepare results
        results = []
        geom_results = []

        # Compartment data
        compartment_data = {}
        if compartment:
            for comp_name, comp_files in compartment_masks.items():
                comp_file_tif = comp_files.get(f"{name}{comp_suffix}.tif")
                comp_file_tiff = comp_files.get(f"{name}{comp_suffix}.tiff")
                comp_file = comp_file_tif if comp_file_tif else comp_file_tiff
                if comp_file:
                    compartment_data[comp_name] = torch.tensor(
                        imread(comp_file),
                        device=device, 
                        dtype=torch.float32
                    )

        # ----------------------------------------
        # Process each cell
        # ----------------------------------------
        for cell_id in cell_ids:
            cell_region = (cell_mask_tensor == cell_id)
            cell_data = {
                "Image": name,
                "CellID": int(cell_id)
            }

            if shape:
                # Area
                cell_data["Area"] = int(torch.sum(cell_region).item())

                # Centroid
                if torch.any(cell_region):
                    indices = torch.nonzero(cell_region, as_tuple=True)
                    centroid_y = torch.mean(indices[0].float()).item()
                    centroid_x = torch.mean(indices[1].float()).item()
                else:
                    centroid_y, centroid_x = np.nan, np.nan

                cell_data["X"] = centroid_x
                cell_data["Y"] = centroid_y

                # Eccentricity
                bbox_indices = torch.nonzero(cell_region)
                if bbox_indices.shape[0] > 0:
                    height = bbox_indices[:, 0].max().item() - bbox_indices[:, 0].min().item() + 1
                    width  = bbox_indices[:, 1].max().item() - bbox_indices[:, 1].min().item() + 1
                    cell_data["Eccentricity"] = height / width if width != 0 else np.nan
                else:
                    cell_data["Eccentricity"] = np.nan

                # --------------------------------------------------
                # BOUNDARY CONTACT CALCULATIONS (Renamed & Combined)
                # --------------------------------------------------
                if boundary_contacts and cell_id != 0:
                    # Identify boundary pixels
                    cell_region_cpu = (cell_mask_cpu == cell_id)
                    structure = generate_binary_structure(2, 1)
                    dilated   = binary_dilation(cell_region_cpu, structure=structure)
                    boundary  = np.logical_xor(cell_region_cpu, dilated)
                    boundary_coords = np.argwhere(boundary)

                    # Keep a breakdown of how many boundary pixels touch each neighbour ID
                    neighbour_counts = defaultdict(int)

                    # Loop over each boundary pixel
                    for (yy, xx) in boundary_coords:
                        # Gather all distinct neighbour IDs (besides the cell itself)
                        neighbours_found = set()
                        # Check 8 neighbours
                        for ny in [yy-1, yy, yy+1]:
                            for nx in [xx-1, xx, xx+1]:
                                if (ny, nx) != (yy, xx):
                                    if (0 <= ny < cell_mask_cpu.shape[0]) and (0 <= nx < cell_mask_cpu.shape[1]):
                                        neighbour_id = cell_mask_cpu[ny, nx]
                                        if neighbour_id != cell_id:  
                                            neighbours_found.add(neighbour_id)
                        
                        # If we found any other cell(s), increment for each
                        if len(neighbours_found) > 0:
                            for n in neighbours_found:
                                neighbour_counts[n] += 1
                        else:
                            # If no other cell was found, we store "0" or something to indicate no contact
                            neighbour_counts[0] += 1

                    # Build an underscore-separated string like "2x20_3x13_0x40"
                    # Sort by neighbour ID for consistency
                    sorted_neighbours = sorted(neighbour_counts.items(), key=lambda x: x[0])
                    detail_str = "_".join(f"{int(nid)}x{count}" for (nid, count) in sorted_neighbours)
                    cell_data["boundary_contacts"] = detail_str

            # Mean intensity
            if mean:
                for idx, marker in zip(selected_indices, selected_names):
                    fluorescence_slice = fluorescence_stack_tensor[idx]
                    mean_intensity = torch.mean(fluorescence_slice[cell_region]).item()
                    cell_data[marker] = mean_intensity

            # Compartment measurement
            if compartment:
                for comp_name, comp_mask in compartment_data.items():
                    if compartment_measure == "centroid":
                        if not np.isnan(centroid_y) and not np.isnan(centroid_x):
                            comp_value = comp_mask[int(centroid_y), int(centroid_x)].item()
                        else:
                            comp_value = np.nan
                    elif compartment_measure == "mean":
                        comp_value = (torch.mean(comp_mask[cell_region]).item() 
                                      if cell_data["Area"] > 0 else np.nan)
                    elif compartment_measure == "mode":
                        values, counts = torch.unique(comp_mask[cell_region], return_counts=True)
                        comp_value = (
                            values[torch.argmax(counts)].item() 
                            if counts.numel() > 0 else np.nan
                        )
                    else:
                        raise ValueError(f"Invalid compartment_measure: {compartment_measure}")
                    cell_data[comp_name] = comp_value

            # Geometry
            if geometry:
                cell_region_cpu = cell_region.cpu().numpy()
                contours = find_contours(cell_region_cpu, level=0.5)
                geom = [contour.tolist() for contour in contours]
                geom_results.append({"Image": name, "CellID": int(cell_id), "Geometry": geom})

            # Store
            results.append(cell_data)

        # ---------------------------------------------------------
        # NEAREST NEIGHBOURS
        # ---------------------------------------------------------
        if neighbours > 0 and shape:
            # Build arrays for X, Y, CellID
            coords = []
            cellid_array = []
            for r in results:
                coords.append((r["Y"], r["X"]))   # (row, col)
                cellid_array.append(r["CellID"]) 
            
            coords = np.array(coords)
            cellid_array = np.array(cellid_array)

            # If we have at least 2 cells with valid coords
            if len(coords) > 1 and not np.isnan(coords).any():
                dist_matrix = cdist(coords, coords, metric="euclidean")

                for i, row_dist in enumerate(dist_matrix):
                    # i = index of the current cell
                    # sort by ascending distance
                    sorted_ix = np.argsort(row_dist)

                    # Remove the index to itself
                    sorted_ix = sorted_ix[sorted_ix != i]

                    # Filter out neighbours that are cellID 0
                    valid_ix = [ix for ix in sorted_ix if cellid_array[ix] != 0]

                    # Now pick up to K nearest from valid_ix
                    k = min(neighbours, len(valid_ix))
                    nearest_ids = cellid_array[valid_ix[:k]]
                    nearest_dists = row_dist[valid_ix[:k]]

                    # Convert to underscore-separated
                    nn_str   = "_".join(str(int(nid)) for nid in nearest_ids)
                    dist_str = "_".join(f"{d:.2f}" for d in nearest_dists)

                    results[i]["nearest_neighbours"] = nn_str
                    results[i]["nearest_neighbours_dist"] = dist_str
            else:
                # If there's only one cell or coords are invalid:
                for r in results:
                    r["nearest_neighbours"] = ""
                    r["nearest_neighbours_dist"] = ""

        # Collect results
        all_results.extend(results)
        all_geom_results.extend(geom_results)

    # ----------------------------------------------------------
    # Build DataFrames and output
    # ----------------------------------------------------------
    final_df = pd.DataFrame(all_results)
    geom_df  = pd.DataFrame(all_geom_results)

    output_path = os.path.join(rootdir, projdir, out_dir)
    geom_output_path = os.path.join(rootdir, projdir, geom_out_dir)

    final_df.to_csv(output_path, index=False)
    geom_df.to_csv(geom_output_path, index=False)

    print("Processing complete.")
    print("Total time taken:", time.time() - start_time)
    print(f"Results saved to {output_path}")
    print(f"Geometry results saved to {geom_output_path}")

In [None]:
from PyMComplete import PyProfiler

PyProfiler(
        rootdir = rootdir, 
        projdir = projdir, 
        mean = 1, 
        shape = 1, 
        geometry = 1, 
        compartment = 1,
        compartment_measure = "mean",
        panel_path = "panel.csv",
        mask_dir =  "analysis/3_segmentation/3e_cellpose_mask",
        image_dir = "analysis/3_segmentation/3a_fullstack", 
        compartment_dir =  "analysis/3_segmentation/3f_compartments", 
        out_dir = "analysis/4_pyprofiler_output/cell_test.csv",
        geom_out_dir = "analysis/4_pyprofiler_output/geom_test.csv", 
        mask_suffix = "_CpSeg_mask",
        image_suffix = "_full",
        comp_suffix = "_compartment",
        neighbours=10,         
        boundary_contacts=1    
        )