# InSituPy demonstration - Register images

This notebook demonstrates the registration of images from H&E, IHC or IF stainings that were performed on the same slide as the Xenium In Situ measurements. It is assumed that the images which are about to be registered, contain the same tissue as the spatial transcriptomics data. 


In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

## Load Xenium data into `InSituData` object

Now the Xenium data can be parsed by providing the data path to `InSituData` using the `read_xenium` function or directly using the downloading function.

In [2]:
from pathlib import Path
from insitupy import read_xenium, register_images, CACHE
from insitupy.datasets import human_pancreatic_cancer

### Load the dataset directly from the downloading function...

In [3]:
xd = human_pancreatic_cancer()

This dataset exists already. Download is skipped. To force download set `overwrite=True`.
Image exists. Checking md5sum...
The md5sum matches. Download is skipped. To force download set `overwrite=True`.
Image exists. Checking md5sum...
The md5sum matches. Download is skipped. To force download set `overwrite=True`.
Corresponding image data can be found in C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hpancreas\unregistered_images
For this dataset following images are available:
slide_id__hPancreas__HE__histo.ome.tiff
slide_id__hPancreas__CD20_TROP2_PPY_DAPI__IFIF_image_name.ome.tiff
Loading cells...
Loading images...
Loading transcripts...


### ... or use the `read_xenium` function and the path to the Xenium data directory if the dataset has already been downloaded

In [4]:
xd = read_xenium(CACHE / "demo_datasets/hpancreas/output-XETG00000__slide_id__hpancreas")

Loading cells...
Loading images...
Loading transcripts...


In [5]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0009465
[1mSample ID:[0m	hPancreas_cancer
[1mPath:[0m		C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hpancreas\output-XETG00000__slide_id__hpancreas
[1mMetadata file:[0m	experiment.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(13752, 48274)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 190965 × 474
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcells[0m
               [1mnuclei[0m
    ➤[95m[1m transcripts[0m
       DataFrame with shape 28455659 x 9

### Prepare the paths to the unregistered images

Here the unregistered images were downloaded by the `human_pancreas_cancer` downloading function and saved in a folder `unregistered_images`.

In [6]:
# prepare paths
if_to_be_registered = CACHE / "demo_datasets/hpancreas" / "unregistered_images/slide_id__hPancreas__CD20_TROP2_PPY_DAPI__IF.ome.tif"
he_to_be_registered = CACHE / "demo_datasets/hpancreas" / "unregistered_images/slide_id__hPancreas__HE__histo.ome.tif"

### Automated Registration of Images

**Overview:**
_Xenium In Situ_ is a non-destructive method that allows for staining and imaging of tissue after in situ sequencing analysis. This process is performed outside the _Xenium_ machine and requires subsequent registration. `InSituPy` provides an automatic image registration pipeline based on the [Scale-Invariant Feature Transform (SIFT) algorithm](https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94).

**Process:**
1. **Feature Detection:**
   - The SIFT algorithm detects common features between the template (_Xenium_ DAPI image) and the acquired images.
   - These features are used to calculate a transformation matrix.
   - The transformation matrix registers the images to the template.

<left><img src="../demo_screenshots/common_features.png" width="800"/></left>

*Common features extracted by SIFT algorithm*

2. **Preprocessing Steps:**
   - **Histological Images (H&E or IHC):**
     - These techniques produce RGB images.
     - Color deconvolution extracts the hematoxylin channel containing the nuclei for registration with the _Xenium_ DAPI image.
   - **Immunofluorescence (IF) Images:**
     - This method results in multiple grayscale images.
     - One channel must contain a nuclei stain (e.g., DAPI).
     - This channel is selected for SIFT feature detection and transformation matrix calculation.
     - Other channels are registered using the same transformation matrix.

### Cropping of Images from Whole Slide Images

**Workflow:**
In a Xenium In Situ workflow, a slide often contains multiple tissue sections. While spatial transcriptomics data is separated during the run, histological stainings contain all sections in one whole slide image. To extract individual images of histologically stained tissue sections, two workflows are recommended:

1. **QuPath Annotation:**
   - Annotate and name individual tissue sections in QuPath.
   - Use the `.groovy` script in `InSituPy/scripts/export_annotations_OME-TIFF.groovy`.

2. **Napari-Based Approach:**
   - Demonstrated in `XX_InSituPy_extract_individual_images.ipynb`.

### Input Files

**Formats:**
- **.tif** or **.ome.tif** formats are accepted.
- **IF Images:**
  - Multi-channel images are expected.
  - Specify channel names using the `channel_names` argument.
  - Specify the channel containing nuclei staining with the `channel_name_for_registration` argument (e.g., DAPI channel).
- **HE Images:**
  - Expected to be RGB images.
  - Cropping methods should result in the correct image format.

### Output Generated by the Registration Pipeline

1. **Registered Images:**
   - If `save_registered_images==True`, registered images are saved as `.ome.tif` in the `registered_images` folder in the parent directory of the _Xenium_ data.
   - File naming convention: `slide_id__sample_id__name__registered.ome.tif`.

2. **Transformation Matrix:**
   - Saved as `.csv` in the `registration_qc` folder within the `registered_images` folder.
   - File name ends with `__T.pdf`.

3. **Common Features:**
   - Representation of common features between the registered image and the template.
   - Saved as `.pdf` in the `registration_qc` folder.
   - File name ends with `__common_features`.

**Directory Structure:**
```
./demo_dataset
├───output-XETG00000__slide_id__sample_id
├───registered_images
│   │   slide_id__sample_id__name__registered.ome.tif
│   ├───registration_qc
│   │       slide_id__sample_id__name__T.csv
│   │       slide_id__sample_id__name__common_features.pdf
└───unregistered_images

## Registration of IF images

In [7]:
register_images(
    data=xd,
    image_to_be_registered=if_to_be_registered,
    image_type="IF",
    channel_names=['CD20', 'TROP2', 'PPY', 'DAPI'],
    channel_name_for_registration="DAPI",
    template_image_name="nuclei",
    save_registered_images=True
    )

	Processing following IF images: [1mCD20, TROP2, PPY, DAPI[0m
		Loading images to be registered...
		Select image with nuclei from IF image (channel index: 3)
Load and scale image data containing all channels.
		Load image into memory...
		Load template into memory...
		Rescale image and template to save memory.
			Rescaled from (4, 17091, 58644) to following dimensions: (4, 2159, 7409)
			Rescaled from (13752, 48274) to following dimensions: (2134, 7494)
		Convert scaled images to 8 bit
Image dimensions after resizing: (4, 9549, 32766). Resize factor: 0.5587272355228157
Load and scale image data containing only the channels required for registration.
		Rescale image and template to save memory.
			Rescaled from (17091, 58644) to following dimensions: (2159, 7409)
			Rescaled from (13752, 48274) to following dimensions: (2134, 7494)
		Convert scaled images to 8 bit
Image dimensions after resizing: (9549, 32766). Resize factor: 0.5587272355228157
		Extract common features from image a

## Registration of H&E images

In [8]:
register_images(
    data=xd,
    image_to_be_registered=he_to_be_registered,
    image_type="histo",
    channel_names='HE',
    template_image_name="nuclei",
    save_registered_images=True,
    )

	Processing following histo images: [1mHE[0m
		Loading images to be registered...
		Run color deconvolution
Load and scale image data containing all channels.
		Load image into memory...
		Load template into memory...
		Rescale image and template to save memory.
			Rescaled from (71883, 20562, 3) to following dimensions: (7478, 2139, 3)
			Rescaled from (13752, 48274) to following dimensions: (2134, 7494)
		Convert scaled images to 8 bit
Image dimensions after resizing: (32766, 9372, 3). Resize factor: 0.4558240474103752
Load and scale image data containing only the channels required for registration.
		Rescale image and template to save memory.
			Rescaled from (71880, 20560) to following dimensions: (7479, 2139)
			Rescaled from (13752, 48274) to following dimensions: (2134, 7494)
		Convert scaled images to 8 bit
Image dimensions after resizing: (32766, 9372). Resize factor: 0.4558430717863105
		Extract common features from image and template
		2025-02-15 21:50:47: Get features...


In [9]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0009465
[1mSample ID:[0m	hPancreas_cancer
[1mPath:[0m		C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hpancreas\output-XETG00000__slide_id__hpancreas
[1mMetadata file:[0m	experiment.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(13752, 48274)
       [1mCD20:[0m	(13752, 48274)
       [1mTROP2:[0m	(13752, 48274)
       [1mPPY:[0m	(13752, 48274)
       [1mHE:[0m	(13752, 48274, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 190965 × 474
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcells[0m
               [1mnuclei[0m
    ➤[95m[1m 

In [21]:
xd.show()

## Working with an `InSituPy` project

To allow a simple and structured saving workflow, `InSituPy` provides two saving functions:
- `saveas()`
- `save()`


### Save as `InSituPy` project

In [10]:
insitupy_project = Path(CACHE / "out/demo_panc_project")

In [11]:
xd.saveas(insitupy_project, overwrite=True)

Saving data to C:\Users\ge37voy\.cache\InSituPy\out\demo_panc_project




Saved.




### Save `InSituPy` project with downscaled image data

Since the image data is very large and not required during most of the trancriptomic analysis, we can downscale the image data to save disk space.

In [12]:
insitupy_project_downscaled = Path(CACHE / "out/demo_panc_project_downscaled")
xd.saveas(
    insitupy_project_downscaled, overwrite=True,
    images_max_resolution=1 # in µm/pixel
    )

Saving data to C:\Users\ge37voy\.cache\InSituPy\out\demo_panc_project_downscaled




Saved.




### Reload from `InSituPy` project

From the `InSituPy` project we can now load only the modalities that we need for later analyses. Due to an optimized file structure using `zarr` and `dask`, this makes loading and visualization of the data more efficient compared to doing this directly from the xenium data bundle.

In [13]:
from insitupy import InSituData

In [14]:
xd = InSituData.read(insitupy_project)
xd_ds = InSituData.read(insitupy_project_downscaled)

In [15]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0009465
[1mSample ID:[0m	hPancreas_cancer
[1mPath:[0m		C:\Users\ge37voy\.cache\InSituPy\out\demo_panc_project
[1mMetadata file:[0m	.ispy

No modalities loaded.

In [16]:
xd_ds

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0009465
[1mSample ID:[0m	hPancreas_cancer
[1mPath:[0m		C:\Users\ge37voy\.cache\InSituPy\out\demo_panc_project_downscaled
[1mMetadata file:[0m	.ispy

No modalities loaded.

### Load all required modalities

Next, we have to make sure that all data modalities that are required for the subsequent analyses are loaded. In our case it is the cellular data and the image data. If a modality is missing, one can load it with `.load_{modality}`.

In [17]:
xd_ds.load_cells()
xd_ds.load_images()

In [18]:
xd_ds.show()