# InSituPy demonstration - Register images

This notebook demonstrates the registration of images from H&E, IHC or IF stainings that were performed on the same slide as the Xenium In Situ measurements. It is assumed that the images which are about to be registered, contain the same tissue as the spatial transcriptomics data. 


In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from insitupy import read_xenium
from insitupy import register_images

## Load Xenium data into `InSituData` object

Now the Xenium data can be parsed by providing the data path to `InSituData` using the `read_xenium` function or directly using the downloading function.

In [3]:
from insitupy.datasets import human_breast_cancer
from insitupy import CACHE

### Load the dataset directly from the downloading function...

In [4]:
xd = read_xenium(r"C:\Users\ge37voy\OneDrive - TUM\Dokumente - SpatialPathology\Projects\2309_PDAC_SATURN3\data\2309-02\20231117__151509__2309-02-PDAC_TMAs\output-XETG00050__0005405__TMA__20231117__151519")

Loading cells...
Loading images...
Loading transcripts...


In [5]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0005405
[1mSample ID:[0m	TMA
[1mPath:[0m		C:\Users\ge37voy\OneDrive - TUM\Dokumente - SpatialPathology\Projects\2309_PDAC_SATURN3\data\2309-02\20231117__151509__2309-02-PDAC_TMAs\output-XETG00050__0005405__TMA__20231117__151519
[1mMetadata file:[0m	experiment.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(112068, 54048)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 345158 × 477
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
           varm: 'binned_expression'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mnuclear[0m
               [1mcellular[0m
    ➤[95m[1m transcripts[0m
       Da

### Prepare the paths to the unregistered images

Here the unregistered images were downloaded by the `human_breast_cancer` downloading function and saved in a folder `unregistered_images`.

In [7]:
# prepare paths
#if_to_be_registered = CACHE / "demo_datasets/hbreastcancer" / "unregistered_images/slide_id__hbreastcancer__CD20_HER2_DAPI__IF.ome.tif"
he_to_be_registered = r"C:\Users\ge37voy\OneDrive - TUM\Dokumente - SpatialPathology\Projects\2309_PDAC_SATURN3\data\2309-02\TMA4-0005404-downscale1.ome.tif"

### Automated Registration of Images

**Overview:**
_Xenium In Situ_ is a non-destructive method that allows for staining and imaging of tissue after in situ sequencing analysis. This process is performed outside the _Xenium_ machine and requires subsequent registration. `InSituPy` provides an automatic image registration pipeline based on the [Scale-Invariant Feature Transform (SIFT) algorithm](https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94).

**Process:**
1. **Feature Detection:**
   - The SIFT algorithm detects common features between the template (_Xenium_ DAPI image) and the acquired images.
   - These features are used to calculate a transformation matrix.
   - The transformation matrix registers the images to the template.

<left><img src="./demo_screenshots/common_features.png" width="800"/></left>
   *Common features extracted by SIFT algorithm*

2. **Preprocessing Steps:**
   - **Histological Images (H&E or IHC):**
     - These techniques produce RGB images.
     - Color deconvolution extracts the hematoxylin channel containing the nuclei for registration with the _Xenium_ DAPI image.
   - **Immunofluorescence (IF) Images:**
     - This method results in multiple grayscale images.
     - One channel must contain a nuclei stain (e.g., DAPI).
     - This channel is selected for SIFT feature detection and transformation matrix calculation.
     - Other channels are registered using the same transformation matrix.

### Cropping of Images from Whole Slide Images

**Workflow:**
In a Xenium In Situ workflow, a slide often contains multiple tissue sections. While spatial transcriptomics data is separated during the run, histological stainings contain all sections in one whole slide image. To extract individual images of histologically stained tissue sections, two workflows are recommended:

1. **QuPath Annotation:**
   - Annotate and name individual tissue sections in QuPath.
   - Use the `.groovy` script in `InSituPy/scripts/export_annotations_OME-TIFF.groovy`.

2. **Napari-Based Approach:**
   - Demonstrated in `XX_InSituPy_extract_individual_images.ipynb`.

### Input Files

**Formats:**
- **.tif** or **.ome.tif** formats are accepted.
- **IF Images:**
  - Multi-channel images are expected.
  - Specify channel names using the `channel_names` argument.
  - Specify the channel containing nuclei staining with the `channel_name_for_registration` argument (e.g., DAPI channel).
- **HE Images:**
  - Expected to be RGB images.
  - Cropping methods should result in the correct image format.

### Output Generated by the Registration Pipeline

1. **Registered Images:**
   - If `save_registered_images==True`, registered images are saved as `.ome.tif` in the `registered_images` folder in the parent directory of the _Xenium_ data.
   - File naming convention: `slide_id__sample_id__name__registered.ome.tif`.

2. **Transformation Matrix:**
   - Saved as `.csv` in the `registration_qc` folder within the `registered_images` folder.
   - File name ends with `__T.pdf`.

3. **Common Features:**
   - Representation of common features between the registered image and the template.
   - Saved as `.pdf` in the `registration_qc` folder.
   - File name ends with `__common_features`.

**Directory Structure:**
```
./demo_dataset
├───output-XETG00000__slide_id__sample_id
├───registered_images
│   │   slide_id__sample_id__name__registered.ome.tif
│   ├───registration_qc
│   │       slide_id__sample_id__name__T.csv
│   │       slide_id__sample_id__name__common_features.pdf
└───unregistered_images

## Registration of H&E images

In [8]:
register_images(
    data=xd,
    image_to_be_registered=he_to_be_registered,
    image_type="histo",
    channel_names='HE',
    template_image_name="nuclei",
    save_registered_images=True,
    min_good_matches=200
    )

	Processing following histo images: [1mHE[0m
		Loading images to be registered...
		Run color deconvolution
		Rescale image and template to save memory.
			Rescaled to following dimensions: (2828, 5656)
			Rescaled to following dimensions: (5759, 2777)
		Convert scaled images to 8 bit
		Extract common features from image and template
		2025-02-05 21:11:22: Get features...
			Method: SIFT...
		2025-02-05 21:11:34: Compute matches...
		2025-02-05 21:11:36: Filter matches...
			Sufficient number of good matches found (303).
		2025-02-05 21:11:37: Display matches...
		2025-02-05 21:11:37: Fetch keypoints...
		2025-02-05 21:11:37: Estimate 2D affine transformation matrix...
		Estimate affine transformation matrix for resized image
		2025-02-05 21:11:37: Register image by affine transformation...
		Save OME-TIFF to C:\Users\ge37voy\OneDrive - TUM\Dokumente - SpatialPathology\Projects\2309_PDAC_SATURN3\data\2309-02\20231117__151509__2309-02-PDAC_TMAs\registered_images\__0005405__TMA__HE__re

In [9]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0005405
[1mSample ID:[0m	TMA
[1mPath:[0m		C:\Users\ge37voy\OneDrive - TUM\Dokumente - SpatialPathology\Projects\2309_PDAC_SATURN3\data\2309-02\20231117__151509__2309-02-PDAC_TMAs\output-XETG00050__0005405__TMA__20231117__151519
[1mMetadata file:[0m	experiment.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(112068, 54048)
       [1mHE:[0m	(112068, 54048, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 345158 × 477
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
           varm: 'binned_expression'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mnuclear[0m
               [1mcellular[0m
  

In [22]:
xd.show()

## Working with an `InSituPy` project

To allow a simple and structured saving workflow, `InSituPy` provides two saving functions:
- `saveas()`
- `save()`


### Save as `InSituPy` project

In [10]:
insitupy_project = Path("demo_dataset/test")

In [11]:
xd.saveas(insitupy_project, overwrite=True)

Saving data to demo_dataset\test




Saved.


In [12]:
from insitupy import InSituData

In [13]:
xd = InSituData.read(insitupy_project)

In [15]:
xd.load_images()
xd.load_cells()

Loading images...


In [17]:
xd.show()

In [19]:
ar = xd.images["HE"][0]

In [22]:
ar

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 2 graph layers,153 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 16.92 GiB 127.97 MiB Shape (112068, 54048, 3) (6688, 6688, 3) Dask graph 153 chunks in 2 graph layers Data type uint8 numpy.ndarray",3  54048  112068,

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 2 graph layers,153 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [27]:
#from distributed import Client
import dask.array as da
from scipy.ndimage import zoom

def procedure(target):
  print("proceduring",target.shape)
  return zoom(target, [0.5,0.5,1])

tile_map = da.map_overlap(procedure, ar)

proceduring (0, 0, 0)
proceduring (1, 1, 1)


In [29]:
zoom(ar, [0.5,0.5,1])

KeyboardInterrupt: 

In [28]:
tile_map

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 16.92 GiB 127.97 MiB Shape (112068, 54048, 3) (6688, 6688, 3) Dask graph 153 chunks in 3 graph layers Data type uint8 numpy.ndarray",3  54048  112068,

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 3 graph layers,153 chunks in 3 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [24]:
ar

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 2 graph layers,153 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 16.92 GiB 127.97 MiB Shape (112068, 54048, 3) (6688, 6688, 3) Dask graph 153 chunks in 2 graph layers Data type uint8 numpy.ndarray",3  54048  112068,

Unnamed: 0,Array,Chunk
Bytes,16.92 GiB,127.97 MiB
Shape,"(112068, 54048, 3)","(6688, 6688, 3)"
Dask graph,153 chunks in 2 graph layers,153 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


### Save `InSituPy` project with downscaled image data

Since the image data is very large and not required during most of the trancriptomic analysis, we can downscale the image data to save disk space.

In [25]:
insitupy_project_downscaled = Path("demo_dataset/demo_insitupy_project_downscaled")
xd.saveas(
    insitupy_project_downscaled, overwrite=True,
    images_max_resolution=1 # in µm/pixel
    )

Saving data to demo_dataset\demo_insitupy_project_downscaled
Downscale image to 1 µm per pixel by factor 4.705882352941177
Downscale image to 1 µm per pixel by factor 4.705882352941177
Downscale image to 1 µm per pixel by factor 4.705882352941177
Downscale image to 1 µm per pixel by factor 4.705882352941177
Downscale image to 1 µm per pixel by factor 4.705882352941177




Saved.


### Reload from `InSituPy` project

From the `InSituPy` project we can now load only the modalities that we need for later analyses. Due to an optimized file structure using `zarr` and `dask`, this makes loading and visualization of the data more efficient compared to doing this directly from the xenium bundle.

In [26]:
from insitupy import InSituData

In [27]:
xd = InSituData.read(insitupy_project)
xd_ds = InSituData.read(insitupy_project_downscaled)

In [28]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\ge37voy\Github\InSituPy\notebooks\demo_dataset\demo_insitupy_project
[1mMetadata file:[0m	.ispy

No modalities loaded.

In [29]:
xd_ds

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\ge37voy\Github\InSituPy\notebooks\demo_dataset\demo_insitupy_project_downscaled
[1mMetadata file:[0m	.ispy

No modalities loaded.

### Load all required modalities

Next, we have to make sure that all data modalities that are required for the subsequent analyses are loaded. In our case it is the cellular data and the image data. If a modality is missing, one can load it with `.load_{modality}`.

In [30]:
xd_ds.load_cells()
xd_ds.load_images()

Loading images...


In [31]:
xd_ds.show()