# InSituPy demonstration - Add annotations

In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from insitupy import XeniumData

## Previous steps

1. Download the example data for demonstration: [01_InSituPy_demo_download_data.ipynb](./01_InSituPy_demo_download_data.ipynb)
2. Register images from external stainings: [02_InSituPy_demo_register_images.ipynb](./02_InSituPy_demo_register_images.ipynb)
3. Visualize data with napari and do preprocessing steps: [03_InSituPy_demo_analyze.ipynb](./03_InSituPy_demo_analyze.ipynb)

At this point, the structure of the data should look like this:

    ```
    ./demo_dataset
    ├───cropped_processed
    ├───output-XETG00000__slide_id__sample_id
    │   ├───analysis
    │   │   ├───clustering
    │   │   ├───diffexp
    │   │   ├───pca
    │   │   ├───tsne
    │   │   └───umap
    │   └───cell_feature_matrix
    ├───registered_images
    ├───registration_qc
    └───unregistered_images
    ```


## Load Xenium data into `XeniumData` object

Now the Xenium data can be parsed by providing the data path to `XeniumData`

In [3]:
# prepare paths
data_dir = Path("demo_dataset") # output directory
xenium_dir = data_dir / "output-XETG00000__slide_id__sample_id" # directory of xenium data
image_dir = data_dir / "unregistered_images" # directory of images

In [4]:
xd = XeniumData(xenium_dir)

In [5]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium

In [6]:
# read all data modalities at once
xd.read_all()

# alternatively, it is also possible to read each modality separately
# xd.read_matrix()
# xd.read_images()
# xd.read_boundaries()
# xd.read_transcripts()
# xd.read_annotations()

No `annotations` modality found.
Reading boundaries...
Reading images...
Reading matrix...
Reading transcripts...


Note: That the `annotations` modality is not found here is expected. Annotations are added in a later step.

In [7]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m matrix[0m
       AnnData object with n_obs × n_vars = 167780 × 313
	       obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
	       var: 'gene_ids', 'feature_types', 'genome'
	       obsm: 'spatial'
    ➤[96m[1m transcripts[0m
	   DataFrame with shape 42638083 x 8
    ➤ [95m[1mboundaries[0m
       [1mcells[0m
       [1mnuclei[0m

## Load annotations

For the analysis of spatial transcriptomic datasets the inclusion of annotations from experts of disease pathology is key. Here, we demonstrate how to annotate data in [QuPath](https://qupath.github.io/), export the annotations as `.geojson` file and import them into the `XeniumData` object.

### Create annotations in QuPath

To create annotations in QuPath, follow these steps:

1. Select a annotation tool from the bar on the top left:

<center><img src="./demo_annotations/qupath_annotation_buttons.png"/></center>

2. Add as many annotations as you want and label them by setting classes in the annotation list. Do not forget to press the "Set class" button:

<center><img src="./demo_annotations/qupath_annotation_list.png"/></center>

3. Export annotations using `File > Export objects as GeoJSON`. Tick `Pretty JSON` to get an easily readable JSON file. The file name needs to have following structure: `annotation-{slide_id}__{sample_id}__{annotation_label}`.

### Import annotations into `XeniumData`

For demonstration purposes, we created a dummy annotation file in `./demo_annotations/`. To add the annotations to `XeniumData` follow the steps below.



In [8]:
xd.read_annotations(annotation_dir="./demo_annotations/")

Reading annotations...


In [9]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m matrix[0m
       AnnData object with n_obs × n_vars = 167780 × 313
	       obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
	       var: 'gene_ids', 'feature_types', 'genome'
	       obsm: 'spatial'
    ➤[96m[1m transcripts[0m
	   DataFrame with shape 42638083 x 8
    ➤ [95m[1mboundaries[0m
       [1mcells[0m
       [1mnuclei[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	4 annotations, 2 classes ('Positive', 'Negative') 
       [1mdemo2:[0m	5 annotations, 3 classes ('Negative', 'Positive', 'O

### Visualize annotations using napari

In [11]:
xd.show(annotation_labels="all")



In [13]:
xd.show(annotation_labels="demo2")

  miter_lengths_squared = (miters**2).sum(axis=1)
  data = data.astype(np.float32)


  data = data.astype(np.float32)


## Save results

The cropped and/or processed data can be saved into a folder using the `.save()` function of `XeniumData`.

The resulting folder has following structure:
```
with_annotations
│   xenium.json
│   xeniumdata.json
│
├───annotations
│       demo.geojson
│
├───boundaries
│       cells.parquet
│       nuclei.parquet
│
├───images
│       morphology_focus.ome.tif
│       slide_id__sample_id__CD20__registered.ome.tif
│       slide_id__sample_id__HER2__registered.ome.tif
│       slide_id__sample_id__HE__registered.ome.tif
│
├───matrix
│       matrix.h5ad
│
└───transcripts
        transcripts.parquet
```

In [15]:
out_dir = data_dir / "with_annotations"
xd.save(out_dir, overwrite=True)

In [16]:
xd_reloaded = XeniumData(out_dir)

In [17]:
xd_reloaded

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	with_annotations
[1mMetadata file:[0m	xeniumdata.json

In [18]:
xd_reloaded.read_all()

Reading annotations...
Reading boundaries...
Reading images...
Reading matrix...
Reading transcripts...




In [19]:
xd_reloaded

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	with_annotations
[1mMetadata file:[0m	xeniumdata.json
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m matrix[0m
       AnnData object with n_obs × n_vars = 167780 × 313
	       obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
	       var: 'gene_ids', 'feature_types', 'genome'
	       obsm: 'spatial'
    ➤[96m[1m transcripts[0m
	   DataFrame with shape 42638083 x 8
    ➤ [95m[1mboundaries[0m
       [1mcells[0m
       [1mnuclei[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	4 annotations, 2 classes ('Positive', 'Negative') 
       [1mdemo2:[0m	5 annotations, 3 classes ('Negative', 'Positive', 'Other') 