# InSituPy demonstration - Add annotations

In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from insitupy import XeniumData

## Previous steps

1. Download the example data for demonstration: [01_InSituPy_demo_download_data.ipynb](./01_InSituPy_demo_download_data.ipynb)
2. Register images from external stainings: [02_InSituPy_demo_register_images.ipynb](./02_InSituPy_demo_register_images.ipynb)
3. Visualize data with napari and do preprocessing steps: [03_InSituPy_demo_analyze.ipynb](./03_InSituPy_demo_analyze.ipynb)

At this point, the structure of the data should look like this:

    ```
    ./demo_dataset
    ├───cropped_processed
    ├───output-XETG00000__slide_id__sample_id
    │   ├───analysis
    │   │   ├───clustering
    │   │   ├───diffexp
    │   │   ├───pca
    │   │   ├───tsne
    │   │   └───umap
    │   └───cell_feature_matrix
    ├───registered_images
    ├───registration_qc
    └───unregistered_images
    ```


## Load Xenium data into `XeniumData` object

Now the Xenium data can be parsed by providing the data path to `XeniumData`

In [3]:
# prepare paths
data_dir = Path("demo_dataset") # output directory
xenium_dir = data_dir / "output-XETG00000__slide_id__sample_id" # directory of xenium data
image_dir = data_dir / "unregistered_images" # directory of images

In [4]:
xd = XeniumData(xenium_dir)

In [5]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium

In [7]:
# read all data modalities at once
#xd.read_all()

# alternatively, it is also possible to read each modality separately
xd.read_cells()
xd.read_images(names=["HE"])
#xd.read_annotations()
# xd.read_boundaries()
# xd.read_transcripts()


Reading cells...
Reading images...


Note: That the `annotations` modality is not found here is expected. Annotations are added in a later step.

In [8]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m

## Load annotations

For the analysis of spatial transcriptomic datasets the inclusion of annotations from experts of disease pathology is key. Here, we demonstrate how to annotate data in [QuPath](https://qupath.github.io/), export the annotations as `.geojson` file and import them into the `XeniumData` object.

### Create annotations in QuPath

To create annotations in QuPath, follow these steps:

1. Select a annotation tool from the bar on the top left:

<center><img src="./demo_annotations/qupath_annotation_buttons.jpg" width="300"/></center>

2. Add as many annotations as you want and label them by setting classes in the annotation list. Do not forget to press the "Set class" button:

<center><img src="./demo_annotations/qupath_annotation_list.jpg" width="350"/></center>

3. Export annotations using `File > Export objects as GeoJSON`. Tick `Pretty JSON` to get an easily readable JSON file. The file name needs to have following structure: `annotation-{slide_id}__{sample_id}__{annotation_label}`.

### Import annotations into `XeniumData`

For demonstration purposes, we created a dummy annotation file in `./demo_annotations/`. To add the annotations to `XeniumData` follow the steps below.



In [9]:
xd.read_annotations(annotation_dir="./demo_annotations/")

Reading annotations...


In [10]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	4 annotations, 2 classes ('Positive', 'Negative') 
       [1mdemo2:[0m	5 annotations, 3 classes ('Negative', 'Positive', 'Other') 

### Visualize and edit annotations using napari

To show all annoation labels set `annotation_labels="all"`. We can also only show one specific annotation label or a list of labels, e.g. `xd.show(annotation_labels="demo2")`.


In [28]:
xd.show(annotation_labels="all")



#### Annotation layers

The annotations are added as shapes layers to the layer list. The layer name always starts with a "*" and has following syntax: `"* Class (Label)"`:

<left><img src="./demo_annotations/napari_layerlist_annotations.jpg" width="300"/></left>

- **Label**: A label for one collection of annotations. Could e.g. tell us who did the annotations or what is the focus of this collection of annotations.
- **Class**: Specifies the class of one specific annotation. Could be e.g. the name of cells, the morphological structure or the disease state annotated.

#### Add custom annotations using the Annotation Widget

<left><img src="./demo_annotations/napari_annotation_widget.jpg" width="200"/></left>

By clicking the `"Add annotation layer"` button a new layer with the above mentioned syntax is added. The layer controls on the top left can be then used to add new shapes as annotations:

<left><img src="./demo_annotations/napari_layerconrols_annotations.jpg" width="300"/></left>

An example annotation is shown here:

<left><img src="./demo_annotations/napari_annotation_example.jpg" width="200"/></left>

The annotations can then be stored in the `XeniumData` object using the `store_annotations` function.


In [12]:
xd.store_annotations()

Added 7 new annotations to label 'newlabel'


In [13]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	4 annotations, 2 classes ('Positive', 'Negative') 
       [1mdemo2:[0m	5 annotations, 3 classes ('Negative', 'Positive', 'Other') 
       [1mnewlabel:[0m	7 annotations, 1 classes ('newclass',) 

### Assign annotations to observations

To use the annotations in analyses (e.g. to select only observations within a certain annotation or compare gene expression between different annotations) one can use the `assign_annotations` function. It adds columns containing the annotation class to `xd.matrix.obs`. The column has the syntax `annotation-{Label}` and if an observation is not part of any annotation within this label, it contains `NaN`. 

In [15]:
xd.assign_annotations()

Assigning label 'demo'...
Assigning label 'demo2'...
Assigning label 'newlabel'...


After assigning the annotations, the labels analyzed here are marked with a tick (✔):

In [16]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'annotation-demo', 'annotation-demo2', 'annotation-newlabel'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	4 annotations, 2 classes ('Positive', 'Negative') ✔
       [1mdemo2:[0m	5 annotations, 3 classes ('Negative', 'Positive', 'Other') ✔
       

Following cells show examples how to explore the assigned annotations:

In [19]:
# print number of cells within one annotation
xd.cells.matrix.obs["annotation-demo2"].notna().sum()

9431

In [21]:
# show only observations that were part of this annotation label
xd.cells.matrix.obs[xd.cells.matrix.obs["annotation-demo2"].notna()]

Unnamed: 0,transcript_counts,control_probe_counts,control_codeword_counts,total_counts,cell_area,nucleus_area,annotation-demo,annotation-demo2,annotation-newlabel
4921,281,0,0,281,733.247187,26.010000,,Other,
4922,273,1,0,274,380.576875,30.074063,,Other,
4923,189,2,0,191,285.658437,8.263594,,Other,
4924,212,0,0,212,282.226562,24.068281,,Other,
4925,58,0,0,58,81.823125,4.470469,,Other,
...,...,...,...,...,...,...,...,...,...
165374,96,1,0,97,150.234844,11.063281,Negative,Negative,
165375,379,0,0,379,153.666719,75.681875,Negative,Negative,
165376,101,0,0,101,27.996875,17.836719,Negative,Negative,
165377,472,0,0,472,200.177656,52.652188,Negative,Negative,


## Crop data

In [30]:
xd_cropped = xd.crop(shape_layer="Shapes")

In [31]:
xd_cropped

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(13324, 13129, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 24134 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'annotation-demo', 'annotation-demo2', 'annotation-newlabel'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	2 annotations, 1 classes ('Positive',) ✔
       [1mdemo2:[0m	1 annotations, 1 classes ('Other',) ✔
       [1mnewlabel:[0m	7 annotations, 1 c

## Save results

The cropped and/or processed data can be saved into a folder using the `.save()` function of `XeniumData`.

The resulting folder has following structure:
```
with_annotations
│   xenium.json
│   xeniumdata.json
│
├───annotations
│       demo.geojson
│
├───boundaries
│       cells.parquet
│       nuclei.parquet
│
├───images
│       morphology_focus.ome.tif
│       slide_id__sample_id__CD20__registered.ome.tif
│       slide_id__sample_id__HER2__registered.ome.tif
│       slide_id__sample_id__HE__registered.ome.tif
│
├───matrix
│       matrix.h5ad
│
└───transcripts
        transcripts.parquet
```

In [36]:
xd_cropped.show(annotation_labels="all")



In [42]:
xd_cropped.annotations.metadata['demo']['classes'].tolist()

['Positive']

In [52]:
meta = xd_cropped.annotations.metadata['demo']

In [53]:
def myprint(d):
    for k, v in d.items():
        if isinstance(v, dict):
            myprint(v)
        else:
            print("{0} : {1}".format(k, v))

In [56]:
nested_dict

{'key1': [1, 2, 3],
 'key2': {'nested_key1': [4, 5, 6], 'nested_key2': 'some_value'},
 'key3': 'another_value'}

In [101]:
import numpy as np

def nested_dict_numpy_to_list(dictionary):
    for key, value in dictionary.items():
        if isinstance(value, np.ndarray):
            dictionary[key] = value.tolist()
        elif isinstance(value, dict):
            nested_dict_numpy_to_list(value)

# Example usage:
nested_dict = {
    'key1': np.array([1, 2, 3]),
    'key2': {
        'nested_key1': np.array([4, 5, 6]),
        'nested_key2': 'some_value'
    },
    'key3': 'another_value'
}


In [47]:
isinstance(xd_cropped.annotations.metadata['demo']['classes'], ndarray)

True

In [44]:
xd.annotations.metadata

{'demo': {'n_annotations': 4,
  'classes': ['Positive', 'Negative'],
  'analyzed': '✔'},
 'demo2': {'n_annotations': 5,
  'classes': ['Negative', 'Positive', 'Other'],
  'analyzed': '✔'},
 'newlabel': {'n_annotations': 7, 'classes': ['newclass'], 'analyzed': '✔'}}

In [113]:
out_dir = data_dir / "cropped_with_annotations"
xd_cropped.save(out_dir, overwrite=True)

In [114]:
xd_reloaded = XeniumData(out_dir)

In [115]:
xd_reloaded

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	cropped_with_annotations
[1mMetadata file:[0m	.xeniumdata

In [116]:
xd_reloaded.read_all()

Reading annotations...
Reading cells...
Reading images...
No `transcripts` modality found.




In [117]:
xd_reloaded

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	cropped_with_annotations
[1mMetadata file:[0m	.xeniumdata
    ➤ [34m[1mimages[0m
       [1mHE:[0m	(13324, 13129, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 24134 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'annotation-demo', 'annotation-demo2', 'annotation-newlabel'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mcellular[0m
               [1mnuclear[0m
    ➤ [36m[1mannotations[0m
       [1mdemo:[0m	2 annotations, 1 classes ('Positive',) ✔
       [1mdemo2:[0m	1 annotations, 1 classes ('Other',) ✔
       [1mnewlabel:[0m	7 annotations, 1 classes ('newclass',) ✔

In [118]:
xd_reloaded.show(annotation_labels="all")

