# InSituPy demonstration - Add annotations

In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from insitupy import XeniumData

## Previous steps

1. Download the example data for demonstration: [01_InSituPy_demo_download_data.ipynb](./01_InSituPy_demo_download_data.ipynb)
2. Register images from external stainings: [02_InSituPy_demo_register_images.ipynb](./02_InSituPy_demo_register_images.ipynb)
3. Visualize data with napari and do preprocessing steps: [03_InSituPy_demo_analyze.ipynb](./03_InSituPy_demo_analyze.ipynb)

At this point, the structure of the data should look like this:

    ```
    ./demo_dataset
    ├───cropped_processed
    ├───output-XETG00000__slide_id__sample_id
    │   ├───analysis
    │   │   ├───clustering
    │   │   ├───diffexp
    │   │   ├───pca
    │   │   ├───tsne
    │   │   └───umap
    │   └───cell_feature_matrix
    ├───registered_images
    ├───registration_qc
    └───unregistered_images
    ```


## Load Xenium data into `XeniumData` object

Now the Xenium data can be parsed by providing the data path to `XeniumData`

In [3]:
# prepare paths
data_dir = Path("demo_dataset") # output directory
xenium_dir = data_dir / "output-XETG00000__slide_id__sample_id" # directory of xenium data
image_dir = data_dir / "unregistered_images" # directory of images

In [4]:
xd = XeniumData(xenium_dir)

In [5]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium

In [6]:
# read all data modalities at once
xd.read_all()

# alternatively, it is also possible to read each modality separately
# xd.read_matrix()
# xd.read_images()
# xd.read_boundaries()
# xd.read_transcripts()
# xd.read_annotations()

No `annotations` modality found.
Reading boundaries...
Reading images...
Reading matrix...
Reading transcripts...


Note: That the `annotations` modality is not found here is expected. Annotations are added in a later step.

In [7]:
xd

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	output-XETG00000__slide_id__sample_id
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m matrix[0m
       AnnData object with n_obs × n_vars = 167780 × 313
	       obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
	       var: 'gene_ids', 'feature_types', 'genome'
	       obsm: 'spatial'
    ➤[96m[1m transcripts[0m
	   DataFrame with shape 42638083 x 8
    ➤ [95m[1mboundaries[0m
       [1mcells[0m
       [1mnuclei[0m

## Load annotations

For the analysis of spatial transcriptomic datasets the inclusion of annotations from experts of disease pathology is key. Here, we demonstrate how to annotate data in [QuPath](https://qupath.github.io/), export the annotations as `.geojson` file and import them into the `XeniumData` object.

### Create annotations in QuPath

To create annotations in QuPath, follow these steps:

1. Select a annotation tool from the bar on the top left:

<center><img src="./demo_annotations/qupath_annotation_buttons.png"/></center>

2. Add as many annotations as you want and label them by setting classes in the annotation list. Do not forget to press the "Set class" button:

<center><img src="./demo_annotations/qupath_annotation_list.png"/></center>

3. Export annotations using `File > Export objects as GeoJSON`. Tick `Pretty JSON` to get an easily readable JSON file. The file name needs to have following structure: `annotation-{slide_id}__{sample_id}__{annotation_label}`.

### Import annotations into `XeniumData`

For demonstration purposes, we created a dummy annotation file in `./demo_annotations/`. To add the annotations to `XeniumData` follow the steps below.



In [8]:
xd.read_annotations(annotation_dir="./demo_annotations/")

Reading annotations...


In [9]:
xd.show(annotation_labels="demo")



## Save results

The cropped and/or processed data can be saved into a folder using the `.save()` function of `XeniumData`.

The resulting folder has following structure:
```
cropped_processed
│   xenium.json
│   xeniumdata.json
│
├───boundaries
│       cells.parquet
│       nuclei.parquet
│
├───images
│       morphology_focus.ome.tif
│       slide_id__sample_id__CD20__registered.ome.tif
│       slide_id__sample_id__HER2__registered.ome.tif
│       slide_id__sample_id__HE__registered.ome.tif
│
├───matrix
│       matrix.h5ad
│
└───transcripts
        transcripts.parquet
```

In [12]:
import geopandas

file = Path(r"C:\Users\ge37voy\Github\InSituPy\notebooks\demo_annotations\annotation-slide_id__sample_id__demo.geojson")
dataframe = geopandas.read_file(file)

In [13]:
dataframe

Unnamed: 0,id,objectType,classification,geometry
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,"{'name': 'Positive', 'color': [250, 62, 62]}","POLYGON ((8863.00000 10814.00000, 8863.00000 1..."
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,"{'name': 'Positive', 'color': [250, 62, 62]}","POLYGON ((13096.00000 12492.00000, 13072.40000..."
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,"{'name': 'Negative', 'color': [112, 112, 225]}","POLYGON ((30975.26000 22938.00000, 30982.00000..."
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,"{'name': 'Negative', 'color': [112, 112, 225]}","POLYGON ((31165.00000 16408.00000, 31149.00000..."


In [21]:
from insitupy import read_qupath_geojson

In [54]:
df = read_qupath_geojson(file)

In [55]:
df

Unnamed: 0,id,objectType,geometry,name,color
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,"POLYGON ((8863.00000 10814.00000, 8863.00000 1...",Positive,"[250, 62, 62]"
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,"POLYGON ((13096.00000 12492.00000, 13072.40000...",Positive,"[250, 62, 62]"
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,"POLYGON ((30975.26000 22938.00000, 30982.00000...",Negative,"[112, 112, 225]"
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,"POLYGON ((31165.00000 16408.00000, 31149.00000...",Negative,"[112, 112, 225]"


In [60]:
from geopandas.geodataframe import GeoDataFrame

def write_qupath_geojson(dataframe: GeoDataFrame):
    """
    Converts a DataFrame with "name" and "color" columns into a QuPath-compatible GeoJSON-like format,
    adding a new "classification" column containing dictionaries with "name" and "color" entries.

    Parameters:
    - dataframe (pandas.DataFrame): The input DataFrame containing "name" and "color" columns.

    Returns:
    pandas.DataFrame: A modified DataFrame with a new "classification" column, and the original "name" and "color"
    columns removed.
    """
    l = []
    for i, row in dataframe.iterrows():
        d = {}
        for c in ["name", "color"]:
            d[c] = row[c]
        l.append(d)

    dataframe["classification"] = l
    dataframe = dataframe.drop(["name", "color"], axis=1)
    
    # write file as geojson
    dataframe.to_file(file.parent / "test.geojson", driver="GeoJSON")
    return dataframe


In [61]:
dd = write_qupath_geojson(df)

In [63]:
dd.to_file(file.parent / "test.geojson", driver="GeoJSON")

In [59]:
type(dd)

geopandas.geodataframe.GeoDataFrame

Unnamed: 0,id,objectType,geometry,classification
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,"POLYGON ((8863.00000 10814.00000, 8863.00000 1...","{'name': 'Positive', 'color': [250, 62, 62]}"
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,"POLYGON ((13096.00000 12492.00000, 13072.40000...","{'name': 'Positive', 'color': [250, 62, 62]}"
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,"POLYGON ((30975.26000 22938.00000, 30982.00000...","{'name': 'Negative', 'color': [112, 112, 225]}"
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,"POLYGON ((31165.00000 16408.00000, 31149.00000...","{'name': 'Negative', 'color': [112, 112, 225]}"


In [36]:
dataframe

Unnamed: 0,id,objectType,geometry,name,color
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,"POLYGON ((8863.00000 10814.00000, 8863.00000 1...",Positive,"[250, 62, 62]"
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,"POLYGON ((13096.00000 12492.00000, 13072.40000...",Positive,"[250, 62, 62]"
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,"POLYGON ((30975.26000 22938.00000, 30982.00000...",Negative,"[112, 112, 225]"
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,"POLYGON ((31165.00000 16408.00000, 31149.00000...",Negative,"[112, 112, 225]"


In [17]:
file.suffix

'.geojson'

In [14]:
xd.annotations.demo

Unnamed: 0,id,objectType,geometry,name,color
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,"POLYGON ((8863.00000 10814.00000, 8863.00000 1...",Positive,"[250, 62, 62]"
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,"POLYGON ((13096.00000 12492.00000, 13072.40000...",Positive,"[250, 62, 62]"
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,"POLYGON ((30975.26000 22938.00000, 30982.00000...",Negative,"[112, 112, 225]"
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,"POLYGON ((31165.00000 16408.00000, 31149.00000...",Negative,"[112, 112, 225]"


# Problem:

## Do we want to save annotations as .geojson or .parquet?

### GeoJSON

Advantages:
- Compatible with QuPath: later the file can be again loaded into QuPath

Disadvantages:
- Does not allow list formats. Instead it is important to leave the file as read by geopandas.read_file()
- Currently the dataframe is reshaped after reading so that it includes the columns "name" and "color". This is useful when plotting the annotations in napari. But it is not possible to write this back to geojson. We would have to reshape it back to the dictionary structure of geopandas.read_file().

### Parquet

Advantages:
- More flexible when saving. Additional columns can be added easily

Disadvantages:
- can not be read by QuPath

In [None]:
file = Path(r"C:\Users\ge37voy\Github\InSituPy\notebooks\demo_annotations\annotation-slide_id__sample_id__demo.geojson")


In [19]:
out_dir = data_dir / "with_annotations"
xd.save(out_dir, overwrite=True)

ImportError: Missing optional dependency 'pyarrow.parquet'. pyarrow is required for Parquet support.  "
        "Use pip or conda to install pyarrow.parquet.

In [37]:
xd_reloaded = XeniumData(out_dir)

In [38]:
xd_reloaded

[1m[31mXeniumData[0m
[1mSlide ID:[0m	slide_id
[1mSample ID:[0m	sample_id
[1mData path:[0m	demo_dataset
[1mData folder:[0m	with_annotations
[1mMetadata file:[0m	xeniumdata.json

In [39]:
xd_reloaded.read_all()

Reading annotations...
demo_dataset\with_annotations\annotations\demo.parquet


DriverError: 'demo_dataset\with_annotations\annotations\demo.parquet' not recognized as a supported file format.

In [40]:
xd_reloaded.xd_metadata

{'slide_id': 'slide_id',
 'sample_id': 'sample_id',
 'path': 'demo_dataset\\output-XETG00000__slide_id__sample_id',
 'images': {'nuclei': 'images/morphology_focus.ome.tif',
  'CD20': 'images/slide_id__sample_id__CD20__registered.ome.tif',
  'HER2': 'images/slide_id__sample_id__HER2__registered.ome.tif',
  'HE': 'images/slide_id__sample_id__HE__registered.ome.tif'},
 'matrix': 'matrix/matrix.h5ad',
 'transcripts': 'transcripts/transcripts.parquet',
 'boundaries': {'cells': 'boundaries/cells.parquet',
  'nuclei': 'boundaries/nuclei.parquet'},
 'annotations': {'demo': 'annotations/demo.parquet'}}

In [21]:
import pandas as pd
pd.read_parquet(r"C:\Users\ge37voy\Github\InSituPy\notebooks\demo_dataset\with_annotations\annotations\demo.parquet")

Unnamed: 0,id,objectType,geometry,name,color
0,bd3aacca-1716-4df8-91dd-bf8f6413a7bd,annotation,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x91\x02...,Positive,"[250, 62, 62]"
1,69814505-4059-42cd-8df2-752f7eb0810d,annotation,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x1c\x02...,Positive,"[250, 62, 62]"
2,1957cd32-0a21-4b45-9dae-ecf236217140,annotation,b'\x01\x03\x00\x00\x00\x02\x00\x00\x00\xb6\x02...,Negative,"[112, 112, 225]"
3,19d2197a-1b8e-456f-8223-fba74641ac1c,annotation,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\xcb\x01...,Negative,"[112, 112, 225]"
