# InSituPy demonstration - Datasets

In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

InSitupy offers six functions that each download a different dataset into '~/.cache/InSituPy/demo_datasets'. The data is provided by 10x Genomics and is publicly available.
Each function returns an `InSituData` object and has an optional argument `overwrite`, which can be set to `True`, whenever a repeated download is desired.
1. `human_breast_cancer()`: In situ gene expression dataset by xenium onboard analysis 1.0.1  
https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast
2. `human_brain_cancer()`: In situ gene expression dataset by xenium onboard analysis 2.0.0  
https://www.10xgenomics.com/datasets/ffpe-human-brain-cancer-data-with-human-immuno-oncology-profiling-panel-and-custom-add-on-1-standard
3. `nondiseased_kidney()`: In situ gene expression dataset by xenium onboard analysis 1.5.0  
https://www.10xgenomics.com/resources/datasets/human-kidney-preview-data-xenium-human-multi-tissue-and-cancer-panel-1-standard
4. `hskin_melanoma()`: In situ gene expression dataset by xenium onboard analysis 1.7.0   
 https://www.10xgenomics.com/resources/datasets/human-skin-preview-data-xenium-human-skin-gene-expression-panel-add-on-1-standard
5. `pancreatic_cancer()`: In situ gene expression dataset by xenium onboard analysis 1.6.0  
 https://www.10xgenomics.com/datasets/pancreatic-cancer-with-xenium-human-multi-tissue-and-cancer-panel-1-standard
6. `human_lung_cancer()`: In situ gene expression dataset by xenium onboard analysis 2.0.0. (Xenium multimodal cell segmentation)  
https://www.10xgenomics.com/datasets/preview-data-ffpe-human-lung-cancer-with-xenium-multimodal-cell-segmentation-1-standard

In [2]:
from pathlib import Path
from os.path import expanduser

import insitupy
from insitupy.datasets import hskin_melanoma,human_brain_cancer,human_breast_cancer,nondiseased_kidney,pancreatic_cancer,human_lung_cancer
from insitupy import read_xenium

In [3]:
hbreast = human_breast_cancer()

C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hbreastcancer\Xenium_FFPE_Human_Breast_Cancer_Rep1_outs.zip: 100%|██████████| 9.18G/9.18G [14:19<00:00, 11.5MiB/s]
C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hbreastcancer\unregistered_images\slide_id__hbreastcancer__HE__histo.tif: 100%|██████████| 1.58G/1.58G [02:28<00:00, 11.4MiB/s]  
C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hbreastcancer\unregistered_images\slide_id__hbreastcancer__CD20_HER2_DAPI__IF.tif: 100%|██████████| 398M/398M [00:36<00:00, 11.4MiB/s] 

Corresponding image data can be found in C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hbreastcancer\unregistered_images.
For this dataset following images are available:
slide_id__hbreastcancer__HE__histo.ome.tiff
slide_id__hbreastcancer__CD20_HER2_DAPI__IF.ome.tiff





In [4]:
hbreast

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mData path:[0m	C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hbreastcancer
[1mData folder:[0m	output-XETG00000__slide_id__hbreastcancer
[1mMetadata file:[0m	experiment.xenium

In [5]:
hbreast.load_all()

Loading annotations...
No `annotations` modality found.
Loading cells...
Loading images...
Loading regions...
No `regions` modality found.
Loading transcripts...


In [6]:
hbreast.show()



In [10]:
hlung = human_lung_cancer()

This dataset exists already. Download is skipped. To force download set `overwrite=True`.
Image exists. Checking md5sum...
The md5sum matches. Download is skipped. To force download set `overwrite=True`.
Corresponding image data can be found in C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hlungcancer\unregistered_images.
For this dataset following image is available:
slide_id__hlungcancer__HE__histo.ome.tiff


In [11]:
hlung

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0017509
[1mSample ID:[0m	Human_Lung_Cancer
[1mData path:[0m	C:\Users\ge37voy\.cache\InSituPy\demo_datasets\hlungcancer
[1mData folder:[0m	output-XETG00000__slide_id__hlungcancer
[1mMetadata file:[0m	experiment.xenium

In [12]:
hlung.load_all()

Loading annotations...
No `annotations` modality found.
Loading cells...
Loading images...


  func()
<tifffile.TiffFile 'morphology_focus_0000.ome.tif'> OME series cannot read multi-file pyramids
<tifffile.TiffFile 'morphology_focus_0001.ome.tif'> OME series cannot read multi-file pyramids
<tifffile.TiffFile 'morphology_focus_0002.ome.tif'> OME series cannot read multi-file pyramids
<tifffile.TiffFile 'morphology_focus_0003.ome.tif'> OME series cannot read multi-file pyramids


Loading regions...
No `regions` modality found.
Loading transcripts...


In [13]:
hlung.show()

The data structure after running the above code should look like this:

    ```
    ./InSituPy/demo_datasets
    ├───hbreastcancer
    │    ├───output-XETG00000__slide_id__hbraincancer
    │    ├───unregistered_images
    ├───hbraincancer
    │    ├───output-XETG00000__slide_id__hbraincancer  
    │    ├───unregistered_images
    
    ```


To explore the datasets continue with registering the images as demonstrated in [02_InSituPy_demo_register_images.ipynb](./02_InSituPy_demo_register_images.ipynb) as well as preprocessing and visualizing the data as in [03_InSituPy_demo_analyze.ipynb](./03_InSituPy_demo_analyze.ipynb). 