# Purpose of the notebook

This notebook is used to transform the output obtained from Xenium into anndata for all the samples used across the study.
We have three type of inputs due to changes in the format of Xenium data from 10X: the original ones (2022), which we converted with `format_xenium_adata`, the firstly released samples(early 2023), which have the final format of Xenium output, converted with `format_xenium_adata_2023`, and the newest ones (2023-), formatted with `format_xenium_adata_mid_2023`

Please note that the current loader, included in the end-to-end pipeline is `format_to_adata`, which works on the current Xenium format (2024, mid). Please use that function with recently processed Xenium files 

# Loading the packages

In [1]:
import os.path
from scipy.io import mmread
import xb.formatting as xf

# Example of formatting (2022)

We input the path wher our original xenium data is and we use ```format_xenium_data``` to transform the datasets

In [68]:
path=r'../../data/original_data/xenium_prerelease_jun20_mBrain_replicates_updated/mBrain_ff_rep1'
tag=r'ms_brain_rep1'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


AnnData object with n_obs × n_vars = 26372 × 541
    obs: 'cell_id', 'x_centroid', 'y_centroid', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'graph_clusters', 'kmeans2_clusters', 'kmeans3_clusters', 'kmeans4_clusters', 'kmeans5_clusters', 'kmeans6_clusters', 'kmeans7_clusters', 'kmeans8_clusters', 'kmeans9_clusters', 'kmeans10_clusters'
    var: 'gene_id', 'reason_of_inclusion', 'Annotation', 'Ensembl ID', 'in_panel'
    uns: 'spatial', 'spots'
    obsm: 'spatial', 'X_umap', 'X_tsne', 'X_pca'

For every dataset, we also want to transform the original OME TIF to a simple tiff. 

In [3]:
xf.format_background(path)

# We repeat this processin for each dataset (2022)

In [69]:
path=r'../../data/original_data/xenium_prerelease_jun20_mBrain_replicates_updated/mBrain_ff_rep2'
tag=r'ms_brain_rep2'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


In [70]:
path=r'../../data/original_data/xenium_prerelease_jun20_mBrain_replicates_updated/mBrain_ff_rep3'
tag=r'ms_brain_rep3'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


In [71]:
path=r'../../data/original_data/xenium_prerelease_sept15_hBreast/hBreast_ffpe_large'
tag=r'h_breast_1'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [72]:
path=r'../../data/original_data/xenium_prerelease_sept15_hBreast/hBreast_ffpe_small'
tag=r'h_breast_2'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [73]:
path=r'../../data/original_data/xenium_prerelease_sept15_mBrain/mBrain_ff_full_coronal_section'
tag=r'ms_brain_fullcoronal'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


In [74]:
path=r'../../data/original_data/xenium_prerelease_sept15_mBrain/mBrain_ff_partial_coronal_section'
tag=r'ms_brain_partialcoronal'
output_path=r'../../data/unprocessed_adata/'
xf.format_xenium_adata(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


# We format now the new datasets ( early 2023)

In [None]:
path=r'/mnt/e/Xenium_V1_FF_Mouse_Brain_MultiSection_2_outs'
tag=r'ms_brain_multisection2'
output_path=r'/mnt/e/'
adata=xf.format_xenium_adata_2023(path,tag,output_path)
xf.format_background(path)

In [27]:
path=r'/mnt/e/Xenium_V1_FF_Mouse_Brain_MultiSection_3_outs'
tag=r'ms_brain_multisection3'
output_path=r'/mnt/e/'
adata=xf.format_xenium_adata_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [29]:
path=r'/mnt/e/Xenium_V1_FF_Mouse_Brain_MultiSection_1_outs'
tag=r'ms_brain_multisection1'
output_path=r'/mnt/e/'
adata=xf.format_xenium_adata_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


# We format now the new datasets (mid 2023-)

In [17]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_lung_healthy/Xenium_Preview_Human_Non_diseased_Lung_With_Add_on_FFPE_outs'
tag=r'healthy_lung'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [20]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_human_gbm/Xenium_V1_FFPE_Human_Brain_Glioblastoma_With_Addon_outs'
tag=r'human_gbm'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

First decompressing done


  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


Analysis files decompressed
UMAP and clusters_could not be recovered


In [21]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_human_brain_healthy/Xenium_V1_FFPE_Human_Brain_Healthy_With_Addon_outs'
tag=r'human_brain'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

First decompressing done


  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


Analysis files decompressed
UMAP and clusters_could not be recovered


In [22]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_human_alzehimers/Xenium_V1_FFPE_Human_Brain_Alzheimers_With_Addon_outs'
tag=r'human_alzheimers'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

First decompressing done


  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


Analysis files decompressed
UMAP and clusters_could not be recovered


In [21]:
path=r'/media/sergio/xenium_b_and_heart/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_with_addon/Xenium_V1_FFPE_Human_Breast_IDC_With_Addon_outs'
tag=r'hbreast_idc_addon_set1'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)



UMAP and clusters_could not be recovered


In [25]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ild_with_addon/Xenium_V1_FFPE_Human_Breast_IDC_With_Addon_outs'
tag=r'hbreast_idc_addon_set2'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [26]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ild_with_addon/Xenium_V1_FFPE_Human_Breast_ILC_With_Addon_outs'
tag=r'hbreast_ilc_addon_set2'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [9]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ilc_entiresample_area/Xenium_V1_FFPE_Human_Breast_IDC_outs'
tag=r'hbreast_idc_entiresample_set3'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [10]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ilc_entiresample_area/Xenium_V1_FFPE_Human_Breast_ILC_outs'
tag=r'hbreast_ilc_entiresample_set3'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [6]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ilc/Xenium_V1_FFPE_Human_Breast_IDC_outs'
tag=r'hbreast_idc_addon_set4'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [7]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_breast_idc_ilc/Xenium_V1_FFPE_Human_Breast_ILC_outs'
tag=r'hbreast_ilc_addon_set4'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [8]:
path=r'/mnt/d/Xenium_benchmarking-main/data/original_data/xenium_lung_healthy/Xenium_Preview_Human_Non_diseased_Lung_With_Add_on_FFPE_outs'
tag=r'lung_cancer'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)

  adata=sc.AnnData(ad.transpose(),obs=cell_info,var=features)


UMAP and clusters_could not be recovered


In [4]:
path=r'/media/sergio/xenium_b_and_heart/xenium_datasets/output-XETG00045__0003524__active__20230510__111824-20230725T092912Z-001/output-XETG00045__0003524__active__20230510__111824'
tag=r'human_spinal_chord_active'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)



UMAP and clusters_could not be recovered


In [5]:
path=r'/media/sergio/xenium_b_and_heart/xenium_datasets/output-XETG00045__0003385__inactive__20230510__111824-20230725T094731Z-001/output-XETG00045__0003385__inactive__20230510__111824'
tag=r'human_spinal_chord_inactive'
output_path=r'../../data/unprocessed_adata/'
adata=xf.format_xenium_adata_mid_2023(path,tag,output_path)
xf.format_background(path)



UMAP and clusters_could not be recovered
