
#### Load Required Extensions and Libraries
- `%autoreload` is used to automatically reload any modules that are updated.
- Import necessary functions from `insitupy` and `scanpy`.


In [1]:

%load_ext autoreload
%autoreload 2


In [2]:
from pathlib import Path
from insitupy.datasets.download import download_url
import shutil
import os
from insitupy import read_xenium
import scanpy as sc

- We load the Xenium data using the `read_xenium` function from `insitupy`.
- The path points to the dataset location on your system.
- The `load_cells()` method loads the cell data from the specified dataset.

In [3]:
# Load the Xenium data from the folder
out_dir = Path("demo_dataset") # output directory
data_dir = out_dir / "output-XETG00000__0001879__Replicate 1" # directory of xenium data
image_dir = out_dir / "unregistered_images" # directory of images

In [4]:
xd = read_xenium(data_dir)

In [5]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\Aitana\OneDrive\Documentos\Github\InSituPy\notebooks\demo_dataset\output-XETG00000__0001879__Replicate 1
[1mMetadata file:[0m	experiment.xenium

In [6]:
# read all data modalities at once
xd.load_cells()

# alternatively, it is also possible to read each modality separately
# xd.load_cells()
# xd.load_images()
# xd.load_transcripts()
# xd.read_annotations()

Loading cells...


In [7]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\Aitana\OneDrive\Documentos\Github\InSituPy\notebooks\demo_dataset\output-XETG00000__0001879__Replicate 1
[1mMetadata file:[0m	experiment.xenium
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
           varm: 'binned_expression'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mnuclear[0m
               [1mcellular[0m

In [8]:
sc.pp.filter_cells(xd.cells.matrix, min_genes=10)
sc.pp.filter_genes(xd.cells.matrix, min_cells=3)


#### Compare Transformations and Generate Normalization Report
- Here we apply several transformations to the dataset (`log1p`, `sqrt_1`, `sqrt_2`, `pearson_residuals`, and `sctransform`).
- The results are saved as an HTML report at the specified path: `"C:/Users/Aitana/normalization_results.html"`.
- The HTML report contains graphical and statistical comparisons of the transformation methods, including:
  - A **summary table** that highlights key metrics for each transformation, such as skewness, kurtosis, mean absolute deviation (MAD), coefficient of variation (CV), Shapiro-Wilk test results, and more. The best-performing metrics are highlighted in green.
  - **Histograms** showing the distribution of transformed counts for each method overlaid with a normal distribution curve.
  - **Q-Q plots** that compare the quantiles of the transformed data against a theoretical normal distribution to assess the normality of the transformed data.

In [9]:
xd.compare_transformations(
    transformation_methods=["log1p", "sqrt_1", "sqrt_2", "pearson_residuals", "sctransform"],
    output_path="C:/Users/Aitana/normalization_results.html"
)


Comparing transformations for the main modality (cells.matrix)...
Store raw counts in anndata.layers['counts']...
Applying transformation: log1p
Applying transformation: sqrt_1
Applying transformation: sqrt_2
Applying transformation: pearson_residuals
Applying transformation: sctransform

    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.
    

R[write to console]: Running SCTransform on assay: RNA

R[write to console]: Running SCTransform on layer: counts

R[write to console]: vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.

R[write to console]: Variance stabilizing transformation of count matrix of size 313 by 163565

R[write to console]: Model formula is y ~ log_umi

R[write to console]: Get Negative Binomial regression parameters per gene

R[write to console]: Using 313 genes, 5000 cells

R[write to console]: Found 2 outliers - those will be ignored in fitting/regularization step


R[write to console]: Second step: Get residuals using fitted parameters for 313 genes

R[write to console]: Computing corrected count matrix for 313 genes

R[write to console]: Calculating gene attributes

R[write to console]: Wall clock passed: Time difference of 26.29784 secs

R[write to console]: Determine variable features

R[write to console]: Centering data matrix

  |                                        

Processing log1p...


  res = hypotest_fun_out(*samples, **kwds)


Processing sqrt_1...
Processing sqrt_2...
Processing pearson_residuals...
Processing sctransform...
HTML report created and saved as 'C:/Users/Aitana/normalization_results.html'


{'main':                    skewness  kurtosis        mad        cv  shapiro_stat  \
 log1p             -0.776917  1.356973  10.381478  0.160502      0.966180   
 sqrt_1            -1.032788  2.290947   4.716037  0.010105      0.947566   
 sqrt_2            -0.412818  0.632775  14.523841  0.165863      0.987594   
 pearson_residuals  0.304709 -0.310506  40.606468  7.745154      0.974914   
 sctransform        0.006686 -0.180280  11.170378  0.201272      0.999263   
 
                        shapiro_p  anderson_stat   ks_stat           ks_p  
 log1p               1.523941e-93    1239.048590  0.061551   0.000000e+00  
 sqrt_1             6.800745e-105    1695.562107  0.071159   0.000000e+00  
 sqrt_2              4.520086e-70     515.625304  0.039719  1.200044e-224  
 pearson_residuals   3.439438e-86    1268.814848  0.075447   0.000000e+00  
 sctransform         1.482991e-22      12.842406  0.006764   6.294840e-07  }