In [1]:
## The following code ensures that all functions and init files are reloaded before executions.
%load_ext autoreload
%autoreload 2

## Previous steps

Download the example data for demonstration: [01_InSituPy_demo_download_data.ipynb](./01_InSituPy_demo_download_data.ipynb).


## Getting started: Setting Up R Integration in Jupyter Notebook

To run sctransform, that calls R in the backend within a Jupyter Notebook, please follow these steps:

---

### 1. Install R

Download and install R from the [CRAN website](https://cran.r-project.org/):

- **Windows**: Choose the Windows installer and follow the installation prompts.
- **macOS**: Download the macOS package and install it.
- **Linux**: Use your package manager (e.g., `sudo apt install r-base` for Ubuntu/Debian).

---

###  2. Install the `rpy2` Python Package

Open your terminal or command prompt and run:

```bash
pip install rpy2
```

Ensure that you're installing it in the same Python environment that your Jupyter Notebook will use.

---


### 3. **Add R to the PATH (Windows):**

   - The default path is usually `C:\Program Files\R\R-x.y.z\bin`, where `x.y.z` is the R version number.
   - Press `Win + X` and select **System**.
   - Click on **Advanced system settings**.
   - In the **System Properties** window, click **Environment Variables**.
   - Under **System variables**, scroll down and select **Path**, then click **Edit**.
   - Click **New** and add the path to R's `bin` directory (e.g., `C:\Program Files\R\R-4.1.0\bin`).
   - Click **OK** to close all windows.
   - Restart VisualStudio

### 3. Install `Seurat` in R

1. **Open R or RStudio**: Launch your R or RStudio application.

2. **Install `Seurat` from CRAN**:
   In the R console, run the following command to install `Seurat` from CRAN:
   ```R
   install.packages("Seurat")

### Setup R environment

To prevent the kernel from crashing when importing `anndata2ri`, it is advisable to specify the R home path as follows.

In [None]:
import os
os.environ['R_HOME'] = 'C:\Program Files\R\R-4.4.1'

### Import packages

In [None]:
from pathlib import Path

import shutil
import os

from insitupy import read_xenium
import scanpy as sc

from insitupy.datasets.download import download_url

## Apply `sctransform`

In [4]:
# prepare paths
out_dir = Path("demo_dataset") # output directory
data_dir = out_dir / "output-XETG00000__0001879__Replicate 1" # directory of xenium data
image_dir = out_dir / "unregistered_images" # directory of images

In [5]:
xd = read_xenium(data_dir)

In [6]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\ge37voy\Github\InSituPy\notebooks\demo_dataset\output-XETG00000__0001879__Replicate 1
[1mMetadata file:[0m	experiment_modified.xenium

In [7]:
# read all data modalities at once
xd.load_all()

# alternatively, it is also possible to read each modality separately
# xd.load_cells()
# xd.load_images()
# xd.load_transcripts()
# xd.read_annotations()

Loading annotations...
No `annotations` modality found.
Loading cells...
Loading images...
Loading regions...
No `regions` modality found.
Loading transcripts...


In [8]:
xd

[1m[31mInSituData[0m
[1mMethod:[0m		Xenium
[1mSlide ID:[0m	0001879
[1mSample ID:[0m	Replicate 1
[1mPath:[0m		C:\Users\ge37voy\Github\InSituPy\notebooks\demo_dataset\output-XETG00000__0001879__Replicate 1
[1mMetadata file:[0m	experiment_modified.xenium
    ➤ [34m[1mimages[0m
       [1mnuclei:[0m	(25778, 35416)
       [1mCD20:[0m	(25778, 35416)
       [1mHER2:[0m	(25778, 35416)
       [1mDAPI:[0m	(25778, 35416)
       [1mHE:[0m	(25778, 35416, 3)
    ➤[32m[1m cells[0m
       [1mmatrix[0m
           AnnData object with n_obs × n_vars = 167780 × 313
           obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area'
           var: 'gene_ids', 'feature_types', 'genome'
           obsm: 'spatial'
           varm: 'binned_expression'
       [1mboundaries[0m
           BoundariesData object with 2 entries:
               [1mnuclear[0m
               [1mcellular[0m
    ➤[95m[1m transcripts[0m

### Filtering 

In [9]:
sc.pp.filter_cells(xd.cells.matrix, min_genes=10)

#### Applying sctransform method to the insitudata object

In [11]:
xd.sctransform()

Applying SCTransform to the main modality (cells.matrix)...


R[write to console]: In addition: 

R[write to console]: In (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
R[write to console]: 
 
R[write to console]:  library 'C:/Users/ge37voy/AppData/Local/R/win-library/4.4' contains no packages




    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.
    

R[write to console]: Running SCTransform on assay: RNA

R[write to console]: Running SCTransform on layer: counts

R[write to console]: vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.

R[write to console]: `vst.flavor` is set to 'v2' but could not find glmGamPoi installed.
Please install the glmGamPoi package for much faster estimation.
--------------------------------------------
install.packages('BiocManager')
BiocManager::install('glmGamPoi')
--------------------------------------------
Falling back to native (slower) implementation.


R[write to console]: Variance stabilizing transformation of count matrix of size 313 by 163565

R[write to console]: Model formula is y ~ log_umi

R[write to console]: Get Negative Binomial regression parameters per gene

R[write to console]: Using 313 genes, 5000 cells

R[write to console]: Second step: Get residuals using fitted parameters for 313 genes

R[write to console]: Computing corrected count matrix for 313 gen

No alternative modalities found.
SCTransform completed for all modalities.


If we now access the anndata object inside `.cells.matrix`, `"norm_counts"` in `.layers` contains the sctransform counts, while the layer `"counts"` contains the raw data. 

In [12]:
xd.cells.matrix

AnnData object with n_obs × n_vars = 163565 × 313
    obs: 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'n_genes'
    var: 'gene_ids', 'feature_types', 'genome'
    obsm: 'spatial'
    varm: 'binned_expression'
    layers: 'norm_counts', 'counts'