## Notebook for transferring labels from Healthy reference to cencer cells using `scVI` and `SCANVi`

- **Developed by**: Anna Maguza
- **Institute of Computational Biology - Computational Health Centre - Helmholtz Munich**
- 24th May 2023

### Load required modules

In [29]:
import sys
import scvi
import torch
import anndata
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import seaborn as sns

import numpy as np
import scipy as sp
import pandas as pd
import scanpy as sc
import numpy.random as random


from umap import UMAP
import warnings; warnings.simplefilter('ignore')

import matplotlib.pyplot as plt

In [30]:
%matplotlib inline
matplotlib.rcParams["pdf.fonttype"] = 42
matplotlib.rcParams["ps.fonttype"] = 42

In [31]:
torch.cuda.is_available()

False

In [32]:
torch.set_float32_matmul_precision('medium')

In [33]:
sc.settings.verbosity = 3
sc.logging.print_versions()
sc.settings.set_figure_params(dpi = 180, color_map = 'magma_r', dpi_save = 300, vector_friendly = True, format = 'svg')

-----
anndata     0.8.0
scanpy      1.9.3
-----
PIL                         9.4.0
absl                        NA
appnope                     0.1.2
asttokens                   NA
attr                        22.2.0
backcall                    0.2.0
beta_ufunc                  NA
binom_ufunc                 NA
brotli                      NA
certifi                     2022.12.07
cffi                        1.15.1
charset_normalizer          2.1.1
chex                        0.1.6
colorama                    0.4.6
comm                        0.1.2
contextlib2                 NA
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.5.1
decorator                   5.1.1
defusedxml                  0.7.1
docrep                      0.3.2
entrypoints                 0.4
executing                   0.8.3
flax                        0.6.1
fsspec                      2023.3.0
h5py                        3.8.0
hypergeom_uf

In [34]:
arches_params = dict(
    use_layer_norm = "both",
    use_batch_norm = "none",
    encode_covariates = True,
    dropout_rate = 0.2,
    n_layers = 2,
)

In [35]:
def X_is_raw(adata):
    return np.array_equal(adata.X.sum(axis=0).astype(int), adata.X.sum(axis=0))

### Read in datasets

In [36]:
input_healthy = '/Users/anna.maguza/Desktop/Data/Processed_datasets/Cancer_dataset_integration/input_files/Epithelial_cells/Geosketch_subset/Joanito/Healthy_epithelial_cells_Geosketch_subset_2000_HVGs.h5ad'
Healthy_adata = sc.read(input_healthy)

In [37]:
input_cancer = '/Users/anna.maguza/Desktop/Data/Processed_datasets/Cancer_dataset_integration/input_files/Epithelial_cells/Geosketch_subset/Joanito/Joanito_cancer_epithelial_cells_2000_HVGs.h5ad'
Cancer_adata = sc.read(input_cancer)

In [38]:
Healthy_adata.obs['seed_labels'] = Healthy_adata.obs['Unified Cell States']
Cancer_adata.obs['seed_labels'] = 'Unknown'

In [39]:
# Concatenate reference and query
adata = Healthy_adata.concatenate(Cancer_adata, batch_key = 'dataset', batch_categories = ['Healthy', 'Cancer'])

In [40]:
del Healthy_adata, Cancer_adata

### Transfer of annotation with scANVI

In [41]:
scvi.model.SCVI.setup_anndata(adata, batch_key = 'Sample_ID', labels_key = "seed_labels")

In [42]:
scvi_model = scvi.model.SCVI(adata, n_latent = 50, n_layers = 3, dispersion = 'gene-batch', gene_likelihood = 'nb')

In [43]:
scvi_model.train()

GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 128/128: 100%|██████████| 128/128 [18:39<00:00,  8.49s/it, loss=444, v_num=1]

`Trainer.fit` stopped: `max_epochs=128` reached.


Epoch 128/128: 100%|██████████| 128/128 [18:39<00:00,  8.75s/it, loss=444, v_num=1]


In [44]:
adata.obsm["X_scVI"] = scvi_model.get_latent_representation()

### Label transfer with `scANVI` 

In [45]:
scanvi_model = scvi.model.SCANVI.from_scvi_model(scvi_model, 'Unknown')

In [46]:
scanvi_model.train()

[34mINFO    [0m Training for [1;36m10[0m epochs.                                                                                   


GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Epoch 10/10: 100%|██████████| 10/10 [03:18<00:00, 19.86s/it, loss=574, v_num=1]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 10/10: 100%|██████████| 10/10 [03:18<00:00, 19.82s/it, loss=574, v_num=1]


In [47]:
adata.obs["C_scANVI"] = scanvi_model.predict(adata)

In [48]:
adata.obs["C_scANVI"].value_counts()    

Paneth cells             18895
Colonocyte               17603
Enterocyte                9744
TA                        7474
Stem cells OLFM4          6599
Stem cells OLFM4 LGR5     2250
Name: C_scANVI, dtype: int64

In [49]:
adata.obsm["X_scANVI"] = scanvi_model.get_latent_representation(adata)

In [50]:
# Save the output
adata.write('/Users/anna.maguza/Desktop/Data/Processed_datasets/Cancer_dataset_integration/output/Epithelial/Joanito_predicted_labels_with_scVI_scANVI_2000HVGs.h5ad')