<a href="https://colab.research.google.com/github/Ann-Holmes/CellAssign_bugs/blob/main/cellassign_bug_reproduce2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Annotation with CellAssign

## Prepare envrionment and data

In [1]:
!pip install --quiet scvi-colab
from scvi_colab import install
install()

[K     |████████████████████████████████| 235 kB 7.5 MB/s 
[K     |████████████████████████████████| 51 kB 8.5 MB/s 
[?25h[34mINFO    [0m scvi-colab: Installing scvi-tools.                                                                        
[34mINFO    [0m scvi-colab: Install successful. Testing import.                                                           


Global seed set to 0


In [2]:
import gdown
url = 'https://drive.google.com/uc?id=10l6m2KKKioCZnQlRHomheappHh-jTFmx'
output = 'sce_follicular_annotated_final.h5ad'
gdown.download(url, output, quiet=False)

url = 'https://drive.google.com/uc?id=1tJSOI9ve0i78WmszMLx2ul8F8tGycBTd'
output = 'FL_celltype.csv'
gdown.download(url, output, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=10l6m2KKKioCZnQlRHomheappHh-jTFmx
To: /content/sce_follicular_annotated_final.h5ad
100%|██████████| 83.0M/83.0M [00:01<00:00, 65.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1tJSOI9ve0i78WmszMLx2ul8F8tGycBTd
To: /content/FL_celltype.csv
100%|██████████| 446/446 [00:00<00:00, 990kB/s]


'FL_celltype.csv'

In [3]:
import scvi
import scanpy as sc
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from scvi.external import CellAssign

sc.set_figure_params(figsize=(4, 4))

%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'

Load follicular lymphoma data and marker gene matrix

In [4]:
adata = sc.read("sce_follicular_annotated_final.h5ad")
adata.var_names_make_unique()
adata.obs_names_make_unique()

  utils.warn_names_duplicates("obs")
  utils.warn_names_duplicates("var")


In [5]:
marker_gene_mat = pd.read_csv('FL_celltype.csv', index_col=0)
marker_gene_mat

Unnamed: 0_level_0,B cells,Cytotoxic T cells,CD4 T cells,Tfh,other
Gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CCL5,0,1,0,0,0
CD19,1,0,0,0,0
CD2,0,1,1,1,0
CD3D,0,1,1,1,0
CD3E,0,1,1,1,0
CD3G,0,1,1,1,0
CD4,0,0,1,1,0
CD74,1,0,0,0,0
CD79A,1,0,0,0,0
CD79B,1,0,0,0,0


The anndata object and cell type marker matrix should contain the same genes, so we index into `adata` to include only the genes from `marker_gene_mat`.

In [6]:
bdata = adata[:, marker_gene_mat.index].copy()
bdata

AnnData object with n_obs × n_vars = 9156 × 24
    obs: 'Sample', 'dataset', 'patient', 'timepoint', 'progression_status', 'patient_progression', 'sample_barcode', 'is_cell_control', 'total_features_by_counts', 'log10_total_features_by_counts', 'total_counts', 'log10_total_counts', 'pct_counts_in_top_50_features', 'pct_counts_in_top_100_features', 'pct_counts_in_top_200_features', 'pct_counts_in_top_500_features', 'total_features_by_counts_endogenous', 'log10_total_features_by_counts_endogenous', 'total_counts_endogenous', 'log10_total_counts_endogenous', 'pct_counts_endogenous', 'pct_counts_in_top_50_features_endogenous', 'pct_counts_in_top_100_features_endogenous', 'pct_counts_in_top_200_features_endogenous', 'pct_counts_in_top_500_features_endogenous', 'total_features_by_counts_feature_control', 'log10_total_features_by_counts_feature_control', 'total_counts_feature_control', 'log10_total_counts_feature_control', 'pct_counts_feature_control', 'pct_counts_in_top_50_features_feature_c

## Run CellAssign with `batch_key="patient"` and `categorical_covariate_keys=None`

**No Errors**

In [None]:
scvi.external.CellAssign.setup_anndata(bdata, size_factor_key="size_factor", batch_key="patient", categorical_covariate_keys=None)
model = CellAssign(bdata, marker_gene_mat)

In [None]:
model.train()

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  f"The `Callback.{hook}` hook was deprecated in v1.6 and"
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 400/400: 100%|██████████| 400/400 [00:57<00:00,  6.92it/s, loss=19.4, v_num=1]


## Run CellAssign with `batch_key=None` and `categorical_covariate_keys=["patient"]`

**RuntimeError**

In [None]:
scvi.external.CellAssign.setup_anndata(bdata, size_factor_key="size_factor", batch_key=None, categorical_covariate_keys=["patient"])
model = CellAssign(bdata, marker_gene_mat)

In [None]:
model.train()

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  f"The `Callback.{hook}` hook was deprecated in v1.6 and"
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 1/400:   0%|          | 0/400 [00:00<?, ?it/s]

RuntimeError: ignored

## Run CellAssign with `batch_key="patient"` and `categorical_covariate_keys="timepoint"`

**RuntimeError**

In [15]:
bdata.obs.head()

Unnamed: 0_level_0,Sample,dataset,patient,timepoint,progression_status,patient_progression,sample_barcode,is_cell_control,total_features_by_counts,log10_total_features_by_counts,...,all_seurat_0.8_cluster,all_seurat_1.2_cluster,all_sc3_cluster,all_SC3_cluster,all_cluster,all_subset_seurat_cluster,all_subset_seurat_0.8_cluster,all_subset_seurat_1.2_cluster,all_subset_cluster,_scvi_batch
Barcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAACCTGAGCCACGTC-1,/datadrive/data/follicular/FL1018T1/filtered_g...,FL1018T1,FL1018,T1,primary,transformed,FL1018T1_AAACCTGAGCCACGTC-1,0,1300,3.114277,...,5,4,11,11,5,9,7,9,7,0
AAACCTGAGGGCTCTC-1,/datadrive/data/follicular/FL1018T1/filtered_g...,FL1018T1,FL1018,T1,primary,transformed,FL1018T1_AAACCTGAGGGCTCTC-1,0,846,2.927883,...,5,4,11,11,5,13,11,13,11,0
AAACCTGCACGAAGCA-1,/datadrive/data/follicular/FL1018T1/filtered_g...,FL1018T1,FL1018,T1,primary,transformed,FL1018T1_AAACCTGCACGAAGCA-1,0,1184,3.073718,...,5,4,19,19,5,0,1,0,1,0
AAACCTGCAGTTAACC-1,/datadrive/data/follicular/FL1018T1/filtered_g...,FL1018T1,FL1018,T1,primary,transformed,FL1018T1_AAACCTGCAGTTAACC-1,0,2019,3.305351,...,5,4,11,11,5,3,4,3,4,0
AAACCTGCATTCCTGC-1,/datadrive/data/follicular/FL1018T1/filtered_g...,FL1018T1,FL1018,T1,primary,transformed,FL1018T1_AAACCTGCATTCCTGC-1,0,1407,3.148603,...,7,8,27,27,7,6,2,6,2,0


In [12]:
bdata.obs.pivot_table(values="dataset", index="patient", columns="timepoint", aggfunc=np.count_nonzero)

timepoint,T1,T2
patient,Unnamed: 1_level_1,Unnamed: 2_level_1
FL1018,1426,2715
FL2001,2432,2583


In [13]:
scvi.external.CellAssign.setup_anndata(bdata, size_factor_key="size_factor", batch_key="patient", categorical_covariate_keys=["timepoint"])
model = CellAssign(bdata, marker_gene_mat)

In [14]:
model.train()

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  f"The `Callback.{hook}` hook was deprecated in v1.6 and"
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 1/400:   0%|          | 0/400 [00:00<?, ?it/s]

RuntimeError: ignored