# Run UCE for the IFNB-stimulation data

This python notebook shows an example of generating Universial Cell Embeddings for scRNA-seq count data.

In [1]:
import scanpy as sc
import anndata as ad
import numpy as np
import pandas as pd
import os

## Step 1: create an AnnData

In this example, we show creating an AnnData from UMI count matrix and metadata.

In order to create an AnnData from other data format, please refer to:

* https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html
* https://scanpy.readthedocs.io/en/stable/api/reading.html

Make sure that the UMI counts are **NOT** normalized as UCE requires count data as input.

In [2]:
ifnb_meta = pd.read_csv('data/ifnb_metadata.csv', index_col=0)
ifnb_count = pd.read_csv('data/ifnb_count.csv.gz', index_col=0)
adata = ad.AnnData(X=ifnb_count.T, obs=ifnb_meta)

Then we write the AnnData to the disk as a h5ad file.

In [3]:
adata.write_h5ad('data/ifnb_count.h5ad')
adata

AnnData object with n_obs × n_vars = 13999 × 14053
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations'

## Step 2: run UCE

UCE must be installed in this step. Please refer to the website to install and run UCE:

* https://github.com/snap-stanford/UCE

After installing UCE, we change to working directory to UCE directory.

In [4]:
os.chdir('UCE')

Then we set some parameters to run UCE for the **ifnb_count.h5ad**.

In [5]:
!python eval_single_anndata.py --adata_path ../data/ifnb_count.h5ad --dir ../data/ --species human --model_loc model_files/4layer_model.torch --batch_size 50

Proccessing ifnb_count
3828.0
ifnb_count (13999, 10744)
Wrote Shapes Dict
10744
Max Code: 612
Loaded model:
model_files/4layer_model.torch
100%|█████████████████████████████████████████| 280/280 [04:12<00:00,  1.11it/s]
*****Wrote Anndata to:*****
../data/ifnb_count_uce_adata.h5ad


The output data is saved as **ifnb_count_uce_adata.h5ad**.

The UCE embeddings are saved in **adata.obsm['X_uce']**. We can view the output and save to a csv file.

In [6]:
os.chdir('..')

In [7]:
adata = sc.read_h5ad('data/ifnb_count_uce_adata.h5ad')
adata

AnnData object with n_obs × n_vars = 13999 × 10744
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations', 'n_genes'
    var: 'n_cells'
    obsm: 'X_uce'

In [8]:
uce_emb = adata.obsm['X_uce']
df_emb = pd.DataFrame(uce_emb, index=adata.obs_names, columns=[f'UCE_{i}' for i in range(1, uce_emb.shape[1]+1)])
df_emb.to_csv('data/ifnb_count_uce_emb.csv.gz')