Skip to content

dchary/ucdeconvolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniCell Deconvolve: Cell Type Deconvolution For Transcriptomic Data

image

Doc Status Version Repo Size Last Commit Commit Activity Language

UniCell Deconvolve applied to 10X Genomics Visium Gene Expression Slide of Breast Adenocarcinoma Sample

Background

The amount of publically available high-dimensional transcriptomic data, whether bulk-RNA, single-cell, or spatial, has increased exponentially in recent years. Although available for reanalysis, published data is often used in isolation to augment novel analyses. Particularly, the problem of cell type deconvolution, either from bulk or spatial transcriptomic datasets, has been addressed by numerous methods through the use of publicly available dataset as cell type specific references. The choice of reference profile however is not always readily apparent or available, and a mismatch between reference and actual cell types may potentially confound study results.

UniCell Deconvolve (UCD) is a pre-trained deep learning model that provides context-free estimations of cell type fractions from whole transcriptome expression data for bulk, single-cell and spatial transcriptomics data. The model is trained on the world's largest fully-integrated scRNA-Seq training database, comprising 28M+ single cells spanning 840+ cell types from 899 studies to date. Extensive benchmarking shows UCD favors comperably when compared with reference-based deconvolution tools, without the need for pretraining. UCD demonstrates strong multi-task performance across a range of deconvolution challenges spanning several transcriptomic data modalities, disease types, and tissues.

Nested rectangles visualizing cell type distribution heiarchy for 28 million single cells comprising the UCD Database to-date

API Access

The UCD package offers the ability to directly integrate UCD predictions into any transcriptomics data analysis pipeline in the form of a web-based API. The package available here provides a secure and scalable connection to the latest pre-trained UCD model, built on top of Google Cloud Platform, which serves deconvolution requests. In order to access the current alpha build of UCD, we ask users to sign up for an early-access API key here. Please allow up to 24 hours to recieve a response.

Includes preprocessing and visualization capabilities. Designed to interface with the annotated dataset and scanpy workflows.

Installation

Conda (Recommended)

We recommend installing ucdeconvolve in a virtual environment using tools such as conda or miniconda. We suggest the following installation:

conda create -n ucdenv python=3.8 pytables jupyter jupyterlab
conda activate ucdenv
pip install ucdeconvolve

PIP

UniCell Deconvolve can be installed from pyPI into an existing python workspace. The pytables package is required and may need to be installed separately using a package manage such as conda before installing ucdeconvolve. For detailed installation instruction see documentation.

pip install ucdeconvolve

Documentation

Full documentation with supporting tutorials is available here.

Quick Start Guide

To demonstrate the functionality of UCD, we will perform a cell type deconvolution of a spatial gene expression section of the human lymph node, made available by 10X Genomics. We will utilize scanpy to quickly load the dataset, and then pass it into ucdeconvolve to obtain cell type predictions.

1. Create a New Account

Register

Load the ucdeconvolve package and run the "ucd.api.register()" command as shown below. Follow the instructions by inputting the required information at each step.

ucd.api.register()

Activate

Upon completion of the initial registration form, you will recieve an email at the address specified with an activation code. Copy the code and paste it back into the waiting input prompt in order to activate your account or paste the activation code into the function "ucd.api.activate(code)"

ucd.api.activate(code)

Authenticate

Upon completion of activation, you will recieve an emial with your user acess token. This token will be automatically appended to your current python instance if you are running ucd.api.register, otherwise you can always authenticate a new python instance with a valid api token using the function "ucd.api.authenticate"

ucd.api.authenticate(token)

2. Load Required Packages

import ucdeconvolve as ucd
import scanpy as sc

3. Load the human lymph node dataset

adata = sc.datasets.visium_sge("V1_Human_Lymph_Node")
AnnData object with n_obs × n_vars = 4035 × 36601
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

4. Run UCDBase to predict cell type fractions

ucd.tl.base(adata)

Example Console Output:

2023-04-25 16:27:40,012|[UCD]|INFO: Starting UCDeconvolveBASE Run. | Timer Started.
Preprocessing Dataset | 100% (16 of 16) || Elapsed Time: 0:00:02 Time:  0:00:02
2023-04-25 16:27:43,509|[UCD]|INFO: Uploading Data | Timer Started.
2023-04-25 16:27:49,367|[UCD]|INFO: Upload Complete | Elapsed Time: 5.857 (s)
Waiting For Submission : UNKNOWN | Queue Size : 0 | \ |#| 2 Elapsed Time: 0:00:03
Waiting For Completion | 100% (4035 of 4035) || Elapsed Time: 0:00:45 Time:  0:00:45
2023-04-25 16:28:42,073|[UCD]|INFO: Download Results | Timer Started.
2023-04-25 16:28:42,817|[UCD]|INFO: Download Complete | Elapsed Time: 0.743 (s)
2023-04-25 16:28:43,466|[UCD]|INFO: Run Complete | Elapsed Time: 63.453 (s)

5. Reading and Visualizing Results

We can print our adata object to see what new information has been added to it. UCD appends the results of each deconvolution run into 'adata.obsm' along with column names (i.e. celltypes) and run information into 'adata.uns' under the default results stem 'ucdbase'. Depending on whether or not the split parameter was set to True or False, you will either see a single new entry into 'adata.obsm' or three entries. By default, split = True so predictions will be split into primary (non-malignat), cell lines, and primary cancer (malignant).

AnnData object with n_obs × n_vars = 4035 × 36601
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial', 'ucdbase'
    obsm: 'spatial', 'ucdbase_cancer', 'ucdbase_lines', 'ucdbase_primary', 'ucdbase_raw'

We can visualize our results by using one of the built-in plotting functions in UCD, which wrap scanpy's plotting API.

ucd.pl.spatial(adata, color = "germinal center b cell")

Predicted germinal center b cell distribution across lymph node section

About

UniCell Deconvolve - Cloud Cell Type Deconvolution For Transcriptomic Data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages