scBridge

This is the official implementation of scBridge embraces cell heterogeneity in single-cell RNA-seq and ATAC-seq data integration.

Installation

You may create an anaconda environment for scBridge with the following commands:

git clone https://github.com/Yunfan-Li/scBridge.git
cd scBridge
conda env create -f environment.yml
conda activate scBridge

Note: scBridge runs on a single GPU.

Quick Start

Data Preparation

scBridge accepts AnnData (https://anndata.readthedocs.io/en/latest/index.html) as inputs, including the source (scRNA-seq) and target (scATAC-seq) h5ad files. The source data should include the cell type annotation in obs.["CellType"]. The integration process does not require and would not utilize the annotation of the target data. If the target data annotation is provided, the model would evaluate the integration performance by computing metrics such as label transfer accuracy and silhouette score. Put the two h5ad files under the same folder (see the PBMC folder for example).

Note: Please unzip the example data with the gunzip command.

Data Integration

To perform data integration, simply run the following command

python main.py --data_path="PBMC/" --source_data="CITE-seq.h5ad" --target_data="ASAP-seq.h5ad" --umap_plot

Here --data_path corresponds to the data folder, --source_data and --target_data correspond to the source and target h5ad data file names.

After training, the integrated results would be saved as two AnnData files under the same data folder as the inputs.

The integrated source data additionally includes

.obsm["Embedding"] (integrated cell embedding)
.obsm["X_umap"] (umap embedding).

The integrated target data additionally includes

.obsm["Embedding"] (integrated cell embedding)
.obsm["X_umap"] (umap embedding)
.obs["Prediction"] (transferred label)
.obs["Reliability"] (cell reliability).

The umap plot colored by cell type, domain, and reliability would be saved to the figures folder.

Configs

In addition to --data_path, --source_data, and --target_data which must be provided to perform integration, there are some optional configs, including:

--umap_plot enables umap visualization
--novel_type enables novel type discovery with the structure loss, where most unreliable cells would be predicted as novel type

The following configs influence the integration performance, which we recommend leave them as the default:

--source_preprocess=["Standard", "TFIDF"] (Default="Standard") the preprocessing strategy for source data
--target_preprocess=["Standard", "TFIDF"] (Default="TFIDF") the preprocessing strategy for target data
--reliability_threshold=[0.0-1.0] (Default=0.95) threshold for selecting reliable cells
--align_loss_epoch=[int] (Default=1) when to enable alignment loss
--prototype_momentum=[0.0-1.0] (Default=0.9) momentum for prototype update
--early_stop_acc=[0.0-1.0] (Default=0.99) early stop training if source data classification accuracy meets the requirement
--max_iteration=[int] (Default=20) maximum number of integration iterations
--batch_size=[int] (Default=512) size of mini-batch
--train_epoch=[int] (Default=20) training epochs
--learning_rate=[float] (Default=5e-4) learning rate
--random_seed=[int] (Default=2023) random seed

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
PBMC		PBMC
data_preprocess		data_preprocess
README.md		README.md
data_utils.py		data_utils.py
environment.yml		environment.yml
eval_utils.py		eval_utils.py
main.py		main.py
model_utils.py		model_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scBridge

Installation

Quick Start

Data Preparation

Data Integration

Configs

About

Releases

Packages

Languages

Yunfan-Li/scBridge

Folders and files

Latest commit

History

Repository files navigation

scBridge

Installation

Quick Start

Data Preparation

Data Integration

Configs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages