cRegulon

Introduction

This is cRegulon software: an optimization model an optimization model for modeling combinatorial regulation from single-cell multi-omics provides units underpinning cell type landscape.

Requirements:

Homer in your environment PATH
Python >=3.0 and packages below:
numpy-groupies==0.9.15
numpy==1.26.3
scipy==1.12.0
pandas==1.4.3
ismember==1.0.2
mpmath==1.2.1
pybedtools==0.9.0
scikit-learn==1.4.2

Installing cRegulon with the following command:

wget https://github.com/SUwonglab/cRegulon/archive/master.zip
unzip master.zip
cd cRegulon-master
wget -O cRegulonData.tar.gz https://figshare.com/ndownloader/files/47503895
tar -xzvf cRegulonData.tar.gz

We also provide an example for cRegulon analysis, which can be downloaded from the following command:

wget -O cRegulonExample.tar.gz https://figshare.com/ndownloader/files/47503904
tar -xzvf cRegulonExample.tar.gz

After downloading, there will be 5 folders that store raw data, intermediate results, and final results of example:
./example_data/
./CSI/
./PseudoBulk/
./Networks/
./Results/

Tutorial of cRegulon analysis

Input single cell data of cRegulon

The typic input file of scRNA-seq data is a gene by cell count matrix:

scRNA	RNACellID1	RNACellID2	RNACellID3
Gene1	5	0	3
Gene2	0	2	0
Gene3	1	0	0

The typic input file of scATAC-seq data is a peak by cell count matrix:

scATAC	ATACellID1	ATACellID2	ATACellID3	ATACellID4
Peak1	1	0	1	0
Peak2	0	1	0	1
Peak3	1	0	0	0

The peaks are in the format of "chr_start_end".

In practice, the file of single cell dataset can be the 10x folder format (example dataset: ./example_data/RA/) or a matrix txt format (example dataset: ./example_data/CL/). In this tutorial, we use RA dataset for demonstration.

Input cell type meta data

The typic cell type meta file (./example_data/RA/RA_scRNA_Cluster.txt) of scRNA-seq data is as follows:

RNACellID1	RAC1
RNACellID2	RAC2
RNACellID3	RAC3

The typic cell type meta file (./example_data/RA/RA_scATAC_Cluster.txt) of scATAC-seq data is as follows:

ATACellID1	RAC2
ATACellID2	RAC3
ATACellID3	RAC1
ATACellID4	RAC2

cRegulon has three main modes, which are three steps of cRegulon model

python3 cRegulon.py {prep,grn,model} ...

prep: Preprocessing mode
grn: GRN mode
model: Model mode

Step 1: preprocessing and pseudo bulk (prep mode)

We run the following script to create pseudo bulk RNA-seq and ATAC-seq data for each cell cluster:

#cRegulon.py prep [-h] [--name NAME] --rna RNA --rna_meta RNA_META --atac ATAC --atac_meta ATAC_META --species SPECIES (human or mouse)
python3 cRegulon.py prep --name RA --rna ./example_data/RA/scRNA/ --rna_meta ./example_data/RA/RA_scRNA_Cluster.txt --atac ./example_data/RA/scATAC/ --atac_meta ./example_data/RA/RA_scATAC_Cluster.txt -g mouse

This process will produce pseudo bulk files (*PS_RNA.txt, *PS_ATAC.txt, *CellType.txt) for each cell cluster in the PseudoBulk folder.

Step 2: GRN construction (grn mode)

We run the following script to construct regulatory network for each cell cluster (current we support hg38 and mm10):

#python3 cRegulon.py grn [-h] --name NAME --celltype CELLTYPE --genome GENOME --cores CORES
for c in `cat ./PseudoBulk/RA_CellType.txt`
do
    python3 cRegulon.py grn -n RA -ct ${c} -g mm10 -p 20
done

This process will produce GRN files (*network.txt, TFTG_regulationScore.txt, TFName.txt, TGName.txt) for each cell cluster in the Networks folder (The GRN construction is independent for each cell cluster, we can do it parallelly).

Step 3: Running cRegulon model (model mode)

We run the following script of cRegulon model: If we already know or have some expection of the cRegulon number, we can provide this number to cRegulon. For example, we have 9 cRegulons for RA, then we run this script:

#python3 cRegulon.py model [-h] --name NAME --module_number MODULE_NUMBER
python3 cRegulon.py model -n RA -mn 9

If we don't know the cRegulon number, we can provide a range of numbers and cRegulon will use elbow rule to select an optimal number. For example, we guess there may be 4-20 cRegulons for RA, then we run this script:

#python3 cRegulon.py model [-h] --name NAME --module_max MODULE_MAX --module_min MODULE_MIN
python3 cRegulon.py model -n RA -mmin 4 -mmax 20

This will output a folder in "Results" with name you specify: "./RA/"

TF combinatorial effects in each cRegulon: ./Results/RA/X.txt
Association matrix between cell clusters and cRegulons: ./Results/RA/A.txt
TF module of each cRegulon: ./Results/RA/*TFModule.txt
Annotation of each cell cluster with cRegulons: ./Results/RA/Annotation/*subnetwork.txt

Annotation mode of cRegulon (annot mode)

If you only have scRNA-seq data, We run the following script to annotate cells with our pre-computed cRegulons from atlas-level dataset:

#python3 cRegulon.py annot [-h] --name NAME --path_rna PATH_RNA --module_number MODULE_NUMBER
python3 cRegulon.py annot --name PBMC --path_rna ./example_data/PBMC/PBMC_scRNA.txt --module_number 12

Citation:

If you use cRegulon software or cRegulon associated concepts, please cite:

Zhanying Feng, et al. Modeling combinatorial regulation from single-cell multi-omics provides units underpinning cell type landscape. 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
LICENSE		LICENSE
README.md		README.md
cRegulon.py		cRegulon.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cRegulon

Introduction

Requirements:

Installing cRegulon with the following command:

Tutorial of cRegulon analysis

Input single cell data of cRegulon

Input cell type meta data

cRegulon has three main modes, which are three steps of cRegulon model

Step 1: preprocessing and pseudo bulk (prep mode)

Step 2: GRN construction (grn mode)

Step 3: Running cRegulon model (model mode)

Annotation mode of cRegulon (annot mode)

Citation:

About

Releases

Packages

Languages

License

AMSSwanglab/cRegulon

Folders and files

Latest commit

History

Repository files navigation

cRegulon

Introduction

Requirements:

Installing cRegulon with the following command:

Tutorial of cRegulon analysis

Input single cell data of cRegulon

Input cell type meta data

cRegulon has three main modes, which are three steps of cRegulon model

Step 1: preprocessing and pseudo bulk (prep mode)

Step 2: GRN construction (grn mode)

Step 3: Running cRegulon model (model mode)

Annotation mode of cRegulon (annot mode)

Citation:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages