-
Notifications
You must be signed in to change notification settings - Fork 0
Prediction of enhancers
dbscATAC had identified 13,470,526 enhancers and 10,402,346 enhancer-gene interactions derived from 1,668,076 single cells spanning 1,028 tissue/cell types in 13 species.
Take GSE149683 for instance, its raw data can be download from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149683
Can run to get the Seurat rds files for all tissue samples:
perl Enh_download_rawdata.plBased on the tissue/cell type annotations assigned to all cells, the large matrix extracted from the RDS file was divided into tissue/cell type specific sub-populations.
Can run:
Rscript Enh_RDS_split_into_celltype_matrix.RTo identify single-cell typical enhancers from hundreds of tissue/cell-type specific single cells, we improved a previously designed unsupervised method by introducing a weighting system to assign quality scores to each single cell and combining all cells' peak profiles to identify typical enhancers.
In this approach, the ATAC peak profile of each single cell was treated as an independent dataset. Our method operates under the assumption that higher-quality datasets are more strongly associated with predicted enhancers, while lower-quality datasets have weaker associations. By comparing the similarities among all single-cell datasets, a relative quality score was assigned to each dataset. Traditionally, the scATAC-seq matrix is binarized to reflect the ‘open’ or ‘closed’ state of chromatin, based on the sparsity of the data and the conceptual framework of chromatin accessibility. However, a recent study demonstrated that modeling fragment counts, rather than binarizing the matrix, preserves quantitative regulatory information and improves the analysis of scATAC-seq data.
To better evaluate the similarity between the datasets of any two single cells (e.g.and), we employed the Tanimoto Coefficient to calculate their correlation. The improved unsupervised learning approach is integrated with the Cicero tool to accurately identify single-cell enhancers and their target genes.
can run:
Rscript Enh_RDS_split_into_celltype_matrix.R
perl Enh_Calling_for_putative_enhancers.plTo obtain the final enhancers, the putative single-cell enhancers should be filtered by promoters, silencers, and exon regions. Only the single-cell enhancers not overlapping with any promoters, silencers, or exon regions are retained:
perl Enh_filterring_by_pro_exon_silencer.pl#Visualization of cell type specific single-cell enhancer through the module "Search single-cell enhancers" in the home page of our webiste:
