The source code for the reproduction of results in "scReadSim: a single-cell RNA-seq and ATAC-seq read simulator".
We provide a Python-based implementation for the convenience of the community. For tutorials and other details, check our website.
Four folders correspond to four tasks:
10x_scRNA-seq
: Using 10x scRNA-seq for synthetic read generation, verification, and benchmarking of deduplication tools10x_scATAC-seq
: Using 10x scATAC-seq for synthetic read generation and verificationsci-ATAC-seq
: Using sci-ATAC-seq for synthetic read generation and verificationsci-ATAC-seq_designedpeaks
: Using sci-ATAC-seq for synthetic read generation with designed ground-truth peaks, verification, and benchmarking of peak calling tools
Under each folder, Synthetic data generation
contains codes for generating synthetic reads and Verification
contains codes for reproducing the figures in the manuscript. A detailed list is as follows.
Folder Synthetic data generation
Code | Function |
---|---|
NOINPUT_ATAC_main_20221130_GreyArea.sh |
main script |
NOINPUT_ATAC_BAM2CountMatrix.py |
convert BAM file to count matrix |
NOINPUT_ATAC_ComplePeakFunction.py |
prepare scATAC-seq non-peaks |
NOINPUT_ATAC_GenerateBAMCoord_20221130_GreyArea.py |
generate synthetic read coordinates |
NOINPUT_ATAC_ReadError_20220421.py |
introducing sequencing errors to synthetic reads |
NOINPUT_ATAC_SyntheticMat.R |
synthetic count matrix generation |
NOINPUT_ATAC_VerifyRead_20221130.py |
calculate pseudo-bulk read coverage for each feature |
Folder Verification
Code | Function |
---|---|
Fig2bBtmS7c_ATAC_VerifyRead_Peak_MakePlot_Nversion_20230316.R |
Figs. 2b Bottom and S7c |
Fig2bUpMidS7b_ATAC_VerifyRead_MakePlot_Nversion.20230316.R |
Figs. 2b Upper, Middle and S7b |
FigS4_ATAC_VerifyCount_20230128.R |
Fig. S4 |
Folder Synthetic data generation
Code | Function |
---|---|
NOINPUT_RNA_main_20221011_UMItranscriptlevel2.sh |
main script |
NOINPUT_RNA_main_VaryRDandCN_20221013.sh |
main script for varying cell number and sequencing depth |
NOINPUT_RNA_BAM2CountMatrix_20220928.py |
convert BAM file to count matrix |
NOINPUT_RNA_ComplePeakFunction_20220603.py |
prepare scRNA-seq features |
NOINPUT_RNA_GenerateBAMCoord_20221011_UMITranscriptLevel2.py |
generate synthetic read coordinates |
NOINPUT_RNA_ReadError_20220421.py |
introducing sequencing errors to synthetic reads |
NOINPUT_RNA_SyntheticMat.R |
synthetic count matrix generation |
NOINPUT_RNA_SyntheticUMIMat_VaryRDandCN_20221013.R |
synthetic count matrix generation for varying cell number and sequencing depth |
NOINPUT_RNA_VaryRDandCN_DownsampleBAM_20221013.R |
downsample cells for varying cell number |
Folder Verification
Code | Function |
---|---|
Fig1c_RNA_Minnow_20230306.sh |
Fig. 1c run minnow |
Fig1c_RNA_Minnow_PrepareCountMatrix.R |
Fig. 1c prepare minnow input |
Fig1c_RNA_Minnow_PlotCoverage_20230316.py |
Fig. 1c |
Fig2defg_RNA_VerifyPipeline_20230315_withoutAlevin_Nversion.R |
Fig. 2defg |
Fig2defg_RNA_VerifyPipeline_CellRanger_20221011_UMItranscriptlevel2.sh |
Fig. 2defg run cellranger |
Fig2defg_RNA_VerifyPipeline_TimeComp_CellRanger_20221015.sh |
Fig. 2defg time usage benchmark of cellranger |
Fig2defg_RNA_VerifyPipeline_TimeComp_UMItools_20221015.sh |
Fig. 2defg time usage benchmark of UMI-tools |
Fig2defg_RNA_VerifyPipeline_UMITools_20221011_UMItranscriptlevel2.sh |
Fig. 2defg run UMI-tools |
FigS2_RNA_VerifyCount_20230128.R |
Fig. S2 |
FigS3_RNA_VerifyRead_MakePlot_20221017.R |
Fig. S3 |
FigS12S13_RNA_VerifyPipeline_20230315_withAlevin_Nversion.R |
Figs. S12 and S13 |
FigS12S13_RNA_VerifyPipeline_Alvein_20221011_UMItranscriptlevel2.sh |
Figs. S12 and S13 run alevin |
FigS12S13_RNA_VerifyPipeline_TimeComp_Alevin_20221015.sh |
Figs. S12 and S13 time usage benchmark of alevin |
Folder Synthetic data generation
Code | Function |
---|---|
NOINPUT_ATAC_main_20221130_GreyArea.sh |
main script |
NOINPUT_ATAC_BAM2CountMatrix.py |
convert BAM file to count matrix |
NOINPUT_ATAC_ComplePeakFunction.py |
prepare scATAC-seq non-peaks |
NOINPUT_ATAC_GenerateBAMCoord_20221130.py |
generate synthetic read coordinates |
NOINPUT_ATAC_ReadError_20220421.py |
introducing sequencing errors to synthetic reads |
NOINPUT_ATAC_SyntheticMat.R |
synthetic count matrix generation |
NOINPUT_ATAC_VerifyRead_20221130.py |
calculate pseudo-bulk read coverage for each feature |
Folder Verification
Code | Function |
---|---|
Fig1deS6_ATAC_SCAN-ATAC-Sim_20230303.sh |
Figs. 1de and S6 run SCAN-ATAC-Sim |
Fig1deS6_ATAC_SCAN-ATAC-Sim_CompareCountMat_20230310.R |
Figs. 1de and S6 |
Fig1deS6_ATAC_SCAN-ATAC-Sim_CountMat_20230310.py |
Figs. 1de and S6 generate count matrices for comparison |
Fig2a_ATAC_SpitCellType_20230310.sh |
Fig. 2a |
Fig2b_ATAC_SpitCellType_20230310.py |
Fig. 2b |
FigS5_ATAC_VerifyCount_20230128.R |
Fig. S5 |
FigS8_ATAC_VerifyRead_MakePlot_20221130.R |
Fig. S8 |
FigS8_ATAC_VerifyRead_Peak_MakePlot_20221130.R |
Fig. S8 peak calling comparison |
Folder Synthetic data generation
Code | Function |
---|---|
INPUT_ATAC_main_mm9TSS_20221130_GreyArea.sh |
main script |
INPUT_ATAC_BAM2CountMatrix.py |
convert BAM file to count matrix |
INPUT_ATAC_ComplePeakFunction.py |
prepare scATAC-seq non-peaks |
INPUT_ATAC_demoInputPeak.py |
design ground-truth peaks |
INPUT_ATAC_GenerateBAMCoord_20221130.py |
generate synthetic read coordinates |
INPUT_ATAC_MatchPeakFunction_20220302.py |
match ground truth (non)peaks and trust worthy (non)peaks |
INPUT_ATAC_SyntheticMat_withCluster_SelectCellType.R |
synthetic count matrix generation |
NOINPUT_ATAC_ReadError_20220421.py |
introducing sequencing errors to synthetic reads |
NOINPUT_ATAC_VerifyRead_20221130.py |
calculate pseudo-bulk read coverage for each feature |
Folder Verification
Code | Function |
---|---|
Fig2hiS10S14S15_ATAC_VerifyRead_TSS_ProcessPeakBed_20221130.sh |
Figs. 2hi, S10, S14 and S15 run peak calling tools |
Fig2hiS10_ATAC_VerifyRead_TSS_MakePlot_Nversion_20230316.R |
Figs. 2hi and S10 |
FigS14S15_TSS_Intervene_upset_logTransformation_20221009.R |
Figs. S14 and S15 |
This pacakge is licensed under the terms of the MIT License.