Skip to content

Reproduction code for "Robust differential expression testing for single-cell CRISPR screens at low multiplicity-of-infection" by Barry et al. 2023

License

Notifications You must be signed in to change notification settings

Katsevich-Lab/sceptre2-manuscript

Repository files navigation

Reproducing the analyses reported in ‘Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection’

Timothy Barry, Kaishu Mason, Kathryn Roeder, Eugene Katsevich

This repository contains code to reproduce the analyses reported in the paper “Robust differential expression testing for single-cell CRISPR screens at low multiplicty of infection” (Genome Biology, 2024).

Dependencies

Ensure that your system has the required software dependencies.

  • GCC version 11.1.0 or higher
  • R version 4.1.2 or higher
  • Python version 3.10.4 or higher
  • Java version 8 or higher
  • Nextflow version 21.10.6 or higher

Get started

First, clone the sceptre2-manuscript repository onto your machine.

git clone git@github.com:Katsevich-Lab/sceptre2-manuscript.git

Next, manually download the processed datasets from Dropbox. The data are stored in ondisc (version 1.1.0) format. Download the following three directories from the Dropbox data repository: frangieh-2021, papalexi-2021, schraivogel-2020. The code originally used to download and process these datasets (alongside documentation) is available in the following repositories: Frangieh, Papalexi, Schraivogel.

We used a config file to increase the portability of our code across machines. Create a config file called .research_config in your home directory.

cd
touch ~/.research_config

Define the following variables within this file:

  • LOCAL_PAPALEXI_2021_DATA_DIR: the location of the papalexi-2021 data directory

  • LOCAL_SCHRAIVOGEL_2020_DATA_DIR: the location of the schraivogel-2020 data directory

  • LOCAL_FRANGIEH_2021_DATA_DIR: the location of the frangieh-2021 data directory

  • LOCAL_CODE_DIR: the location of the directory in which the cloned sceptre2-manuscript repository is located

  • LOCAL_SCEPTRE2_DATA_DIR: the location of the directory in which to store files (e.g., auxiliary data, results) associated with the paper.

  • KATS_SOFT_DIR: the location of the directory in which to create the Python virtual environment

The contents of the .research_config file should look like something along the following lines.

LOCAL_PAPALEXI_2021_DATA_DIR=/Users/timbarry/research_offsite/external/papalexi-2021/
LOCAL_SCHRAIVOGEL_2020_DATA_DIR=/Users/timbarry/research_offsite/external/schraivogel-2020/
LOCAL_FRANGIEH_2021_DATA_DIR=/Users/timbarry/research_offsite/external/frangieh-2021/
LOCAL_CODE_DIR=/Users/timbarry/research_code/
LOCAL_SCEPTRE2_DATA_DIR=/Users/timbarry/research_offsite/projects/sceptre2/
KATS_SOFT_DIR=/Users/timbarry/

Next, create an .Rprofile file in your home directory (if you have not yet done so).

cd
touch .Rprofile

Add the following commands to your .Rprofile.

suppressMessages(library(conflicted))
.get_config_path <- function(dir_name) {
  cmd <- paste0("source ~/.research_config; echo $", dir_name)
  system(command = cmd, intern = TRUE)
}
Sys.setenv(RETICULATE_PYTHON = paste0(.get_config_path("KATS_SOFT_DIR"), "py/lowmoi-venv/bin/python"))
utils::setRepositories(ind = 1:4)

Set up the analyses

Navigate to the bash_scripts subdirectory of the sceptre2-manuscript directory. Execute the following command to create a Python virtual environment called py/lowmoi-venv and download the necessary Python packages. This script assumes that the command python3 can be called from the command line.

cd bash_scripts
bash install_python_packages.sh

If you are having trouble initializing the Python virtual environment, consult the following documentation. Next, install the required R packages. Note that this script installs version 1.1.0 of ondisc and version 0.3.0 of sceptre.

Rscript ../R_scripts/install_R_packages.R

Finally, complete the rest of the setup, which consists of initializing the directory structure of the SCEPTRE2 data directory, initializing the symbolic links, creating the synthetic data (both negative and positive control), performing quality control on the data, and computing sample sizes. The following code takes about 10 minutes to execute.

# 4. set up the offsite directory structure
bash setup.sh

# 5. create the symbolic links within the offsite directory structure
bash create_sym_links.sh

# 6. create the synthetic data
Rscript ../R_scripts/create_synthetic_data.R

# 7. run the low MOI uniform processing step (which includes cell and feature qc)
Rscript ../R_scripts/uniform_processing_lowmoi.R

# 8. compute dataset sample sizes
Rscript ../R_scripts/compute_dataset_sample_sizes.R

We recommend downloading the results from the Dropbox results repository, so that you can reproduce the figures without having to rerun the analyses. This can be executed via the following commands.

source ~/.research_config
cd $LOCAL_SCEPTRE2_DATA_DIR"/results"
 wget --max-redirect=20 -O download.zip https://www.dropbox.com/sh/yuubaoro3k75c61/AACPi9LDjY0B1pxwMxUMSoTca?dl=1
 unzip -o download.zip

All results will be placed into the correct locations.

Run the Nextflow pipelines

The next step is to run a sequence of Nextflow pipelines to carry out the main analyses of the paper. These pipelines should be run one-by-one. Each of these pipelines takes as an argument a parameter file contained within the directory /param_files. The parameter file passed as an argument to each pipeline is indicated as a comment.

# 9. Undercover analysis
# Figure 1: c-f
# Figure 2: c, e, f
# Figure 4: a-c
# Figure S1-S5
# parameter file: undercover/params_undercover_0523.groovy
qsub ../pipeline_launch_scripts/undercover/grp_size_1_0523/grp_size_1.sh

# 10. Positive control analysis
# Figure 4: d
qsub ../pipeline_launch_scripts/positive_control/pc_0523/pc_analysis.sh

# 11. MW resampling statistics analysis
# Figure 2: b
qsub ../pipeline_launch_scripts/resampling_distributions/seurat_at_scale/seurat_at_scale.sh

# 12. Unfiltered SCEPTRE positive control analysis
# Figure S9
qsub ../pipeline_launch_scripts/positive_control/sceptre_unfiltered_pc_0523/pc_analysis.sh

# 13. Discovery analysis
# Figure S7
qsub ../pipeline_launch_scripts/discovery_analysis/

We carried out our analyses on an SGE cluster; thus, we submitted the Nextflow jobs to the scheduler via qsub. If you instead are running the analyses on a SLURM cluster, for example, then you should submit the jobs via sbatch. If you are running the analyses locally, then you should submit the jobs via bash, etc.

Once the jobs have completed, process the results.

Rscript ../R_scripts/process_results.R

Run the auxiliary analyses

The next step is to run several auxiliary analyses. These analyses are smaller and thus do not necessitate Nextflow pipelines.

# 14. run the confounding analyses for figure 2 and figure s6
Rscript ../R_scripts/fig_s6_analyses.R

# 15. compute the dataset statistical details for table s2
Rscript ../R_scripts/get_dataset_statistical_details.R

# 16. run the analysis for supplementary figure s7 (CAMP)
Rscript ../R_scripts/camp_simulation.R

# 17. run the analysis for supplementary figure s11 (score vs. resid simulation)
# note that this step is carried out on a SGE cluster
bash run_simulation_study.sh

# 18. run the analysis for supplementary table 4 (spectral vs qr score computation)
Rscript ../R_scripts/score_test_benchmark.R

# 19. run the analyses for figure 5
Rscript ../R_scripts/save_datasets_as_r_objects.R
bash run_discovery_analyses_fig_5.sh

# 20. run the analyses for figure s12
# first, install 'resid_statistic' branch of the sceptre package
Rscript -e "devtools::install_github('katsevich-lab/sceptre@resid_statistic')"
# next, run the analyses
bash run_discovery_analyses_fig_s12.sh

# redownload sceptre 0.3.0 for good measure
Rscript -e "devtools::install_github('katsevich-lab/sceptre', ref = 'v0.3.0')"

# 21. ChIP-seq
Rscript download_additional_data.R
Rscript get_TF_targets_papalexi_chipseq.R

Create the figures

The final step is to create the figures. To do so issue the following commands.

Rscript ../R_scripts/figure_creation/fig_1/fig_1.R
Rscript ../R_scripts/figure_creation/fig_2/fig_2.R
Rscript ../R_scripts/figure_creation/fig_3/fig_3.R
Rscript ../R_scripts/figure_creation/fig_4/fig_4.R
Rscript ../R_scripts/figure_creation/fig_5/fig_5.R
Rscript ../R_scripts/figure_creation/fig_s1_s3/fig_s1_s3.R
Rscript ../R_scripts/figure_creation/fig_s4/fig_s4_s5.R
Rscript ../R_scripts/figure_creation/fig_s6/fig_s6.R
Rscript ../R_scripts/figure_creation/fig_s7/fig_s7.R
Rscript ../R_scripts/figure_creation/fig_s8/fig_s8.R
Rscript ../R_scripts/figure_creation/fig_s9/fig_s9.R
Rscript ../R_scripts/figure_creation/fig_s10/fig_s10.R
Rscript ../R_scripts/figure_creation/fig_s11/fig_s11.R
Rscript ../R_scripts/figure_creation/fig_s12/fig_s12.R
Rscript ../R_scripts/figure_creation/fig_s13/fig_s13.R

This codebase is large and complex. Contact the authors if you seek help with reproducing an aspect of the analysis.

About

Reproduction code for "Robust differential expression testing for single-cell CRISPR screens at low multiplicity-of-infection" by Barry et al. 2023

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •