Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.
Keiran Raine edited this page Jan 7, 2022 · 4 revisions

Overview

C-SAR (CRISPR single guide analysis reporting) is a Nextflow pipeline for the analysis of single guide CRISPR screens. It forms part of a suite of complementary tools:

  • pyCROQUET - single and dual guide quantification
  • RCRISPR - R package for transforming, analysing and visualising CRISPR data sets (includes scripts run by C-SAR)
  • C-SAR - Nextflow pipeline for single guide CRISPR analysis
  • c-sar-denarius - a web interface for displaying and interacting with C-SAR results

C-SAR is designed to run downstream of read mapping/guide quantification generated by tools such as pyCROQUET. A single configuration file is required which, along with pipeline parameters, points at the core input data:

  • individual count files per sample (in a single directory) or a sample count matrix
  • sample mapping file (linking sample metadata to the count files)
  • sgRNA library

Common analysis steps such as normalisation, copy number (CN) correction and the identification of enriched and depleted genes run using a containerised version of widely-used software including: CRISPRcleanR, MAGeCK and BAGEL2.

Key pipeline stages which can be toggled on/off by the user:

  • Sequencing quality controls
  • Remove user defined guides by sgRNA ID
  • Remove duplicate guides
  • Filter by read count (all samples, plasmid only, controls only)
  • Normalise counts (BAGEL2, MAGeCK or CRISPRcleanR methods)
  • CN correct counts (CRISPRcleanR)
  • Depleted/enriched gene identification with MAGeCK and/or BAGEL2
  • Intermediate stage quality check controls

Multiple contrasts can be performed in a single analysis e.g. treatment vs control, treatment vs plasmid and control vs plasmid. At each stage, C-SAR provides quality control metrics and visualisations in addition to the software results. All stages are configurable and can be switched on or off by the user in the configuration file.

The majority of C-SAR's core functionality is provided by a novel R package, RCRISPR, allowing the user to visualise and transform their data outside of the pipeline.

C-SAR has been built using Nextflow, enabling it to be portable (runs on multiple platforms) and reproducible (supports Docker and Singularity).

Usage

To see the most commonly used command line options:

nextflow run c-sar --help

Quick start

The typical command for running the pipeline is as follows:

nextflow run c-sar -c <config file>

For configuration parameters, please see the configuration section of this wiki.

Software references

Hart, T., Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016). https://doi.org/10.1186/s12859-016-1015-8

Li, W., Xu, H., Xiao, T. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014). https://doi.org/10.1186/s13059-014-0554-4

Iorio, F., Behan, F. M., Gonçalves, E., Bhosle, S. G., Chen, E., Shepherd, R., Beaver, C., Ansari, R., Pooley, R., Wilkinson, P., Harper, S., Butler, A. P., Stronach, E. A., Saez-Rodriguez, J., Yusa, K., & Garnett, M. J. (2018). Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting. BMC genomics, 19(1), 604. https://doi.org/10.1186/s12864-018-4989-y

Clone this wiki locally