The purpose of this repository is to provide the scripts that are used to produce the figures for the manuscript by Froehlich & Uyar et al, 2020.
See the Biorxiv Preprint here.
The raw read data (including both Illumina and Pacbio sequenced samples) can be downloaded from here and the sample sheet which describes the experimental set up can be downloaded from here.
The targeted sequencing data of CRISPR-Cas9 treated and control samples were processed using the crispr-DART pipeline.
The reports output of the pipeline from this analysis can be browsed here:
The input files to run the pipeline such as settings.yaml
, sample_sheet.csv
, cutsites.bed
, and comparisons.tsv
along
with the output files and folders can be downloaded from here.
The necessary pipeline outputs needs to be downloaded from here.
The downloaded compressed folder needs to be uncompressed and the settings.yaml
file needs to be modified according to the location of the file paths to
the various input files. In order to reproduce the figures, the only fields that need to be modified in the settings.yaml
file are:
- sample_sheet: /path/to/sample_sheet.csv
- cutsites: /path/to/cutsites.bed
- reference_fasta: /path/to/ce11.fa
- output-dir: /path/to/output
The scripts in this repository take as input the settings.yaml
file which contains all the necessary links to other important
files such as the sample sheet, alignment outputs, or the location of the genome sequence file. The pipeline output files are
parsed and processed to make the figures that were further asthetically processed for publication. However, you can find the
raw versions of the figures printed by these scripts in this repository.
- To get various summary plots from the processed pipeline output (Figure 2, Supplementary Fig. 5)
> cd summary_plots
> /usr/bin/Rscript ../scripts/summary_plots.R */path/to/settings.yaml*
- To get correlation between sgrna efficiencies and external scores calculated for the designed guides (Supplementary Fig. 3)
> cd summary_plots
> /usr/bin/Rscript ../scripts/sgRNA_scores.R */path/to/settings.yaml* ../data/sgRNAscores.txt
- To cluster and compare lin-41 pacbio RNA reads:
> cd rna_analysis
> /usr/bin/Rscript ../scripts/cluster_pacbio_reads.R */path/to/settings.yaml* lin41_RNApacbio_L1_all lin41_RNApacbio_L4_all cluster_pacbio_all
- To analyse impact of deletions in L1 vs L4 abundance
> cd rna_analysis
> /usr/bin/Rscript ../scripts/analyse_impact_on_rna_expression.R */path/to/settings.yaml*
- To make the plots about how deletion or reads with deletions get selected agains over generations from F2 to F5.
> cd generations_analysis
> /usr/bin/Rscript ../scripts/deletions_impact_on_fitness.R */path/to/settings.yaml* ../data/analysis_table.tsv
- To plot the profile of kmers from the inserted sequence present around the cut site (Supplementary Fig. 4)
> cd kmer_analysis
> /usr/bin/Rscript /path/to/scripts/insertion_kmers_matching_surrounding_sequence.R */path/to/settings.yaml* *sample* *[sample2 ... sampleN]*