Scripts to process CRISPR-based TF screening data
License
zhjilin/RSLC
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Scripts to perform the UMI counting and SSMD score calculation 1. Installation 1.1 Download the file: 1.2 unzip 1.3 Required dependencies: (i) perl packages: IO::Uncompress::Gunzip (ii) R: reshape2, plyr 2. Counting Random Sequence Labels (RSLs)/UMIs 2.1 GuideUMI-count-p0.1.pl This perl script can be used to count RSLs/UMIs from only one fastq/fastq.gz file at a time. 2.1.1 Usage: perl GuideUMI-count-p0.1.pl --library lib.csv Sample1_reads.fastq.gz This will give 2 tables named as Sample1.count Sample1.UMI 2.1.2 Help: perl GuideUMI-count-p0.1.pl --help 2.1.3 Instructions for file formats Library file format: GuideID,GuideSequence,TargetGene AATF_03,GGACCCTGAAGCGGACCCCG,AATF AATF_04,GATGAAGGGGAAGATGGGGA,AATF AATF_05,CTTCAGATGAGCATTAGCAG,AATF Note: Any other form of annotation is not allowed. Input file format: Both fastq and fastq.gz files are allowed. Note: File name should have only one underscore in it. Whatever preceeds underscore in the input file name will be used as outputfile name. Data deposited under PRJEB18436. 2.2 Batch-GuideUMI-count-p0.1.pl This is a wrapper of the above script that will execute UMI counting for more than 2 input files and merge individual count files. ALERT: Don't count too many files at the same time (< cpu numbers), because it runs as many processes as your input fastq files and requires a lot of memory. 2.2.1 Usage: perl Batch-GuideUMI-count-p0.1.pl --library lib.csv --step 12 --fastq Sample1_reads.fq,Samplel2_reads.fastq.gz,Sample3_read.fastq.gz This script will yield a SampleX.count files and a SampleX.UMI files [X stands for 1,2 or 3] for each of the input files. Additionally, it will merge all the count files into one file named as summary_count.output.raw. Subsequent analysis relies on this merged table. The --step argument can also be used to only merge count files if UMIs were individually counted on several files using the GuideUMI-count-p0.1.pl script described above (section 2.1). 2.2.2 Help: perl Batch-GuideUMI-count-p0.1.pl –help 3. Data normalization and SSMD Calculation 3.1 IRA-SSMD.R This is the core R script to calculate the SSMD score for one treatment vs control. 3.1.1 Usage: Rscript --vanilla IRA-SSMD.R inputfile outputprefix count_threshold (if one wants to filter reads below certain counts) 3.1.2 Example: Rscript --vanilla IRA-SSMD.R summary_count.output.raw output_prefix 1 3.2 IRA-SSMD.sh This can be used for pairwise comparison of several treatments (i.e. time points) with one control (Day4 vs Day10, Day15, Day20, Day30) This shell script is a wrapper to parse the big table in order to get pairwise tables (traits against control) to calculate SSMD score. Five arguments must be provided. To see the help information: ./IRA-SSMD.sh 3.2.1 Usage: ./IRA-SSMD.sh Rscript Inputfile output_prefix count_threshold (If one wants to filter reads below certain counts) 3.2.2 example ./IRA-SSMD.sh IRA-SSMD.R summary_count.output.raw ABC 1 4.UMI Binning 4.1 Bin-count-TruncatedUMIs.pl This is a perl script for binning CRISPR-Cas9 RSL guides based on the common RSL prefix. 4.1.1 Usage: perl Bin-count-TruncatedUMIs.pl <trunclen> <mincount> <input_countfile.csv> The script reads the RSL guide counts the from the input file <input_countfile.csv> and writes to standard output. It then bins together all RSL guides that have the same first <trunclen> bases and writes the sum counts of the truncated RSLs to standard output. Only those RSL guides are considered in the binning that have been observed at least <mincount> times in at least one of the samples. 4.1.2 Input/Output format: The input file should be a comma separated text file containing the following columns RSL.guide,guide.set,Control,Treatment The first column contains the guide name and its RSL separated by an underscore '_'. The second column contains the guide set name. The last two columns contain the RSL guide counts in two samples (control and treatment). Example: RSL.guide,guide.set,Control,Treatment AATF_01_AAAAGC,AATF_01,37,2 The output format is the same as input format.
About
Scripts to process CRISPR-based TF screening data
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published