A Python script for fast and precise analysis of CRISPR-induced mutations via prefixed index counting (CRISPRpic)
If this is your first time to run Python, see the manual, Manual_Beginner_Mac.pdf or For Window10.
CRISPRpic runs on Python 2.7 or later. You can simply download CRISPRpic.py and run it without further installation. Make sure to put CRISPRpic.py in your path, so it is executable anywhere. Otherwise, place the script in your working directory.
The script has 2 python package dependencies:
- pandas
- matplotlib
For dependency installation instructions, see Dependencies section (below)
You can test CRISPRpic.py by running it on the example input files (AAVS1.out.extendedFrags.fastq and AAVS1_input.txt), located in the TEST/ directory.
Command for test:
cd TEST
python DOWNLOAD_DIR/CRISPRpic.py -i AAVS1_input.txt -f AAVS1.out.extendedFrags.fastq -w 3
DOWNLOAD_DIR is where CRISPRpic.py was downloaded.
python CRISPRpic.py -i INPUT -f SEQFILE -w WINDOW
-i INPUT file contains the following information seperated by tab (\t):
- Locus name such as TP53
- Expected amplicon sequence
- guide RNA seq with PAM site (for enzyme_type 1 and 2) or break point from the 5' end amplicon (for 3)
- the type of enzyme - 1:SpCas9, 2:AsCpf1, 3:Custom
You can find an example input file in TEST/AAVS1_input.txt
-f SEQFILE is a fastq file of single-end sequencing data. If you have paired-end sequencing data, you can merge to single-end using a program called Fast Length Adjustment of SHort reads (FLASH: https://ccb.jhu.edu/software/FLASH/)
You can find an example file at TEST/AAVS1.out.extendedFrags.fastq
-w WINDOW is the size of the mutagenic window from the double strand break (DSB). -w 3 means that we only consider a mutation within 3 bp of both directions from the DSB while other mutations outside of this window will be considered as unmodified.
-d INDEX_SIZE (optional) is the starting size of index. Default is 8, but larger size such as 12 should be used for when the amplicon contains lots of homologous or low complexity sequences
Outpue files will be generated in a folder named after the locus of your interest such as TP53
- TP53_freq_table.txt: main table that shows the frequency of mutation types
- TP53_mut_freq.txt: a table contains the read sequences and their classification
- *.pdf: bar charts show the frequency of insertions / deletions
- all raw files are under intermediate_files
On Ubuntu or Debian Linux:
sudo apt-get install python-matplotlib python-pandas
On Mac OS X: Install conda (https://conda.io/docs/user-guide/install/macos.html)
conda install pandas matplotlib
On Windows 10 using Anaconda Prompt:
conda install pandas matplotlib
Anaconda Prompt can be installed on Window10 by the following instruction: https://docs.anaconda.com/anaconda/install/windows/
FLASH installation: https://ccb.jhu.edu/software/FLASH/