CRISPRpic

A Python script for fast and precise analysis of CRISPR-induced mutations via prefixed index counting (CRISPRpic)

If this is your first time to run Python, see the manual, Manual_Beginner_Mac.pdf or For Window10.

CRISPRpic runs on Python 2.7 or later. You can simply download CRISPRpic.py and run it without further installation. Make sure to put CRISPRpic.py in your path, so it is executable anywhere. Otherwise, place the script in your working directory.

The script has 2 python package dependencies:

pandas
matplotlib

For dependency installation instructions, see Dependencies section (below)

Test

You can test CRISPRpic.py by running it on the example input files (AAVS1.out.extendedFrags.fastq and AAVS1_input.txt), located in the TEST/ directory.

Command for test:

cd TEST
python DOWNLOAD_DIR/CRISPRpic.py -i AAVS1_input.txt -f AAVS1.out.extendedFrags.fastq -w 3
DOWNLOAD_DIR is where CRISPRpic.py was downloaded.

Command line usage

python CRISPRpic.py -i INPUT -f SEQFILE -w WINDOW

-i INPUT file contains the following information seperated by tab (\t):

Locus name such as TP53
Expected amplicon sequence
guide RNA seq with PAM site (for enzyme_type 1 and 2) or break point from the 5' end amplicon (for 3)
the type of enzyme - 1:SpCas9, 2:AsCpf1, 3:Custom

You can find an example input file in TEST/AAVS1_input.txt

-f SEQFILE is a fastq file of single-end sequencing data. If you have paired-end sequencing data, you can merge to single-end using a program called Fast Length Adjustment of SHort reads (FLASH: https://ccb.jhu.edu/software/FLASH/)

You can find an example file at TEST/AAVS1.out.extendedFrags.fastq

-w WINDOW is the size of the mutagenic window from the double strand break (DSB). -w 3 means that we only consider a mutation within 3 bp of both directions from the DSB while other mutations outside of this window will be considered as unmodified.

-d INDEX_SIZE (optional) is the starting size of index. Default is 8, but larger size such as 12 should be used for when the amplicon contains lots of homologous or low complexity sequences

How to interprete outputs

Outpue files will be generated in a folder named after the locus of your interest such as TP53

TP53_freq_table.txt: main table that shows the frequency of mutation types
TP53_mut_freq.txt: a table contains the read sequences and their classification
*.pdf: bar charts show the frequency of insertions / deletions
all raw files are under intermediate_files

Dependencies

On Ubuntu or Debian Linux:

sudo apt-get install python-matplotlib python-pandas

On Mac OS X: Install conda (https://conda.io/docs/user-guide/install/macos.html)

conda install pandas matplotlib

On Windows 10 using Anaconda Prompt:

conda install pandas matplotlib

Anaconda Prompt can be installed on Window10 by the following instruction: https://docs.anaconda.com/anaconda/install/windows/

FLASH installation: https://ccb.jhu.edu/software/FLASH/

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
TEST		TEST
CRISPRpic.py		CRISPRpic.py
Manual_Beginner_Mac.pdf		Manual_Beginner_Mac.pdf
README.md		README.md
license		license
sim_error.zip		sim_error.zip
sim_no_error.zip		sim_no_error.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRISPRpic

Test

Command line usage

How to interprete outputs

Dependencies

About

Releases

Packages

Languages

License

compbio/CRISPRpic

Folders and files

Latest commit

History

Repository files navigation

CRISPRpic

Test

Command line usage

How to interprete outputs

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages