These Python parser scripts summarize and genotype CRISPResso2 analysed clones including those edited with multiple HDR templates. These scripts are based on previous repositories of MATLAB code. They extract and join relevant data from the results folders generated by CRISResso2 per each sample.
(K. Clement et al. Nat. Biotechnology 2019)(https://github.com/pinellolab/CRISPResso2/)(http://crispresso2.pinellolab.org/)
An input folder with the CRISPResso_on_··· subfolders generated by CRISPResso2 analysis in basic mode. Python 2.7 with openpyxl module.
Copy the CRISPYparser....py file in your folder containing CRISPResso_on_··· subfolders generated by CRISPResso2.
Open a Terminal window or python development enviroment such as Spyder.
Run the script by typing "python CRISPResso2parser...py" in a Terminal window or clicking in run button.
Introduce your input data asked by the script in the Terminal or enviroment command window.
The script will generate an excel (.xlsx) file with your CRISPResso2 results summary.
This script is useful when results in CRISPResso_on_···subfolders correspond to samples of pooled cells from a editing experiment. The code generates an excel file (POOL.xls) containing for each sample (CRISPResso2 subfolder) the total number and percentages of reads aligned, modified, unmodified, with frameshift mutations (fs) or with inframe mutations (if). This script reports with NaN values samples with unsuccesful CRISPResso analysis or fs if related values when the coding sequence is NOT provided to CRISPResso2. See code comments and excel example for further details.
This script is useful when results in CRISPResso_on_···subfolders correspond to samples of clonal lines derived from a editing experiment. The code generates an excel file containing for each sample (CRISPResso2 subfolder) the same values as CRISPResso2parser_pool.py plus the number, percentage and the aligment to the wildtype (wt) reference of the 3 top alleles identified by CRISPResso2.
Based on these data and the "clone purity value" the script screens for pure clonal lines and determines their
zygosity and corresponding genotype (x/x).
The "clone purity value" is defined as the Minimun percentage to assume a pure clone. Allele1 percentage must be higher than this value to be classified as homozygotes. Allele 1 and Allele2 percentages must be higher than this value/2 to be classified as heterozygotes. Otherwise clones will be classified as non-pure genotyped clones.
This script will ask to introduce this value and if you provided a coding sequence (CDS) to CRISPResso2 (YES or NO). If CDS was NOT provided possible genotypes in the CLONAL-woCDS.xlsx file are wt/wt, mut/wt, mut/mut. mut means mutated. If CDS was provided possible genotypes in the CLONAL-wCDS.xlsx file are wt/wt, if/if, fs/fs, fs/wt, if/wt or fs/if. See code comments and excel example for further details.
This script is useful when results in CRISPResso_on_···subfolders correspond to samples of clonal lines derived from a editing experiment for knock-in generation by homologous directed repair (HDR). The code generates an excel file containing for each sample (CRISPResso2 subfolder) the same data as CRISPResso2parser_CLONAL.py but with both alleles genotyped in order to identify those alleles generated by HDR. This script is compatible with experiments where different HDR templates are concurrently used.
IMPORTANT NOTE: Despite the samples come from a experiment where HDR templates are used, CRISPResso2 must be run providing. Ensure that the parameter --plot_window_size allow to include your expected HDR modifications into the Alleles_frequency_table_around_sgRNA_.txt*
Aditionally of clone purity value and CDS supply, this script will ask to introduce:
- Your HDR templates names separated by 1 white space:
HDRa HDRb ...
- Your HDR templates sequences separated by 1 white space:
TAGATGGGTCTAGCTAGTCGACTAGGATACAGTCGATC TAGATGGGTCTAGCTAGTCGACAAGGATACAGTCGATC...
If CDS was NOT provided, possible alleles genotypes in the HDR-woCDS.xlsx file are wt, mut, HDRa or HDRb. If CDS was provided, possible alleles genotypes in the HDR-wCDS.xlsx file are wt, fs, if, HDRa or HDRb.. See code comments and excel example for further details.
- CRISPResso2parser.py scripts summarize and allow easier visualization of results coming from CRISPResso2 analysis.
- CRISPResso2parser_clonal.py script automatically identifies pure clones and determines their genotype.
- CRISPResso2parser_clonal-HDR.py script automatically identifies pure clones edited with multiple HDR templates.
On balance, CRISPResso2parser.py scripts further expands the CRISPResso2 applications for high throughput screening of genome edited clonal lines.