CaVar (pronounced as ka-vee-a) is a CLI tool to check status of CRISPR-Cas off-target sites (OTs) from genomic variants. It utilizes output from an upstream OT finding tool, for example, Cas-Offinder and a VCF file to infer status of the OT sites. The following scoring scheme (Table 1) is used to report the status of the PAM (pm). To calculate the protospacer score (ps), a hamming-distance-based metric is used. If the number of mismatches between the supplied gRNA and the variant protospacer sequence of an off-target site is less than the maximum allowed value, the protospacer score (ps) is set to unity; otherwise, it is set to zero. Finally, the off-target score is calculated as: os := (ps × 10) + pm.
| old PAM | new PAM | Score (pm) |
|---|---|---|
| valid | valid | 0 |
| valid | valid (same) | 1 |
| invalid | valid | 2 |
| valid | invalid | 3 |
| invalid | invalid | 4 |
Table 1. Off-target PAM scoring scheme
Cavar requires Python >= 3.9 and depends on the following external libraries
- cyvcf2 == 0.31.4
- pysam == 0.23.3
To install with pip use
pip install cavarFor installation from the source use the following
git clone https://github.com/dasprosad/cavar.git
cd cavar
pip install .- Usage of reference genome
The reference genome must be the same which was used for variant calling. VCF specification states that CHROM field must not contain any white space so, the FASTA headers must be reformated to contain only the FASTA ID and not any other description.
- Finding off-target sites with Cas-Offinder
Cas-Offinder usage is listed on its GitHub project page. Use the header-reformatted reference genome for quering with Cas-Offinder. Set the maximum number tolerated mismatches according to your Cas system of choice leaving the PAM as "NNN". An example input file for Cas-Offinder with a maximum of 4 mismatches looks like the following.
/home/user/path/to/ref/hg38.fa
NNNNNNNNNNNNNNNNNNNNNN
ATGTTGATGATAGGATGATNNN 4
Run Cas-Offinder with
cas-offinder input.txt C casoffinder_out.tab- Converting Cas-Offinder result to BED
There are many ways of converting it to a BED file, but I have used the following awk oneliner
First add a BED header (optional)
echo -e "#BED3\n#CHROM\tSTART\tEND\tNAME\tSCORE\tSTRAND" >casoffinder_out.bedThen use awk to format to BED
awk '$1 !~ /^#/ {print $1, $4, $5}' OFS='\t' casoffinder_out.tab >>casoffinder_out.bed- For help use
cavar --help
usage: cavar [OPTIONS] <grna> <bed file> <vcf file>
cavar --help
Check status of CRISPR-Cas9 off-targets from genomic variants
positional arguments:
grna (STR) grna without the pam sequence
bedfile (PATH) path of bedfile
vcffile (PATH) path of vcffile
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-p, --pam-regex STR pam as regex (default: [ATCG]GG)
-d, --distance INT maximum number of mismatches in crRNA (default: 4)
-o, --outfile PATH name of outfile (default: outfile.bed)- Find off-target status of a gRNA with a given Cas9
- Find OTs of the gRNA with Cas9 across the reference genome
cas-offinder input.txt C casoffinder_ots.tab-
Convert the
casoffinder_ots.tabto BED -
Check status of those OTs with genomic variants
cavar --outfile p1_ots.bed ATGTTGATGATAGGATGAT casoffinder_ots.bed p1.vcf.gz- 1.0b1 First beta release.
If you have found any bugs or would like to request any new features please report it on CAVAR.
CaVar is distributed under the GNU Public License v3 (GPL3). You should have received a copy of the license with CaVar.
