Here we describe CRISPR library designer (CLD), an integrated bioinformatics application for the design of custom single guide RNA (sgRNA) libraries for all organisms with annotated genomes. CLD is suitable for the design of libraries using modified CRISPR enzymes and targeting non-coding regions. To demonstrate its utility, we perform a pooled screen for modulators of the TNF-related apoptosis inducing ligand (TRAIL) pathway using a custom library of 12,471 sgRNAs.
Also check out the pre-built docker at dockerhub.
Install docker to the point that docker run hello-world
runs successfully and use cld
as described below e.g:
-
To use the graphical interface:
-
Mac:
- Install xquartz: https://www.xquartz.org/
open -a XQuartz
IP=$(ifconfig en0 | grep inet | awk '$1=="inet" {print $2}')
xhost + $IP
- adapt
docker-compose.yaml
change to folder and enter
docker-compose up
or enter
docker run -e DISPLAY=$IP:0 -v ${PWD}:/data boutroslab/cld_docker cld_gui
-
Windows:
- Install a command line package manager for windows: https://chocolatey.org/
- Follow this guide to install the graphical interface manager for windows: https://dev.to/darksmile92/run-gui-app-in-linux-docker-container-on-windows-host-4kde
- adapt
docker-compose.yaml
change to folder and enter
docker-compose up
or enter
docker run -e DISPLAY=<your IP>:0.0 -v ${PWD}:/data boutroslab/cld_docker cld_gui
-
GUIed linux:
- adapt
docker-compose.yaml
change to folder and enter
docker-compose up
or enter
docker run -e DISPLAY=<your IP>:0.0 -v ${PWD}:/data boutroslab/cld_docker cld_gui
- When logging in remotely: log into your remote server by
ssh -X
- adapt
-
-
Download the database for your organism of interest.
docker run -v ${PWD}:/data boutroslab/cld_docker cld --task=make_database --output-dir=/data --organism homo_sapiens
- Enter its name in the reference organism field on the start page.
- Enter a list of gene identifiers in the "Gene List" tab and go to the "Design Parameter" tab to set your parameters.
- Go to the "Start Analysis" tab to start sgRNA search.
- The results will be created in the selected output directory ("Input/Output" tab).
Command-Line-Start:
Install docker to the point that docker run hello-world runs successfully and use cld as described below e.g:
docker run -v ${PWD}:/data boutroslab/cld_docker cld --help
cld can be called either with “--version”, printing its version number and copyrights, "--help" printing a more elusive help documentation and with “--task”.
NOTE
The output directory defaults to /data
within the Docker container.
You can overwrite this by setting --output-dir
to any other directory
within the container. Make sure that you mount your local directory to the
output directory when use run the Docker container: e.g. -v ${PWD}:/data
EXAMPLE to execute from the path containing all needed files:
docker run -v ${PWD}:/data boutroslab/cld_docker cld --task=end_to_end --output-dir=/data --parameter-file=/data/params.txt --gene-list=/data/gene_list.txt
cld can run 2 distinct tasks, database creation and library design.
Database creation is called using the --task=make_database
command
giving the organism name of interest, as it is denoted in ENSEMBLs ftp folder structure
e.g. homo_sapiens, and the rsync url to the current ftp server of ENSEMBL, examples
can be found when cld --help is called. After calling this function CLD will
automatically download the latest toplevel FASTA, GFF and GTF files for the organism
of interest and compile a database containing bowtie indexes, mygff files and
reformatted sequence files. If not enough computing power is available to the user,
these databases also might be downloaded from here.
Library design can either be done in two steps:
cld --task=target_ident
and then
cld --task=library_assembly
if the user wants to separate the two steps for example in order to only identify target sites without compiling a clonable library. Else
cld --task=end_to_end
which automatically will perform the steps mentioned before and present the end-result in a user defined output folder. For reasons of manageability for high throughput design, output files are kept as simple and standardised as possible. However a genome wide library targeting the human genome quickly spans several GB depending on how strict the parameters are chosen. Since the end_to_end task takes most time we benchmarked its time consumption to be approximately 1 h wall-time for an 8-core cpu node.
For running cld from the command line the syntax as outlined in the MANUAL must be used.