asqcan is a workflow pipeline for the automated assembly, quality control and annotation of bacterial genome sequences. Modern bacterial sequencing projects can involve a significant number of isolates and the process of assembling, running necessary QC and annotation can be time consuming. The asqcan pipeline seeks to automate this as much as possible. The current steps asqcan takes are:
- Quality analysis of raw reads with FastQC
- Genome assembly with spades
- Quality analysis of assemblies with quast
- Contamination and quality analysis of assemblies with blobtools
- Annotation of assemblies using prokka
The asqcan pipleine runs these five steps on each .fastq or .fastq.gz reads file in the directory provided by the -q option. When asqcan completes, it generates a report on the success or failure of each step of the pipline (asqcan_rport.tsv). Successful steps will not be rerun on a subsequent execution, i.e. asqcan will detect successful steps and ignore them in future runs.
asqcan requires a linux-based system and the following:
- python (2.7)
- GNU parallel (>=20170422)
- FastQC (>=0.11.7)
- spades (>=3.11.1)
- quast (>=4.6.3)
- blobtools (>=1.0)
- blast (>=2.7.1)
- prokka (>=1.13)
To download and install asqcan with all dependencies use conda:
conda install -c conda-forge -c bioconda asqcan
or pip (requires manual dependency installation):
pip install git+https://github.com/bogemad/asqcan.git
or a manual install (again this requires you to manually install dependencies):
git clone https://github.com/bogemad/asqcan.git
cd asqcan
python setup.py install
usage: asqcan [-h] -q READS_DIR -o OUTDIR [-b DB] [-t THREADS] [-f]
[--version] [-v]
asqcan - A combined pipeline for bacterial genome ASsembly, Quality Control,
and ANnotation.
required arguments:
-q READS_DIR, --fastq-dir READS_DIR
Path to a directory with your interleaved fastq files.
-o OUTDIR, --output-directory OUTDIR
Path to the output directory. A directory will be
created if one does not exist.
optional arguments:
-h, --help show this help message and exit
-b DB, --blast_database DB
Path to the local nt blast database. This pipeline
does not require you to have a local copy of the nt
database but without it you will not be able to use
similarity data for blobtools. Similarity data adds
significantly to the blobplot and blobtools table
outputs of this pipeline. See https://blast.ncbi.nlm.n
ih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=
Download to install a local nt database.
-t THREADS, --threads THREADS
Number of threads to use for multiprocessing.
-f, --force Overwrite files in the output directories.
--version show program's version number and exit
-v, --verbose Increase verbosity on command line output (n.b.
verbose output is always saved to asqcan.log in the
output directory).