Skip to content
SeqSero2
Python Shell
Branch: master
Clone or download
Latest commit 9de7cc0 Oct 3, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
__pycache__ shaoting bioconda branch Sep 30, 2019
bin Update SeqSero2_package.py Sep 30, 2019
seqsero2_db shaoting bioconda branch Sep 30, 2019
LICENSE updated by Shaokang Zhang on June 29th 2017 Jun 29, 2017
MANIFEST.in shaoting bioconda branch Sep 30, 2019
README.md Update README.md Oct 4, 2019
setup.py shaoting bioconda branch Sep 30, 2019
version.py shaoting bioconda branch Sep 30, 2019

README.md

SeqSero2 v1.0.2

Salmonella serotype prediction from genome sequencing data.

Online version: http://www.denglab.info/SeqSero2

Introduction

SeqSero2 is a pipeline for Salmonella serotype prediction from raw sequencing reads or genome assemblies

Dependencies

SeqSero2 has three workflows:

(A) Allele micro-assembly (default). This workflow takes raw reads as input and performs targeted assembly of serotype determinant alleles. Assembled alleles are used to predict serotype and flag potential inter-serotype contamination in sequencing data (i.e., presence of reads from multiple serotypes due to, for example, cross or carryover contamination during sequencing).

Allele micro-assembly workflow depends on:

  1. Python 3;

  2. Biopython 1.73;

  3. Burrows-Wheeler Aligner v0.7.12;

  4. Samtools v1.8;

  5. NCBI BLAST v2.2.28+;

  6. SRA Toolkit v2.8.0;

  7. SPAdes v3.9.0;

  8. Bedtools v2.17.0;

  9. SalmID v0.11.

(B) Raw reads k-mer. This workflow takes raw reads as input and performs rapid serotype prediction based on unique k-mers of serotype determinants.

Raw reads k-mer workflow (originally SeqSeroK) depends on:

  1. Python 3;
  2. SRA Toolkit (optional, just used to fastq-dump sra files);

(C) Genome assembly k-mer. This workflow takes genome assemblies as input and the rest of the workflow largely overlaps with the raw reads k-mer workflow

Installation

Conda

To install the latest SeqSero2 Conda package (recommended):

conda install -c bioconda seqsero2=1.0.2

Git

To install the SeqSero2 git repository locally:

git clone https://github.com/denglab/SeqSero2.git
cd SeqSero2
python3 -m pip install --user .

Other options

Third party SeqSero2 installations (may not be the latest version of SeqSero2):
https://github.com/B-UMMI/docker-images/tree/master/seqsero2
https://github.com/denglab/SeqSero2/issues/13

Executing the code

Make sure all SeqSero2 and its dependency executables are added to your path (e.g. to ~/.bashrc). Then type SeqSero2_package.py to get detailed instructions.

Usage: SeqSero2_package.py 

-m <string> (which workflow to apply, 'a'(raw reads allele micro-assembly), 'k'(raw reads and genome assembly k-mer), default=a)

-t <string> (input data type, '1' for interleaved paired-end reads, '2' for separated paired-end reads, '3' for single reads, '4' for genome assembly, '5' for nanopore fasta, '6'for nanopore fastq)

-i <file> (/path/to/input/file)

-p <int> (number of threads for allele mode, if p >4, only 4 threads will be used for assembly since the amount of extracted reads is small, default=1) 

-b <string> (algorithms for bwa mapping for allele mode; 'mem' for mem, 'sam' for samse/sampe; default=mem; optional; for now we only optimized for default "mem" mode)

-d <string> (output directory name, if not set, the output directory would be 'SeqSero_result_'+time stamp+one random number)

-c <flag> (if '-c' was flagged, SeqSero2 will only output serotype prediction without the directory containing log files)

--check <flag> (use '--check' flag to check the required dependencies)

-v, --version (show program's version number and exit)

Examples

Allele mode:

# Allele workflow ("-m a", default), for separated paired-end raw reads ("-t 2"), use 10 threads in mapping and assembly ("-p 10")
SeqSero2_package.py -p 10 -t 2 -i R1.fastq.gz R2.fastq.gz

K-mer mode:

# Raw reads k-mer ("-m k"), for separated paired-end raw reads ("-t 2")
SeqSero2_package.py -m k -t 2 -i R1.fastq.gz R2.fastq.gz

# Genome assembly k-mer ("-t 4", genome assemblies only predicted by the k-mer workflow, "-m k")
SeqSero2_package.py -m k -t 4 -i assembly.fasta

Output

Upon executing the command, a directory named 'SeqSero_result_Time_your_run' will be created. Your result will be stored in 'SeqSero_result.txt' in that directory. And the assembled alleles can also be found in the directory if using "-m a" (allele mode).

Citation

Zhang S, Den-Bakker HC, Li S, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. SeqSero2: rapid and improved Salmonella serotype determination using whole genome sequencing data. **Appl Environ Microbiology. 2019 Sep 20. pii: AEM.01746-19. doi: 10.1128/AEM.01746-19. Epub ahead of print

Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X.
Salmonella serotype determination utilizing high-throughput genome sequencing data.
J Clin Microbiol. 2015 May;53(5):1685-92.PMID:25762776

You can’t perform that action at this time.