This tool performs Shiga toxin sub-typing of Escherichia coli

This tool, implemented for use in a Galaxy instance (, performs Shiga toxin sub-typing of Escherichia coli

requirements for the tool:
python 3.7 (
perl 5.26.2 (
perl-bioperl 1.7 (
blast 2.9 (
trimmomatic 0.39 (
spades 3.14 (
skesa 2.3 (
fastqc 0.11.9 (
muscle 3.8 (
duk (
fastq-pair (

The input can be of two types: raw reads (FASTQ, single end or paired end) or contigs (FASTA). In the case of contigs, the tool simply performs a blastn search against the Shiga toxin subtype database (STSTDB) from the Statens Serum Institut SSI and Technical University of Denmark DTU (, Instead, in the case of raw reads, the tool performs a quality assessment applying FastQC and then trimming by Trimmomatic followed by several operations aimed to construct stx consensus sequences on which the final blastn search will be performed against the forementioned database STSTDB (see flow chart below).
The tool performs four different assemblies: 1. SPAdes on all contigs; 2. SKESA on all contigs; 3. SPAdes on contigs filtered using duk/fastq_pair against STSTDB; 4. SKESA on contigs filtered using duk/fastq_pair against STSTDB. On each assembly a blastn search is performed against STSTDB, extracting the best matching contig with an e-value < 0.001 and an identity > 95%. The sequences of all four results are divided between stx1 and stx2 and each four type files put together, thus obtaining two multifasta files: stx1.fasta and stx2.fasta. To both files their corrisponding reference sequences from STSTDB are added and the two resulting files are aligned by MUSCLE. From these alignments the reference sequences are filtered out before all possible consensus sequences are reconstructed. These stx1 and stx2 consensus sequences are combined into a single multifasta file on which a blastn query is performed against the STSTDB, extracting the best matching sequence with an e-value < 0.001 and an identity > 95%. Furthermore, all sequences shorter than 1200 base pairs are filtered out.
The tool outputs a report with a table of all matching references with the corrisponding values for the pident, length and positive parameters.
A summary is given of the Shiga toxin subtypes that matched between 95% < identity <= 100% with the identity value indicated in parentheses in case of partial matches.
Also a link is given to the FastQC web page (two in case of paired end reads).

flow chart of the tool


In the [GALAXY ROOT DIR]/tools directory:

git clone
cd shigatoxin-galaxy/scripts
chmod u+x duk
chmod u+x fastq_pair
chmod u+x stx_*

In the [GALAXY ROOT DIR]/tools/shigatoxin-galaxy/ file change the value of BASE_URL to that of your Galaxy instance


Add the following line to the file [GALAXY ROOT DIR]/config/tool_conf.xml:

<tool file="shigatoxin-galaxy/ecolishigatoxintyper.xml" />

Launch the tool from the Galaxy interface to install the dependencies (Galaxy should use conda to install them) The dependencies should be installed as an environment in the directory mulled-v1-5f29e6703a37b12366e0e56e69bb93dd1ad2ce2968cf54296eec5f4e127d9763

The first execution will get an error because of Trimmomatic:

cd [GALAXY ROOT DIR]/tool_dependency_dir/_conda/envs/mulled-v1-5f29e6703a37b12366e0e56e69bb93dd1ad2ce2968cf54296eec5f4e127d9763/bin
chmod 755 ../share/trimmomatic-0.39-1/trimmomatic.jar
ln -s ../share/trimmomatic-0.39-1/trimmomatic.jar trimmomatic.jar


