|
Detection of strain-specific sequences for primer design in microbial genomes. |
|
ERIMIN is a command-line tool designed for microbial genomic analysis. It enables the detection of strain-specific genomic regions in microbial genomes suitable for primer design by:
- Aligning paired-end FASTQ reads to a reference genome using Bowtie2
- Extracting and assembling unmapped reads with SPAdes
- Filtering contigs > 1000 bp
- Performing BLASTn comparisons against a user-defined genome database
- Detecting non-covered regions with Bedtools for candidate primer targets
- Annotates candidate regions using Prokka
- Detects prophage/viral regions using PHASTEST
- Designs primers using NCBI Primer-BLAST
flowchart TD
A[FASTQ files + reference genome] -->|paired-end reads| B[Alignment - Bowtie2]
B -->|extract unmapped reads| C[Assembly - SPAdes]
C -->|filter contigs over 1000 bp| D[BLASTn comparison]
D -->|coverage analysis| E[Non-covered regions - candidate targets]
E -->|annotation| F[Prokka annotation]
F --> Q{Run PHASTEST}
Q -->|Yes| G[PHASTEST - prophage or viral detection]
Q -->|No| H[Primer design - NCBI Primer-BLAST]
G -->|optional filtering or flagging| H[Primer design - NCBI Primer-BLAST]
H --> I[Candidate primer targets and suggested primer sets]
Clone the repository:
git clone https://github.com/Mtsif/Erimin.git
Make sure you have the following dependencies installed:
- Bowtie2 (https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
- SPades (https://github.com/ablab/spades)
- Samtools (http://www.htslib.org/download/)
- Bedtools (https://bedtools.readthedocs.io/en/latest/content/installation.html)
- Prokka (https://github.com/tseemann/prokka)
ERIMIN uses online services:
- NCBI BLAST
- PHASTEST
- NCBI Primer-BLAST
The basic inputs are:
-
A reference genome in FASTA format
-
Paired-end sequencing reads in FASTQ format:
- Forward reads (R1.fastq)
- Reverse reads (R2.fastq)
-
Organism metadata:
- Genus Name
- Species Name
- Strain Name
-
Number of primer pairs to return per sequence
-
Number of CPU threads
bash Erimin.sh -i [R1.fastq R2.fastq] -ref [reference_file] -g [genus] -s [species] -str [strain] -n [num_primers] -t [threads]
This will create the following files:
reference/ - Contains the indexed reference genome files.
assembly/ - Contains assembled contigs from unmapped reads.
BlastResults/ - Contains BLAST output files and results from coverage analysis.
prokka/ - Contains PROKKA output files (Annotation).
phastest/ - Contains PHASTEST output files (pro-phage/virus sequence detection).
primers/ - Contains the primers files in csv format.
├── htmls/ - Primer-BLAST HTML reports
└── pairs_csv/ - Primer pair csv files.
ERIMIN submits candidate target sequences to NCBI Primer-BLAST with the following default settings (as implemented in run_primerblast()):
Primer design (Primer3) constraints
- Primer length:
PRIMER_MIN_SIZE=20,PRIMER_OPT_SIZE=22,PRIMER_MAX_SIZE=25 - Melting temperature (Tm):
PRIMER_MIN_TM=57,PRIMER_OPT_TM=60,PRIMER_MAX_TM=63 - Max Tm difference between primers:
PRIMER_MAX_DIFF_TM=1 - GC content:
PRIMER_MIN_GC=45,PRIMER_MAX_GC=55 - Product size range:
PRIMER_PRODUCT_MIN=70,PRIMER_PRODUCT_MAX=500 - Number of primer pairs returned per target:
PRIMER_NUM_RETURN=<user value via -n> - Low complexity filtering:
LOW_COMPLEXITY_FILTER=1 - Thermodynamic alignment:
TH_OLIGO_ALIGNMENT=1,TH_TEMPLATE_ALIGNMENT=1
Specificity checking
- Specificity database:
PRIMER_SPECIFICITY_DATABASE=refseq_representative_genomes - Target organism restriction:
ORGANISM=Bacteria (taxid:2) - Total specificity mismatch:
TOTAL_PRIMER_SPECIFICITY_MISMATCH=5 - 3′-end specificity mismatch:
PRIMER_3END_SPECIFICITY_MISMATCH=4 - Mismatch region length:
MISMATCH_REGION_LENGTH=5 - Total mismatch ignore:
TOTAL_MISMATCH_IGNORE=9 - Mispriming library:
PRIMER_MISPRIMING_LIBRARY=AUTO
To change any of these values, edit the corresponding -d / --data-urlencode fields inside the run_primerblast() function in Erimin.sh.
ERIMIN is implemented as a transparent bash workflow. Users can:
- Modify the script directly to adjust thresholds (e.g., contig-length cutoff, BLAST settings, Primer-BLAST settings, multiplex selection rules).
- Run only specific stages by executing the relevant commands from the script (recommended for debugging or advanced customization).
Common “partial run” entry points:
- Stop after assembly to inspect contigs:
assembly/contigs.fasta,assembly/contigs_over1000.fasta - Run only the BLAST/coverage steps to produce candidate regions:
BlastResults/noCov.txt,BlastResults/seq.txt - Run only annotation: Prokka on an existing FASTA
- Run only primer design: call
run_primerblast()on a chosen FASTA file
Tip: For iterative development, comment out sections you do not need, or add exit 0 after a stage to stop early.
ERIMIN generates Primer-BLAST outputs per target contig/sequence. File names are derived from the contig (or Prokka-split sequence) basename.
HTML reports
- Location:
primers/htmls/ - Naming:
primerblast_results_<base>.html
Primer tables
- Location:
primers/(orprimers/pairs_csv/if configured) - Naming:
primer_pairs_<base>.csv
Where <base> corresponds to the FASTA header used when splitting sequences into individual files (derived from prokka/<strain>.fna).
If you export primers in FASTA format (optional), ERIMIN uses the following header convention:
> <contig>_Ffor the forward primer sequence> <contig>_Rfor the reverse primer sequence
Example:
>NODE_4_length_3640_cov_46.758911_F
TCTGTACTCAACCACTTCGCTC
>NODE_4_length_3640_cov_46.758911_R
CCATTATTAGCGCCACACCAAG
The script also exports the selected primers as a FASTA file, which can be used for additional QC checks such as primer–dimer screening.
If you use Erimin in your research, please cite:
Tsifintaris M, Koutra P, Tsiartas P, Repanas P, Touliopoulos S, Nelios G, Anastasiadou A, Tamouridou G, Nikolaou A, Tsochantaridis I.
Erimin: A Pipeline to Identify Bacterial Strain Specific Primers.
DNA. 2026; 6(1):11. https://doi.org/10.3390/dna6010011
