ERIMIN

Detection of strain-specific sequences for primer design in microbial genomes.

ERIMIN is a command-line tool designed for microbial genomic analysis. It enables the detection of strain-specific genomic regions in microbial genomes suitable for primer design by:

Aligning paired-end FASTQ reads to a reference genome using Bowtie2
Extracting and assembling unmapped reads with SPAdes
Filtering contigs > 1000 bp
Performing BLASTn comparisons against a user-defined genome database
Detecting non-covered regions with Bedtools for candidate primer targets
Annotates candidate regions using Prokka
Detects prophage/viral regions using PHASTEST
Designs primers using NCBI Primer-BLAST

flowchart TD
    A[FASTQ files + reference genome] -->|paired-end reads| B[Alignment - Bowtie2]
    B -->|extract unmapped reads| C[Assembly - SPAdes]
    C -->|filter contigs over 1000 bp| D[BLASTn comparison]
    D -->|coverage analysis| E[Non-covered regions - candidate targets]
    E -->|annotation| F[Prokka annotation]

    F --> Q{Run PHASTEST}
    Q -->|Yes| G[PHASTEST - prophage or viral detection]
    Q -->|No| H[Primer design - NCBI Primer-BLAST]
    G -->|optional filtering or flagging| H[Primer design - NCBI Primer-BLAST]

    H --> I[Candidate primer targets and suggested primer sets]

Installation:

Clone the repository:

git clone https://github.com/Mtsif/Erimin.git

Requirements:

Make sure you have the following dependencies installed:

Bowtie2 (https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
SPades (https://github.com/ablab/spades)
Samtools (http://www.htslib.org/download/)
Bedtools (https://bedtools.readthedocs.io/en/latest/content/installation.html)
Prokka (https://github.com/tseemann/prokka)

Internet access required

ERIMIN uses online services:

NCBI BLAST
PHASTEST
NCBI Primer-BLAST

Inputs description:

The basic inputs are:

A reference genome in FASTA format
Paired-end sequencing reads in FASTQ format:
- Forward reads (R1.fastq)
- Reverse reads (R2.fastq)
Organism metadata:
- Genus Name
- Species Name
- Strain Name
Number of primer pairs to return per sequence
Number of CPU threads

Example of usage:

bash Erimin.sh -i [R1.fastq R2.fastq] -ref [reference_file] -g [genus] -s [species] -str [strain] -n [num_primers] -t [threads]

This will create the following files:

reference/ 	        - Contains the indexed reference genome files.
assembly/	      - Contains assembled contigs from unmapped reads.
BlastResults/       - Contains BLAST output files and results from coverage analysis.
prokka/             - Contains PROKKA output files (Annotation).
phastest/           - Contains PHASTEST output files (pro-phage/virus sequence detection).
primers/            - Contains the primers files in csv format.
  ├── htmls/        - Primer-BLAST HTML reports
  └── pairs_csv/    - Primer pair csv files.

Primer-BLAST default parameters used by ERIMIN

ERIMIN submits candidate target sequences to NCBI Primer-BLAST with the following default settings (as implemented in run_primerblast()):

Primer design (Primer3) constraints

Primer length: PRIMER_MIN_SIZE=20, PRIMER_OPT_SIZE=22, PRIMER_MAX_SIZE=25
Melting temperature (Tm): PRIMER_MIN_TM=57, PRIMER_OPT_TM=60, PRIMER_MAX_TM=63
Max Tm difference between primers: PRIMER_MAX_DIFF_TM=1
GC content: PRIMER_MIN_GC=45, PRIMER_MAX_GC=55
Product size range: PRIMER_PRODUCT_MIN=70, PRIMER_PRODUCT_MAX=500
Number of primer pairs returned per target: PRIMER_NUM_RETURN=<user value via -n>
Low complexity filtering: LOW_COMPLEXITY_FILTER=1
Thermodynamic alignment: TH_OLIGO_ALIGNMENT=1, TH_TEMPLATE_ALIGNMENT=1

Specificity checking

Specificity database: PRIMER_SPECIFICITY_DATABASE=refseq_representative_genomes
Target organism restriction: ORGANISM=Bacteria (taxid:2)
Total specificity mismatch: TOTAL_PRIMER_SPECIFICITY_MISMATCH=5
3′-end specificity mismatch: PRIMER_3END_SPECIFICITY_MISMATCH=4
Mismatch region length: MISMATCH_REGION_LENGTH=5
Total mismatch ignore: TOTAL_MISMATCH_IGNORE=9
Mispriming library: PRIMER_MISPRIMING_LIBRARY=AUTO

To change any of these values, edit the corresponding -d / --data-urlencode fields inside the run_primerblast() function in Erimin.sh.

Running ERIMIN partially / customizing the workflow

ERIMIN is implemented as a transparent bash workflow. Users can:

Modify the script directly to adjust thresholds (e.g., contig-length cutoff, BLAST settings, Primer-BLAST settings, multiplex selection rules).
Run only specific stages by executing the relevant commands from the script (recommended for debugging or advanced customization).

Common “partial run” entry points:

Stop after assembly to inspect contigs: assembly/contigs.fasta, assembly/contigs_over1000.fasta
Run only the BLAST/coverage steps to produce candidate regions: BlastResults/noCov.txt, BlastResults/seq.txt
Run only annotation: Prokka on an existing FASTA
Run only primer design: call run_primerblast() on a chosen FASTA file

Tip: For iterative development, comment out sections you do not need, or add exit 0 after a stage to stop early.

Primer output files and naming conventions

ERIMIN generates Primer-BLAST outputs per target contig/sequence. File names are derived from the contig (or Prokka-split sequence) basename.

HTML reports

Location: primers/htmls/
Naming: primerblast_results_<base>.html

Primer tables

Location: primers/ (or primers/pairs_csv/ if configured)
Naming: primer_pairs_<base>.csv

Where <base> corresponds to the FASTA header used when splitting sequences into individual files (derived from prokka/<strain>.fna).

If you export primers in FASTA format (optional), ERIMIN uses the following header convention:

> <contig>_F for the forward primer sequence
> <contig>_R for the reverse primer sequence

Example:

>NODE_4_length_3640_cov_46.758911_F
TCTGTACTCAACCACTTCGCTC
>NODE_4_length_3640_cov_46.758911_R
CCATTATTAGCGCCACACCAAG

Optional additional QC

The script also exports the selected primers as a FASTA file, which can be used for additional QC checks such as primer–dimer screening.

Citation

If you use Erimin in your research, please cite:

Tsifintaris M, Koutra P, Tsiartas P, Repanas P, Touliopoulos S, Nelios G, Anastasiadou A, Tamouridou G, Nikolaou A, Tsochantaridis I.
Erimin: A Pipeline to Identify Bacterial Strain Specific Primers.
DNA. 2026; 6(1):11. https://doi.org/10.3390/dna6010011

Read the article here

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
input		input
.Rhistory		.Rhistory
.gitattributes		.gitattributes
Erimin.png		Erimin.png
Erimin.sh		Erimin.sh
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERIMIN

Installation:

Requirements:

Internet access required

Inputs description:

Example of usage:

Primer-BLAST default parameters used by ERIMIN

Running ERIMIN partially / customizing the workflow

Primer output files and naming conventions

Optional additional QC

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ERIMIN

Installation:

Requirements:

Internet access required

Inputs description:

Example of usage:

Primer-BLAST default parameters used by ERIMIN

Running ERIMIN partially / customizing the workflow

Primer output files and naming conventions

Optional additional QC

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages