Skip to content

Mtsif/Erimin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ERIMIN

Detection of strain-specific sequences for primer design in microbial genomes.

ERIMIN logo

ERIMIN is a command-line tool designed for microbial genomic analysis. It enables the detection of strain-specific genomic regions in microbial genomes suitable for primer design by:

  • Aligning paired-end FASTQ reads to a reference genome using Bowtie2
  • Extracting and assembling unmapped reads with SPAdes
  • Filtering contigs > 1000 bp
  • Performing BLASTn comparisons against a user-defined genome database
  • Detecting non-covered regions with Bedtools for candidate primer targets
  • Annotates candidate regions using Prokka
  • Detects prophage/viral regions using PHASTEST
  • Designs primers using NCBI Primer-BLAST
flowchart TD
    A[FASTQ files + reference genome] -->|paired-end reads| B[Alignment - Bowtie2]
    B -->|extract unmapped reads| C[Assembly - SPAdes]
    C -->|filter contigs over 1000 bp| D[BLASTn comparison]
    D -->|coverage analysis| E[Non-covered regions - candidate targets]
    E -->|annotation| F[Prokka annotation]

    F --> Q{Run PHASTEST}
    Q -->|Yes| G[PHASTEST - prophage or viral detection]
    Q -->|No| H[Primer design - NCBI Primer-BLAST]
    G -->|optional filtering or flagging| H[Primer design - NCBI Primer-BLAST]

    H --> I[Candidate primer targets and suggested primer sets]
Loading

Installation:

Clone the repository:

git clone https://github.com/Mtsif/Erimin.git

Requirements:

Make sure you have the following dependencies installed:

Internet access required

ERIMIN uses online services:

  • NCBI BLAST
  • PHASTEST
  • NCBI Primer-BLAST

Inputs description:

The basic inputs are:

  • A reference genome in FASTA format

  • Paired-end sequencing reads in FASTQ format:

    • Forward reads (R1.fastq)
    • Reverse reads (R2.fastq)
  • Organism metadata:

    • Genus Name
    • Species Name
    • Strain Name
  • Number of primer pairs to return per sequence

  • Number of CPU threads

Example of usage:

bash Erimin.sh -i [R1.fastq R2.fastq] -ref [reference_file] -g [genus] -s [species] -str [strain] -n [num_primers] -t [threads] 

This will create the following files:

reference/ 	        - Contains the indexed reference genome files.
assembly/	      - Contains assembled contigs from unmapped reads.
BlastResults/       - Contains BLAST output files and results from coverage analysis.
prokka/             - Contains PROKKA output files (Annotation).
phastest/           - Contains PHASTEST output files (pro-phage/virus sequence detection).
primers/            - Contains the primers files in csv format.
  ├── htmls/        - Primer-BLAST HTML reports
  └── pairs_csv/    - Primer pair csv files.

Primer-BLAST default parameters used by ERIMIN

ERIMIN submits candidate target sequences to NCBI Primer-BLAST with the following default settings (as implemented in run_primerblast()):

Primer design (Primer3) constraints

  • Primer length: PRIMER_MIN_SIZE=20, PRIMER_OPT_SIZE=22, PRIMER_MAX_SIZE=25
  • Melting temperature (Tm): PRIMER_MIN_TM=57, PRIMER_OPT_TM=60, PRIMER_MAX_TM=63
  • Max Tm difference between primers: PRIMER_MAX_DIFF_TM=1
  • GC content: PRIMER_MIN_GC=45, PRIMER_MAX_GC=55
  • Product size range: PRIMER_PRODUCT_MIN=70, PRIMER_PRODUCT_MAX=500
  • Number of primer pairs returned per target: PRIMER_NUM_RETURN=<user value via -n>
  • Low complexity filtering: LOW_COMPLEXITY_FILTER=1
  • Thermodynamic alignment: TH_OLIGO_ALIGNMENT=1, TH_TEMPLATE_ALIGNMENT=1

Specificity checking

  • Specificity database: PRIMER_SPECIFICITY_DATABASE=refseq_representative_genomes
  • Target organism restriction: ORGANISM=Bacteria (taxid:2)
  • Total specificity mismatch: TOTAL_PRIMER_SPECIFICITY_MISMATCH=5
  • 3′-end specificity mismatch: PRIMER_3END_SPECIFICITY_MISMATCH=4
  • Mismatch region length: MISMATCH_REGION_LENGTH=5
  • Total mismatch ignore: TOTAL_MISMATCH_IGNORE=9
  • Mispriming library: PRIMER_MISPRIMING_LIBRARY=AUTO

To change any of these values, edit the corresponding -d / --data-urlencode fields inside the run_primerblast() function in Erimin.sh.

Running ERIMIN partially / customizing the workflow

ERIMIN is implemented as a transparent bash workflow. Users can:

  1. Modify the script directly to adjust thresholds (e.g., contig-length cutoff, BLAST settings, Primer-BLAST settings, multiplex selection rules).
  2. Run only specific stages by executing the relevant commands from the script (recommended for debugging or advanced customization).

Common “partial run” entry points:

  • Stop after assembly to inspect contigs: assembly/contigs.fasta, assembly/contigs_over1000.fasta
  • Run only the BLAST/coverage steps to produce candidate regions: BlastResults/noCov.txt, BlastResults/seq.txt
  • Run only annotation: Prokka on an existing FASTA
  • Run only primer design: call run_primerblast() on a chosen FASTA file

Tip: For iterative development, comment out sections you do not need, or add exit 0 after a stage to stop early.

Primer output files and naming conventions

ERIMIN generates Primer-BLAST outputs per target contig/sequence. File names are derived from the contig (or Prokka-split sequence) basename.

HTML reports

  • Location: primers/htmls/
  • Naming: primerblast_results_<base>.html

Primer tables

  • Location: primers/ (or primers/pairs_csv/ if configured)
  • Naming: primer_pairs_<base>.csv

Where <base> corresponds to the FASTA header used when splitting sequences into individual files (derived from prokka/<strain>.fna).

If you export primers in FASTA format (optional), ERIMIN uses the following header convention:

  • > <contig>_F for the forward primer sequence
  • > <contig>_R for the reverse primer sequence

Example:

>NODE_4_length_3640_cov_46.758911_F
TCTGTACTCAACCACTTCGCTC
>NODE_4_length_3640_cov_46.758911_R
CCATTATTAGCGCCACACCAAG

Optional additional QC

The script also exports the selected primers as a FASTA file, which can be used for additional QC checks such as primer–dimer screening.

Citation

If you use Erimin in your research, please cite:

Tsifintaris M, Koutra P, Tsiartas P, Repanas P, Touliopoulos S, Nelios G, Anastasiadou A, Tamouridou G, Nikolaou A, Tsochantaridis I.
Erimin: A Pipeline to Identify Bacterial Strain Specific Primers.
DNA. 2026; 6(1):11. https://doi.org/10.3390/dna6010011

Read the article here

About

**ERIMIN** is a command-line tool designed for microbial genomic analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages