Skip to content

GetPrimers command line manual

codeatcg edited this page May 5, 2022 · 6 revisions

Installation

Linux 64-bit operation system, GCC, g++ compiler, gzip and perl (>=5.8) environment are pre-required. Run the following commands to install GetPrimers and third-party softwares,

git clone https://github.com/codeatcg/GetPrimers.git
cd GetPrimers
sh install.sh

Third-party softwares

WindowMasker
BLAST+
primer3

Workflow of GetPrimers

For short primer strategy GetPrimer used WindowMasker to mask repeat regions in the genome and then extracted gene sequence based on GFF file. Primer3 was used to get raw primers. After preliminary quality control the primers were mapped to the genome by blastn. Based on the alignment information the product sizes were predicted. Subsequently the primers were filtered further. Then all combinations of upstream and downstream primers were evaluated and graded. Finally, the pointed number of gene targeting primer sets were outputted.

For long primer strategy GetPrimer also used WindowMasker to mask repeat regions in the genome and then extracted gene sequence based on GFF file. The default primer size was 65 bp with a 45 bp homology to yeast genome and 20 bp homology to plasmid. The primer size can be pointed by customers by modifying the config file. Then the primer section that was homologous to yeast genome was aligned to the source genome by blastn and the hit counts were saved.

  • pipeline of short primer strategy

Schematic of short and long primer strategy

Short primer strategy

  • Knockout
  • C' tagging
  • N' tagging

Long primer stragety

  • Knockout
  • C' tagging
  • N' tagging

Verification primers

  • Knockout
  • C' tagging
  • N' tagging

Rating criteria

For short primer strategy the candidate gene targeting primers were graded based on a serial of criteria. When ‘--blastn’ option is set in silico PCR amplification will be run and the number of probable products will be as one of the criteria to grade the primer sets. Here, a targeting primer set is composed of primer G1, G3, G4 and G2. A verification primer set is composed of V1, V3 or V4, V2 for knockout and V1, V2 for C’ tagging and N’ tagging.

  • in silico PCR
  • rating criteria for gene targeting primers
  • rating criteria for verification primers

Options

--ctag                  design primers for adding tags at the C terminal
--ntag                  design primers for adding tags at the N terminal
--knockout              design primers for gene knockout
--force                 force to output G3 and G4 that don't meet the criteria to design primers
--refine                refine primer design
--sscodon               primers for gene knockout contain start codon and stop codon
--mode        [string]  method for primer design, by default: short (long | short)
--coordinate  [string]  coordinate of target gene, any location in the gene. format: chr:coordinate
--coordList   [file]    list of coordinates of target genes. one line per gene
--geneName    [string]  name of target gene, name can be ID, GeneName or DbxrefID
--geneList    [file]    list of names of target genes. one line per gene
--all                   design primers for all genes
--gff         [file]    GFF file
--genome      [file]    reference genome file
--mask                  mask repetitive elements of genome
--marker      [file]    selection marker or insertion sequence
--rec                   redesign common sequence of primers, which is homologous to the insertion cassette (section of P1 and P2, V5, V6)
--outDir      [dir]     output directory
--config      [file]    config file (in most cases there's no need to modify the file)
--nump        [int]     number of primers output at most, by default 5
--bundle      [int]     in case of designing primers for many genes bundle a certain number of genes together, by default 1000
--blast                 run blast
--thread      [int]     number of threads
--pcon        [float]   primer concentration, by default 50 nM (unit nM)
--salt        [float]   concentration of monovalent cation, by default 50 mM (unit mM)
--dsalt       [float]   concentration of divalent cation, by default 1.5 mM (unit mM)
--dntp        [float]   concentration of dNTP, by default 0.6 mM (unit mM)
--thermo                thermodynamic models is used for oligo-oligo interactions and hairpins

Input

Genome file and GFF file

Flat or gzip compressed file is supported. When using files downloaded from Ensembl or NCBI please make sure that Genome file and GFF file are from the same database.

Marker file

Marker file contains a DNA sequence from plasmid, which is the insertion cassette. Several insertion cassettes are available in ‘plasmid’ directory.

Config File

For most situations, there is no need to modify the config file ‘config.txt’ in script directory. The program searches the config file ‘config.txt’ in script directory automatically. You can also define a different file path by using ‘--config’. Please read the comment line in the file ‘config.txt’ carefully if you want to modify the default values.

Gene Name

When genome file and GFF file downloaded from NCBI are used Gene Names, Annotation ID and Gene ID are all supported. For example:

  • Annotation ID List:
    YAL067C
    YAL064C-A

  • Gene Name List:
    SEO1
    TDA8

  • Gene ID List:
    851230
    851234

When genome file and GFF file downloaded from Ensembl are used Gene Names and Annotation ID are supported but Gene ID is not.

Gene Location

The format is (Chromosome name):(Coordinate). Coordinate can be any position in a gene. For example:

  • NCBI
    NC_001133.9:7236
    NC_001133.9:13364

  • Ensembl
    I:7236
    I:13364

Output

Four sub-directories will be created automatically in the output directory. Intermediate files are saved in directory 'primer_blast',‘primer_para’and‘primer_raw'. The final results are saved in directory 'primer_result'.

Example

Download test data

wget -c http://ftp.ensemblgenomes.org/pub/fungi/release-53/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
wget -c http://ftp.ensemblgenomes.org/pub/fungi/release-53/gff3/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz

Run the program

Getprimers can design gene targeting primers automatically for all genes in the genome or a list of genes or one gene. Gene name and gene location are both supported. Please use the absolute file paths running the tests. The default value of option '--mode' is 'short'. If you want to use long primer strategy please add option '--mode long' in the following command line.

Design primers for all genes in the genome

knockout

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_knockout.fa --outDir all_knockout --mask --all --thread 10 --knockout --nump 10 --blast --force --refine --thermo

C’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_ctag.fa --outDir all_ctag --mask --all --thread 10 --ctag --nump 10 --blast --force --refine --thermo

N’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_GFP_ntag.fa --outDir all_ntag --mask --all --thread 10 --ntag --nump 10 --blast --force --refine --thermo

Design primers for a list of genes

  • Gene coordinate list

knockout

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_knockout.fa --outDir coord_list_knockout --mask –coordList coord1.list --thread 10 --knockout --nump 10 --blast --force --refine --thermo

C’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_ctag.fa --outDir coord_list_ctag --mask –coordList coord1.list --thread 10 --ctag --nump 10 --blast --force --refine --thermo

N’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_GFP_ntag.fa --outDir coord_list_ntag --mask –coordList coord1.list --thread 10 --ntag --nump 10 --blast --force --refine --thermo
  • Gene Name list

knockout

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_knockout.fa --outDir name_list_knockout --mask --geneList geneName1.list --thread 10 --knockout --nump 10 --blast --force --refine --thermo

C’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_ctag.fa --outDir name_list_ctag --mask --geneList geneName1.list --thread 10 --ctag --nump 10 --blast --force --refine --thermo

N’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_GFP_ntag.fa --outDir name_list_ntag --mask --geneList geneName1.list --thread 10 --ntag --nump 10 --blast --force --refine --thermo

Design primers for one gene

  • Gene coordinate

knockout

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_knockout.fa --outDir coord_one_knockout --mask –coordinate I:7236 --thread 10 --knockout --nump 10 --blast --force --refine --thermo

C’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_ctag.fa --outDir coord_one_ctag --mask –coordinate I:7236 --thread 10 --ctag --nump 10 --blast --force --refine --thermo

N’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_GFP_ntag.fa --outDir coord_one_ntag --mask –coordinate I:7236 --thread 10 --ntag --nump 10 --blast --force --refine --thermo
  • Gene name

knockout

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_knockout.fa --outDir name_one_knockout --mask --geneName SEO1 --thread 10 --knockout --nump 10 --blast --force --refine --thermo

C’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_ctag.fa --outDir name_one_ctag --mask --geneName SEO1 --thread 10 --ctag --nump 10 --blast --force --refine --thermo

N’ tagging

perl mainFun.pl --genome Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz --gff Saccharomyces_cerevisiae.R64-1-1.53.gff3.gz --marker pFA6_GFP_ntag.fa --outDir name_one_ntag --mask --geneName SEO1 --thread 10 --ntag --nump 10 --blast --force --refine --thermo