Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.25 KB

README.md

File metadata and controls

25 lines (18 loc) · 1.25 KB

extract_promoters

Rim Gubaev, 2018

extract_promoters.sh is intended to extract promoter regions with required lenght. The script works well with NCBI data (i.e. fasta and gff files). It perfectly fits for organisms with poorly annotated genomes which are basically presented by large sets of contigs.

Note: the script works correct with gff files from NCBI since start position of the gene in NCBI corresponds to transcription start site.

Input files

  1. multi-fasta file with genomic sequences (i.e contigs or chromosomes)
  2. gff file for above multi-fasta file

Output files

  1. genes.gff - shorter version of gff file that contains information on location of genes
  2. genes.bed - bed formated varsion of genes.gff. bed format detailes are here
  3. fai file generated by samtools. fai format detailes are here
  4. sizes.chr table with chromosome sizes
  5. promoters.bed - bed file with promoter locations for each gene
  6. promoters.fa - multi-fasta file with promoter regions

Required packages:

  1. samtools
  2. bedtools

Email:rimgubaev@gmail.com