Skip to content

RimGubaev/extract_promoters

Repository files navigation

extract_promoters

Rim Gubaev, 2018

extract_promoters.sh is intended to extract promoter regions with required lenght. The script works well with NCBI data (i.e. fasta and gff files). It perfectly fits for organisms with poorly annotated genomes which are basically presented by large sets of contigs.

Note: the script works correct with gff files from NCBI since start position of the gene in NCBI corresponds to transcription start site.

Input files

  1. multi-fasta file with genomic sequences (i.e contigs or chromosomes)
  2. gff file for above multi-fasta file

Output files

  1. genes.gff - shorter version of gff file that contains information on location of genes
  2. genes.bed - bed formated varsion of genes.gff. bed format detailes are here
  3. fai file generated by samtools. fai format detailes are here
  4. sizes.chr table with chromosome sizes
  5. promoters.bed - bed file with promoter locations for each gene
  6. promoters.fa - multi-fasta file with promoter regions

Required packages:

  1. samtools
  2. bedtools

Email:rimgubaev@gmail.com

About

Bash script for promoter sequences extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published