Genome wide identification of heat shock elements

This script is designed to identify heat shock elements (HSEs) from the promoter sequence of genes.

Getting Started

This script runs under Linux. You need to install bedtools(https://github.com/arq5x/bedtools2/releases) and biopython(https://biopython.org/wiki/Download) before running the script and add them to the environment variables.

You need to prepare these input files

Reference genome sequence (fasta)
Coding sequence of genes (fasta)
Genome annotation (Must be in gff3 format)
Genomic chromosome length (tab delimited)

An example of a chromosome length file

Once you have all the files ready, you can run this script!

Usages

 sh ./run_pipeline.sh cds.fa gene.gff3 genome_chr_length.txt genome.fa output_file

NOTICE: The order of the input files cannot be changed!

Output

This script only outputs one file. Format is as follows:

The first column is the gene ID of the promoter containing HSE.
The second and third column are the position of HSEs on the promoter.
The fourth column is the sequence of HSEs.
The number in the fifth column is the subunits of HSEs. “G” and “C” represent the first subunit sequence with 5’-NGAAN-3’ and 5’-NTTCN-3’, respectively.
The sixth column is the mismatched nucleotide. “NA” means HSEs do not contain mismatched nucleotide.
The seventh column is the position of mismatched nucleotide. “1:2” represent the 2nd position in the first subunit.

Scripts Details

The scripts used for identifying heat shock elements (HSEs) contain the following steps:

The script “max_cds_length.py” was used to collect the transcription start sites of each gene according to the genome annotation GFF file.
The script “seq_bed.py” was used to create the bed (Browser Extensible Data) file for each promoter which include the chromosome number, starting and ending positions, gene id and strand information.
The software “bedtools” was used to extract the promoter sequence.
The script “hse_call.py” was used to identify HSEs with regular expressions from promoter sequence.
The scripts “hse_result.py”, “hse_mismatch.py” and “hse_pos.py” were used to characterize and classify the identified HSEs.

Citation

Zhao P, Javed S, Shi X, Wu B, Zhang D, Xu S and Wang X (2020) Varying Architecture of Heat Shock Elements Contributes to Distinct Magnitudes of Target Gene Expression and Diverged Biological Pathways in Heat Stress Response of Bread Wheat. Front. Genet. 11:30. doi: 10.3389/fgene.2020.00030

Contact

For any bugs/issues/suggestions, please send emails to: Peng Zhao pengzhao@nwafu.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
example		example
images		images
scripts		scripts
README.md		README.md
run_pipeline.sh		run_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome wide identification of heat shock elements

Getting Started

You need to prepare these input files

Usages

Output

Scripts Details

Citation

Contact

About

Releases

Packages

Languages

biozhp/hse

Folders and files

Latest commit

History

Repository files navigation

Genome wide identification of heat shock elements

Getting Started

You need to prepare these input files

Usages

Output

Scripts Details

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages