This Jupyter notebook will explore the use of the "deepBind" model on enhancer13.
I will present the data being used, the input and output of the model, as well as the graphical results it produces.
Within a 5.2 kb genomic segment lies Enhancer 13, or eSR-A, a vital player in the orchestration of sexual development. It exclusively exerts its influence in human embryonic testes but remains dormant in ovaries. eSR-A's core, a 1514 bp fragment packed with SOX9 and SF1 binding sites, activates vigorously when exposed to SF1 and SOX9.
This enhancer's significance transcends humans, with a parallel counterpart, Enh13, discovered in mice. In mice, Enh13 deletion leads to complete sex reversal in XY individuals.
The stakes are high for eSR-A, as its deletion in humans results in 46,XY sex reversal, and duplication leads to 46,XX (ovo) testicular disorders of sexual development. This discovery underscores the intricate regulation of genes like SOX9 in sexual development and highlights its conservation and critical role in human and mouse biology.
DeepBind is a neural network model used for predicting DNA- and RNA-binding proteins.
It was trained on a large dataset of protein sequences and their corresponding binding information.
The model aims to accurately identify regions of proteins that interact with DNA or RNA molecules.
Read more about the model here - Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
The "deepBind" model requires DNA sequences in a fasta format, together with a BED file containing regions matching the sequence in the fasta file.
NOTE: in the fasta file, the sequence must start with 'chr' in order the model to work
Sample of FASTA file:
>chren13_WT
CAAAACATCCAGGTGGGCTTCAAAACAGGAAGAGAAAAAAAAGAGAGAGAAACGAAAGGAAAGAAAGAAAAGCCCAGAGTGAAGTTT
Sample of BED file:
chren13_WT 48 64
chren13_WT 56 72
chren13_WT 64 80
chren13_WT 72 88
chren13_WT 80 96
chren13_WT 88 104
The output of the model will be the predicted binding affinity between each enhancer sequence and the transcription factors.
This will be a numerical value that represents the likelihood of binding.
sample of the output:
chren13_WT 48 64 0.8379223
chren13_WT 56 72 1.0102623
chren13_WT 64 80 1.0469577
chren13_WT 72 88 1.0469577
chren13_WT 80 96 1.0469577
chren13_WT 88 104 1.0469577
Objective: To assess the deepBind score for the binding sites of enhancer 13.
In our experiments, we employed the following components:
-
Enhancer 13 Segment: We utilized a specific segment of enhancer 13, as indicated above.
-
Modified Enhancer 13 : An altered version of enhancer 13 segment was used, featuring a 3-base pair deletion on the SOX9 binding site (bs).
We performed our analysis on a 16-base pair window with an 8-base pair shift over the sequence.
In order to generate a bed file with the described windows, the following commaned was preformed
bin/create_segment_file.py 'data/en13_seq.fa' 16 8
deepBind analysis was conducted on three transcription factors (TFs) to evaluate their binding to the enhancer:
- SOX9
- GATA4
- AR (Androgen Receptor)
The AR transcription factor was included as a control in our experiments.
Example command to run the SOX9 model:
bin/deepBind.sh 'data/en13_seq_win_16_shift_8.bed' 'data/en13_seq.fa' 'DeepBind/Homo_sapiens/TF/D00649.002_SELEX_SOX9'
List of all available models of deepBind can be found here
In the analysis of the deepBind results, the following procedure was applied:
-
Binding Score Assignment: Each window in the sequence was assigned a binding score for a specific transcription factor (TF).
-
Average Score Calculation: For every position within the enhancer, we calculated the average score of its binding with the respective TF.
This process ultimately resulted in the assignment of a binding score to each nucleotide within the enhancer sequence.
The graphical representation of the result is provided below:
The code for generating the plot can be found in [deepBind_analysis.ipynb]