PTFSpot

Description

PTFSpot: Deep co-learning on transcription factors and their binding regions attains impeccable universality in plants.

Unlike animals, variability in transcription factors (TF) and their binding regions (TFBR) across the plants species is a major problem which most of the existing TFBR finding software fail to tackle, rendering them hardly of any use. This limitation has resulted into underdevelopment of plant regulatory research and rampant use of Arabidopsis like model species, generating misleading results. Here we report a revolutionary transformers based deep-learning approach, PTFSpot, which learns from TF structures and their binding regions co-variability to bring a universal TF-DNA interaction model to detect TFBR with complete freedom from TF and species specific models’ limitations. During a series of extensive benchmarking studies over multiple experimentally validated data, it not only outperformed the existing software by >30% lead, but also delivered consistently >90% accuracy even for those species and TF families which were never encountered during model building process. PTFSpot makes it possible now to accurately annotate TFBRs across any plant genome even in the total lack of any TF information, completely free from the bottlenecks of species and TF specific models.

Architecture

Figure: Implementation of the PTFSpot Deep Co-learning system using Transformers and DenseNet to identify TF binding regions across plant genomes. The first part is a 14 heads attention transformers which learn from the dimeric, pentameric and heptameric words representations of any given sequence arising from anchoring prime motif’s context. In the parallel, the bound TF’s structure is learned by the DenseNet. Learning by both partners are joined finally together, which is passed on to the final fully connected layers to generate the probability score for existence of binding region in the center.

Web server

PTFSpot can be used directly from this web-server. This server implements trained models and can process both individual sequences or fasta files. Also, a tutorial is hosted on the PTFSpot web-server.

Package installation

The latest version of the package can be downloaded from the GitHub repository.

Requirements

Python3.6 or higher
Numpy==1.23.5
keras==2.13.1
tensorflow==2.13.1
plotly==5.11.0
pandas==1.5.0
bayesian-optimization==1.2.0
bedops (sudo apt-get install bedops -y)
Bedtools (sudo apt-get install bedtools -y)

File description

1. hyper_param.py = Python script build model implementing hyperparameter tuning
2. file_for_tuning: file containing label (0/1) and sequence (positive and negative instances). All in one line separated by tab for a single instance.
3. genomic_sequence = example fasta sequence.
4. PTFSpot.sh = Shell execution script.
5. bimodal.h5  = Universal Transformer-DenseNet System.
6. pdbset.py = pdb file parser
7. ptfspot.py = universal model script

Note

Always place your Alphafold2 generated TF pdb file in "pdb" folder (An example is provided).
This project was executed within the Ubuntu Linux platform's Open Source OS environment.

To build model implementing hyperparameter tuning

python3 hyper_param.py file_for_tuning

file_for_tuning: file containing label (0/1) and sequence (positive and negative instances). All in one line separated by tab for a single instance.

Running script

Module: Transformer-DenseNet system (identify TF binding regions within sequence with the corresponding Alphafold2 generated TF protein structure) To detect the TF binding region, In parent directory execute following command:

unzip deeptfactor.zip
chmod a+x PTFSpot.sh
sh PTFSpot.sh <fasta file> <folder path> <Alphafold generated PDB file> <folder path of PTFSpot>
eg: sh PTFSpot.sh genomic_sequence folderpath ABF2 /home/user/PTFSpot

To generate line plot, execute the following command:
python3 make-plot.py seq1.csv (filename) (generated plots are interactive)

Output description

TFBS detection module gives output in following format

ABF1_genomic_sequence.txt = TF binding regions result

Citation

Citation: Gupta S, Kesarwani V, Bhati U, Jyoti, Shankar R (2024) Deep co-learning on transcription factors and their binding regions attains impeccable universality in plants. bioRxiv 2024. Read research article here.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Figure.png		Figure.png
PTFSpot.zip		PTFSpot.zip
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure.png

Figure.png

PTFSpot.zip

PTFSpot.zip

README.md

README.md

logo.png

logo.png

Repository files navigation

PTFSpot

Description

Architecture

Web server

Package installation

Requirements

File description

Note

To build model implementing hyperparameter tuning

Running script

Output description

Citation

About

Releases

Packages

SCBB-LAB/PTFSpot

Folders and files

Latest commit

History

Repository files navigation

PTFSpot

Description

Architecture

Web server

Package installation

Requirements

File description

Note

To build model implementing hyperparameter tuning

Running script

Output description

Citation

About

Resources

Stars

Watchers

Forks