The official repository of the paper MocFormer: A Two-Stage Pre-training-Driven Transformer for Drug-Target Interactions Prediction
A novel two-stage pre-trained framework (Mocformer) is proposed for drug-target interactions prediction. In the first stage, pre-trained molecule and protein models develop a comprehensive feature representation, enhancing the framework's ability to handle drug and protein diversity. This also reduces bias, improving prediction accuracy. In the second stage, a transformer with bilinear pooling and a fully connected layer enables predictions based on feature vectors.
git clone https://github.com/rickwang28574/MocFormer.git
cd MocFormer
conda create -n MocFormer python==3.8.1
conda activate MocFormer
pip install -r requirements.txt
The two datasets provided below are results obtained by processing and fine-tuning small molecules and proteins using Unimol and ESM-2, respectively. The composition of the datasets includes: SMILES representations for small molecules, amino acid sequences for proteins, embeddings for small molecules, embeddings for proteins, and labels.
For DrugBank dataset
https://drive.google.com/file/d/1PsFQusALcyp2NkjFs5xw-CHhSqZpzSVr/view?usp=sharing
For Epigenetic-regulators dataset
Train: https://drive.google.com/file/d/1_aJX3UBZMDsi32EZz25BAW3KRvdQHsy9/view?usp=sharing
Test: https://drive.google.com/file/d/1k-Y6fBAY8U8IukxaO9Dhh4ESzYRLtIGz/view?usp=sharing
Please run the billnear_DrugBank_uni_esm2_3B_trans copy.ipynb notebook sequentially, and you will obtain results in the "Test" section of the file. In this Jupyter notebook, we have provided model training and testing code using the DrugBank dataset as an example. The model for the Epigenetic-regulators dataset will be very similar.
@article{Zhang2023.09.13.557595,
title={MocFormer: A Two-Stage Pre-training-Driven Transformer for Drug-Target Interactions Prediction},
author={Yi-Lun Zhang and Wen-Tao Wang and Jia-Hui Guan and Deepak Kumar Jain and Tian-Yang Wang and Swalpa Kumar Roy},
journal={International Journal of Computational Intelligence Systems},
year={2024}
}