This repository contains the Python code and dataset for the paper titled “DrugTar improves druggability prediction by integrating large language models and gene ontologies” by Niloofar Borhani, Iman Izadi, Ali Motahharynia, Mahsa Sheikholeslami, and Yousof Gheisari. The paper is available at https://doi.org/10.1093/bioinformatics/btaf360.
The online tool is available at www.drugtar.com.
esm2_embedder.py: Generates ESM-2 embeddings for each protein sequence.feats_extraction.py: Extracts features (ESM-2 embeddings and Gene Ontology (GO) terms) and corresponding labels, creating feature matrices as DataFrames.
To train the DrugTar model and validate it using k-fold cross-validation, run the following command:
python scripts/main.py --phase='train' \
--batch_size=32 --num_epochs=110 --k_fold=10 --init_lr=0.0002 \
--save_model='True' --DNN_dims='128_64_32' --dropout=0.5 \
--feat_selected=4000 \
--train_feature='feature/train_feature.pkl' \
--train_label='feature/train_label.pkl' \
--test_feature='feature/test_feature.pkl' \
--test_label='feature/test_label.pkl' \
--model_file='models/' \
--save_pred_file='prediction/druggability_scores.csv'To test the performance of the trained DrugTar model on independent test data:
python scripts/main.py --phase='test' \
--batch_size=32 --num_epochs=110 --init_lr=0.0002 \
--save_model='True' --DNN_dims='128_64_32' --dropout=0.5 \
--feat_selected=4000 \
--test_feature='feature/test_feature.pkl' \
--test_label='feature/test_label.pkl' \
--model_file='models/' \
--save_pred_file='prediction/druggability_scores.csv'
To predict druggability scores for unlabeled proteins using the trained DrugTar model:
python scripts/main.py --phase='prediction' \
--batch_size=32 --num_epochs=110 --init_lr=0.0002 \
--save_model='True' --DNN_dims='128_64_32' --dropout=0.5 \
--feat_selected=4000 \
--test_feature='feature/test_feature.pkl' \
--model_file='models/' \
--save_pred_file='prediction/druggability_scores.csv'
For further inquiries, please contact.
Niloofar Borhani
Ph.D. Student, Control Engineering
Isfahan University of Technology
Email: n.borhani@ec.iut.ac.ir
CV: Google Scholar
