DTINet is a computational pipeline to predict novel drug-target interactions (DTIs) from heterogeneous network. DTINet focuses on learning a low-dimensional vector representation of features for each node in the heterogeneous network, and then predicts the likelihood of a new DTI based on these representations via a vector space projection scheme. See our paper on Nature Communications and preprint on bioRxiv:100305.
We provide an example script to run experiments on our dataset:
- Run
run_DTINet.m
: predict drug-target interactions, and evaluate the results with cross-validation.
Note: See the "Tutorial" section below for a detailed instruction on how to specify parameters of DTINet, or how to run DTINet on your own dataset.
Supplementary_Data_1.xlsx
: The list of top 150 novel drug-target interactions predicted by DTINet, which was trained based all on drugs and targets that have at least one known interacting pair. Known drug-target pairs (corresponding to those non-zero entries in the drug-target interaction matrix) and novel predicted DTIs that share homologous proteins (with sequence identity scores >40%) with known DTIs were excluded from the list.Supplementary_Data_2.xlsx
: The entire list of novel drug-target interactions predicted by DTINet, which was trained based on all drugs and targets that have at least one known interacting pair.Supplementary_Data_3.xlsx
: Examples of the novel predictions which can be supported by the previous known evidence in the literature.
DTINet.m
: predict drug-target interactions (DTIs)DCA.m
: compact feature learning by integrating heterogeneous networkdiffusionRWR.m
: network diffusion algorithm (random walk with restart)compute_similarity.m
: compute Jaccard similarity based on interaction/association networkauc.m
: evaluation scriptrun_DCA.m
: example code of runningDCA.m
for feature learningrun_DTINet.m
: example code of runningDTINet.m
for drug-target predictiontrain_mf.mexa64
: pre-built binary file of inductive matrix completion algorithm (downloaded from here)download_imc.sh
: download the inductive matrix completion source and build the executable library from source.
drug.txt
: list of drug namesprotein.txt
: list of protein namesdisease.txt
: list of disease namesse.txt
: list of side effect namesdrug_dict_map
: a complete ID mapping between drug names and DrugBank IDprotein_dict_map
: a complete ID mapping between protein names and UniProt IDmat_drug_se.txt
: Drug-SideEffect association matrixmat_protein_protein.txt
: Protein-Protein interaction matrixmat_protein_drug.txt
: Protein-Drug interaction matrixmat_drug_protein.txt
: Drug_Protein interaction matrix (transpose of the above matrix)mat_drug_protein_remove_homo.txt
: Drug_Protein interaction matrix, in which homologous proteins with identity score >40% were excluded (see the paper).mat_drug_drug.txt
: Drug-Drug interaction matrixmat_protein_disease.txt
: Protein-Disease association matrixmat_drug_disease.txt
: Drug-Disease association matrixSimilarity_Matrix_Drugs.txt
: Drug similarity scores based on chemical structures of drugsSimilarity_Matrix_Proteins.txt
: Protein similarity scores based on primary sequences of proteins Note: drugs, proteins, diseases and side-effects are organized in the same order across all files, including name lists, ID mappings and interaction/association matrices.
We provided the pre-trained vector representations for drugs and proteins, which were used to produce the results in our paper.
drug_vector_d100.txt
protein_vector_d400.txt
Our implementation requires the Inductive Matrix Completion (IMC) library. We provide an executable binary file in the src/ folder for convenience. The executable binary file was built on a typical Ubuntu 14.04 (64 bit) system. If you are using other Linux platforms, please consider building the library from its source by running bash download_imc.sh
.
Tips: We recommend users to download and install the IMC library using the download_imc.sh
script. If you download the library yourself from the website of IMC, please be aware that DTINet requires the C/C++ version (with Python and Matlab interfaces). Please do not use the other version, i.e., a pure MATLAB implementation. The pure MATLAB version treats the unknown/missing entries in the interaction matrix as zeros, which is not the same as required in DTINet.
- Put interaction/association matrices in the
data/
folder. - Create a
network/
folder underDTINet/
and runcompute_similarity.m
, which will compute the Jaccard similarity of drugs and proteins, based on interaction/association matrices. - Specify parameters (number of dimensions of feature vectors, restart probability, the maximum number of iterations) and run
run_DCA.m
, which will learn the feature vectors of drugs and proteins and save them in thefeature/
folder. - Set the path of feature vectors and corresponding parameters in
run_DTINet.m
and execute it. This script will predict the drug-target interactions and evaluate the results using a ten-fold cross-validation.
Luo, Y., Zhao, X., Zhou, J., Yang, J., Zhang, Y., Kuang, W., Peng, J., Chen, L. & Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nature Communications 8, (2017).
@article{Luo2017,
author = {Yunan Luo and Xinbin Zhao and Jingtian Zhou and Jinglin Yang and Yanqing Zhang and Wenhua Kuang and Jian Peng and Ligong Chen and Jianyang Zeng},
title = {A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information},
doi = {10.1038/s41467-017-00680-8},
url = {https://doi.org/10.1038/s41467-017-00680-8},
year = {2017},
month = {sep},
publisher = {Springer Nature},
volume = {8},
number = {1},
journal = {Nature Communications}
}
If you have any questions or comments, please feel free to email Yunan Luo (luoyunan[at]gmail[dot]com) and/or Jianyang Zeng (zengjy321[at]tsinghua[dot]edu[dot]cn).