Scripts for "Efficient Hyperparameter Optimization by Using Bayesian Optimization for Drug-Target Interaction Prediction"
A Bayesian optimization technique enables a short search time for a complex prediction model that includes many hyperparameters while maintaining the accuracy of the prediction model. Here, we apply a Bayesian optimization technique to the drug-target interaction (DTI) prediction problem as a method for computational drug discovery. We target neighborhood regularized logistic matrix factorization (NRLMF) (Liu et al., 2016), which is a state-of-the-art DTI prediction method, and accelerated parameter searches with the Gaussian process mutual information (GP-MI). Experimental results with four general benchmark datasets show that our GP-MI-based method obtained an 8.94-fold decrease in the computational time on average and almost the same predicted area under the curve (AUC) for all datasets compared to those of a grid parameter search, which was generally used in DTI predictions. Moreover, if a slight accuracy reduction (approximately 0.002 for AUC) is allowed, an increase in the calculation speed of 18 times or more can be obtained. Our results show for the first time that Bayesian optimization works effectively for the DTI prediction problem. By accelerating the time-consuming parameter search, the most advanced model can be used even if the number of drug candidates and target proteins to be predicted increase.
You need to use Python 3.x for executing this scripts. We recommends that you use Anaconda 2.4.0 to set up python environment. This script was created by using Python 3.5.2. For Python 3.5.2 please refer to the following URL.
In addition, we use Numpy, scikit-learn (ver. 0.18.1 and above), scipy, pymatbridge (required only when using KBMF 2K) as Python package. For each package please refer to the following URL.
− Numpy: http://www.numpy.org/
− scikit-learn: http://scikit-learn.org/stable/
− scipy: http://www.scipy.org/
− pymatbridge: http://arokem.github.io/python-matlab-bridge/
In order to execute the script, the Drug-Target Interaction data set created by Yamanishi et al. Is necessary. The data set can be downloaded from the following URL.
− nr_admat_dgc.txt, nr_simmat_dc.txt, nr_simmat_dg.txt
− gpcr_admat_dgc.txt, gpcr_simmat_dc.txt, gpcr_simmat_dg.txt
− ic_admat_dgc.txt, ic_simmat_dc.txt, ic_simmat_dg.txt
− e_admat_dgc.txt, e_simmat_dc.txt, e_simmat_dg.txt
- Download the archive of BO-DTI-master from this repository.
- Extract the archive and cd into the extracted directory.
- Run make command.
$ cd BO-DTI-master $ mkdir dataset $ cp ~/Downloads/*_admat_dgc.txt dataset $ cp ~/Downloads/*_simmat_dc.txt dataset $ cp ~/Downloads/*_simmat_dg.txt dataset
You can specify the following options
- gpmi ... GPMI algorithm can be used instead of grid search
- delta ... Adjust the balance between exploration and usage: delta > 0
- max_iter ... Specify the maximum value of iteration (number of combinations of parameters): max_iter > 0
- n_init ... Specify the initial number of samples: n_init > 0
- seed ... Fix the division of cross validation
- job-id ... Specify the job id
- workdir ... Specify the directory to output log files
For other, please refer to PyDTI
- Command to execute grid search
$ python PyDTI.py --method="nrlmf" --dataset="nr" --cvs=1 --specify-arg=0 --predict-num=0 --seed="1" --job-id="1" --workdir="."
- Command to execute GPMI algorithm
$ python PyDTI.py --method="nrlmf" --dataset="nr" --cvs=1 --specify-arg=0 --predict-num=0 --gpmi="delta=1e-100 max_iter=2688 n_init=1" --seed="1" --job-id="1" --workdir="."
This script was created based on PyDTI developed by Liu et al. PyDTI can be accessed from the following URL.
These scripts was implemented by Tomohiro Ban.
Department of Computer Science, School of Computing, Tokyo Institute of Technology, Japan
If you have any questions, please feel free to contact the author.
Tomohiro Ban, Masahito Ohue, Yutaka Akiyama: Efficient Hyperparameter Optimization by Using Bayesian Optimization for Drug-Target Interaction Prediction, In Proceedings of the 7th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS 2017), 6 pages, Orlando, FL, USA, October 19-21, doi:10.1109/ICCABS.2017.8114299, 2017. https://doi.org/10.1109/ICCABS.2017.8114299
(Conference Website) http://www.iccabs.org/
Copyright © 2017 Akiyama Laboratory, Tokyo Institute of Technology, All Rights Reserved.