This is the source code and data for the task of drug discovery as described in our paper: "SemaTyP: a knowledge graph based literature mining method for drug discovery"
- scikit-learn
- numpy
- tqdm
In order to use the code, you have to provide
- Theraputic Target Database You don't need to download by yourself, I have uploaded all the TTD 2016 version in <./data/TTD>.
- SemedDB You need to download from here with password:1234 to obtain the whole knowledge graph. After downloading the "predications.txt" file, please replace the file <./data/SemedDB/predications.txt>. with this new downloaded file.
Install the environment.
pip install -r requirements.txt
Construct training and test data.
python experimental_data.py
Train and test the model.
python main.py
An illustration of the features constructed in our work.
data/SemmedDB: contains all relations extracted from SemmedDB, which are used for constructing the Knowledge Graph in our experiment. The whole "predications.txt" contains 39,133,975 relations, we just leave a small sample "predications.txt" file here which contain 100 relation. The whole "predications.txt" file coule be downloaded from
data/TTD: contains the drug, target and disease relations retrieved from Theraputic Target Database.
experimental_data.py: constuct the drug-target-disease associations from TTD and Knowledge Graph.
knowledge_graph.py: construct the Knowledge Graph used in our experiment.
data_loader.py:used to load traing and test data.
main.py:used to train and test the models
Please cite our paper if you use this code in your own work:
@article{sang2018sematyp,
title={SemaTyP: a knowledge graph based literature mining method for drug discovery},
author={Sang, Shengtian and Yang, Zhihao and Wang, Lei and Liu, Xiaoxia and Lin, Hongfei and Wang, Jian},
journal={BMC bioinformatics},
volume={19},
number={1},
pages={1--11},
year={2018},
publisher={Springer}
}