Skip to content
A graph-based deep learning framework for life science
Python Jupyter Notebook Java Pawn HTML Shell
Branch: master
Clone or download

Latest commit

Latest commit 4609a79 Apr 1, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
KNIME Update README.md Nov 6, 2019
Notebook Add files via upload Mar 9, 2020
active_learning first commit Jun 30, 2019
data_generator update for additional example Aug 3, 2019
docs docs: Update README.md Mar 9, 2020
example_config support graph isomorphism network Mar 9, 2020
example_data feat: add reaction prediction examples. Jul 2, 2019
example_jbl update examples Sep 12, 2019
example_model support graph attention network Mar 29, 2020
example_param fix: bug fix for the example of hyperparameter optimization. Oct 31, 2019
example_script rename and re-organize directories for new kGCN Sep 12, 2019
gcnvisualizer docs: update README.md and requirements.txt Nov 26, 2019
graph_kernel first commit Jun 30, 2019
hooks first commit Jun 30, 2019
kgcn support graph attention network Mar 29, 2020
kgcn_tf2 We'd like to support tensorflow2 and pytorch Mar 9, 2020
kgcn_torch We'd like to support tensorflow2 and pytorch Mar 9, 2020
logs first commit Jun 30, 2019
model feat: add reaction prediction examples. Jul 2, 2019
neural_architecture_search feat: update nas. Mar 15, 2020
result first commit Jun 30, 2019
sample_chem/compound-protein_interaction docs: update README.md Mar 10, 2020
sample_nx/link_prediction support pip and add sample scripts for networkX Sep 3, 2019
script first commit Jun 30, 2019
script_cv fix bug and update cv_splitter Nov 12, 2019
visualization first commit Jun 30, 2019
Dockerfile Update Dockerfile Oct 21, 2019
LICENSE Update LICENSE Jan 24, 2020
README.md Update README.md Apr 1, 2020
gcn.py fix(NAS): remove unnecessary library import statement. Mar 26, 2020
gcn_gen.py add sample programs for CPI Mar 9, 2020
gcn_pair.py fix bug in visualization Nov 12, 2019
model_functions.py doc: update README and comments. Nov 15, 2019
opt_hyperparam.py fix opt_hyperparam.py Mar 9, 2020
setup.py feat: add kgcn-sparse command Mar 9, 2020
task_sparse_gcn.py fix: bug fix for task_sparse_gcn.py Mar 9, 2020

README.md

kGCN: a graph-based deep learning framework for life science

Installation

A setup script is under construction. Now, you have to execute the python codes directly.

Requirements

  • python3 (> 3.6)
    • tensorflow (>1.12)
    • joblib
    • numpy
    • scipy
    • scikit-learn (> 0.21)
    • matplotlib

For Ubuntu 18.04

For CentOS 7

To install additional modules:

Run the demo

This is a TensorFlow implementation of Graph Convolutional Networks for the task of classification of graphs.

Our implementation of Graph convolutional layers consulted the following paper:

Thomas N. Kipf, Max Welling, Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)

For training with a dataset, example_jbl/synthetic.jbl, by using a neural network defined in example_model/model.py

kgcn train --config example_config/sample.json

where sample.json is a config file.

For testing and inferrence

kgcn infer --config example_config/sample.json --model model/model.sample.last.ckpt

where model/model.sample.last.ckpt is a trained model file.

Sample dataset

Our sample dataset file (example.jbl) is created by the following command:

python example_script/make_example.py

When you create your own dataset, you can refer make_sample.py. This script converts adjacency matrices (example_data/adj.txt), features (example_data/feature.txt), and labels (example_data/label.txt) into the dataset file (example_jbl/sample.jbl)

For example, in training phases, you can specify a dataset as follows:

kgcn train --config example_config/sample.json --dataset example_jbl/sample.jbl

Configuration

You can specify a configuration file (example_config/sample.json) as follows:

kgcn train --config example_config/sample.json

The commands of kgcn

kgcn has three commands: train/infer/train_cv. You can specify a command as follows:

kgcn <command> --config example_config/sample.json
  • train command: The script trains a model and saves it.

  • infer command: The script estimates labels of test data using the loaded model.

  • train_cv command: The command simplifies cross-validation routines including training stages and estimation(evaluation) stages. Once you execute this command, cross-validation is performed by running a seriese of training and estimation programs.

Configulation file

Dataset file: To use your own data, you have to create a dictionary with the following data format and compress it as a joblib dump file.

Visualization file

Reference

@article{Kojima2020,
  author = "Ryosuke Kojima and Shoichi Ishida and Masateru Ohta and Hiroaki Iwata and Teruki Honma and Yasushi Okuno",
  title = "{kGCN: A Graph-Based Deep Learning Framework for Chemical Structures}",
  year = "2020",
  month = "2",
  url = "https://chemrxiv.org/articles/kGCN_A_Graph-Based_Deep_Learning_Framework_for_Chemical_Structures/11859684",
  doi = "10.26434/chemrxiv.11859684.v1"
}

Directory structure

.
├── active_learning/                     :
├── data_generator/                      :
│    ├── synth_generator.py              : random graph
│    └── synth_generator_ring.py         : random graph with ring
├── docs/                                : a set of documents
├── example_config/                      : examples of config files
├── example_data/                        : examples of adj. files, label files, etc.
├── example_jbl/                         : examples of jbl. files
├── example_model/                       : examples of model files
├── example_param/                       : examples of parameter domain files
├── example_script/                      : scripts for the examples
├── gcn_modules/                         :
├── gcnvisualizer/                       : kgcn visualization modules
├── graph_kernel/                        : graph kernel SVM
├── hooks/                               : 
├── kgcn                                 :
│   ├── legacy                           : duplicated scripts
│   ├── preprocessing/                   : scripts for dataset preparaiton for kgcn 
│   ├── core.py                          : a main program files for the GCN model
│   ├── data_util.py                     : utilities for data handling
│   ├── default_model.py                 : 
│   ├── error_checker.py                 : error checker
│   ├── feed.py                          : functions to build feed dictionaries
│   ├── feed_index.py                    : functions to build feed dictionaries (index base)
│   ├── layers.py                        : GCN-related layers
│   ├── make_plots.py                    : functions to plot graphs
│   └── visualization.py                 : functions to visualize trained models
├── kgcn_tf2                             : 
├── kgcn_torch                           :
├── KNIME/                               : 
├── logs/                                : output directory for exmaples
├── model/                               : output directory for exmaples
├── Notebook/                            : examples of jupyter notebook
├── result/                              : output directory for exmaples
├── sample_chem/                         : 
├── sample_nx/                           :
├── script                               : utility sctipts
│   ├── make_dataset.py                  :
│   ├── plot_graph.py                    :
│   ├── show_graph.py                    :
│   └── show_label_balance.py            :
├── script_cv                            : scripts for parallel cross validation
│   ├── 01make_dataset.sh                :
│   ├── 02run_fold.sh                    :
│   └── make_cross_validation_dataset.py : 
├── visualization/                       : output directory for exmaples
├── Dockerfile                           :
├── gcn.py                               : the main engin of this project
├── gcn_gen.py                           : an engin for generative models
├── gcn_pair.py                          : an engin for ranking models
├── LICENSE                              : LICENSE file
├── model_functions.py                   :
├── opt_hyperparam.py                    :an engin for optimization of hyper parameters
├── README.md                            : this file
├── setup.py                             :
└── task_sparse_gcn.py                   : 

Additional sample1

We provide additional example using synthetic data to discriminate 5-node rings and 6-node rings. The following command generates synthetic data as text formats:

python data_generator/synth_generator_ring.py

The following command generates .jbl from text:

python example_script/make_synth.py

The following command carries out cross-validation:

kgcn train_cv --config example_config/synth.json

Accuracy and the other scores are stored in:

result/synth_cv_result.json

More information is stored in:

result/synth_info.json

Additional sample2

We prepared additional samples for multimodal and multitask learning. You can specify a configuration file (sample_multimodal_config.json/sample_multitask_config.json) as follows:

kgcn --config example_config/multimodal.json train

For multimodal, symbolic sequences and graph data are used as the inputs of a neural network. This configuration file specifies the program of model as "model_multimodal.py", which includes definition of neural networks for graphs, sequences, and combining them. Please reffer to sample/seq.txt and a coverting program (make_example.py) to prepare sequence data,

kgcn --config example_config/multitask.json train

In this sample, "multitask" means that multiple labels are allowed for one graph. This configuration file specifies the program of model as "model_multitask.py", which includes definition of a loss function for multiple labels. Please reffer to sample_data/multi_label.txt and a coverting program (make_sample.py) to prepare multi labeled data,

Application example1: compound-protein interaction

Application example2: Reaction prediction and visualization

Generative model

python gcn_gen.py --config example_config/vae.json train

gcn_gen.py is an alternative gcn.py for generative models. example_config/vae.json is a setting for VAE (Variational Auto-encoder) that is implemented in example_model/model_vae.py

Sparse task

First, prepare .tfrecords files in a dataset folder. The files that are named '[train, eval, test].tfrecords' are used for training, eval, test.
You can have multiple files for training, etc. Alternatively, you can just have one file that contains multiple examples for training.
The format of serialized data in .tfrecords:

features = {
        'label': tf.io.FixedLenFeature([label_length], tf.float32),
        'mask_label': tf.io.FixedLenFeature([label_length], tf.float32),
        'adj_row': tf.io.VarLenFeature(tf.int64),
        'adj_column': tf.io.VarLenFeature(tf.int64),
        'adj_values': tf.io.VarLenFeature(tf.float32),
        'adj_elem_len': tf.io.FixedLenFeature([1], tf.int64),
        'adj_degrees': tf.io.VarLenFeature(tf.int64),
        'feature_row': tf.io.VarLenFeature(tf.int64),
        'feature_column': tf.io.VarLenFeature(tf.int64),
        'feature_values': tf.io.VarLenFeature(tf.float32),
        'feature_elem_len': tf.io.FixedLenFeature([1], tf.int64),
        'size': tf.io.FixedLenFeature([2], tf.int64)
}

Then, run following command.

python task_sparse_gcn.py --dataset your_dataset --other_flags

Hyperparamter optimization

kgcn-opt --config ./example_config/opt_param.json  --domain ./example_param/domain.json

kgcn-opt (opt_hyperparam.py) is a command for hyperparameter optimization using GPyOpt library (https://github.com/SheffieldML/GPyOpt), a Bayesian optimization libraly. ./example_config/opt_param.json is a config file to use gcn.py ./example_param/domain.json is a domain file to define hyperparameters and their search spaces. The format of this file follows "domain" of GPyOpt. For more information for this json file, see the GPyOpt document(http://nbviewer.jupyter.org/github/SheffieldML/GPyOpt/blob/devel/manual/index.ipynb ).

Depending on your environment, it might be necessary to change line 9 (opt_cmd) of opt_hyperparam.py

When you want to change and add hyperparameters, please change domain.json and model file. An example of model file is example_model/opt_param.py in which a hyperparameter is num_gcn_layer.

License

This edition of kGCN is for evaluation, learning, and non-profit academic research purposes only, and a license is needed for any other uses. Please send requests on license or questions to kojima.ryosuke.8e@kyoto-u.ac.jp.

You can’t perform that action at this time.