Implementation of Graph Machines using python, and their application to a dataset composed of molecule.
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model
│ summaries
│
├── notebooks <- Jupyter notebooks. Naming c
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, generated graphics and
│ figures to be used in reporting
│
├── Pipfile,Pipfile.lock <- File used for pipenv
│
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ ├── load_dataset.py
│ │ └── make_dataset.py
│ │
│ ├── scott <- Scripts to turn raw data into newick format
│ │
│ │
│ ├── models <- Scripts to train_regression models and then use trained models to
│ │ ├── predict_model.py make predictions
│ │ └── train_model.py
│ │
│ ├── visualization <- Scripts to create exploratory and results oriented visualizations
│ │ └── visualize.py
│ │
│ └── Net <- Neural Network
│ └── FNN_GM_Net.py
│
├──GM-Classification.py <- Script for classification task
│
└──GM-Regression.py <- Script for regression task
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
The datasets used can be downloaded from : https://brunl01.users.greyc.fr/CHEMISTRY/
First of all you need to get this repo on your pc:
git clone https://github.com/elbarto91/GraphMachines-.git
The repository downloaded is ready-to-go, so you may only extract it. To execute the script(train/predict) in a separate environment, must be installed python 3.7(.5) and pipenv.
Get python3:
sudo apt-get update
sudo apt-get install python3.7
Get pipenv:
sudo pip/pip3 install pipenv
From the main folder of the project,in order to install the environment:
sudo pipenv install
In order to display the several plot, may need to install a package "tkinter" :
sudo apt-get install python3-tk
usage: GM-Regression.py [-h] [-d DEVICE] [-e NUM_EPOCHS]
[-hln HIDDEN_LAYER_SIZE] [-lr LEARNING_RATE]
[-r REPORT] [-rdd ROOTDIRDATASET] [-trf TRAINFILE]
[-tef TESTFILE] [-s SAVE] [-l LOAD] [-b BIAS]
[-rn REPORTNAME] [-mp MODELPATH]
optional arguments:
-h, --help show this help message and exit
-d DEVICE, --device DEVICE
device to use(GPU or CPU(defualt))
-e NUM_EPOCHS, --num_epochs NUM_EPOCHS
number of epochs,default=10000
-hln HIDDEN_LAYER_SIZE, --hidden_layer_size HIDDEN_LAYER_SIZE
number of nodes for the hidden layer, default = 4
-lr LEARNING_RATE, --learning_rate LEARNING_RATE
learning rate for the optimizer, default = 0.001
-r REPORT, --report REPORT
save result in a report file
-rdd ROOTDIRDATASET, --rootDirDataset ROOTDIRDATASET
directory of dataset files
-trf TRAINFILE, --trainFile TRAINFILE
dataset containing the name on the trainset files
-tef TESTFILE, --testFile TESTFILE
dataset containing the name on the testset files
-s SAVE, --save SAVE True if you want to save the model, default = False
-l LOAD, --load LOAD True if you want to load the model, default = False
-b BIAS, --bias BIAS bias value, default = 1
-rn REPORTNAME, --reportName REPORTNAME
base name for the report's folder
-mp MODELPATH, --modelPath MODELPATH
model's path
In order to use the environment created with pipenv you need to launch it(from the root folder of the project) :
pipenv shell
python GM-Regression.py -e 1000 -rdd data/processed/Acyclic/ -trf trainset_0.ds -tef testset_0.ds --reportName ACYCLIC --save True
(Check that yu have a saved model)
python GM-Regression.py -rdd data/processed/Acyclic/ --report True -tef testset_0.ds --reportName ACYCLIC --load True --modelPath models/ACYCLIC/model_testset_0.ds-Dvalue12-maxMValue4-Saved.pth
- Pycharm - Integrated development environment (IDE)
- Git - distributed version-control system for tracking changes in source code during software development.
- Python 3.7.5 - Interpreted, high-level, general-purpose programming language
- Pytorch - An open source machine learning framework
- Jupyter Notebook - Open-source web application
- Pipenv - Packaging tool for Python ####Based on:
- Scott - software able to compute, for any fully-labelled (edge and node) graph, a canonical tree representative of its isomorphism class, that can be derived into a canonical trace (string) or adjacency matrix
- Graph Machines and Their Applications to Computer-Aided Drug Design: A New Approach to Learning from Structured Data - Graph machines learn real numbers from graphs.
This project is licensed under the MIT License - see the LICENSE.md file for details
- If you want to use a different dataset, be sure to use the same layout of the dataset in processed.
- Use
watch -n 0.5 nvidia-smi
to check the gpu status