[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li

Published on The 41st International Conference on Machine Learning (ICML 2024).

Introduction

Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable GraphWords in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from GraphWords to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings:

The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8 / 9$ graph classification and regression tasks.
The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation.
Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges.
The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation.

Installation

To get started with GraphsGPT, please run the following commands to install the environments.

git clone git@github.com:A4Bio/GraphsGPT.git --depth=1
cd GraphsGPT
conda create --name graphsgpt python=3.12
conda activate graphsgpt
pip install -e .[dev]
pip install -r requirements.txt

Quickstart

We provide some Jupyter Notebooks in ./jupyter_notebooks, and their corresponding online Google Colaboratory Notebooks. You can run them for a quick start.

	Jupyter Notebook	Google Colaboratory
GraphsGPT Pipeline	example_pipeline.ipynb
Clustering Analysis	clustering.ipynb
Hybridization Analysis	hybridization.ipynb
Interpolation Analysis	interpolation.ipynb

Checkpoints

The model checkpoints can be downloaded from 🤗 Transformers. We provide both the foundational pretrained models with different number of Graph Words $W$ (GraphsGPT-nW), and the conditional version with one Graph Word (GraphsGPT-1W-C).

Model Name	Model Type	Model Checkpoint
GraphsGPT-1W	Foundation Model
GraphsGPT-2W	Foundation Model
GraphsGPT-4W	Foundation Model
GraphsGPT-8W	Foundation Model
GraphsGPT-1W-C	Finetuned Model

Representation Experiments

You should first download the configurations and data for finetuning, and put them in ./data_finetune. (We also include the finetuned checkpoints in the model_zoom.zip file for a quick test.)

To evaluate the representation performance of the Graph2Seq Encoder, please run:

bash ./scripts/representation/finetune.sh

You can also toggle the --mixup_strategy for graph mixup using Graph2Seq.

Generation Experiments

For the unconditional generation with GraphGPT Decoder, please refer to README-Generation-Uncond.md.

For the conditional generation with GraphGPT-C Decoder, please refer to README-Generation-Cond.md.

To evaluate the few-shots generation performance of GraphGPT Decoder, please run:

bash ./scripts/generation/evaluation/moses.sh
bash ./scripts/generation/evaluation/zinc250k.sh

Citation

@article{gao2024graph,
  title={A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer},
  author={Gao, Zhangyang and Dong, Daize and Tan, Cheng and Xia, Jun and Hu, Bozhen and Li, Stan Z},
  journal={arXiv preprint arXiv:2402.02464},
  year={2024}
}

Contact Us

If you have any questions, please contact:

Zhangyang Gao: gaozhangyang@westlake.edu.cn
Daize Dong: dzdong2019@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Introduction

Installation

Quickstart

Checkpoints

Representation Experiments

Generation Experiments

Citation

Contact Us

About

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
data		data
data_finetune		data_finetune
entrypoints		entrypoints
jupyter_notebooks		jupyter_notebooks
models		models
moses		moses
scripts		scripts
utils		utils
README.md		README.md
graphsgpt.svg		graphsgpt.svg
requirements.txt		requirements.txt
setup.py		setup.py

A4Bio/GraphsGPT

Folders and files

Latest commit

History

Repository files navigation

[GraphsGPT] A Graph is Worth K Words:Euclideanizing Graph using Pure Transformer (ICML2024)

Introduction

Installation

Quickstart

Checkpoints

Representation Experiments

Generation Experiments

Citation

Contact Us

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

[GraphsGPT] A Graph is Worth $K$ Words:
Euclideanizing Graph using Pure Transformer (ICML2024)

Packages