A Simple Framework for Learning MLPs on Graph without Supervision

This repo presents implementation of SimMLP: A simple framework for learning MLPs on graphs without supervision.

Link: https://arxiv.org/abs/2402.08918

Introduction

We present a simplified framework (SimMLP) to enhance model generalization, especially on unseen nodes. In particular, we employ self-supervised alignment between GNN and MLP embeddings to model the fine-grained and generalizable correlations between node features and graph structures.

Getting Started

Setup Environment

We use conda for environment setup. Please run the bash as

conda create -y -n SimMLP python=3.8
conda activate SimMLP

pip install -r requirements.txt
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Here a conda environment named SimMLP and install relevant requirements from requirements.txt. Be sure to activate the environment via conda activate SimMLP before running experiments as described.

Dataset Preparation

There is no need to prepare the datasets manually. It would be automatically downloaded via Pytorch Geometric library. Please refer to our paper for more specific dataset details.

Usage

Basic Usage

To quickly run the model, please run main.py by specifying the experiment setting. Here is an example.

python main.py --setting trans --dataset cora --use_params

To ensure the reproducibility of the paper, we provide the detailed hyper-parameters under param folder. You can use it by specifying --use_params in the script.

Dataset

We support ten public datasets, including cora, citeseer, pubmed, amazon-cs, amazon-photo, co-cs, co-phys, wiki-cs, flickr, ogb-arxiv. Please refer to the function get_node_clf_dataset in dataset.py for details. This enables the quick adaptation of new datasets. We encourage researchers to use Nvidia A100 (80GB) for the Arxiv dataset, and Nvidia GeForce RTX 3090 (24GB) for the remaining datasets.

Evaluation Settings

SimMLP can perform on three settings, including transductive, inductive, and cold-start. We describe these settings in the paper as:

You can run these settings by specifying the --setting trans or --setting ind. Note for SimMLP, the inference phase is naturally structure-free, thus the cold-start result is equivalent to the inductive result in inductive setting.

Extension to Graph Classification

The basic usage is the same as node classification task, but it is unnecessary to specify the experimental settings. You only need to denote the dataset. We support collab, dd, imdb-b, imdb-m, mutag, proteins, ptc-mr. You can do extension in the function get_graph_clf_dataset under dataset.py.

python main_graph.py --dataset collab --use_params

The hyper-parameters are also provided under param/graph/.

Extension to Supervised Version

To run the experiments on supervised SimMLP, please run the following script. We only implement the model on the transductive setting for simplicity. We encourage researchers to extend this in the other two settings.

python main_sup.py --dataset cora --use_params

As analyzed in the paper, the supervised version cannot achieve desirable performance. We consider it might because of the inconsistency (or even conflict) between the supervised cross-entropy loss and the self-supervised alignment loss. We leave this in the future work.

Experimental Results

We present the experimental results on node classification with three settings. Please refer to the paper for more experimental results.

Contact Us

Please open an issue or contact zwang43@nd.edu if you have questions.

Citation

Please cite the following paper corresponding to the repository:

@article{wang2024graph,
  title={Graph Inference Acceleration by Learning MLPs on Graphs without Supervision},
  author={Wang, Zehong and Zhang, Zheyuan and Zhang, Chuxu and Ye, Yanfang},
  journal={arXiv preprint arXiv:2402.08918},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
param		param
SimMLP.png		SimMLP.png
args.py		args.py
augment.py		augment.py
dataset.py		dataset.py
eval.py		eval.py
loss.py		loss.py
main.py		main.py
main_graph.py		main_graph.py
main_sup.py		main_sup.py
model.py		model.py
readme.md		readme.md
requirement.txt		requirement.txt
result_ind_cold.png		result_ind_cold.png
result_trans.png		result_trans.png
settings.png		settings.png
utils.py		utils.py

Zehong-Wang/SimMLP

Folders and files

Latest commit

History

Repository files navigation

A Simple Framework for Learning MLPs on Graph without Supervision

Introduction

Getting Started

Setup Environment

Dataset Preparation

Usage

Basic Usage

Dataset

Evaluation Settings

Extension to Graph Classification

Extension to Supervised Version

Experimental Results

Contact Us

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages