Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Citation

@misc{he2023harnessing,
      title={Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning}, 
      author={Xiaoxin He and Xavier Bresson and Thomas Laurent and Adam Perold and Yann LeCun and Bryan Hooi},
      year={2023},
      eprint={2305.19523},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

0. Python environment setup with Conda

conda create --name TAPE python=3.8
conda activate TAPE

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install -c pyg pytorch-sparse
conda install -c pyg pytorch-scatter
conda install -c pyg pytorch-cluster
conda install -c pyg pyg
pip install ogb
conda install -c dglteam/label/cu113 dgl
pip install yacs
pip install transformers
pip install --upgrade accelerate

1. Download TAG datasets

A. Original text attributes

Dataset	Description
ogbn-arxiv	The OGB provides the mapping from MAG paper IDs into the raw texts of titles and abstracts. Download the dataset here, unzip and move it to `dataset/ogbn_arxiv_orig`.
ogbn-products (subset)	The dataset is located under `dataset/ogbn_products_orig`.
arxiv_2023	Download the dataset here, unzip and move it to `dataset/arxiv_2023_orig`.
Cora	Download the dataset here, unzip and move it to `dataset/cora_orig`.
PubMed	Download the dataset here, unzip and move it to `dataset/PubMed_orig`.

B. LLM responses

Dataset	Description
ogbn-arxiv	Download the dataset here, unzip and move it to `gpt_responses/ogbn-arxiv`.
ogbn-products (subset)	Download the dataset here, unzip and move it to `gpt_responses/ogbn-products`.
arxiv_2023	Download the dataset here, unzip and move it to `gpt_responses/arxiv_2023`.
Cora	Download the dataset here, unzip and move it to `gpt_responses/cora`.
PubMed	Download the dataset here, unzip and move it to `gpt_responses/PubMed`.

2. Fine-tuning the LMs

To use the orginal text attributes

WANDB_DISABLED=True TOKENIZERS_PARALLELISM=False CUDA_VISIBLE_DEVICES=0,1,2,3 python -m core.trainLM dataset ogbn-arxiv

To use the GPT responses

WANDB_DISABLED=True TOKENIZERS_PARALLELISM=False CUDA_VISIBLE_DEVICES=0,1,2,3 python -m core.trainLM dataset ogbn-arxiv lm.train.use_gpt True

3. Training the GNNs

To use different GNN models

python -m core.trainEnsemble gnn.model.name MLP
python -m core.trainEnsemble gnn.model.name GCN
python -m core.trainEnsemble gnn.model.name SAGE
python -m core.trainEnsemble gnn.model.name RevGAT gnn.train.lr 0.002 gnn.train.dropout 0.75

To use different types of features

# Our enriched features
python -m core.trainEnsemble gnn.train.feature_type TA_P_E

# Our individual features
python -m core.trainGNN gnn.train.feature_type TA
python -m core.trainGNN gnn.train.feature_type E
python -m core.trainGNN gnn.train.feature_type P

# OGB features
python -m core.trainGNN gnn.train.feature_type ogb

4. Reproducibility

Use run.sh to run the codes and reproduce the published results.

This repository also provides the checkpoints for all trained models (*.ckpt) and the TAPE features (*.emb) used in the project. Please donwload them here.

arxiv-2023 dataset

The codes for constructing and processing the arxiv-2023 dataset are provided here.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
core		core
dataset		dataset
gpt_preds		gpt_preds
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
overview.svg		overview.svg
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core

core

dataset

dataset

gpt_preds

gpt_preds

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

overview.svg

overview.svg

run.sh

run.sh

Repository files navigation

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Citation

0. Python environment setup with Conda

1. Download TAG datasets

A. Original text attributes

B. LLM responses

2. Fine-tuning the LMs

To use the orginal text attributes

To use the GPT responses

3. Training the GNNs

To use different GNN models

To use different types of features

4. Reproducibility

arxiv-2023 dataset

About

Releases

Packages

Languages

License

XiaoxinHe/TAPE

Folders and files

Latest commit

History

Repository files navigation

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Citation

0. Python environment setup with Conda

1. Download TAG datasets

A. Original text attributes

B. LLM responses

2. Fine-tuning the LMs

To use the orginal text attributes

To use the GPT responses

3. Training the GNNs

To use different GNN models

To use different types of features

4. Reproducibility

arxiv-2023 dataset

About

Resources

License

Stars

Watchers

Forks

Languages