UniGLM

Official code for "UniGLM: Training One Unified Language Model for Text-Attributed Graphs".

Pipeline of the UniGLM

Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for individual TAG and cannot generalize across various graph scenarios. Given the shared textual space, leveraging multiple TAGs for joint fine-tuning, aligning text and graph structure from different aspects, would be more beneficial. Motivated by this, we introduce a novel Unified Graph Language Model (UniGLM) framework, the first graph embedding model that generalizes well to both in-domain and cross-domain TAGs. Specifically, UniGLM is trained over multiple TAGs with different domains and scales using self-supervised contrastive learning. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training by minimizing repetitive encoding calculations. Extensive empirical results across 9 benchmark TAGs demonstrate UniGLM's efficacy against leading embedding baselines in terms of generalization (various downstream tasks and backbones) and transfer learning (in and out of domain scenarios).

🚀Quick Start

To train and evaluate UniGLM with your own datasets, there are three steps.

Model Training

Use run_Mix.sh to train your own model. Datasets and folder names can be changed.

Embedding Generation

Use run_emb.sh and run_emb_transfer.sh to generate embeddings.

GNNs Evaluation

Use run_GNN.sh to run evaluations on different GNNs.

Reproducibility

Model weight is available at deltayf/UniGLM.

Citation

If you find this repo useful, please star the repo and cite:

@article{fang2024uniglm,
  title={UniGLM: Training One Unified Language Model for Text-Attributed Graphs},
  author={Yi Fang and Dongzhe Fan and Sirui Ding and Ninghao Liu and Qiaoyu Tan},
  journal={arXiv preprint arXiv:2406.12052},
  year={2024},
  url={https://arxiv.org/abs/2406.12052}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
core		core
img		img
MIT-LICENSE		MIT-LICENSE
README.md		README.md
run_GNN.sh		run_GNN.sh
run_Mix.sh		run_Mix.sh
run_emb.sh		run_emb.sh
run_emb_transfer.sh		run_emb_transfer.sh
run_woE.sh		run_woE.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniGLM

Pipeline of the UniGLM

🚀Quick Start

Model Training

Embedding Generation

GNNs Evaluation

Reproducibility

Citation

About

Releases

Packages

Languages

License

NYUSHCS/UniGLM

Folders and files

Latest commit

History

Repository files navigation

UniGLM

Pipeline of the UniGLM

🚀Quick Start

Model Training

Embedding Generation

GNNs Evaluation

Reproducibility

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages