This repository contains the source code and datasets for MLMP: Metapath-enhanced Language Model Pretraining on Text-Attributed Heterogeneous Graphs.
Download processed data. To reproduce the results in our paper, you need to first download the processed datasets. You need to also download bert-base-cased and put them into ./data.
You need to execute ./data/data_process.ipynb for OAG-Venue dataset and ./data/data_process_googreads.ipynb for GoodReads dataset.
Pretraining in ./pretrain.
sh run.sh
Run node classification in ./downstream/node-classification.
sh run.sh
Run link prediction in ./downstream/link-predict.
sh run.sh