Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Data Preparation

Please download the datasets from DatasetsForTHLM , and put it into ./Data

Model Pretraining

Example of training THLM on Patents dataset:

python main.py --dataset_name Patents

Get node embeddings

Obtain node embeddings for Patents, GoodReads and OAG_Venue in ./Downstream/preprocess_data

Example of obtaining node embeddings for Patents:

python Patent_features.py

Model Evaluation

Link Prediction for OAG_Venue: ./Downstream/Link-Train-OAG
Link Prediction for Patents/GoodReads: ./Downstream/Link-Train-Patent
Node Classification for OAG_Venue: ./Downstream/train-OAG
Node Classification for Patents/GoodReads: ./Downstream/train-Patent

Pre-trained Language Models

We also provide the pre-trained language models on these three datasets at HuggingFace.

Environments:

PyTorch 2.0.0
transformers 4.23.1
dgl 0.9.1
tqdm
numpy

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Downstream		Downstream
Modules		Modules
R_HGNN		R_HGNN
config		config
utils		utils
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downstream

Downstream

Modules

Modules

R_HGNN

R_HGNN

config

config

utils

utils

README.md

README.md

main.py

main.py

Repository files navigation

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Data Preparation

Model Pretraining

Get node embeddings

Model Evaluation

Pre-trained Language Models

Environments:

About

Releases

Packages

Languages

Hope-Rita/THLM

Folders and files

Latest commit

History

Repository files navigation

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Data Preparation

Model Pretraining

Get node embeddings

Model Evaluation

Pre-trained Language Models

Environments:

About

Resources

Stars

Watchers

Forks

Languages