Skip to content

Hope-Rita/THLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Data Preparation

Please download the datasets from DatasetsForTHLM , and put it into ./Data

Model Pretraining

Example of training THLM on Patents dataset:

python main.py --dataset_name Patents

Get node embeddings

Obtain node embeddings for Patents, GoodReads and OAG_Venue in ./Downstream/preprocess_data

Example of obtaining node embeddings for Patents:

python Patent_features.py

Model Evaluation

  • Link Prediction for OAG_Venue: ./Downstream/Link-Train-OAG

  • Link Prediction for Patents/GoodReads: ./Downstream/Link-Train-Patent

  • Node Classification for OAG_Venue: ./Downstream/train-OAG

  • Node Classification for Patents/GoodReads: ./Downstream/train-Patent

Pre-trained Language Models

We also provide the pre-trained language models on these three datasets at HuggingFace.

Environments:

  • PyTorch 2.0.0
  • transformers 4.23.1
  • dgl 0.9.1
  • tqdm
  • numpy

About

Codes for Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages