Skip to content

ParishadBehnam/MG-BERT

Repository files navigation

MG-BERT

This is the source code of our paper, Parishad BehnamGhader, Hossein Zakerinia, Mahdieh Soleymani Baghshah. "MG-BERT: Multi Graph Augmented BERT in Masked Language Modeling." In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), pp. 125-131. 2021.


Datasets

You may use the CoLA and SST datasets from the GLUE repository and the Brown dataset from the Brown Corpus Manual to train and assess your models. The WN18 Knowledge Graph can also get accessed through this repository.


Running the code

In order to run this code, first use the prepare_data.py to create the multi-graphs based on the corpus.

python prepare_data.py --dataset cola --kg WN11  

Then, you can train your MG-BERT model using train.py. Here, we train a model using a multi-graph consisting of tf-idf, pmi, and KG graphs.

python train.py --dataset cola --kg WN11 --dyn 0.8 --graph-mode 123 --epoch 100  

Finally, evaluate your final model (via Hits@k metrics) via evaluate.py.

python evaluate.py --dataset cola --kg WN11 --dyn 0.8 --graph-mode 123 --epoch 100

Some parts of this project were originally implemented in the VGCN-BERT repository and Huggingface transformers.

About

The source code for "MG-BERT: Multi-Graph Augmented BERT for Masked Language Modeling" paper (NAACL 2021, TextGraphs-15).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages