Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md Update READMEs (#13) Apr 19, 2019
__init__.py Integrate BERT into Hedwig (#29) (#11) Apr 14, 2019
__main__.py Remove unwanted args from models/bert (#15) Apr 29, 2019
args.py Remove unwanted args from models/bert (#15) Apr 29, 2019
model.py Integrate BERT into Hedwig (#29) (#11) Apr 14, 2019

README.md

DocBERT

Finetuning the pre-trained BERT models for Document Classification tasks.

Quick start

For fine-tuning the pre-trained BERT-base model on Reuters dataset, just run the following from the project working directory.

python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30

The best model weights will be saved in

models/bert/saves/Reuters/best_model.pt

To test the model, you can use the following command.

python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30 --trained-model models/bert/saves/Reuters/best_model.pt

Model Types

We follow the same types of models as in huggingface's implementation

  • bert-base-uncased
  • bert-large-uncased
  • bert-base-cased
  • bert-large-cased

Dataset

We experiment the model on the following datasets:

  • Reuters (ModApte)
  • AAPD
  • IMDB
  • Yelp 2014

Settings

Finetuning procedure can be found in :

Acknowledgement

You can’t perform that action at this time.