Skip to content
Permalink
Browse files

Update READMEs (#13)

  • Loading branch information...
Ashutosh-Adhikari committed Apr 19, 2019
1 parent 7d24958 commit 284e2ddf9a1cd82090583edc7aa69c683c7463e2
Showing with 54 additions and 4 deletions.
  1. +4 −3 README.md
  2. +49 −0 models/bert/README.md
  3. +1 −1 models/reg_lstm/README.md
@@ -6,11 +6,12 @@ This repo contains PyTorch deep learning models for document classification, imp

## Models

+ [Kim CNN](models/kim_cnn/): CNNs for sentence classification [(Kim, EMNLP 2014)](http://www.aclweb.org/anthology/D14-1181)
+ [HAN](models/han/): Hierarchical Attention Networks [(Zichao, et al, NAACL 2016)](https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf)
+ [Reg-LSTM](models/reg_lstm/): Regularized LSTM for document classification [(Merity et al.)](https://arxiv.org/abs/1708.02182)
+ [DocBERT](models/bert/) : DocBERT: BERT for Document Classification [(Adhikari et al., 2019)](https://arxiv.org/abs/1904.08398v1)
+ [Reg-LSTM](models/reg_lstm/): Regularized LSTM for document classification [(Merity et al., 2017)](https://arxiv.org/abs/1708.02182)
+ [XML-CNN](models/xml_cnn/): CNNs for extreme multi-label text classification [(Liu et al., SIGIR 2017)](http://nyc.lti.cs.cmu.edu/yiming/Publications/jliu-sigir17.pdf)
+ [HAN](models/han/): Hierarchical Attention Networks [(Zichao et al., NAACL 2016)](https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf)
+ [Char-CNN](models/char_cnn/): Character-level Convolutional Network [(Zhang et al., NIPS 2015)](http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf)
+ [Kim CNN](models/kim_cnn/): CNNs for sentence classification [(Kim, EMNLP 2014)](http://www.aclweb.org/anthology/D14-1181)

Each model directory has a `README.md` with further details.

@@ -0,0 +1,49 @@
# DocBERT

Finetuning the pre-trained [BERT](https://arxiv.org/abs/1810.04805) models for Document Classification tasks.

## Quick start

For fine-tuning the pre-trained BERT-base model on Reuters dataset, just run the following from the project working directory.

```
python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30
```

The best model weights will be saved in

```
models/bert/saves/Reuters/best_model.pt
```

To test the model, you can use the following command.

```
python -m models.bert --dataset Reuters --model bert-base-uncased --max-seq-length 256 --batch-size 16 --lr 2e-5 --epochs 30 --trained-model models/bert/saves/Reuters/best_model.pt
```

## Model Types

We follow the same types of models as in [huggingface's implementation](https://github.com/huggingface/pytorch-pretrained-BERT.git)
- bert-base-uncased
- bert-large-uncased
- bert-base-cased
- bert-large-cased

## Dataset

We experiment the model on the following datasets:

- Reuters (ModApte)
- AAPD
- IMDB
- Yelp 2014

## Settings

Finetuning procedure can be found in :
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
- [DocBERT: BERT for Document Classification](https://arxiv.org/abs/1904.08398v1)

## Acknowledgement
- Our implementation is inspired from [huggingface's implementation](https://github.com/huggingface/pytorch-pretrained-BERT.git)
@@ -13,7 +13,7 @@ python -m models.reg_lstm --dataset Reuters --mode static --batch-size 32 --lr 0
The best model weights will be saved in

```
model/reg_lstm/saves/Reuters/best_model.pt
models/reg_lstm/saves/Reuters/best_model.pt
```

To test the model, you can use the following command.

0 comments on commit 284e2dd

Please sign in to comment.
You can’t perform that action at this time.