Reuters Text Classification

We use the Reuters-21578 Text Categorization Collection to perform multi-label text classification with Transformer-based models. Given a Reuters news article, we need to classify it into one or more topic.

The dataset is imbalanced, and contains data of variable length (as illustrated by the tables below). As such, we leverage methods like Inverse Class Frequency to address the dataset imbalance.

Distribution of Tokens (using bert-base-cased):

Metric	Value
Average	121.95
Standard deviation	109.25
Maximum	2976
Minimum	4

Distribution of Topics:

Topic	Count
[earn]	3687
[acq]	1994
[crude]	326
[trade]	307
[money-fx]	243
...	...

Results

BERT

After finetuning the last two layers of a BERT-cased model + an added dense layer for 22 epochs (refer to train_bert.py), we have the following Validation Data result:

Metric	Value
Exact-Match Accuracy	0.9065
Hamming Loss	0.0026
Micro-F1	0.9426
Macro-F1	0.833

XLNet

After finetuning the last two layers of a XLNet-cased model + an added dense layer for 12 epochs (refer to train_xlnet.py), we have the following Validation Data result:

Metric	Value
Exact-Match Accuracy	0.923
Hamming Loss	0.0019
Micro-F1	0.956
Macro-F1	0.905

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
dataset.py		dataset.py
eda.ipynb		eda.ipynb
requirements.txt		requirements.txt
train_bert.py		train_bert.py
train_common.py		train_common.py
train_xlnet.py		train_xlnet.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.json

config.json

dataset.py

dataset.py

eda.ipynb

eda.ipynb

requirements.txt

requirements.txt

train_bert.py

train_bert.py

train_common.py

train_common.py

train_xlnet.py

train_xlnet.py

utils.py

utils.py

Repository files navigation

Reuters Text Classification

Results

BERT

XLNet

About

Releases

Packages

Languages

License

danaiamirali/reuters-text-classification

Folders and files

Latest commit

History

Repository files navigation

Reuters Text Classification

Results

BERT

XLNet

About

Topics

Resources

License

Stars

Watchers

Forks

Languages