transformer_for_textclassification

Transformer Architecture

Fine-tune Tricks

The demo apply transformer decoder to finish text classification task. Actually, it's hard to adjust the parameter value of model to proper one. Here, I draw a brief conclusion for how to effectively fine-tune the parameters.

Adam optimization with learning rate of 1.5e-4 (1e-4 magnitude) and learning rate decay schedule with exponential decay of each epoch. If lr is set larger(1e-3 magnitude), the result is not well.
I try to add a BiLSTM layer on th top of transformer decoder, but get a worse result. I personally think it's due to dismatch learning rate between decoder layer and lstm layer, which make it hard to fine tune their parameters well simultaneously. And, it further confirms a fact that "unsupervised pre-training + supervised fine-tuning" mode is effective and well done nowadays.
For activation function, GeLU seems works worse than ReLU (maybe due to limited corpus).
For regularization, dropout with a rate of 0.1 (but not in interval 0.2 to 0.5) and label smoothing for better generalization.
x + dropout(sublayer(norm(x))) give a better result than norm(x + dropout(sublayer(x)))
For the position-wise feed-foward network, you can set the inner state dimensionality larger. (1024 d_ff for 300 d_model)
For multi-head self-attention network, more heads, better performance.
The bigger data, the better effect!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
conf		conf
dataloader		dataloader
imgs		imgs
modules		modules
vocab		vocab
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classifier.py		classifier.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

transformer_for_textclassification

Transformer Architecture

Fine-tune Tricks

About

Releases

Packages

Languages

License

LindgeW/transformer_for_textclassification

Folders and files

Latest commit

History

Repository files navigation

transformer_for_textclassification

Transformer Architecture

Fine-tune Tricks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages