Skip to content

avishek-018/TransEmoC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub last commit GitHub code size in bytes GitHub repo size

TransEmoC: A Transformer-Based Unified Toolkit for Emotion Classification in a Low Resource Language

This git repo is a part of the Project Emotion Classification in a Resource Constrained Language. The proposed model described in the corresponding paper is wrapped up in this git project for reuse by developers and researchers.

Find the reproducible capsule here.

Usage

  1. Clone the github repository
git clone https://github.com/avishek-018/TransEmoC.git
  1. Change directory
cd TransEmoC

If you are using colab ignore steps above and proceed with the following code snippet

!git clone https://github.com/avishek-018/TransEmoC.git
!mv  -v TransEmoC/* /content
!rm -rf TransEmoC
  1. Install the requirements
pip install -r requirements.txt
  1. Run train.py with you data and specify the hyperparameter(optional)
python train.py  --train <"path to train data(xlsx/csv)"> --val <"path to val data(xlsx/csv)"> --text_col <"text data column name"> --model <"model name"> --label_col  <"class/label column name"> --epochs <num of epochs> --batch_size <batch size> --lr <learning rate> --maxlen <max text length>

Required arguments:

  • --train
  • --val
  • --text_col
  • --label_col

And the optional arguments are (otherwise proposed hyperpameters are used)

  • --model
  • --epochs
  • --batch_size
  • --lr
  • --maxlen
Exmaple for this repository:
python train.py  --train "data/train.xlsx" --val "data/val.xlsx" --text_col "cleaned" --label_col "classes" --model "xlm-roberta-base" --epochs 20 --batch_size 12 --lr 2e-5 --maxlen 70
  1. Run evaluate.py to evaluate your model with test data
python evaluate.py --test <"path to test data(xlsx/csv)"> --text_col <"text data column name"> --label_col "classes" --model <"/path/to/model"> 

Required arguments:

  • --test
  • --text_col
  • --label_col

And the optional arguments are (otherwise default model location used)

  • --model
Exmaple for this repository:
python evaluate.py --test "data/test.xlsx" --text_col "cleaned" --label_col "classes" --model "model" 

Output

A result.txt file is generated

  1. Run predict.py to get the prediction for a text
python predict.py --sentence "your text goes here" --model <"/path/to/model"> 

Exmaple for this repository:

python predict.py --sentence "ছেলেটা তিন দিন ধরে না খেয়ে আছে" --model "model" 

Output:

Cite this work

If you find this repository helpful in your work please cite the following

@inproceedings{das-etal-2021-emotion,
    title = "Emotion Classification in a Resource Constrained Language Using Transformer-based Approach",
    author = "Das, Avishek  and
      Sharif, Omar  and
      Hoque, Mohammed Moshiul  and
      Sarker, Iqbal H.",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-srw.19",
    pages = "150--158",
    abstract = "Although research on emotion classification has significantly progressed in high-resource languages, it is still infancy for resource-constrained languages like Bengali. However, unavailability of necessary language processing tools and deficiency of benchmark corpora makes the emotion classification task in Bengali more challenging and complicated. This work proposes a transformer-based technique to classify the Bengali text into one of the six basic emotions: anger, fear, disgust, sadness, joy, and surprise. A Bengali emotion corpus consists of 6243 texts is developed for the classification task. Experimentation carried out using various machine learning (LR, RF, MNB, SVM), deep neural networks (CNN, BiLSTM, CNN+BiLSTM) and transformer (Bangla-BERT, m-BERT, XLM-R) based approaches. Experimental outcomes indicate that XLM-R outdoes all other techniques by achieving the highest weighted f{\_}1-score of 69.73{\%} on the test data.",
}

Note

If you find any anomaly or have any query/suggestion feel free to ping. Pull requests are welcome.