Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches

Abstract

Emotion recognition is one of the machine learning applications which can be done using text, speech, or image data gathered from social media spaces. Detecting emotion can help us in different fields, including opinion mining. With the spread of social media, different platforms like Twitter have become data sources, and the language used in these platforms is informal, making the emotion detection task difficult. EmoPars and ArmanEmo are two new human-labeled emotion datasets for the Persian language. These datasets, especially EmoPars, are suffering from inequality between several samples between two classes. In this paper, we evaluate EmoPars and compare them with ArmanEmo. Throughout this analysis, we use data augmentation techniques, data re-sampling, and class-weights with Transformer-based Pretrained Language Models(PLMs) to handle the imbalance problem of these datasets. Moreover, feature selection is used to enhance the models' performance by emphasizing the text's specific features. In addition, we provide a new policy for selecting data from EmoPars, which selects the high-confidence samples; as a result, the model does not see samples that do not have specific emotion during training. Our model reaches a Macro-averaged F1-score of 0.81 and 0.76 on ArmanEmo and EmoPars, respectively, which are new state-of-the-art results in these benchmarks.

Files

|
|__ augmentation: notebook used for data augmentation
|__ augmented datasets: datasets with augmented samples
|__ data analysis: data analysis notebook
|__ dataset modifier: notebook used to create datasets using thresholds or removing uncertain samples
|__ main dataset: includes EmoPars and ArmanEmo datasets
|__ modified datasets: result of dataset modifier notebook
|__ models: files to create binary classifiers
   |
   |__ data: dictionary used to detect mispelled words
   |__ multilabel: files to train multilabel classifier

Binary Classifer

If you want to make any changes in training the model including using F1CE loss function or using different hyperparameteres, change the related files which in this instance, they are hyperparameteres.py and f1ce_loss.py.

Furthermore, the feature extraction is not embedded in the main model and you need to use methods in feature_extraction.py file to add the features at the end of each sample. Preprocess in embedded in the file.

Finally to train the model use the following command:

python3 binary_classifier.py \\
   [target_emotion_label(Anger/Happiness/Hatred/Wonder/Sadness/Fear)] [data_address] [model_name]

For help you can use:

python3 main.py -h

Multilabel Classifer

Again you may change file like hyperparameteres.py for some minor changes. Preprocess in embedded in the file.

Finally to train the model use the following command:

python3 binary_classifier.py \\
   [data_address] [model_name] [threshold]

For help you can use:

python3 main.py -h

Address to PLM Used in This Study

Results

Citation

@misc{https://doi.org/10.48550/arxiv.2211.08029,
  doi = {10.48550/ARXIV.2211.08029},
  url = {https://arxiv.org/abs/2211.08029},
  author = {Abaskohi, Amirhossein and Sabri, Nazanin and Bahrak, Behnam},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches

Abstract

Files

Binary Classifer

Multilabel Classifer

Address to PLM Used in This Study

Results

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
augmentation		augmentation
augmented datasets		augmented datasets
data analysis		data analysis
dataset modifier		dataset modifier
main dataset		main dataset
models		models
modified datasets		modified datasets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

AmirAbaskohi/Persian-Emotion-Detection-using-ParsBERT-and-Imbalanced-Data-Handling-Approaches

Folders and files

Latest commit

History

Repository files navigation

Persian Emotion Detection using ParsBERT and Imbalanced Data Handling Approaches

Abstract

Files

Binary Classifer

Multilabel Classifer

Address to PLM Used in This Study

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages