SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification

What is Patronizing and Condescending Language (PCL)?

We all are patronizing and condescending sometimes. And of course, we all are susceptible to be condescended and patronized by others. But some groups are, unfortunately, more used to be referred to with this undervaluing treatment. The so-called vulnerable communities seem to be the perfect target for charity and pity-driven texts, condescension and patronization in news stories.

PCL is often involuntary and unconscious and the authors using such language are usually trying to help the communities in need, by raising awareness, moving the audience to action or standing for the rights of the under-represented. But PCL can potentially be very harmful, as it feeds tereotypes, routinizes discrimination and drives to greater exclusion.

For more details about the task check out here.

Abstract

This paper presents a combination of data aug- mentation methods to boost the performance of state-of-the-art transformer-based language models for Patronizing and Condescending Language (PCL) detection and multi-label PCL classification tasks. These tasks are inherently different from sentiment analysis because posi- tive/negative hidden attitudes in the context will not necessarily be considered positive/negative for PCL tasks. Our approach relies on fine- tuning pretrained RoBERTa and GPT3 mod- els such as Davinci and Curie engines with extra-enriched PCL dataset. We augmented the underrepresented class of annotated data to achieve competitive results among top-16 SemEval-2022 participants. Furthermore, we discuss Few-Shot learning technique to over- come the limitation of low-resource NLP prob- lems.

Software implementation

All source code used to generate the results and figures in the paper are in the code folder. The calculations and figure generation are all run inside Jupyter notebooks. The data used in this study is available upon request.

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/daniel-saeedi/PCL_Detection_SemEval2022.git

or download a zip archive.

Dependencies

Run this command to install dependencies:

pip3 install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
code		code
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

.DS_Store

.DS_Store

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification

What is Patronizing and Condescending Language (PCL)?

Abstract

Software implementation

Getting the code

Dependencies

About

Releases

Packages

Languages

License

daniel-saeedi/PCL_Detection_SemEval2022

Folders and files

Latest commit

History

Repository files navigation

SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification

What is Patronizing and Condescending Language (PCL)?

Abstract

Software implementation

Getting the code

Dependencies

About

Resources

License

Stars

Watchers

Forks

Languages