Skip to content

daniel-saeedi/PCL_Detection_SemEval2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification

License: MIT

What is Patronizing and Condescending Language (PCL)?

We all are patronizing and condescending sometimes. And of course, we all are susceptible to be condescended and patronized by others. But some groups are, unfortunately, more used to be referred to with this undervaluing treatment. The so-called vulnerable communities seem to be the perfect target for charity and pity-driven texts, condescension and patronization in news stories.

PCL is often involuntary and unconscious and the authors using such language are usually trying to help the communities in need, by raising awareness, moving the audience to action or standing for the rights of the under-represented. But PCL can potentially be very harmful, as it feeds tereotypes, routinizes discrimination and drives to greater exclusion.

For more details about the task check out here.

Abstract

This paper presents a combination of data aug- mentation methods to boost the performance of state-of-the-art transformer-based language models for Patronizing and Condescending Language (PCL) detection and multi-label PCL classification tasks. These tasks are inherently different from sentiment analysis because posi- tive/negative hidden attitudes in the context will not necessarily be considered positive/negative for PCL tasks. Our approach relies on fine- tuning pretrained RoBERTa and GPT3 mod- els such as Davinci and Curie engines with extra-enriched PCL dataset. We augmented the underrepresented class of annotated data to achieve competitive results among top-16 SemEval-2022 participants. Furthermore, we discuss Few-Shot learning technique to over- come the limitation of low-resource NLP prob- lems.

Software implementation

All source code used to generate the results and figures in the paper are in the code folder. The calculations and figure generation are all run inside Jupyter notebooks. The data used in this study is available upon request.

Getting the code

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/daniel-saeedi/PCL_Detection_SemEval2022.git

or download a zip archive.

Dependencies

Run this command to install dependencies:

pip3 install -r requirements.txt

About

Patronizing and Condescending Language Detection (SemEval 2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published