Skip to content

caskcsg/TextSmoothing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 

Repository files navigation

Source code for ACL 2022 paper: Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Our work mainly based on Data Augmentation using Pre-trained Transformer Models

Code contains implementation of the following data augmentation methods

  • TextSmoothing
  • EDA + TextSmoothing
  • Backtranslation + TextSmoothing
  • CBERT + TextSmoothing
  • BERT Prepend + TextSmoothing
  • GPT-2 Prepend + TextSmoothing
  • BART Prepend + TextSmoothing

DataSets

In paper, we use three datasets from following resources

Low-data regime experiment setup

Run src/utils/download_and_prepare_datasets.sh file to prepare all datsets. download_and_prepare_datasets.sh performs following steps

  1. Download data from github
  2. Replace numeric labels with text for STSA-2 and TREC dataset
  3. For a given dataset, creates 15 random splits of train and dev data.

Dependencies

To run this code, you need following dependencies

  • Pytorch 1.5
  • fairseq 0.9
  • transformers 2.9

How to run

To run data augmentation experiment for a given dataset, run bash script in scripts folder. For example, to run data augmentation on snips dataset,

  • run scripts/bart_snips_lower.sh for BART experiment
  • run scripts/bert_snips_lower.sh for rest of the data augmentation methods

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published