This repository contains codes for our paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction” in *Sem 2023 .
Aspect Sentiment Quad Prediction (ASQP)
- Given a sentence, the task aims to predict all sentiment quads (aspect category, aspect term, opinion term, sentiment polarity)
Aspect Sentiment Triplet Extraction (ASTE)
- Given a sentence, the task aims to predict all sentiment triplets (aspect term, opinion term, sentiment polarity)
-
We propose the synthesis of diverse parallel data using a Q2T model for ASQP.
- Build and Augment sentiment quad sets based on original quad label collection.
- Randomly sample quads as input of the Q2T model.
- Generate review text.
- Generated review text + Sampled input quads -> augmented parallel data.
-
We propose a data filtering strategy to remove low-quality augmented data.
- check the consistency between input quads with generated review text.
- check the word usage in context part of the review text
-
We propose a measurement to evaluate the difficulty of the augmented samples, which is used to balance the augmented dataset.
- The measurement is Average Context Inverse Document Frequency (AC-IDF).
- we make the difficulty of augmented dataset following union distribution based on the proposed measurement.
pytorch transformers pytorch_lightning torchmetrics
- Set up the environment
- Data Augmentation
- AugABSA/scripts/data2text/main.py
- Train Q2T model using ASQP/ASTE training dataset
- Synthesize parallel data
- Filtering and balancing
- AugABSA/scripts/data2text/main.py
- Train T2Q model using both training dataset and augmented dataset
- AugABSA/scripts/text2data/main.py