Skip to content
code for our 2019 paper: "Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification"
Python
Branch: master
Clone or download
xelda1988 Merge pull request #1 from Paethon/patch-1
Added state of the art badge
Latest commit 65446a8 Sep 10, 2019

README.md

PWC

Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification

code for our 2019 paper: "Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification"

Installation

python -m venv venv
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Preparing data for BERT Language Model Finetuning

We make use of two publicly available research datasets for the domains laptops and restaurants:

Download these datasets and put them into the data/raw folder.

To prepare the data for language model finetuning run the following python scripts:

python prepare_laptop_reviews.py
python prepare_restaurant_reviews.py
python prepare_restaurant_reviews.py --large  # takes some time to finish

Measure the number of non-zero lines to get the exact amount of sentences

cat data/transformed/restaurant_corpus_1000000.txt | sed '/^\s*$/d' | wc -l
# Rename the corpora files postfix to the actual number of sentences
# e.g  restaurant_corpus_1000000.txt -> restaurant_corpus_1000004.txt

Concatenate laptop corpus and the small restaurant corpus to create the mixed corpus (restaurants + laptops)

cd data/transformed
cat laptop_corpus_1011255.txt restaurant_corpus_1000004.txt > mixed_corpus.txt

Preparing SemEval 2014 Task 4 Dataset for Experiments

Download all the SemEval 2014 Task 4 datasets from: http://metashare.ilsp.gr:8080/repository/search/?q=semeval+2014 into

data/raw/semeval2014/

and unpack the archives. Create the preprocessed datasets using the following commands

Laptops

# laptops

# laptops without conflict label
python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Train Data v2.0 & Annotation Guidelines/Laptop_Train_v2.xml" \
--output_dir data/transformed/laptops_noconfl \
--istrain \
--noconfl

python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Test Data - Gold Annotations/ABSA_Gold_TestData/Laptops_Test_Gold.xml" \
--output_dir data/transformed/laptops_noconfl \
--noconfl

Restaurants

# restaurants without conflict label
python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Train Data v2.0 & Annotation Guidelines/Restaurants_Train_v2.xml" \
--output_dir data/transformed/restaurants_noconfl \
--istrain \
--noconfl

python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Test Data - Gold Annotations/ABSA_Gold_TestData/Restaurants_Test_Gold.xml" \
--output_dir data/transformed/restaurants_noconfl \
--noconfl

Mixed

# mixed without conflict label
python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Train Data v2.0 & Annotation Guidelines/Restaurants_Train_v2.xml" \
"data/raw/semeval2014/SemEval-2014 ABSA Train Data v2.0 & Annotation Guidelines/Laptop_Train_v2.xml" \
--output_dir data/transformed/mixed_noconfl \
--istrain --noconfl

python prepare_semeval_datasets.py \
--files "data/raw/semeval2014/SemEval-2014 ABSA Test Data - Gold Annotations/ABSA_Gold_TestData/Restaurants_Test_Gold.xml" \
"data/raw/semeval2014/SemEval-2014 ABSA Test Data - Gold Annotations/ABSA_Gold_TestData/Laptops_Test_Gold.xml" \
--output_dir data/transformed/mixed_noconfl --noconfl

BERT Language Model Finetuning and Down-Stream Classification

The LM finetuning is performed with a script from the huggingface/pytorch-transformers repository:

Down-stream task-specific finetuning was performed with an adaption to this script:

If you use this work, please cite our paper using the following Bibtex tag:

@article{rietzler2019adapt,
   title={Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification},
   author={Rietzler, Alexander and Stabinger, Sebastian and Opitz, Paul and Engl, Stefan},
   journal={arXiv preprint arXiv:1908.11860},
   year={2019}
}
You can’t perform that action at this time.