Skip to content

Official datasets of Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain [COLING'20]

License

Notifications You must be signed in to change notification settings

dmhyun/ALSAdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain

Official data of COLING'20 paper

Overview

We release large-scale datasets of users’ comments in two languages, English and Korean, for aspect-level sentiment analysis in automotive domain. The datasets consist of 58,000+ commentaspect pairs, which are the largest compared to existing datasets. In addition, this work covers new language (i.e., Korean) along with English for aspect-level sentiment analysis. We build the datasets from automotive domain to enable users (e.g., marketers in automotive companies) to analyze the voice of customers on automobiles.

Data comparison

We also provide baseline performances for future work by evaluating recent models on the released datasets.

Baseline performance

Data

We provide the data for research purpose only and the redistribution of the data is prohibited. Please contact us if you agree to the terms of use.

 Contact information: dm.hyun@postech.ac.kr

Pretrained Word Vectors

We also provide the word vectors trained with Word2Vec for each language.

English: Google Drive link

Korean: Google Drive link

Aspect-level sentiment classifiers

Refer to a repository here, which is based on PyTorch. Simply change the data in the repository with ours to check the performance.

Citation

If you use this repository for your work, please consider citing our paper:

 @inproceedings{hyun2020building,
  title={Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain},
  author={Hyun, Dongmin and Cho, Junsu and Yu, Hwanjo},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={961--966},
  year={2020}
}

About

Official datasets of Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain [COLING'20]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published