Skip to content
Sentence Embeddings used in the GermEval-2017 Submission
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE.txt
NOTICE.txt
README.md

README.md

UKP TU-DA at GermEval 2017: Deep Learning for Aspect Based Sentiment Detection

GermEval-2017 : Shared Task on Aspect-based Sentiment in Social Media Customer Feedback

This is the repository to our experiments for the GermEval2017 shared task reported in Lee et al., UKP TU-DA at GermEval 2017: Deep Learning for Aspect Based Sentiment Detection.

We provide the German sentence embeddings trained with sent2vec using Wikipedia, Twitter, and the shared task data as well as information about how to use them.

The base code for the ensemble classifier we used in subtasks A and B can be found here.

For access to the multi-task learning framework we used for subtasks C and D, please contact us. Our implementation was based on this TensorFlow framework.

Please use the following citation:

@inproceedings{Lee:2017,
	title = {UKP TU-DA at GermEval 2017: Deep Learning for Aspect Based Sentiment Detection},
	author = {Lee, Ji-Ung and Eger, Steffen and Daxenberger, Johannes and Gurevych, Iryna},
	organization = {German Society for Computational Linguistics},
	booktitle = {Proceedings of the  GSCL GermEval Shared Task on Aspect-based Sentiment in Social Media Customer Feedback},
	pages = {22--29},
	month = sep,
	year = {2017},
	location = {Berlin, Germany},
}

Abstract: This paper describes our submissions to the GermEval 2017 Shared Task, which focused on the analysis of customer feedback about the Deutsche Bahn AG. We used sentence embeddings and an ensemble of classifiers for two sub-tasks as well as state-of-the-art sequence taggers for two other sub-tasks.

Contact persons:

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

Due to a big file size the embeddings are not stored in this repository. You can find them here:

The embeddings were trained on the shared task data, Wikipedia data, and Tweets from the German Sentiment Corpus.

Embedding dimensions are 500, 700, and 1000, as specified in their names and were trained with the following parameters: -minCount 10 -epoch 5 -lr 0.2 -wordNgrams 2 -loss ns -neg 10 -thread 5 -t 0.0001 -dropoutK 2 -bucket 2000000

Requirements

Using the embeddings

On Linux you can unpack the embeddings with:

$tar --lzma -xvf ../path-to-model/model.bin.tar.lzma

For obtaining sentence embeddings from the sent2vec models do:

$./fasttext print-sentence-vectors ../path-to-model/model.bin < input-sentences.txt > embedding-vectors.txt

References

A Twitter Corpus and Benchmark Resources for German Sentiment Analysis.

Mark Cieliebak, Jan Deriu, Fatih Uzdilli, and Dominic Egger. In “Proceedings of the 4th International Workshop on Natural Language Processing for Social Media (SocialNLP 2017)”, Valencia, Spain, 2017

Polyglot: Distributed Word Representations for Multilingual NLP

Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. In “Proceedings Seventeenth Conference on Computational Natural Language Learning (CoNLL 2013)”, Sofia, Bulgaria, 2013

You can’t perform that action at this time.