Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
corpus Update ReadMe.md Apr 25, 2017
preprocess Update ReadMe.md Apr 25, 2017
README.md Update README.md May 7, 2017
datasetExample.jpg Add files via upload Apr 23, 2017

README.md

Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation

Lotem Peled, Roi Reichart (pdf)

Overview

This repository contains the Sarcasm SIGN dataset, a parallel corpus of sarcastic tweets and their non-sarcastic interpretations, as created by human experts. This corpus was created as part of our paper Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation which will be presented in ACL 2017. The repository contains two folders: "corpus" which contains the data files as well as the instructions for our human experts; and "preprocess" which contains code for preprocessing the data and preparing it for a MT system (see ReadMe in preprocess folder).

Characteristics

The Sarcasm SIGN dataset is comprised of 3000 sarcastic tweets (tweets marked with #sarcasm), which are written in English, are not retweets, and do not contain URLs or images. Each sarcastic tweet has five different non sarcastic interpretation. The average sarcastic tweet length is 13.87 words, average interpretation length is 12.10 words and the vocabulary size is 8788 unique words. Following are two examples from our dataset:

Screenshot

Further information regarding the dataset and the instructions given to the human experts can be found in the "corpus" folder.

Future Research

We engourage researchers to send us their algorithms and results, and we will present them here.

Citation

If you use the Sarcasm SIGN dataset and/or algorithm, please cite the following:

Peled, Lotem, and Roi Reichart. "Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation." (ACL 2017).

Contact

For any questions, inquiries or interesting ideas, feel free to contact us.

Lotem: lotemi.peled@gmail.com || https://sites.google.com/view/lotempeled/

Roi: roiri@ie.technion.ac.il || https://ie.technion.ac.il/~roiri/