Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation

Lotem Peled, Roi Reichart (pdf)

Overview

This repository contains the Sarcasm SIGN dataset, a parallel corpus of sarcastic tweets and their non-sarcastic interpretations, as created by human experts. This corpus was created as part of our paper Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation which will be presented in ACL 2017. The repository contains two folders: "corpus" which contains the data files as well as the instructions for our human experts; and "preprocess" which contains code for preprocessing the data and preparing it for a MT system (see ReadMe in preprocess folder).

Characteristics

The Sarcasm SIGN dataset is comprised of 3000 sarcastic tweets (tweets marked with #sarcasm), which are written in English, are not retweets, and do not contain URLs or images. Each sarcastic tweet has five different non sarcastic interpretation. The average sarcastic tweet length is 13.87 words, average interpretation length is 12.10 words and the vocabulary size is 8788 unique words. Following are two examples from our dataset:

Further information regarding the dataset and the instructions given to the human experts can be found in the "corpus" folder.

Future Research

We engourage researchers to send us their algorithms and results, and we will present them here.

Citation

If you use the Sarcasm SIGN dataset and/or algorithm, please cite the following:

Peled, Lotem, and Roi Reichart. "Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation." (ACL 2017).

Contact

For any questions, inquiries or interesting ideas, feel free to contact us.

Lotem: lotemi.peled@gmail.com || https://sites.google.com/view/lotempeled/

Roi: roiri@ie.technion.ac.il || https://ie.technion.ac.il/~roiri/

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
corpus		corpus
preprocess		preprocess
README.md		README.md
datasetExample.jpg		datasetExample.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

corpus

preprocess

preprocess

README.md

README.md

datasetExample.jpg

datasetExample.jpg

Repository files navigation

Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation

Overview

Characteristics

Future Research

Citation

Contact

About

Releases

Packages

Lotemp/SarcasmSIGN

Folders and files

Latest commit

History

Repository files navigation

Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation

Overview

Characteristics

Future Research

Citation

Contact

About

Resources

Stars

Watchers

Forks