GitHub - amazon-science/context-situated-pun-generation: This repository provides the dataset used in "Context-situated pun generation" by Jiao Sun, Anjali Narayan-Chen, Shereen Oraby, Shuyang Gao, Tagyoung Chung, Jing Huang, Yang Liu, and Nanyun Peng.

Context-Situated Pun Generation

Overview

This repository includes the collected dataset from "Context-Situated Pun Generation" appearing at EMNLP 2022 (paper available on amazon.science or arXiv).

The original SemEval 2017 Task 7 dataset (Miller et al., 2017) contains puns that are either homographic (exploiting polysemy) or heterographic (exploiting phonological similarity to another word). We sample puns that contain both sense annotations and pun word annotations from SemEval Task 7. From this set, we sample from the 500 most frequent pun word/alter word pairs (p_w, a_w) and randomly sample 100 unique context words C. Combining the sampled pun pairs and context words, we collect 4,552 (C, p_w, a_w) instances for annotation. Full details on the data collection can be found in the paper (see Citation section).

Sample Instance

The excerpt below shows a sample data instance:

context	        pun_word    alter_word	pun_word_sense	                                                                                        alter_word_sense	                                                                        new_pun     user_pun
25 cent,profit	charge	    charge	pay with a credit card; pay with plastic money; postpone payment by recording a purchase as a debt	energize a battery by passing a current through it in the direction opposite to discharge	yes	    The cashier said there was no charge for my battery.

Description of Fields

context: Context words C, represented as a comma-separated list of keyword phrases (our dataset).
pun_word: Pun word p_w (from SemEval 2017 Task 7).
alter_word: Alter word a_w (from SemEval 2017 Task 7).
pun_word_sense: Word sense information for the pun word S_{p_w} (retrieved from WordNet using SemEval annotated senses).
alter_word_sense: Word sense information for the alter word S_{a_w} (retrieved from WordNet using SemEval annotated senses).
new_pun: whether the annotator could come up with a new pun using the given context keywords and pun/alter words (our dataset).
user_pun: if new_pun is yes, the text of the human-written pun that incorporates both the context keywords and the pun word (our dataset).

Data File

In this repository, we release the full dataset of 4,552 annotated instances in the Context-SitUated Pun (CUP) dataset.

├── data
   └── context_situated_pun.csv (full dataset)

Security

See CONTRIBUTING for more information.

License

This library is licensed under the CC-BY-NC-4.0 License (see LICENSE).

Citation

If using this dataset in any relevant work, please cite the following papers:

The Context-SitUated Pun (CUP) dataset, Context-Situated Pun Generation, EMNLP 2022

@inproceedings{sun2022context,
  title = {Context-Situated Pun Generation},
  author = {Sun, Jiao and Narayan-Chen, Anjali and Oraby, Shereen and Gao, Shuyang and Chung, Tagyoung and Huang, Jing and Liu, Yang and Peng, Nanyun},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2022}
}

The original SemEval-2017 Task 7 Dataset, SemEval-2017 Task 7: Detection and Interpretation of English Puns, SemEval 2017 (CC-BY-NC License)

@inproceedings{miller-etal-2017-semeval,
    title = "{S}em{E}val-2017 Task 7: Detection and Interpretation of {E}nglish Puns",
    author = "Miller, Tristan  and
      Hempelmann, Christian  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 11th International Workshop on Semantic Evaluation ({S}em{E}val-2017)",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/S17-2005",
    doi = "10.18653/v1/S17-2005",
    pages = "58--68",
    abstract = "A pun is a form of wordplay in which a word suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word, for an intended humorous or rhetorical effect. Though a recurrent and expected feature in many discourse types, puns stymie traditional approaches to computational lexical semantics because they violate their one-sense-per-context assumption. This paper describes the first competitive evaluation for the automatic detection, location, and interpretation of puns. We describe the motivation for these tasks, the evaluation methods, and the manually annotated data set. Finally, we present an overview and discussion of the participating systems{'} methodologies, resources, and results.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context-Situated Pun Generation

Overview

Sample Instance

Description of Fields

Data File

Security

License

Citation

About

Releases

Packages

License

amazon-science/context-situated-pun-generation

Folders and files

Latest commit

History

Repository files navigation

Context-Situated Pun Generation

Overview

Sample Instance

Description of Fields

Data File

Security

License

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Packages