Skip to content

colinzhaoust/WinoWhy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WinoWhy

This is the github repo for ACL 2020 paper "WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge".

Dependency

Python 3.6, Pytorch 1.1

Introduction of WinoWhy

This repo includes the original Winograd Schema Challenge (WSC) dataset and 4095 WinoWhy reasons (15 for each WSC question) that could justify the pronoun coreference choices in WSC.

WinoWhy contains 3 sources of reasons: (1) Human; (2) Human Reverse; (3) Generation Model. Each WSC reason has 5 reasons from each source.

Here are the descriptions and examples of reasons from these sources. The examples are based on the WSC question: "The city councilmen refused the demonstrators a permit because they feared violence. Does the 'they' refer to 'the city councilmen' or 'the demonstrators'?". The reasons are based on the question "The 'they' refers to the city councilmen because...". The paired question of this WSC changes "feared" to "advocated".

Resource Description Example
Human Reasons provided by human beings. city councilmen are administrative so they are more likely to fear.
Human Reverse Human reasons for the paired WSC question. the demonstrators were the ones who needed a permit.
Generation Model The reasons generated by GPT-2 with the same question. they are under the command of Mayor James B. Gray.

Upon the collected reasons from humans and the second round annotation on their plausibility, valid reasons (at least 80% of the annotators agree that the reason justifies the answer to the WSC question) are then used to categorize what types of commonsense knowledge are needed to solve the WSC question. The selected knowledge types are as follows (notice that a question could require knowledge from multiple categories):

Name (# of question) Definition Example
Property (32) Knowledge about property of objects. ice is cold.
Object (82) Knowledge about objects. cats have ears.
Eventuality (88) Knowledge about eventualities. 'wake up' happens before 'open eyes'.
Spatial (64) Knowledge about spatial position. object at the back can be blocked.
Quantity (20) Knowledge about numbers. 2 is smaller than 10.
Others (48) All other knowledge. NA

In general, WinoWhy provide interesting and broad-covering reasons for the WSC questions. Human reasons to solve these pronoun coreference questions are creative. Humans annotators could answer the questions through giving specific definition on the concepts in the question, general and abstract explanation, or indirect tricks. GPT-2 reasons are usually valid English sentences yet invalid justification. However, the number of the reasons might be small due to the essence that WSC is a small dataset with delicately questions. Also, a careful use of the reasons could be studied since it is another challenge towards understanding the commonsense. We will keep working on improving the dataset quality.

Data Format of WinoWhy

There are two data files in the repo:

winowhy.json: the WSC dataset and corresponding WinoWhy questions.

cat_ref.json: the knowledge categories and indexes of corresponding WSC questions.

WinoWhy Dataset

Datatset: a list of 273 WSC questions.

    WSC Question: a dictionary. The keys: values are:

        "text": a dictionary of the orginal WSC text. The keys: values are:

            "txt1": a string of text before the pronoun;

            "pron": a string of the target pronoun;

            "txt2": a string of text after the pronoun;

        "answers": a list of strings of candidate answer spans;

        "correctAnswer": A or B;

        "source": original wsc source;

        "reasons": a list of the WinoWhy reasons:

            WinoWhy Reason: a list of a reason info:

                reason[0]: reason text;

                reason[1]: reason source (human, gpt, reverse);

                reason[2]: reason plausibility;

                reason[3]: reason label (Valid, Invalid, Undecided)

Category Reference

Dataset: a dictionary. The key: values are:

    "Property", "Object", "Eventuality", "Spatial", "Quantity", "Others": a list of the indexes of the WSC questions.

Application of WinoWhy

Unsupervised

We can first connect the question and the reason as a single sentence by adding a few words between them (e.g., WSC Question+" The 'they' refers to the city councilmen because "+ Reason). Then we can put the sentence into the models and take the returned probability as the prediction.

Supervised

Similarly, we can regard WinoWhy as a binary classification problem which requires the model to distinguish the valid/invalid reasons through supervised learning. You can run the code for supervised learning by python supervised_winowhy.py. A processed dataset removing the reasons with label undecided for classification is available in ./dataset/.

Todo

Include the categorical annotation into the WinoWhy main dataset.

Citation

@inproceedings{zhang2020WinoWhy,
  author    = {Hongming Zhang* and Xinran Zhao* and Yangqiu Song},
  title     = {WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge},
  booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) 2020},
  year      = {2020}
}

Others

If you have any other questions about this repo, you are welcome to open an issue or send me an email, I will respond to that as soon as possible.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%