Tasks 1285-1307: kpa dataset #626

yeganehkordi · 2021-11-14T21:35:14Z

These tasks have the same definitions with different domains. @ashok-arjun, Can you change the definitions? For example, we can have summary generation, argument generation, etc.

mishra-sid · 2021-11-15T21:50:01Z

Hi, since it requires just updating the task definitions, I can work on this.

yeganehkordi · 2021-11-16T09:27:09Z

Hi, it's not just the definition; the whole task needs to be changed. These tasks are the same, and we need to change some of them to have new different tasks.
It would be great if you could work on this.

danyaljj · 2021-11-16T09:35:10Z

@yeganehkordi Could you elaborate more on what is the issue?

yeganehkordi · 2021-11-16T09:49:00Z

Yeah, the kpa dataset consists of arguments and their summary in different topics(gathered from many sources). Each of these tasks is to verify the summary in a given topic, and the only difference in the tasks is their topic. Actually, I think the contributor didn't need to separate the topics into different tasks, because knowing the topic doesn't make any difference in summary verification.
My suggestion is to provide new tasks based on this dataset or dropping some of the tasks. The new tasks can be summary generation, argument generation, topic verification, etc.

mishra-sid · 2021-11-16T09:56:55Z

Makes sense, thanks for the clarification. Maybe I can drop all the existing datasets on kpa, aggregate the instances from those datasets and make the new suggested tasks ?

yeganehkordi · 2021-11-16T10:05:32Z

I agree! I guess it would be preferable to provide as many different tasks as you can.

danyaljj · 2021-11-19T04:33:47Z

Yeah, if it's too difficult to fix a task, it's better to drop it completely.

mishra-sid · 2021-11-19T15:02:06Z

@danyaljj Do you think it's worth spending time creating new tasks using this data(maybe just the simple ones?) or should I just create a PR to drop the existing tasks?

danyaljj · 2021-12-07T18:39:41Z

Let's drop it.

danyaljj · 2022-02-03T20:56:21Z

Actually, looking through the conversation, not sure why I suggested to drop the tasks. How about we go ahead with @yeganehkordi 's suggestion of merging the tasks? (if I understood it correctly). @Palipoor do you have any suggestions here?

ashok-arjun · 2022-02-04T11:30:40Z

Sorry I have been busy over the past few months, and couldn't be active here.

Actually, I think that having the topic (eg. Assisted suicide should be a criminal offence) makes a lot of difference in a specific task, as it covers a part of the question for all the instances. I feel it is similar to having the topic as translate from en-->fr and then having instances under the task.

On the other hand, merging everything would be similar to having all translation pairs (en-->fr, fr-->en and practically all paris) together in one task, and just because the main task (translation) is the same for all of them.

I see that the same notion has been also used in some other tasks, some being:

task1488_sarcasmdetection_headline_classification, task1489_sarcasmdetection_tweet_classification
task1490_bengali_personal_hate_speech_binary_classification, task1491_bengali_political_hate_speech_binary_classification, task1492_bengali_religious_hate_speech_binary_classification, task1493_bengali_geopolitical_hate_speech_binary_classification
task1448_disease_entity_extraction_ncbi_dataset, task1449_disease_entity_extraction_bc5cdr_dataset

In each of the above, the main objective (eg. classification, or entity extraction) remains the same for all tasks, but they are separated based on the characteristics of the instances. I feel that the KPA dataset tasks are also similar, as they are separated based on the characterstics of the instances, and a task as a whole is self-contained about the characteristic, as the question specifies.

Kindly correct me if I am wrong about this. @danyaljj @yeganehkordi

yeganehkordi · 2022-02-04T16:25:30Z

I think having different topics in summary verification tasks won't make much difference in solving the tasks, and they don't need different information or skills. It just slightly changes the domains. While understanding a language and writing in that language requires different skills; someone might translate a sentence from English to French, and not be able to translate French to English.

task1490_bengali_personal_hate_speech_binary_classification, task1491_bengali_political_hate_speech_binary_classification, task1492_bengali_religious_hate_speech_binary_classification, task1493_bengali_geopolitical_hate_speech_binary_classification

In the same manner, in hate speech detection tasks a sentence might have different types of toxicity, and in each task, we need to distinguish a specific inappropriate language usage. So, the same instance may have different outputs in these tasks.

task1448_disease_entity_extraction_ncbi_dataset and task1449_disease_entity_extraction_bc5cdr_dataset
task1488_sarcasmdetection_headline_classification, task1489_sarcasmdetection_tweet_classification

I agree with you on these tasks. They are the same. However, they are from different datasets and I think there is a value in having a large variety of datasets. It's somehow like having the same question answering tasks from different datasets.

I still think it's better to merge instances and have different tasks based on this dataset. If it's not possible, I guess we can evaluate generalization over domains using this version of the tasks and I think we can keep them.

Palipoor · 2022-02-04T22:28:05Z

I think different translation datasets that are extracted from different sources should be in different tasks. Translating a tweet from English to French is a lot different from translating a Wikipedia page, for instance.
However, in the kpa case, I don't think the tasks are that different from each other. In addition to this, I don't even think it's necessary to include the topics in the task description. Simply "Does the given keypoint correctly summarize the argument?" suffices.

Palipoor · 2022-02-04T22:29:00Z

I think we can easily merge them into one task. I only dropped them because I thought it was agreed upon.

yeganehkordi added the urgent label Nov 14, 2021

Palipoor added a commit to Palipoor/natural-instructions-expansion that referenced this issue Feb 3, 2022

Drop kpa tasks - closes allenai#626

7caa6d7

Palipoor mentioned this issue Feb 3, 2022

Drop kpa tasks - closes #626 #696

Closed

Palipoor mentioned this issue Mar 1, 2022

Merge all kpa tasks #733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks 1285-1307: kpa dataset #626

Tasks 1285-1307: kpa dataset #626

yeganehkordi commented Nov 14, 2021

mishra-sid commented Nov 15, 2021

yeganehkordi commented Nov 16, 2021

danyaljj commented Nov 16, 2021

yeganehkordi commented Nov 16, 2021

mishra-sid commented Nov 16, 2021

yeganehkordi commented Nov 16, 2021

danyaljj commented Nov 19, 2021

mishra-sid commented Nov 19, 2021

danyaljj commented Dec 7, 2021

danyaljj commented Feb 3, 2022

ashok-arjun commented Feb 4, 2022 •

edited

yeganehkordi commented Feb 4, 2022 •

edited

Palipoor commented Feb 4, 2022

Palipoor commented Feb 4, 2022

Tasks 1285-1307: kpa dataset #626

Tasks 1285-1307: kpa dataset #626

Comments

yeganehkordi commented Nov 14, 2021

mishra-sid commented Nov 15, 2021

yeganehkordi commented Nov 16, 2021

danyaljj commented Nov 16, 2021

yeganehkordi commented Nov 16, 2021

mishra-sid commented Nov 16, 2021

yeganehkordi commented Nov 16, 2021

danyaljj commented Nov 19, 2021

mishra-sid commented Nov 19, 2021

danyaljj commented Dec 7, 2021

danyaljj commented Feb 3, 2022

ashok-arjun commented Feb 4, 2022 • edited

yeganehkordi commented Feb 4, 2022 • edited

Palipoor commented Feb 4, 2022

Palipoor commented Feb 4, 2022

ashok-arjun commented Feb 4, 2022 •

edited

yeganehkordi commented Feb 4, 2022 •

edited