Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate tuples and tuples with none tail node #11

Closed
phosseini opened this issue May 25, 2021 · 5 comments
Closed

Duplicate tuples and tuples with none tail node #11

phosseini opened this issue May 25, 2021 · 5 comments
Assignees

Comments

@phosseini
Copy link

phosseini commented May 25, 2021

There are duplicate tuples in all three splits of data: ~68,626 in the train, ~7,410 in dev, and ~8,473 in test (please correct me if I'm wrong). I wonder why? And should we just ignore the duplicates when using the data? One example:

['PersonX answers the question', 'xAttr', 'knowledgeable']
['PersonX answers the question', 'xAttr', 'knowledgeable']

Also, there are tuples with none tail node value (these none valued tuples are also part of the duplicate tuples). For example,

['PersonX accidentally threw ___', 'xIntent', 'none']
['PersonX accidentally threw ___', 'xIntent', 'none']

I wonder how these none values should be interpreted? Should we just ignore them? Or, does it mean that the subject or head has no relation of type edge relation in the tuple? For instance, in the case of PersonX accidentally threw ___, PersonX has no xIntent? If that's the case, then how should we treat the following cases:

['PersonX accidently left', 'oReact', 'none']
['PersonX accidently left', 'oReact', 'sad']

Where we have the same relation one time with a none tail node and another time with a non-empty tail node.

Thanks.

@phosseini
Copy link
Author

There are duplicate tuples in all three splits of data: ~68,626 in the train, ~7,410 in dev, and ~8,473 in test (please correct me if I'm wrong). I wonder why? And should we just ignore the duplicates when using the data? One example:

['PersonX answers the question', 'xAttr', 'knowledgeable']
['PersonX answers the question', 'xAttr', 'knowledgeable']

Also, there are tuples with none tail node value (these none valued tuples are also part of the duplicate tuples). For example,

['PersonX accidentally threw ___', 'xIntent', 'none']
['PersonX accidentally threw ___', 'xIntent', 'none']

I wonder how these none values should be interpreted? Should we just ignore them? Or, does it mean that the subject or head has no relation of type edge relation in the tuple? For instance, in the case of PersonX accidentally threw ___, PersonX has no xIntent? If that's the case, then how should we treat the following cases:

['PersonX accidently left', 'oReact', 'none']
['PersonX accidently left', 'oReact', 'sad']

Where we have the same relation one time with a none tail node and another time with a non-empty tail node.

Thanks.

@keisks Any thoughts on this?

@puraminy
Copy link

puraminy commented Jun 21, 2021

I am not a member of this project, however I am also working on Atomic. I haven't noticed that. In my opinion certainly duplicate entries have no use and they are redundant or can even have negative effects.

As for none, they mean that there was no intention or no involuntary effect in case of xEffect or oEffect. In your example, since the person threw it accidentally, then none means he has no intention.

Also each head and relation can have multiple targets. They may have different confidence degree or plausibility.

I am a Ph.D student from Tehran university, I would like to know you more, and know what you are doing on Atomic. We may be able to share our thoughts, my email is : pouramini -------- gmail

@csbhagav
Copy link
Contributor

Sorry for the delay in addressing the issues. We are looking at this and will respond soon.

@rlebras
Copy link
Collaborator

rlebras commented Sep 15, 2021

Thank you @phosseini and @puraminy for your interest in our work!

As @puraminy mentioned, the tail node none indicates that the relation does not apply to the given head (e.g., if an event does not affect people other than PersonX, the tails for the oEffect, oReact, and oWant would be annotated as none - see Sap et al., 2019).

Duplicate tuples indicate that multiple annotators provided the same tail for a given head/relation pair. While these tuples are redundant, keeping them in the data allows to accurately reflect the data collection process and can be used to leverage the degree of confidence in these tuples.

@rlebras rlebras closed this as completed Sep 15, 2021
@cingtiye
Copy link

cingtiye commented Nov 22, 2022

Hi @rlebras

As @puraminy mentioned, there are tuples with different tail node values, while their head and relation are the same. For example,

["personX 'd better go", 'xAttr', 'avoidant']
["PersonX 'd better go", 'xAttr', 'weak']
["PersonX 'd better go", 'xAttr', 'hurried']
["PersonX 'd better go", 'xAttr', 'late']
["PersonX 'd better go", 'xAttr', 'Tardy']
["PersonX 'd better go", 'xAttr', 'busy']

Is it really neccessary for the LM models, such as GPT-XL or BART, to generate multiply tail nodes values (avoidant [EOS] , weak [EOS] or hurried [EOS] etc.) for the same input ( PersonX 'd better go xAttr [GEN] )?

In my opinion, LM should not generate different outputs for the same input during training ((please correct me if I'm wrong).

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants