New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate tuples and tuples with none tail node #11
Comments
@keisks Any thoughts on this? |
I am not a member of this project, however I am also working on Atomic. I haven't noticed that. In my opinion certainly duplicate entries have no use and they are redundant or can even have negative effects. As for Also each head and relation can have multiple targets. They may have different confidence degree or plausibility. I am a Ph.D student from Tehran university, I would like to know you more, and know what you are doing on Atomic. We may be able to share our thoughts, my email is : pouramini -------- gmail |
Sorry for the delay in addressing the issues. We are looking at this and will respond soon. |
Thank you @phosseini and @puraminy for your interest in our work! As @puraminy mentioned, the tail node Duplicate tuples indicate that multiple annotators provided the same tail for a given head/relation pair. While these tuples are redundant, keeping them in the data allows to accurately reflect the data collection process and can be used to leverage the degree of confidence in these tuples. |
Hi @rlebras As @puraminy mentioned, there are tuples with different tail node values, while their head and relation are the same. For example, ["personX 'd better go", 'xAttr', 'avoidant']
["PersonX 'd better go", 'xAttr', 'weak']
["PersonX 'd better go", 'xAttr', 'hurried']
["PersonX 'd better go", 'xAttr', 'late']
["PersonX 'd better go", 'xAttr', 'Tardy']
["PersonX 'd better go", 'xAttr', 'busy'] Is it really neccessary for the LM models, such as GPT-XL or BART, to generate multiply tail nodes values ( In my opinion, LM should not generate different outputs for the same input during training ((please correct me if I'm wrong). Thanks. |
There are duplicate tuples in all three splits of data:
~68,626
in the train,~7,410
in dev, and~8,473
in test (please correct me if I'm wrong). I wonder why? And should we just ignore the duplicates when using the data? One example:Also, there are tuples with
none
tail node value (thesenone
valued tuples are also part of the duplicate tuples). For example,I wonder how these
none
values should be interpreted? Should we just ignore them? Or, does it mean that the subject or head has no relation of typeedge relation
in the tuple? For instance, in the case ofPersonX accidentally threw ___
, PersonX has no xIntent? If that's the case, then how should we treat the following cases:Where we have the same relation one time with a
none
tail node and another time with a non-empty tail node.Thanks.
The text was updated successfully, but these errors were encountered: