Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

Open
mlpotter opened this issue Dec 9, 2020 · 4 comments

Comments

@mlpotter
Copy link

mlpotter commented Dec 9, 2020

Is your feature request related to a problem? Please describe.
I find it difficult to acquire a list of triples of the form (subject,predicate,object) or (arg0,verb,arg1) from the json file output of the AllenNLP Open Information Extraction model which provides the bio-tags (which may be argument modifiers, the arguments themselves, etc). It is frustrating for me to compare the AllenNLP Open Information Extraction model output to other models such as Stanford Core NLP or University of Washington's OpenIE 5.1 when the output formats are so different.

Describe the solution you'd like
Currently, the output of the Open Information Extraction model for AllenNLP is a json file of the form show below:
image

This information provides the bio-tags for the arguments, verbs, verb modifiers, argument modifiers, etc, but the desired output which is consistent with other Open Information Extraction models in research is a tuple of the form (subject,predicate,object) or (arg0,verb,arg1). For example, from the paper (Supervised Open Information Extraction) AllenNLP bases its Open Information Extraction model from, the annotations are of the triplet form (even though there are multiple objects in some cases) and therefore this format is more desirable:
image

Describe alternatives you've considered
An alternative I have found is Gabriel Stanovsky's python wrapper https://github.com/gabrielStanovsky/supervised_oie_wrapper. A quick glance suggests that this wrapper seems to be outdated (been over 1-1.5 years since the last update), and not a main feature of AllenNLP (which it should be). However, this still does not quite satisfy the triple format desired, for example in the output below from the wrapper:
image
This triple actually has two arguments ARG1 and ARG2 as the object. In some scenarios the object would be split such that two triples are created (John,decided to join,the party) and (John,decided to join,in December) or merge the object (John,decided to join,the party in December), or to completely ignore modifying arguments such as "in December" (John,decided to join,the party).

Additional context
Even just implementing Gabriel Stanovsky's python wrapper to an up-to-date AllenNLP version would be extremely useful to start, as it formats the data much more closely to other Open Information Extraction models in the literature.

@wangrat
Copy link

wangrat commented Dec 10, 2020

I'm doing something like that for the project I'm working on currently, I will keep this in mind and if I have a nice piece of code solving this problem well I can do a PR. But no promises ;)

EDIT
I guess it would be an additional method in the SRLPredictor class doing some postprocessing. So allennlp_models package (and repo) it is.

@mlpotter
Copy link
Author

mlpotter commented Dec 10, 2020

Yes it would be ideal to postprocess this output from the '''make_srl_string''' method
image

Into clean triples (for example I do not want bio-tags like the arg-modifier in a triple output).

Good luck with your project, perhaps you will be the authors to a solution.

@schmmd
Copy link
Member

schmmd commented Dec 10, 2020

@mlpotter you could write a custom predictor which formats Open IE triples to your liking. I think @gabrielStanovsky chose this format so he could add additional information, such as temporal information.

[ARG0: Albert Einstein , a German theoretical physicist] , [V: published] [ARG1: the theory of relativity] [ARGM-TMP: in 1915] .

In any case, I'm glad you find the Open IE 5.1 output intuitive, as I'm the original author of that codebase 😄

@gabrielStanovsky
Copy link
Contributor

Hi all, yes, this indeed was a flexible format to work with, and I agree that adding my wrapper to the codebase makes sense, even more so now that the models are separate from the main repo. Thanks!

@schmmd schmmd removed their assignment Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants