In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

mlpotter · 2020-12-09T20:32:53Z

Is your feature request related to a problem? Please describe.
I find it difficult to acquire a list of triples of the form (subject,predicate,object) or (arg0,verb,arg1) from the json file output of the AllenNLP Open Information Extraction model which provides the bio-tags (which may be argument modifiers, the arguments themselves, etc). It is frustrating for me to compare the AllenNLP Open Information Extraction model output to other models such as Stanford Core NLP or University of Washington's OpenIE 5.1 when the output formats are so different.

Describe the solution you'd like
Currently, the output of the Open Information Extraction model for AllenNLP is a json file of the form show below:

This information provides the bio-tags for the arguments, verbs, verb modifiers, argument modifiers, etc, but the desired output which is consistent with other Open Information Extraction models in research is a tuple of the form (subject,predicate,object) or (arg0,verb,arg1). For example, from the paper (Supervised Open Information Extraction) AllenNLP bases its Open Information Extraction model from, the annotations are of the triplet form (even though there are multiple objects in some cases) and therefore this format is more desirable:

Describe alternatives you've considered
An alternative I have found is Gabriel Stanovsky's python wrapper https://github.com/gabrielStanovsky/supervised_oie_wrapper. A quick glance suggests that this wrapper seems to be outdated (been over 1-1.5 years since the last update), and not a main feature of AllenNLP (which it should be). However, this still does not quite satisfy the triple format desired, for example in the output below from the wrapper:

This triple actually has two arguments ARG1 and ARG2 as the object. In some scenarios the object would be split such that two triples are created (John,decided to join,the party) and (John,decided to join,in December) or merge the object (John,decided to join,the party in December), or to completely ignore modifying arguments such as "in December" (John,decided to join,the party).

Additional context
Even just implementing Gabriel Stanovsky's python wrapper to an up-to-date AllenNLP version would be extremely useful to start, as it formats the data much more closely to other Open Information Extraction models in the literature.

wangrat · 2020-12-10T15:07:44Z

I'm doing something like that for the project I'm working on currently, I will keep this in mind and if I have a nice piece of code solving this problem well I can do a PR. But no promises ;)

EDIT
I guess it would be an additional method in the SRLPredictor class doing some postprocessing. So allennlp_models package (and repo) it is.

mlpotter · 2020-12-10T15:52:36Z

Yes it would be ideal to postprocess this output from the '''make_srl_string''' method

Into clean triples (for example I do not want bio-tags like the arg-modifier in a triple output).

Good luck with your project, perhaps you will be the authors to a solution.

schmmd · 2020-12-10T17:57:52Z

@mlpotter you could write a custom predictor which formats Open IE triples to your liking. I think @gabrielStanovsky chose this format so he could add additional information, such as temporal information.

[ARG0: Albert Einstein , a German theoretical physicist] , [V: published] [ARG1: the theory of relativity] [ARGM-TMP: in 1915] .

In any case, I'm glad you find the Open IE 5.1 output intuitive, as I'm the original author of that codebase 😄

gabrielStanovsky · 2020-12-10T19:37:28Z

Hi all, yes, this indeed was a flexible format to work with, and I agree that adding my wrapper to the codebase makes sense, even more so now that the models are separate from the main repo. Thanks!

mlpotter added the Feature request label Dec 9, 2020

schmmd added the Contributions welcome label Dec 10, 2020

dirkgr assigned schmmd Dec 12, 2020

schmmd removed their assignment Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

mlpotter commented Dec 9, 2020 •

edited

wangrat commented Dec 10, 2020 •

edited

mlpotter commented Dec 10, 2020 •

edited

schmmd commented Dec 10, 2020

gabrielStanovsky commented Dec 10, 2020

In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

In Open Information Extraction Model have output option for triples of form (subject,verb,object) #4857

Comments

mlpotter commented Dec 9, 2020 • edited

wangrat commented Dec 10, 2020 • edited

mlpotter commented Dec 10, 2020 • edited

schmmd commented Dec 10, 2020

gabrielStanovsky commented Dec 10, 2020

mlpotter commented Dec 9, 2020 •

edited

wangrat commented Dec 10, 2020 •

edited

mlpotter commented Dec 10, 2020 •

edited