Skip to content

The Plumber framework for KG completion and structured triples extraction

Notifications You must be signed in to change notification settings

YaserJaradeh/ThePlumber

Repository files navigation

The Plumber

Plumber framework icon a.k.a. Plumber

Plumber is a framework for KG completion, it extracts triples from NL text and map the triples to their KG representations.

Plumber creates Information Extraction (IE) pipelines out of community created componnents.

The Science behind Plumber

A framwork for creating Information Extraction pipelines. Plumber integrates 33 reusable components released by the research community for the subtasks entity linking (EL), relation linking (RL), text triple extraction (TE) (subject, predicate, object), and coreference resolution (CR). Overall, there are 264 different composable KG information extraction pipelines generated by the possible combination of the available 33 components. Plumber implements a transformer-based classification algorithm that intelligently chooses a suitable pipeline based on the unstructured input text of the user.

We also performed an exhaustive evaluation of Plumber to investigate the efficacy of the framework in creating KG triples from unstructured text. We also demonstrated that independent of the underlying KG; Plumber can find and assemble different extraction components to produce better suited KG triple extraction pipelines, significantly outperforming existing baselines such as Frankenstein.

A teaser of the results:

Tables Dataset Classification P Classification R Classification F1
Frankenstein WebNLG 0.732 0.751 0.741
Plumber WebNLG 0.877 0.900 0.888

Please check the pulication for more details on the evaluation and the aproach in general.

Plumber in-action

Click here to see Plumber demo

Interactive testing

To be deployed soon under the ORKG infrastructure and available for the public to try it out.

Running the code

Here is a really basic documentation of what are the requests and responses of the Plumber API created for the ORKG

First things first, you need to run everything through docker-compose

docker-compose up -d

This should take care of everything and setup all needed containers and orchestrates their communication.

Some background first

Since Plumber is designed with “some” modularity in-mind, to be able to easily call components, the name of the class is used as a key. What is this key? Nice that you asked, let’s take a look at this small example: ORKGLinker, is a possible name for a component (class) implemented within Plumber. To make it easier to call The “linker” is removed and you can add any number of “underscores (_)” to make it easier. This means that you can pass to Plumber the name “orkg” or “ORkg” or “O_R_K_G” and they all are viable candidates for the same component. P.S. sometimes due to technical details you might end up with something like “NeuralSpacyCoreferenceResolver” so similarly to what was done before, “Resolver” is removed but you endup with a nasty and long name (e.g. neural_spacy_coreference).

Now back to business

So what are the endpoints available now: Calling the index page on the API will get you a response that looks like this:

[
   "get_components                           GET,HEAD,OPTIONS     /components",
   "get_pipelines                            GET,HEAD,OPTIONS     /pipelines",
   "run_pipeline                             PUT,OPTIONS,POST     /run"
]

From the above response you should be able to guess what each request does and what you expect.

Hmm, What about more details

OK you got it, let’s dive into the details of the calls

curl --location --request GET 'awesome-plumber-host:5000/components'

The response to this simple call is a JSON object containing a description of the components (an example of such response would be)

[{
   "desc":"The dependency extractor makes use of stanford parser to extract meaningful entity triplets from sentences based on the dependency tree of said sentence",
   "icon":null,
   "key":"dependency",
   "kg":null,
   "name":"Dependency-based Extractor",
   "task":"TE",
   "url":"https://github.com/anutammewar/extract_triplets"
}]

As you can see here, there is a number of attributes returned here, most importantly:

  • Task, indicating which task in the pipeline (TE: Extractors, CR: Resolvers, EL-RL-EL/RL: for Linkers)
  • KG: indicates the graph
  • Key: Is the key that Plumber understands, and the name is the human friendly one ;)

Another call is the pipelines one, well from the naming, you can expect what it will provide. Since some pipelines might be saved to help novice users or to make it easy to find the right one (assuming more and more components get added).

curl --location --request GET 'awesome-plumber-host:5000/pipelines

Now this call returns a list of available pipelines, describing what components they are made of and the name of mentioned pipeline. (following is an example of one pipeline)

[{
   "extractors":[
      "MinIE"
   ],
   "linkers":[
      "FalconWikidataJoint"
   ],
   "name":"Wikidata Test Pipeline",
   "resolvers":[
      "hmtl"
   ]
}]

You can notice that the pipeline structure only makes use of the “key” value that was noted in the component description. Linkers and Resolvers could be null as well because they are not a required field.

Start your engines

Now that we know what we need to know, you can start running pipelines

In order to run a pipeline, you need to let Plumber know what is the configuration of your pipeline, and to do so, you need to make a call that looks like this:

curl --location --request PUT 'awesome-plumber-host:5000/run' \
--header 'Content-Type: application/json' \
--data-raw '{
   "extractor": ["open_ie"],
   "linker": ["FalconDBpediaJoint"],
   "resolver": ["dummy"],
   "input_text": "Some fancy text here to extract from"
}'

Wow! This is different than before, as you might say. Well it is.

You need to make a POST or a PUT HTTP call to the /run endpoint and pass the payload as shown above.

The payload has four attributes (“extractor”, “linker”, “resolver”, and “input_text”)

The mandatory fields are the “extractor” and the “input_text”, the others could be passed as null or not passed at all.

P.S. Running pipelines is not black magic, so expect some time until you get the results!

P.P.S. in case you are wondering about the response of the last call.

[
   {
      "object":{
         "label":"the Sea of Galilee",
         "uri":null
      },
      "predicate":{
         "label":"painted The Storm on",
         "uri":null
      },
      "subject":{
         "label":"Rembrandt",
         "uri":"http://dbpedia.org/resource/Rembrandt"
      }
   }
]

You can see that each object is a triple, and each span has a “Label” and a “URI”. If the “uri” is null then it means it is not linked, or could be a literal.

And, .... long story short, this is it. Happy plumbing!!

User Interface

The code for the user interface is part of the ORKG frontend code Gitlab repo

Plumber's integration with a user interface looks like this:

Check the demonstration video for more details

Citation

If you feel this work is helpful please cite the the main publication of Plumber

@inproceedings{plumber,
    author="Jaradeh, Mohamad Yaser
    and Singh, Kuldeep
    and Stocker, Markus
    and Both, Andreas
    and Auer, S{\"o}ren",
    editor="Brambilla, Marco
    and Chbeir, Richard
    and Frasincar, Flavius
    and Manolescu, Ioana",
    title="Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines",
    booktitle="Web Engineering",
    year="2021",
    publisher="Springer International Publishing",
    address="Cham",
    pages="240--254",
    isbn="978-3-030-74296-6",
    doi="10.1007/978-3-030-74296-6\_19"
}

About

The Plumber framework for KG completion and structured triples extraction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages