Plumber is a framework for KG completion, it extracts triples from NL text and map the triples to their KG representations.
Plumber creates Information Extraction (IE) pipelines out of community created componnents.
A framwork for creating Information Extraction pipelines. Plumber integrates 33 reusable components released by the research community for the subtasks entity linking (EL), relation linking (RL), text triple extraction (TE) (subject, predicate, object), and coreference resolution (CR). Overall, there are 264 different composable KG information extraction pipelines generated by the possible combination of the available 33 components. Plumber implements a transformer-based classification algorithm that intelligently chooses a suitable pipeline based on the unstructured input text of the user.
We also performed an exhaustive evaluation of Plumber to investigate the efficacy of the framework in creating KG triples from unstructured text. We also demonstrated that independent of the underlying KG; Plumber can find and assemble different extraction components to produce better suited KG triple extraction pipelines, significantly outperforming existing baselines such as Frankenstein.
A teaser of the results:
Tables | Dataset | Classification P | Classification R | Classification F1 |
---|---|---|---|---|
Frankenstein | WebNLG | 0.732 | 0.751 | 0.741 |
Plumber | WebNLG | 0.877 | 0.900 | 0.888 |
Please check the pulication for more details on the evaluation and the aproach in general.
To be deployed soon under the ORKG infrastructure and available for the public to try it out.
Here is a really basic documentation of what are the requests and responses of the Plumber API created for the ORKG
First things first, you need to run everything through docker-compose
docker-compose up -d
This should take care of everything and setup all needed containers and orchestrates their communication.
Since Plumber is designed with “some” modularity in-mind, to be able to easily call components, the name of the class is used as a key. What is this key? Nice that you asked, let’s take a look at this small example: ORKGLinker, is a possible name for a component (class) implemented within Plumber. To make it easier to call The “linker” is removed and you can add any number of “underscores (_)” to make it easier. This means that you can pass to Plumber the name “orkg” or “ORkg” or “O_R_K_G” and they all are viable candidates for the same component. P.S. sometimes due to technical details you might end up with something like “NeuralSpacyCoreferenceResolver” so similarly to what was done before, “Resolver” is removed but you endup with a nasty and long name (e.g. neural_spacy_coreference).
So what are the endpoints available now: Calling the index page on the API will get you a response that looks like this:
[
"get_components GET,HEAD,OPTIONS /components",
"get_pipelines GET,HEAD,OPTIONS /pipelines",
"run_pipeline PUT,OPTIONS,POST /run"
]
From the above response you should be able to guess what each request does and what you expect.
OK you got it, let’s dive into the details of the calls
curl --location --request GET 'awesome-plumber-host:5000/components'
The response to this simple call is a JSON object containing a description of the components (an example of such response would be)
[{
"desc":"The dependency extractor makes use of stanford parser to extract meaningful entity triplets from sentences based on the dependency tree of said sentence",
"icon":null,
"key":"dependency",
"kg":null,
"name":"Dependency-based Extractor",
"task":"TE",
"url":"https://github.com/anutammewar/extract_triplets"
}]
As you can see here, there is a number of attributes returned here, most importantly:
- Task, indicating which task in the pipeline (TE: Extractors, CR: Resolvers, EL-RL-EL/RL: for Linkers)
- KG: indicates the graph
- Key: Is the key that Plumber understands, and the name is the human friendly one ;)
Another call is the pipelines one, well from the naming, you can expect what it will provide. Since some pipelines might be saved to help novice users or to make it easy to find the right one (assuming more and more components get added).
curl --location --request GET 'awesome-plumber-host:5000/pipelines
Now this call returns a list of available pipelines, describing what components they are made of and the name of mentioned pipeline. (following is an example of one pipeline)
[{
"extractors":[
"MinIE"
],
"linkers":[
"FalconWikidataJoint"
],
"name":"Wikidata Test Pipeline",
"resolvers":[
"hmtl"
]
}]
You can notice that the pipeline structure only makes use of the “key” value that was noted in the component description. Linkers and Resolvers could be null as well because they are not a required field.
Now that we know what we need to know, you can start running pipelines
In order to run a pipeline, you need to let Plumber know what is the configuration of your pipeline, and to do so, you need to make a call that looks like this:
curl --location --request PUT 'awesome-plumber-host:5000/run' \
--header 'Content-Type: application/json' \
--data-raw '{
"extractor": ["open_ie"],
"linker": ["FalconDBpediaJoint"],
"resolver": ["dummy"],
"input_text": "Some fancy text here to extract from"
}'
Wow! This is different than before, as you might say. Well it is.
You need to make a POST
or a PUT
HTTP call to the /run endpoint and pass the payload as shown above.
The payload has four attributes (“extractor”, “linker”, “resolver”, and “input_text”)
The mandatory fields are the “extractor” and the “input_text”, the others could be passed as null or not passed at all.
P.S. Running pipelines is not black magic, so expect some time until you get the results!
P.P.S. in case you are wondering about the response of the last call.
[
{
"object":{
"label":"the Sea of Galilee",
"uri":null
},
"predicate":{
"label":"painted The Storm on",
"uri":null
},
"subject":{
"label":"Rembrandt",
"uri":"http://dbpedia.org/resource/Rembrandt"
}
}
]
You can see that each object is a triple, and each span has a “Label” and a “URI”. If the “uri” is null then it means it is not linked, or could be a literal.
And, .... long story short, this is it. Happy plumbing!!
The code for the user interface is part of the ORKG frontend code Gitlab repo
Plumber's integration with a user interface looks like this:
Check the demonstration video for more details
If you feel this work is helpful please cite the the main publication of Plumber
@inproceedings{plumber,
author="Jaradeh, Mohamad Yaser
and Singh, Kuldeep
and Stocker, Markus
and Both, Andreas
and Auer, S{\"o}ren",
editor="Brambilla, Marco
and Chbeir, Richard
and Frasincar, Flavius
and Manolescu, Ioana",
title="Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines",
booktitle="Web Engineering",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="240--254",
isbn="978-3-030-74296-6",
doi="10.1007/978-3-030-74296-6\_19"
}