D2KB

Ricardo Usbeck edited this page Jun 17, 2016 · 4 revisions

Here, we briefly explain the "Disambiguate to Knowledge Base" (D2KB, a.k.a. Entitiy Linking) task focussing on the input an annotator gets and the output that is expected.

Input

A document with already marked entities. The example NIF document below shows the sentence Today, Barack Obama visited Berlin and met John Doe. and the markings of the three named entities Barack Obama, Berlin and John Doe.

<http://example.org/document-1#char=0,52>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "52"^^xsd:nonNegativeInteger ;
        nif:isString          "Today, Barack Obama visited Berlin and met John Doe."@en .
		
<http://example.org/document-1#char=7,19>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Barack Obama"@en ;
        nif:beginIndex        "7"^^xsd:nonNegativeInteger ;
        nif:endIndex          "19"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> ;
		
<http://example.org/document-1#char=28,34>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Berlin"@en ;
        nif:beginIndex        "28"^^xsd:nonNegativeInteger ;
        nif:endIndex          "34"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> 
		
<http://example.org/document-1#char=43,51>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "John Doe"@en ;
        nif:beginIndex        "43"^^xsd:nonNegativeInteger ;
        nif:endIndex          "51"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> ;

Output

A document with marked entities that have an URI assigned to them. The NIF document from above is printed below with the URIs that have been added to the three named entities.

<http://example.org/document-1#char=0,52>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "52"^^xsd:nonNegativeInteger ;
        nif:isString          "Today, Barack Obama visited Berlin and met John Doe."@en .
		
<http://example.org/document-1#char=7,19>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Barack Obama"@en ;
        nif:beginIndex        "7"^^xsd:nonNegativeInteger ;
        nif:endIndex          "19"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Barack_Obama> .
		
<http://example.org/document-1#char=28,34>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Berlin"@en ;
        nif:beginIndex        "28"^^xsd:nonNegativeInteger ;
        nif:endIndex          "34"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Berlin> .
		
<http://example.org/document-1#char=43,51>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "John Doe"@en ;
        nif:beginIndex        "43"^^xsd:nonNegativeInteger ;
        nif:endIndex          "51"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.org/document-1#char=0,52> ;
        itsrdf:taIdentRef     <http://my-annotator.org/unknown/John_Doe> .

Task

Link the entities marked inside the text to a knowledge base (KB). Entities that are not present inside the KB should get a URI which does not use the URI prefix of the KB (or any other well known KB). In the example, the unknown entity John Doe has got the URI http://my-annotator.org/unknown/John_Doe. Note that if an entity is send back without any URI it might be counted as an error.

Evaluation

The linked entities returned by the annotator are matched against the gold standard using our URI matching implementation.

Handling of higher order annotators

Annotators of "higher order" are annotators that can handle more complex tasks than the D2KB task, e.g., A2KB. Those annotators can still take part in an D2KB experiment even if they do not explicitly use the already marked entities. The response of these annotators is filtered using a strong annotation match filter. Thus, all entities that do not exactly match one of the marked entities in the gold standard are removed from the response of the annotator before it is evaluated.

It might become clearer with an example.

President Barack Obama met Angela Merkel in Berlin, yesterday."

Lets assume the word sets President Barack Obama, Angela Merkel and Berlin have been marked as entities in the gold standard. Our example A2KB system returns Barack Obama, Angela Merkel, Berlin and yesterday where the URIs of the first three annotations exactly match the URIs given in the gold standard. Before comparing the annotations of the system with those given by the gold standard, the two annotations Barack Obama and yesterday are removed since their positions do not exactly match the positions of gold standard annotations. The remaining two annotations are counted as true positives while the missing annotation President Barack Obama is counted as false negativeleading to a precision of 1.0 and a recall of 2/3.