-
Notifications
You must be signed in to change notification settings - Fork 58
The Natural Language Interchange Format (NIF) is an RDF vocabulary that can be used to describe natural language resources that can be interchanged between different systems. GERBIL uses is to sent and receive documents and annotations from the benchmarked annotators. For more general information about NIF, visit its website.
In this section, we want to show some of the core concepts of NIF that might be important to understand the way how documents are sent and received by GERBIL.
In NIF, additional information about a part of a text, e.g., information about a named entity inside the text, can be added by using an RDF node that points to the texts RDF node with the nif:referenceContext
property. Note that the texts RDF node has to have the type nif:Context
.
NIF resources typically have their character boundings at the end of their URI. Let's assume that there is a text ex:Text
with 100 characters. The URI of the RDF node representing this text in NIF would be ex:Text#char=0,100
. An annotation inside the text starting at character 42 with a length of 10 would have the URI ex:Text#char=42,52
. However, while the URIs typically already include the positions, NIF defines the two properties nif:beginIndex
and nif:endIndex
that are used to add the begin and end positions to the RDF nodes.
In NIF, positions are determined by counting character points. While in simple texts, there might be no difference between counting characters and character points, it is important to be aware of the fact that these two ways of counting can differ. In Java, the length of a String in codepoints can be determined in the following way:
String text = ...;
int length = text.codePointCount(0, text.length());
Note that - like in Java - the end position of a String in NIF is the first position behind the String.
In the following table, there are some helpful properties that can be used to express features of an annotation.
Property | Meaning | Comment |
---|---|---|
nif:anchorOf |
Contains the String the annotation is referencing inside the referenced text. | optional |
nif:beginIndex |
Defines the start position of the String. | mandatory |
nif:endIndex |
Defines the first position after the String. | mandatory |
nif:referenceContext |
References the text to which this annotation belongs to. | mandatory |
ITSRDF.taClassRef |
References to URIs defining the type of the String. | should be present in the result of entity typing tasks |
ITSRDF.taConfidence |
Defines a confidence value for this annotation. | optional |
itsrdf:taIdentRef |
References to URIs defining the meaning of the String. | should be present in the result of linking tasks |
In the version supported by GERBIL, NIF does not define the type Document. However, GERBIL parses an RDF node as document if this node has the type nif:Context
and has the property nif:isString
.
During the communication with NIF based webservices, GERBIL sends and expects to receive single NIF documents. using the Turtle serialization, such a document can look like this:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-1#char=0,146>
a nif:RFC5147String , nif:String , nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "146"^^xsd:nonNegativeInteger ;
nif:isString "Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas , but in effect had no formal training in either botany or art."@en .
<http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-1#char=44,50>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Sydney"@en ;
nif:beginIndex "44"^^xsd:nonNegativeInteger ;
nif:endIndex "50"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-1#char=0,146> ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Sydney> .