Web service

Jaime Orellana edited this page Jan 20, 2018 · 8 revisions
Clone this wiki locally

This page gives an introduction on how to use the DBpedia Spotlight Web Service. The available service endpoints are listed below and described in more details in the User's Manual.

Table of Contents

Spotting

Spotting : takes text as input and recognizes entities/concepts to annotate. Several spotting techniques are available, such as dictionary lookup and Named Entity Recognition (NER).

Supported types (POST/GET): XML, JSON, NIF

Disambiguate

Disambiguation: takes spotted text input, where entities/concepts have already been recognized and marked as wiki markup or xml. Chooses an identifier for each recognized entity/concept given the context.

Supported types (POST/GET):XML, JSON, HTML, RDFa, NIF

Annotate

Annotation: runs spotting and disambiguation. Takes text as input, recognizes entities/concepts to annotate and chooses an identifier for each recognized entity/concept given the context.

Supported types (POST/GET):XML, JSON, HTML, RDFa, NIF

Candidates

Similar to annotate, but returns a ranked list of candidates instead of deciding on one. These list contains some properties as described below:

  • support: how prominent is this entity, i.e. number of inlinks in Wikipedia;
  • priorScore: normalized support;
  • contextualScore: score from comparing the context representation of an entity with the text (e.g. cosine similartity with tf-icf weights);
  • percentageOfSecondRank: measure by how much the winning entity has won by takingcontextualScore_2ndRank / contextualScore_1stRank, which means the lower this score, the further the first ranked entity was "in the lead";
  • finalScore: combination of all of them;
Supported types (POST/GET):XML, JSON

Feedback

Supported types (POST/GET):XML

Examples

Example 1: Simple request

  • text= "President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance."
  • confidence = 0.2; support=20
  • whitelist all types.
curl http://model.dbpedia-spotlight.org/en/annotate \
  --data-urlencode "text=President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing
  that the policy provides more generous assistance." \
  --data "confidence=0.2" \
  --data "support=20"

Example 2: Using SPARQL for filtering

This example demonstrates how to keep the annotations constrained to only politicians related to Chicago.

curl http&#58&#59;//model.dbpedia&#45&#59;spotlight.org/en/annotate \
  &#45&#59;&#45&#59;data&#45&#59;urlencode &quot&#59;text&#61&#59;President Obama called Wednesday on Congress to extend a tax break
  for students included in last year&#39&#59;s economic stimulus package, arguing
  that the policy provides more generous assistance.&quot&#59; \
  &#45&#59;&#45&#59;data &quot&#59;confidence&#61&#59;0.2&quot&#59; \
  &#45&#59;&#45&#59;data &quot&#59;support&#61&#59;20&quot&#59; \
 &#45&#59;&#45&#59;data&#45&#59;urlencode &quot&#59;sparql&#61&#59;SELECT DISTINCT ?x WHERE &#123&#59; ?x a &lt&#59;http&#58&#59;//dbpedia.org/ontology/OfficeHolder&gt&#59; . ?x ?related &lt&#59;http&#58&#59;//dbpedia.org/resource/Chicago&gt&#59; . &#125&#59;&quot&#59;

Notice: Due to system resources restrictions, for this demo we only use the first 2000 results returned for each query (default for the public DBpedia SPARQL endpoint). However you are welcome to download the software+data and install in your server for real world use cases.

Attention: Make sure to encode your SPARQL query before adding it as the value of the //&sparql// parameter - see java.net.URLEncoder.encode().

Content Negotiation

You can request different types of output by setting the Accept request header. For example, in order to request JSON output, you can add Accept:application/json to the request headers.

One example using cURL:

     curl "http://model.dbpedia-spotlight.org/en/annotate?text=President%20Michelle%20Obama%20called%20Thursday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20" -H "Accept:application/json"

The content types we currently support are:

  • text/html
  • application/xhtml+xml
  • text/xml
  • application/json
The application/xhtml+xml comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.

If your input text is long, you may prefer using POST instead of GET.

    curl -i -X POST \
       -H "Accept:application/json" \
       -H "content-type:application/x-www-form-urlencoded" \
       -d "disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package" \
       http://spotlight.dbpedia.org/dev/rest/annotate/

Please not that you must use content-type application/x-www-form-urlencoded for POST requests.