Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

User's manual

Sandro edited this page Feb 18, 2018 · 14 revisions

DBpedia Spotlight is a tool for annotating mentions of DBpedia concepts in plain text.

We offer three basic functions: Annotate, Disambiguate and Candidates (Best K). They can be accessed from a Scala/Java API, REST Web Service and from a user interface on the Web (HTML/Javascript). For the Scala/Java API, there are a number of configuration parameters that can be used to instruct the annotation and disambiguation functions.

Architecture

The DBpedia Spotlight architecture is composed of the following modules:

  • Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
  • Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
  • Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.

External dependencies:

  • DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.

System Requirements

  • Java 1.7+
  • Scala 2.10+
  • Spotlight JAR
  • large RAM to set the heap size big enough for the Spotter (approx. 8G)
  • Maven 3 for the automagic installation of dependencies.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

Programmatic usage

If you want to use DBpedia Spotlight in your Java/Scala code, take a look at core/SpotlightFactory to see how you can create your objects, and then look at rest/Candidates.java to see how you can wire them together.

Online Usage

Web Application

The Web Application is located at http://demo.dbpedia-spotlight.org.

Web Service

The Web Service is explained in detail at Web Service.

Content Negotiation

You can request different types of output by setting the Accept [request header](<http://en.wikipedia.org/wiki/List_of_HTTP_header_fields "request header"). For example, in order to request JSON output, you can add Accept:application/json to the request headers.

One example using cURL:

curl "http://api.dbpedia-spotlight.org/en/annotate?text=President%20Michelle%20Obama%20called%20Thursday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20"\
 -H "Accept:application/json" 

The content types we currently support are:

  • text/html
  • application/xhtml+xml
  • text/xml
  • application/json

The application/xhtml+xml comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.

If your input text is long, you may prefer using POST instead of GET.

curl -i -X POST \
    -H "Accept:application/json" \
    -H "content-type:application/x-www-form-urlencoded" \
    -d "disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package" \
       http://api.dbpedia-spotlight.org/en/annotate

Please note that you must use content-type application/x-www-form-urlencoded for POST requests.

The following are 4 examples, each consists of a query url and the result.

Example 1: without type restriction

http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20">
    <Resources>
      <Resource URI="http://dbpedia.org/resource/Barack_Obama"
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
        similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/United_States_Congress"
        support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36" 
        similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
      <Resource URI="http://dbpedia.org/resource/Tax_break"
        support="32" types="" surfaceForm="tax break" offset="57"
        similarityScore="0.35041093826293945" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/Student"
        support="1701" types="" surfaceForm="students" offset="71"
        similarityScore="0.32534149289131165" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/Policy"
        support="557" types="" surfaceForm="policy" offset="148"
        similarityScore="0.3228176236152649" percentageOfSecondRank="-1.0"/>
    </Resources>
  </Annotation>

Example 2: with type restriction

http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&types=Person,Organisation

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20" types="Person,Organisation">
    <Resources>
      <Resource URI="http://dbpedia.org/resource/Barack_Obama"
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0" 
        similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
      <Resource URI="http://dbpedia.org/resource/United_States_Congress"
        support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36" 
        similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
    </Resources>
  </Annotation>

Example 3: with SPARQL restriction

http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&sparql=SELECT+DISTINCT+%3Fx%0D%0AWHERE+%7B%0D%0A%3Fx+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FOfficeHolder%3E+.%0D%0A%3Fx+%3Frelated+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FChicago%3E+.%0D%0A%7D

returns the XML

  <Annotation text="President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing that the policy
  provides more generous assistance."
  confidence="0.2" support="20" 
  sparql="SELECT DISTINCT ?x WHERE { ?x a <http://dbpedia.org/ontology/OfficeHolder>; . 
  ?x ?related <http://dbpedia.org/resource/Chicago>;  }" 
  policy="whitelist"> 
    <Resources> 
      <Resource URI="http://dbpedia.org/resource/Barack_Obama" 
        support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0" 
        similarityScore="0.2730408310890198" percentageOfSecondRank="-1.0"/> 
    </Resources> 
  </Annotation> 

Example 4: Candidates Interface

The parameters are the same as in Example 1, but you will send your request to http://api.dbpedia-spotlight.org/en/candidates

http://api.dbpedia-spotlight.org/en/candidates?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20

returns XML

  <annotation text="President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials. ">
  <surfaceForm name="individuals" offset="67">
    <resource label="Individual" uri="Individual" contextualScore="0.26683980226516724" percentageOfSecondRank="-1.0" support="312" priorScore="0.0" finalScore="0.26683980226516724"/>
    <resource label="The Individuals (New Jersey band)" uri="The_Individuals_%28New_Jersey_band%29" contextualScore="0.011762913316488266" percentageOfSecondRank="-1.0" support="17" priorScore="0.0" finalScore="0.011762913316488266"/>
    <resource label="The Individuals (Chicago band)" uri="The_Individuals_%28Chicago_band%29" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
  </surfaceForm>
  <surfaceForm name="officials" offset="233">
    <resource label="Official" uri="Official" contextualScore="0.1324356347322464" percentageOfSecondRank="-1.0" support="196" priorScore="0.0" finalScore="0.1324356347322464"/>
    <resource label="Rugby league match officials" uri="Rugby_league_match_officials" contextualScore="0.04376954212784767" percentageOfSecondRank="-1.0" support="9" priorScore="0.0" finalScore="0.04376954212784767"/>
  </surfaceForm>
  <surfaceForm name="President Obama" offset="0">
    <resource label="Presidency of Barack Obama" uri="Presidency_of_Barack_Obama" contextualScore="0.5634340643882751" percentageOfSecondRank="-1.0" support="134" priorScore="0.0" finalScore="0.5634340643882751"/>
  </surfaceForm>
  <surfaceForm name="1 million" offset="97">
    <resource label="Million" uri="Million" contextualScore="0.527919590473175" percentageOfSecondRank="-1.0" support="492" priorScore="0.0" finalScore="0.527919590473175"/>
  </surfaceForm>
  <surfaceForm name="percentage" offset="156">
    <resource label="Percentage" uri="Percentage" contextualScore="0.6362485885620117" percentageOfSecondRank="-1.0" support="165" priorScore="0.0" finalScore="0.6362485885620117"/>
  </surfaceForm>
  <surfaceForm name="earnings" offset="176">
    <resource label="Income" uri="Income" contextualScore="0.5776156187057495" percentageOfSecondRank="-1.0" support="648" priorScore="0.0" finalScore="0.5776156187057495"/>
  </surfaceForm>
  <surfaceForm name="taxpayers" offset="194">
    <resource label="Tax" uri="Tax" contextualScore="0.7484055757522583" percentageOfSecondRank="-1.0" support="1540" priorScore="0.0" finalScore="0.7484055757522583"/>
    <resource label="TaxPayers&apos; Alliance" uri="TaxPayers%27_Alliance" contextualScore="0.12765906751155853" percentageOfSecondRank="-1.0" support="15" priorScore="0.0" finalScore="0.12765906751155853"/>
    <resource label="The Taxpayer (Luxembourg)" uri="The_Taxpayer_%28Luxembourg%29" contextualScore="0.024930020794272423" percentageOfSecondRank="-1.0" support="3" priorScore="0.0" finalScore="0.024930020794272423"/>
    <resource label="The Taxpayers" uri="The_Taxpayers" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
  </surfaceForm>
  </annotation>
Clone this wiki locally