User's manual
DBpedia Spotlight is a tool for annotating mentions of DBpedia concepts in plain text.
We offer three basic functions: Annotate, Disambiguate and Candidates (Best K). They can be accessed from a Scala/Java API, REST Web Service and from a user interface on the Web (HTML/Javascript). For the Scala/Java API, there are a number of configuration parameters that can be used to instruct the annotation and disambiguation functions.
The DBpedia Spotlight architecture is composed of the following modules:
- Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
- Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
- Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
- Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.
External dependencies:
- DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
- Java 1.7+
- Scala 2.10+
- Spotlight JAR
- large RAM to set the heap size big enough for the Spotter (approx. 8G)
- Maven 3 for the automagic installation of dependencies.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
If you want to use DBpedia Spotlight in your Java/Scala code, take a look at core/SpotlightFactory to see how you can create your objects, and then look at rest/Candidates.java to see how you can wire them together.
The Web Application is located at http://demo.dbpedia-spotlight.org.
The Web Service is explained in detail at Web Service.
You can request different types of output by setting the Accept
[request header](<http://en.wikipedia.org/wiki/List_of_HTTP_header_fields "request header").
For example, in order to request JSON output, you can add Accept:application/json
to the request headers.
One example using cURL:
curl "http://api.dbpedia-spotlight.org/en/annotate?text=President%20Michelle%20Obama%20called%20Thursday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20"\ -H "Accept:application/json"
The content types we currently support are:
text/html
application/xhtml+xml
text/xml
application/json
The application/xhtml+xml
comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.
If your input text is long, you may prefer using POST instead of GET.
curl -i -X POST \ -H "Accept:application/json" \ -H "content-type:application/x-www-form-urlencoded" \ -d "disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package" \ http://api.dbpedia-spotlight.org/en/annotate
Please note that you must use content-type application/x-www-form-urlencoded
for POST requests.
The following are 4 examples, each consists of a query url and the result.
http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/United_States_Congress"
support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
<Resource URI="http://dbpedia.org/resource/Tax_break"
support="32" types="" surfaceForm="tax break" offset="57"
similarityScore="0.35041093826293945" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/Student"
support="1701" types="" surfaceForm="students" offset="71"
similarityScore="0.32534149289131165" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/Policy"
support="557" types="" surfaceForm="policy" offset="148"
similarityScore="0.3228176236152649" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>
http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&types=Person,Organisation
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20" types="Person,Organisation">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/United_States_Congress"
support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
</Resources>
</Annotation>
http://api.dbpedia-spotlight.org/en/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&sparql=SELECT+DISTINCT+%3Fx%0D%0AWHERE+%7B%0D%0A%3Fx+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FOfficeHolder%3E+.%0D%0A%3Fx+%3Frelated+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FChicago%3E+.%0D%0A%7D
returns the XML
<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20"
sparql="SELECT DISTINCT ?x WHERE { ?x a <http://dbpedia.org/ontology/OfficeHolder>; .
?x ?related <http://dbpedia.org/resource/Chicago>; }"
policy="whitelist">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.2730408310890198" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>
The parameters are the same as in Example 1, but you will send your request to http://api.dbpedia-spotlight.org/en/candidates
http://api.dbpedia-spotlight.org/en/candidates?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20
returns XML
<annotation text="President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials. ">
<surfaceForm name="individuals" offset="67">
<resource label="Individual" uri="Individual" contextualScore="0.26683980226516724" percentageOfSecondRank="-1.0" support="312" priorScore="0.0" finalScore="0.26683980226516724"/>
<resource label="The Individuals (New Jersey band)" uri="The_Individuals_%28New_Jersey_band%29" contextualScore="0.011762913316488266" percentageOfSecondRank="-1.0" support="17" priorScore="0.0" finalScore="0.011762913316488266"/>
<resource label="The Individuals (Chicago band)" uri="The_Individuals_%28Chicago_band%29" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
</surfaceForm>
<surfaceForm name="officials" offset="233">
<resource label="Official" uri="Official" contextualScore="0.1324356347322464" percentageOfSecondRank="-1.0" support="196" priorScore="0.0" finalScore="0.1324356347322464"/>
<resource label="Rugby league match officials" uri="Rugby_league_match_officials" contextualScore="0.04376954212784767" percentageOfSecondRank="-1.0" support="9" priorScore="0.0" finalScore="0.04376954212784767"/>
</surfaceForm>
<surfaceForm name="President Obama" offset="0">
<resource label="Presidency of Barack Obama" uri="Presidency_of_Barack_Obama" contextualScore="0.5634340643882751" percentageOfSecondRank="-1.0" support="134" priorScore="0.0" finalScore="0.5634340643882751"/>
</surfaceForm>
<surfaceForm name="1 million" offset="97">
<resource label="Million" uri="Million" contextualScore="0.527919590473175" percentageOfSecondRank="-1.0" support="492" priorScore="0.0" finalScore="0.527919590473175"/>
</surfaceForm>
<surfaceForm name="percentage" offset="156">
<resource label="Percentage" uri="Percentage" contextualScore="0.6362485885620117" percentageOfSecondRank="-1.0" support="165" priorScore="0.0" finalScore="0.6362485885620117"/>
</surfaceForm>
<surfaceForm name="earnings" offset="176">
<resource label="Income" uri="Income" contextualScore="0.5776156187057495" percentageOfSecondRank="-1.0" support="648" priorScore="0.0" finalScore="0.5776156187057495"/>
</surfaceForm>
<surfaceForm name="taxpayers" offset="194">
<resource label="Tax" uri="Tax" contextualScore="0.7484055757522583" percentageOfSecondRank="-1.0" support="1540" priorScore="0.0" finalScore="0.7484055757522583"/>
<resource label="TaxPayers' Alliance" uri="TaxPayers%27_Alliance" contextualScore="0.12765906751155853" percentageOfSecondRank="-1.0" support="15" priorScore="0.0" finalScore="0.12765906751155853"/>
<resource label="The Taxpayer (Luxembourg)" uri="The_Taxpayer_%28Luxembourg%29" contextualScore="0.024930020794272423" percentageOfSecondRank="-1.0" support="3" priorScore="0.0" finalScore="0.024930020794272423"/>
<resource label="The Taxpayers" uri="The_Taxpayers" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
</surfaceForm>
</annotation>
DBpedia Spotlight - Shedding Light on the Web of Documents
Project
Model backend
Developers
Google Summer of Code - GSoC