FAQ

Jo Daiber edited this page Jan 19, 2016 · 5 revisions
Clone this wiki locally

FAQ

What is DBpedia?

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.

How large is DBPedia's knowlegde base?

The English version of the DBpedia knowledge base describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology, including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases. In addition, there are localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia.

What is DBpedia Spotlight?

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.

How to install DBpedia Spotlight?

Installation Steps

Why to move to the Statistical Version?

The statistical version :

  • it's faster and simpler
  • performs well on a lot of test sets. ( Lucene still seems to be better on specific datasets)
  • There are models for other languages ( Dutch, German, Spanish...)
  • Uses much less RAM

What is the current version of DBPedia Spotlight?

The current version of DBpedia Spotlight is currently V0.7 updated in July 2014 from V0.6 .
Please check the Release Notes for more info.

Spotlight 0.6 vs Spotlight 0.7

There are two statistical versions (0.6 & 0.7).

0.6

  • It's the first version to introduce the statistical models
  • It uses less memory than the Lucene version.

Just as an example, a production server using the big english model, while also handling a few hundreds of transactions requieres at least a max heap of: 15360M. The Memory footprint depends on the language model being used.

If you are using this version, you will need the following files:

0.7

  • Introduced some changes in the way the counts are stored in order to save a significant amount of memory
  • It uses a FSA to match surface forms
  • It handles lowercases and variations of surface Forms better by introducing a new store, and allowing non exact matches
  • It requires Java 8

Just as an example, a production server using the big english model, while also handling a few hundreds >of transactions requires a heap between 7G and 9G. The Memory footprint depends on the language model >being used.

If you are using this version, you will need the following files:

How can I get json/xml/html from the annotate endpoint ?

There are four content types supported for type of output. *text/html *application/xhtml+xml *text/xml *application/json Please check this link for more examples

What are the differences in the models available? what does "en_2+2" stands for?

The numbers in the model names stand for cut-offs in terms of counts, so 2+3 stands for: minimum count of a surface form=2, minimum count of a context word=3. Higher minimum counts mean that more sf-entity and sf-context word combinations are left out of the model.

What are the projects using Spotlight?

Projects, Citations, Videos and Ideas

What citations to use while using Spotlight in research work?

Citations