This repository has been archived by the owner on Oct 20, 2018. It is now read-only.
Lucene Architecture
Sandro edited this page Mar 11, 2017
·
2 revisions
The DBpedia Spotlight Architecture is composed by the following modules:
- Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
- Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
- Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
- Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.
External dependencies:
- DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
- Lucene 3.6, providing the low level indexing framework used by DBpedia Spotlight.
- LingPipe 4.0.0, providing the string matching implementation used for the Spotter module.
- Java 1.7+
- Scala 2.9+
- Spotlight JAR
- Spotlight Library JARs
- Lucene disambiguation index
- Spotter dictionary
- large RAM to set the heap size big enough for the Spotter (approx. 8G)
- Maven 3 for the automatic installation of dependencies.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
DBpedia Spotlight - Shedding Light on the Web of Documents
Project
Model backend
Developers
Google Summer of Code - GSoC