Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

Lucene Web Service Parameters

sandroacoelho edited this page Jul 26, 2013 · 3 revisions

Table of Contents

Web Service Parameters

All parameters accepted by DBpedia Spotlight are describing in the WADL (Web Application Description Language) file.

Following there are a description of each parameter used by the web service. Many of them are used in the filter and are described in that section.

text or url (Required): text or url to be annotated.

Filters

It is possible to remove unwanted annotations through the endpoints using filters parameters. These filters are available in /candidates and /annotate (applied after disambiguation step) endpoints.

Coreference resolution (CoreferenceResolution)

It is a heuristic that seeks coreference in all text and infer the surface form. When is true, no other filter will be applied.

Available in: /candidates, /annotate

Parameter name: coreferenceResolution

Parameter type: boolean

Default value: true

Confidence (ConfidenceFilter)

Selects all entities that have a percentageOfSecondRank greater than the square of value informed.

Available in: /candidates, /annotate

Parameter name: confidence

Parameter type: number(double)

Default value: 0.1

Support (SupportFilter)

Selects all entities that have a support greater than informed.

Available in: /candidates, /annotate

Parameter name: support

Parameter type: number(integer)

Default value: 10

Types (TypeFilter)

Combined with policy parameter, select all entities that have the same type - if policy is whitelist. Otherwise - if policy is blacklist - select all entities that have not the same type.

Usage:

types=DBpedia:PopulatedPlaces,DBpedia:Thing

Available in: /candidates, /annotate

Parameter name: types

Parameter type: string

Sparql (SparqlFilter)

Combined with policy parameter, select all entities that match with the query result - if policy is whitelist. Otherwise - if policy is blacklist - select all entities that no match with the query result.

Available in: /candidates, /annotate

Parameter name: sparql

Parameter type: string

Spotter

Spotters are algorithms that select all candidates for possible annotations. There are two kind of implementations. In the language-independent implementation, the candidates are generated by traversing a finite state automaton encoding all possible sequences of tokens that form known spot candidates.

In the language-dependent implementation, candidates are generated using three methods: 1. identifying all sequences of capitalized tokens, 2. identifying all noun phrases, prepositional phrases and multi word units, 3. identifying all named entities. Methods 2 and 3 are performed using Apache OpenNLP6 models for phrase chunking and Named Entity Recognition.

Available in: /candidates, /annotate, /spot

Parameter name: sparql

Parameter type: string

Default value: Default

Possible values:LingPipeSpotter, WikiMarkupSpotter, AtLeastOneNounSelector, CoOccurrenceBasedSelector, NESpotter, OpenNLPNGramSpotter, OpenNLPChunkerSpotter,KeaSpotter

You can change the strategy using for spotting by changing the value of the &spotter= parameter passed to our web service. Supported spotters:

  • Default, picks the first spotter informed in the configuration file
  • LingPipeSpotter, uses a dictionary of known names to spot
  • AtLeastOneNounSelector, uses the dictionary and removes spots that do not contain a noun
  • CoOccurrenceBasedSelector, uses the dictionary and removes spots that "look like" a non-entity (as trained by feature co-occurrence statistics)
  • NESpotter, uses the OpenNLP default models for Named Entity Recognition (NER)
  • KeyphraseSpotter, uses the Kea default models for Keyphrase Extraction
  • WikiMarkupSpotter, assumes that another tool has performed spotting and encoded the spots as WikiMarkup
  • SpotXmlParser, assumes that another tool has performed spotting and encoded the spots as SpotXml.
See: org.dbpedia.spotlight.model.SpotterConfiguration.SpotterPolicy

Disambiguator

Available in: /candidates, /annotate, /spot

Parameter name: sparql

Parameter type: string

Default value: Default

Possible values: Document, Occurrences, CuttingEdge, Default

Clone this wiki locally