Join GitHub today
Available APIs and approaches
The AIDA approach relies on coherence graph building and dense subgraph algorithms and is based on the YAGO2 knowledge base. Although the authors provide their source code, a webservice and their dataset which is a manually annotated subset of the 2003 CoNLL share task, GERBIL will not use the webservice since it is not stable enough for regular replication purposes.
This approach is a pure entity disambiguation approach (D2KB) based on string similarity measures, an expansion heuristic for labels to cope with co-referencing and the graph-based HITS algorithm. The authors published datasets21 along with their source code and an API. AGDISTIS can only be used for the D2KB task.
The core of this approach lies in the use of random walks and a densest subgraph algorithm to tackle the word sense disambiguation and entity linking tasks in a multilingual setting thanks to the BabelNet semantic network. Babelfy has been evaluated using six datasets: three from earlier SemEval tasks, one from a Senseval task and two already used for evaluating AIDA. All of them are available online but distributed throughout the web. Additionally, the authors offer a webservice limited to 100 requests per day which are extensible for research purposes.
One of the first semantic approaches was published in 2011, this framework combines NER and NED approach based upon DBpedia. Based on a vector-space representation of entities and using the cosine similarity, this approach has a public (NIF-based) webservice as well as its online available evaluation dataset.
This approach is an open-source implementation of an entity disambiguation framework. The system was implemented in order to simplify the implementation of an entity linking approach and allows to replace single parts of the process. The authors implemented several state-of-the-art disambiguation methods. Results in this paper are obtained using an implementation of the original TagMe disambiguation function. Moreover, Ceccarelli et al. provide the source code as well as a webservice.
In 2011, presented an NED approach for entities from Wikipedia. In this article, the authors compare local approaches, e.g., using string similarity, with global approaches, which use context information and lead finally to better results. The authors provide their datasets as well as their software "Illinois Wikifier"10 online. Since "Illinois Wikifier" is currently only available as local binary and GERBIL is solely based on webservices we excluded it from GERBIL for the sake of comparability and server load.
This approach is the successor of the approach introduced in 2013 by Steinmetz et al. which is based on a fine-granular context model taking into account heterogeneous text sources as well as text created by automated multimedia analysis. The source texts can have different levels of accuracy, completeness, granularity and reliability which influence the determination of the current context. Ambiguity is solved by selecting entity candidates with the highest level of probability according to the predetermined context. The new implementation begins with the detection of groups of consecutive words (n-gram analysis) and a lookup of all potential DBpedia candidate entities for each n-gram. The disambiguation of candidate entities is based on a scoring cascade. KEA is available as NIF-based web-service.
In 2013, Rizzo et al. proposed an approach for entity recognition tailored for extracting entities from tweets. The approach relies on a machine learning classification of the entity type given a rich feature vector composed of a set of linguistic features, the output of aproperly trained Conditional Random Fields classifier and the output of a set of off-the-shelf NER extractors supported by the NERD Framework. The follow-up, NERD-ML, improved the classification task by redesigning the selection of the features, and they proposed experiments on both microposts and newswire domains. NERD-ML has a public webservice which is part of GERBIL.
PBOH is a pure entity disambiguation approach (D2KB) based on light statistics from the English Wikipedia corpus. The authors develop a probabilistic graphical model using pairwise Markov Random Fields to address the problem of Entity Disambiguation. They show that pairwise co-occurrence statistics of words and entities are enough to obtain comparable or better performance than heavy feature engineered systems. They employ loopy belief propagation to perform inference at test time. PBOH can only be used for the D2KB task.
"Probabilistic Bag-of-Hyperlinks Model for Entity Linking", proc. WWW'16, http://dl.acm.org/citation.cfm?id=2882988
TagMe 2 was publised in 2012 and is based on a directory of links, pages and an inlink graph from Wikipedia. The approach recognizes named entities by matching terms with Wikipedia link texts and disambiguates the match using the in-link graph and the page dataset. Afterwards, TagMe 2 prunes identified named entities which are considered as non-coherent to the rest of the named entities in the input text. The authors publish a key-protected webservice as well as their datasets online. The source code, licensed under Apache 2 licence can be obtained directly from the authors. The datasets comprise only fragments of 30 words and less of full documents and will not be part of the current version of GERBIL.
WAT is the successor of TagMe. The new annotator includes a re-design of all TagMe components, namely, the spotter, the disambiguator, and the pruner. Two disambiguation families were newly introduced: graph-based algorithms for collective entity linking based and vote-based algorithms for local entity disambiguation (based on the work of Ferragina et al.). The spotter and the pruner can be tuned using SVM linear models. Additionally, the library can be used as a D2KB-only system by feeding appropriate mention spans to the system.
This approach was introduced in 2008 and is based on different facts like prior to probabilities, context relatedness and quality, which to are then combined and tuned using a classiffer. The to authors evaluated their approach based on a subset of the AQUAINT dataset. They provide the source code for their approach as well as a webservice which is available in GERBIL.
Note that the Wikipedia Miner webservice is not available anymore!