This release introduces the following:
- Upgrade Scala to version 2.10 and some deps clean up.
- Fixes a bug related to duplicated TokenType Ids thanks to @Lugrin
This release prepares the groundwork for the new Vector Models developed at GSOC 2015
Main improvements of this version
smaller and much faster models through quantization of counts, optimization of search and some pruning (see memory usage here)
better handling of case
various fixes in Spotlight and PigNLProc
models can now be created without requiring a Hadoop and Pig installation:
git clone https://github.com/dbpedia-spotlight/model-quickstarter cd model-quickstarter ./index_db.sh -l wdir nl_NL nl/stopwords.list Dutch models/nl
support for confidence value
This version breaks model compatibility with the previous version, so new models are available here.
Raw model data
In addition to those, we also re-ran the count collection for most languages with DBpedia 3.9 and are making those raw counts available here.
- Spotlight Model Editor by Idio