Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

Build from Source with Maven

Jagadeesha edited this page Sep 19, 2017 · 31 revisions

Requirements

  • Java 1.7+ - see this if you have compile/build issues
  • Scala 2.10+
  • Maven (see note on versions below)
  • Git
  • RAM of appropriate size for the spotter lexicon you need

Table of Contents

Running DBpedia Spotlight Server with Maven

Install pre-requisites:

NOTE: The latest version (0.6.5 and newer) builds only with Maven3.

  sudo apt-get install git maven3

If you also want to run the demo in your server, install Apache

  sudo apt-get install apache2

Checkout all code using the command:

  git clone https://github.com/dbpedia-spotlight/dbpedia-spotlight.git

Run install through Maven

  cd dbpedia-spotlight-*
  mvn install

This mvn install from the parent pom.xml is important because it runs install-file for some jars distributed alongside the source code.

After installing the software, in order to run a Web service in your machine, also need the disambiguation index and the spotter lexicon, change the conf/server.properties file to point to those files, and run mvn scala:run '-DaddArgs=../conf/server.properties' from the rest directory. Get the necessary files. See http://spotlight.dbpedia.org/download/ Depending on the files you choose (small, medium, large) you will need different RAM requirements. With the largest dictionary, you will need close to 16GB of RAM. This parameter can be configured within pom.xml inside the rest directory.

  mvn scala:run '-DaddArgs=../conf/server.properties'

Generating DBpedia Spotlight Data with Maven

Get DBpedia Extraction

  hg clone http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework
  cd extraction_framework
  mvn install

Edit the file dbpedia-spotlight-latest/pom.xml and leave only the modules core, index, rest and demo.

Run install through Maven

   cd dbpedia-spotlight-*
   mvn install

Follow instructions in dbpedia-spotlight-*/bin/index.sh. See also at the Data Generation Manual and Internationalization Manual to learn more about the steps to create your own datasets for DBpedia Spotlight.

FAQ

Some frequently observed errors are collected below.

Whatever build error you get, check maven version

if you experience problems with missing dependencies while doing mvn install on the project, you might want to check your installed version of Maven.

Cannot find (maven) model file

Error:

  org.apache.maven.reactor.MavenExecutionException: Could not find the model file '/usr/local/spotlight/trunk/jung'. for project unknown

Solution: The only required modules for running the web service are: core, rest and demo (if you want the HTML interface as well). If you do not need to index, you can remove every other module from the parent pom.xml The only required modules for running indexing are: core and index. You can remove the other modules from the parent pom.xml

Memory error

Error:

  Memory error, heap space

You may need to update your pom.xml with adequate heap space for the dictionary file you are using.

  <properties>
    <heapspace.Xmx.server>-Xmx16g</heapspace.xmx.server>
  </properties>
How much memory?

The memory requirements are directly tied to your target lexicon, as our most rudimentary implementation loads the entire lexicon into memory in order to speed up spotting.

You can build a dictionary of People, Locations and Organizations with about 200M of RAM. See the one that I included in the distribution, for example. http://dbp-spotlight.svn.sourceforge.net/viewvc/dbp-spotlight/tags/release-0.5/dist/src/deb/control/data/usr/share/dbpedia-spotlight/spotter.dict?view=log

You can also download the dictionary built from URIs that occurred more than 75 times in Wikipedia: http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz

This should load with a lot less (maybe 5x) less RAM than the one we use in production. And it will spot the most important things anyways.

See: http://sourceforge.net/mailarchive/message.php?msg_id=28255247

Could not resolve dependencies

For some dependencies that either did not have a maven repo or that we had to patch, we distribute the jars alongside our code, and install them via install-file in the parent pom.xml. Make sure you run *mvn install* from the parent directory (e.g. /home/user/workspace/dbpedia-spotlight-*/)

Error:

  (Failed to execute goal on project core: Could not resolve dependencies for project org.dbpedia.spotlight:core:jar:0.5)  dependencies are missing for:
  org.semanticweb.yars:nx-parser:jar:1.1
  com.aliasi:lingpipe:jar:4.0.0
  edu.umd:cloud9:jar:SNAPSHOT
  weka:weka:jar:3.7.3

Solution:

  cd /home/user/workspace/dbpedia-spotlight-*/
  mvn install

In case you are using Maven3 and still could not solve the problem, refer to http://sourceforge.net/mailarchive/forum.php?thread_name=CA%2B3KvkOfTzMsdwUutx625WZK6VOJApADyKatmwQo2Gv49AbmqQ%40mail.gmail.com&forum_name=dbp-spotlight-users

Problems with installation of rest/ jersey dependencies

problems with jersey dependencies: " This is due to the glassfish repository, which is hardcoded in the jerser-server-1.1.5.pom, returning a junk artifact (some HTML with a nginx message instead of a real pom).

You can work around this by adding this to the "mirrors" section of your $HOME/.m2/settings.xml:

  <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                      http://maven.apache.org/xsd/settings-1.0.0.xsd">
    ...
    <mirrors>
      <mirror>
        <id>glassfish-mirror</id>
        <name>glassfish mirror</name>
        <url>http://maven.nuxeo.org/nexus/content/repositories/public-releases</url>
        <mirrorOf>glassfish-repository</mirrorof>
      </mirror>
    </mirrors>
    ...
  </settings>

and removing all "com.sun.jersey" artifacts from your local repository (rm -rf ~/.m2/repository/com/sun/jersey) " (http://answers.nuxeo.com/questions/2195/cant-build-nuxeo-source-nuxeo-webengine-jax-rs-jersey-server-error)

Cannot find parent

If this problem occurs when installing dbpedia spotlight, try running (in root folder of the project):

 1) mvn --non-recursive clean install
 2) mvn clean install
Cannot find spotter file

Server needs dictionary and other data files.You need to download all this file and edit paths in server.properties to point to these files. If you don't know what files are needed,you can simplely download the quick start jar version and look into its data folder and compare to its server.properties.you may also need stopwords file like this.

Clone this wiki locally