Skip to content

Create Mapping Statistics

Daniel Fleischhacker edited this page May 23, 2014 · 5 revisions

Create mapping statistics

This guide describes how to generate the mappings statistics as displayed as shown at http://mappings.dbpedia.org/server/statistics/

Process

  1. Update extraction framework to newest version from GitHub
  2. Make sure newest version of all modules are compiled and installed locally (install-run)
  3. Download most recent ontology from mapping wiki
    1. cd core
    2. ../run download-ontology
    3. Commit new ontology version
  4. Download most recent mappings from mapping wiki
    1. cd ../core
    2. ../run download-mappings
    3. Commit new mappings
  5. Download most current Wikipedia dumps
    1. cd ../dump
    2. Choose one of the download.*.properties based on set of relevant Wikipedia language versions
    3. Adapt download path in property file
    4. ../run download config=download.*.properties
  6. Start extraction limited to data required for mapping statistics
    1. cd ../dump
    2. adapt "base-dir" extraction.stats.properties to download directory
    3. adapt "source" parameter, default is NOT .xml.bz2 but .xml though stated differently!!!!
    4. ../run stats-extraction extraction.stats.properties
  7. Start statistics extraction
    1. cd ../server
    2. Adapt base dir in pom.xml for launcher "stats" to download directory used in previous step
    3. ../run stats
  8. In case you want to run the statistics server on a different system than the one you created the statistics on, copy the mappingstats_* files from folder server/main/src/statistics/ on the generation server to the same folder on the hosting server
  9. Start statistics server
    1. cd ../server
    2. Adapt server URI in pom.xml for launcher "server"
    3. If you want to prefer IPv4 on a machine which also supports IPv6 define environment variable _JAVA_OPTIONS first: export _JAVA_OPTIONS='-Djava.net.preferIPv4Stack=true'
    4. ../run server
    5. The server is now available at the defined URI