Home

tep edited this page Jun 19, 2013 · 20 revisions
Clone this wiki locally

##Welcome to Glimmer

Glimmer is a Hadoop based distributed indexing system for building MG4J indexes from RDF tuples in NQuad format. It also includes a simple web application for querying the resulting indexes. Both the indexing and web application are written in Java. There are also a few shell scripts to execute the steps need to build the indexes for a given NQuads file and query the resulting index from the command line.

Glimmer is an academic project and is derived from the implementation of distributed indexing detailed in the paper 'Distributed Indexing for Semantic Search' by Peter Mika(Yahoo Research).

##Prerequisites

  • Java JDK 6. Other versions may work. The code was written against version 1.6.0_31 Note that MG4J 5.1 uses memory mapped local file access on the local machine and a bit in Hadoop. To build indexes successfully, use a version of Java that has a 64bit data model You will possibly get Map failed errors otherwise. Especially when merging lots of indexes.
  • A Maven installation. Version 3.0.3 was used during development.
  • A Hadoop cluster. Probably version 0.23.x. The version of Hadoop we are currently developing the code against is defined in the Maven pom.xml file.
  • If you want try out the web app, an install of a Java servlet container such as Tomcat, Jetty etc..
  • If you want a more usable interface to the MG4J query command-line you can install rlwrap to get command-line history and editing.

All other dependencies are jars that are automatically downloaded by Maven.


Please see the following sections for more information