Skip to content
This repository has been archived by the owner on Apr 18, 2019. It is now read-only.
tep edited this page Jan 24, 2013 · 20 revisions

##Welcome to the Glimmer

Glimmer is a Hadoop based distributed indexing system for building MG4J indexes from RDF tuples in NQuad format. It also includes a simple web application for querying the resulting indexes (TODO +something about this+scorer..). Both the indexing and web application are written in Java. There are also a few shell scripts to execute the steps need to build the indexes for a given NQuads file and query the resulting index from the command line.

Glimmer is an academic project and is the implementation of distributed indexing detailed in the paper 'Distributed Indexing for Semantic Search' by Peter Mika(Yahoo Research).

##Prerequisites

  • Java JDK 6. Other versions may work. The code was written against version 1.6.0_31 Note that MG4J 5.1 uses memory mapped local file access on the local machine and a bit in Hadoop. For index building to work well, use a version of Java that has a 64bit data model You will possibly get Map failed errors otherwise. Especially when merging lots of indexes.
  • A Maven installation. Version 3.0.3 was used during development.
  • A Hadoop cluster. Probably version 0.23.x. The version of Hadoop we developed the code with is defined in the pom.xml file.
  • If you want try out the web app an install of a Java servlet container such as Tomcat, Jetty etc..
  • If you want a more usable interface to the MG4J query command-line you can install rlwrap to get command-line history and editing.

All other dependencies are jars that are automatically downloaded by Maven.

###Building Indices

###Querying Indices

###The Web App

###Future Improvements

Clone this wiki locally