Skip to content
This repository has been archived by the owner on Apr 18, 2019. It is now read-only.
Tim P. edited this page Jun 19, 2013 · 5 revisions

#To Do... ###A short list of issues and improvement/feature suggestions in no particular order...

  • Simplify horizontal & vertical indexing code. The current use of RDFDocument objects to "present" the subjects for building horizontal and Vertical indexes is a vestige of the way indexes are build using MG4J. It is unnecessarily complex. The reading of subjects into the mapper is now independent of MG4J or the index writing process that goes on in the reducer. Have distinct mappers for horizontal and vertical index generation would be simpler to maintain.

  • Add some kind of simple 'join' functionality. Currently to find say 'All articles by a given author' requires the user to enter two queries. First to find the author's URL/BNode then to query all subjects with that auther. author:<URL|BNode> There is some experimental code for MG4J that may permit this. Although it's not in the current MG4J release. Alternately adding the logic to the web app may be an option, although quite complex.

  • Handling of 'range' types. Everything other than URLs and BNodes are currently tokenised as regular text. This makes some things difficult or impossible to query. Ideally a user would be able to query by date range, or greater than some numerical value.

  • Handling of dates. Dates are tokenised as regular text. This makes MG4J parallel queries complex. eg. (predicate:http://schema.org/startdate ^ object:2011) & (predicate:http://schema.org/startdate ^ object:07) to find all subject with a startDate containing '2011' and '07'.

Clone this wiki locally