RDFgrid is a simple framework for map/reduce-based batch-processing of RDF data with Hadoop and Amazon Elastic MapReduce.
- Processes RDF data in the line-oriented, whitespace-separated N-Triples format.
- Provides RDF statement manipulation using RDF.rb's object model; no manual parsing or serialization involved.
- Provides built-in aggregate combiners/reducers for the common
sum
,min
,max
, andavg
operations. - Compatible with Hadoop Streaming and Amazon's Elastic MapReduce service.
- Available as a prepackaged archive with all dependencies included, simplifying deployments to Hadoop's distributed cache.
#!/usr/bin/ruby -Ilib
require 'rdfgrid'
class PredicateCounter < RDFgrid::Mapper::StatementMapper
def process(statement)
yield statement.predicate, 1
end
end
PredicateCounter.process!
#!/usr/bin/ruby -Ilib
require 'rdfgrid'
class PredicateSummer < RDFgrid::Reducer
def process(values)
yield values.inject(0) { |sum, value| sum + value.to_i }
end
end
PredicateSummer.process!
$ cat data.nt | ruby mapper.rb | sort | ruby reducer.rb
- {RDFgrid::Mapper}
- {RDFgrid::Reducer}
- RDF.rb (>= 0.1.2)
The recommended installation method is via RubyGems. To install the latest official release, do:
% [sudo] gem install rdfgrid
To get a local working copy of the development repository, do:
% git clone git://github.com/datagraph/rdfgrid.git
Alternatively, you can download the latest development version as a tarball as follows:
% wget http://github.com/datagraph/rdfgrid/tarball/master
- http://rdfgrid.rubyforge.org/
- http://github.com/datagraph/rdfgrid
- http://rubygems.org/gems/rdfgrid
- http://rubyforge.org/projects/rdfgrid/
- http://raa.ruby-lang.org/project/rdfgrid/
- http://www.ohloh.net/p/rdfgrid
RDFgrid is a Datagraph technology.
RDFgrid is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.