Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
[UNMAINTAINED] RDFgrid is a framework for batch-processing RDF data with Hadoop and Amazon Elastic MapReduce.
Ruby
Tag: 0.1.0

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
doc
lib
spec
.gemspec
.gitignore
.yardopts
AUTHORS
README
README.md
Rakefile
UNLICENSE
VERSION

README.md

RDFgrid: Map/Reduce-based Linked Data Processing with Hadoop

RDFgrid is a simple framework for map/reduce-based batch-processing of RDF data with Hadoop and Amazon Elastic MapReduce.

Features

  • Processes RDF data in the line-oriented, whitespace-separated N-Triples format.
  • Provides RDF statement manipulation using RDF.rb's object model; no manual parsing or serialization involved.
  • Provides built-in aggregate combiners/reducers for the common sum, min, max, and avg operations.
  • Compatible with Hadoop Streaming and Amazon's Elastic MapReduce service.
  • Available as a prepackaged archive with all dependencies included, simplifying deployments using Hadoop's distributed cache.

Examples

A mapper for counting RDF predicate usage (doc/examples/mapper.rb)

#!/usr/bin/ruby -Ilib
require 'rdfgrid'

class PredicateCounter < RDFgrid::Mapper::StatementMapper
  def process(statement)
    yield statement.predicate, 1
  end
end

PredicateCounter.process!

A reducer for summing up RDF predicate usage (doc/examples/reducer.rb)

#!/usr/bin/ruby -Ilib
require 'rdfgrid'

class PredicateSummer < RDFgrid::Reducer
  def process(values)
    yield values.inject(0) { |sum, value| sum + value.to_i }
  end
end

PredicateSummer.process!

Running the mapper and reducer pipeline with a local N-Triples dataset

$ cat data.nt | ruby mapper.rb | sort | ruby reducer.rb

Documentation

  • {RDFgrid::Mapper}
  • {RDFgrid::Reducer}

Dependencies

Installation

The recommended installation method is via RubyGems. To install the latest official release, do:

% [sudo] gem install rdfgrid

Download

To get a local working copy of the development repository, do:

% git clone git://github.com/datagraph/rdfgrid.git

Alternatively, you can download the latest development version as a tarball as follows:

% wget http://github.com/datagraph/rdfgrid/tarball/master

Author

License

RDFgrid is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.

Something went wrong with that request. Please try again.