Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

datagraph/rdfgrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDFgrid: Map/Reduce-based Linked Data Processing with Hadoop

RDFgrid is a simple framework for map/reduce-based batch-processing of RDF data with Hadoop and Amazon Elastic MapReduce.

Features

  • Processes RDF data in the line-oriented, whitespace-separated N-Triples format.
  • Provides RDF statement manipulation using RDF.rb's object model; no manual parsing or serialization involved.
  • Provides built-in aggregate combiners/reducers for the common sum, min, max, and avg operations.
  • Compatible with Hadoop Streaming and Amazon's Elastic MapReduce service.
  • Available as a prepackaged archive with all dependencies included, simplifying deployments to Hadoop's distributed cache.

Examples

A mapper for counting RDF predicate usage (doc/examples/mapper.rb)

#!/usr/bin/ruby -Ilib
require 'rdfgrid'

class PredicateCounter < RDFgrid::Mapper::StatementMapper
  def process(statement)
    yield statement.predicate, 1
  end
end

PredicateCounter.process!

A reducer for summing up RDF predicate usage (doc/examples/reducer.rb)

#!/usr/bin/ruby -Ilib
require 'rdfgrid'

class PredicateSummer < RDFgrid::Reducer
  def process(values)
    yield values.inject(0) { |sum, value| sum + value.to_i }
  end
end

PredicateSummer.process!

Running the mapper and reducer pipeline with a local N-Triples dataset

$ cat data.nt | ruby mapper.rb | sort | ruby reducer.rb

Documentation

  • {RDFgrid::Mapper}
  • {RDFgrid::Reducer}

Dependencies

Installation

The recommended installation method is via RubyGems. To install the latest official release, do:

% [sudo] gem install rdfgrid

Download

To get a local working copy of the development repository, do:

% git clone git://github.com/datagraph/rdfgrid.git

Alternatively, you can download the latest development version as a tarball as follows:

% wget http://github.com/datagraph/rdfgrid/tarball/master

Mailing List

Resources

Authors

RDFgrid is a Datagraph technology.

License

RDFgrid is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.

About

[Unmaintained] RDFgrid is a framework for batch-processing RDF data with Hadoop and Amazon Elastic MapReduce.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages