hadoop-papyrus¶ ↑

Enable to run Ruby DSL script on your Hadoop.

Description¶ ↑

You can write DSL by Ruby to run Hadoop as Mapper / Reducer. This gem depends on ‘jruby-on-hadoop’ project.

Install¶ ↑

Required gems are all on GemCutter.

Upgrade your rubygem to 1.3.5
Install gems

$ gem install hadoop-papyrus

Usage¶ ↑

Run Hadoop cluster on your machines and put your ‘hadoop’ executable to your PATH or set HADOOP_HOME env variable.
put files into your hdfs. ex) wc/inputs/file1
Now you can run ‘papyrus’ like below:

$ papyrus examples/word_count_test.rb

You can get Hadoop job results in your hdfs wc/outputs/part-*

Examples¶ ↑

Word Count DSL script

dsl 'WordCount'

from 'wc/inputs'
to 'wc/outputs'

count_uniq
total :bytes, :words, :lines

Log Analysis DSL script

dsl 'LogAnalysis'

data 'apache log on test2' do
  from 'apachelog/inputs'
  to 'apachelog/outputs'

  each_line do
    pattern /(.*) (.*) (.*) \[(.*)\] (".*") (\d*) (\d*) (.*) "(.*)"/
    column_name 'remote_host', 'pass', 'user', 'access_date', 'request', 'status', 'bytes', 'pass', 'ua'

    topic 'ua counts', :label => 'ua' do
      count_uniq column[:ua]
    end
  end
end

Run spec¶ ↑

Set HADOOP_HOME on your env and run ‘jruby -S rake spec’

Author¶ ↑

Koichi Fujikawa <fujibee@gmail.com>

Copyright¶ ↑

License: Apache License

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
bin		bin
conf		conf
contrib/hudson		contrib/hudson
examples		examples
lib		lib
spec		spec
.gitignore		.gitignore
README.rdoc		README.rdoc
Rakefile		Rakefile
VERSION		VERSION
hadoop-papyrus.gemspec		hadoop-papyrus.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

conf

conf

contrib/hudson

contrib/hudson

examples

examples

lib

lib

spec

spec

.gitignore

.gitignore

README.rdoc

README.rdoc

Rakefile

Rakefile

VERSION

VERSION

hadoop-papyrus.gemspec

hadoop-papyrus.gemspec

Repository files navigation

hadoop-papyrus¶ ↑

Description¶ ↑

Install¶ ↑

Usage¶ ↑

Examples¶ ↑

Run spec¶ ↑

Author¶ ↑

Copyright¶ ↑

About

Releases

Packages

Languages

fujibee/hadoop-papyrus

Folders and files

Latest commit

History

Repository files navigation

hadoop-papyrus¶ ↑

Description¶ ↑

Install¶ ↑

Usage¶ ↑

Examples¶ ↑

Run spec¶ ↑

Author¶ ↑

Copyright¶ ↑

About

Resources

Stars

Watchers

Forks

Languages