sqs / rhino

Ruby ORM for HBase

This URL has Read+Write access

rhino / README
100644 180 lines (113 sloc) 5.966 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
= Rhino - a Ruby ORM for HBase
 
Rhino is a Ruby object-relational mapping (ORM) for HBase[http://www.hbase.org].
 
=== Support & contact
 
Author: Quinn Slack qslack@qslack.com[mailto:qslack@qslack.com]
 
Contributors: Dru Jensen
 
== Getting started
 
=== Download Rhino
 
  git clone git://github.com/sqs/rhino.git
 
=== Installing HBase and Thrift
 
Since Rhino uses the HBase Thrift API, you must first install both HBase and Thrift. Downloading the latest trunk revisions of each is recommended, but if you encounter problems, try using the latest stable release instead. Here are the basic steps for installing both.
 
==== Installing HBase
  
  svn co http://svn.apache.org/repos/asf/hadoop/hbase/trunk hbase-core-trunk
  cd hbase-core-trunk
  ant
  
{More installation instructions}[http://wiki.apache.org/hadoop/Hbase/10Minutes] are available on the HBase Wiki.
 
==== Installing Thrift
 
Thrift[http://developers.facebook.com/thrift/] also requires the Boost C++ libraries; you'll have to get those on your own if your system does not have them.
 
  svn co http://svn.facebook.com/svnroot/thrift/trunk/ thrift-trunk
  cd thrift-trunk
  ./bootstrap.sh
  ./configure && make && sudo make install
  cd lib/rb
  sudo ruby setup.rb
 
==== Starting the HBase and Thrift servers
 
Once you have installed HBase and Thrift, start HBase, then start the Thrift server. From the root HBase directory, run these commands:
 
  bin/start-hbase.sh
  bin/hbase thrift start
  
Both servers need to be running to use Rhino. Occasionally, the Thrift server will be unable to connect to HBase. In that case, stop the Thrift server (ctrl-C), stop the HBase server (<tt>bin/stop-hbase.sh</tt>), and then rerun the above commands to restart both. To verify that HBase is running, try running the HBase shell (<tt>bin/hbase shell</tt>).
 
==== Loading Rhino
 
Since Rhino is not yet packaged as a gem, you will have to <tt>require 'PATH_TO_RHINO/lib/rhino.rb'</tt> in your scripts.
  
== Usage
 
=== Connect to HBase
 
The following code points Rhino to the Thrift server you just started (which by default listens on localhost:9090).
  
  Rhino::Model.connect('localhost', 9090)
  
=== Describe your table
 
A class definition like:
 
  class Page < Rhino::Model
    include Rhino::Constraints
 
    column_family :title
    column_family :contents
    column_family :links
    column_family :meta
    column_family :images
    
    alias_attribute :author, 'meta:author'
 
    has_many :links, Link
    has_many :images, Image
 
    constraint(:title_required) { |page| page.title and !page.title.empty? }
  end
 
...is mapped to the following HBase table (described in {HBase Query Language}[http://wiki.apache.org/lucene-hadoop/HBase/HBaseShell]
 
  CREATE TABLE pages(title:, contents:, links:, meta:, images:);
  
Or, in version 0.2's JRuby shell language:
  
  create 'pages', 'title', 'contents', 'links', 'meta', 'images'
 
=== Basic operations
 
==== Getting records
 
  page = Page.get('some-page')
  all_pages = Page.get_all()
  
==== Creating new records
 
  # data can be specified in the second argument of Page.new...
  page = Page.new('the-row-key', {:title=>"my title"})
  # ...or as attributes on the model
  page.contents = "<p>welcome</p>"
  page.save
 
==== Reading and updating attributes
 
  page = Page.get('some-key')
  puts "the old title is: #{page.title}"
  page.title = "another title"
  page.save
  puts "the new title is: #{page.title}"
 
You can also read from and write to specific columns in a column family.
Since we already defined the <tt>meta:</tt> column family, Rhino knows we want to set the <tt>meta:author</tt> column:
 
  page = Page.get('some-key')
  page.meta_author = "John Doe"
  page.save
  puts "the author is: #{page.meta_author}"
  
=== has_many and belongs_to
 
In the model definition above, we stated that a Page <tt>has_many :links</tt> and <tt>has_many :images</tt>. We can
define what a Link and an Image is with greater detail now.
 
  class Link < Rhino::Cell
    belongs_to :page
  
    def url
      url_parts = key.split('/')
      backwards_host = url_parts.shift
      path = url_parts.join('/')
      host = backwards_host.split('.').reverse.join('.')
      "http://#{host}/#{path}"
    end
  end
 
  class Image < Rhino::Cell
    belongs_to :page
  end
  
Now that we've defined <tt>Link</tt> and <tt>Image</tt>, we can work with them easily. The following code adds a link to the page <tt>com.example</tt>, which when written to HBase becomes a cell in the <tt>links:</tt> column family named <tt>links:com.google</tt> with the contents <tt>search engine</tt>.
 
  page = Page.get('com.example')
  page.links.create('com.google', 'search engine')
  
You can also iterate over the collection of links. In this example, we use <tt>Link#url</tt>, a method we defined on the Link class
to convert from the common <tt>com.example/path</tt> URL storage style to <tt>example.com/path</tt>.
 
  page.links.each do |link|
    puts "Link to #{link.url} with text: '#{link.contents}'"
  end
  
You can also get a specific link.
 
  google_link_text = page.get('com.google').contents
 
=== Setting timestamps and retrieving by timestamp
  
First, let's create some Pages with different timestamps.
 
  a_week_ago = Time.now - 7 * 24 * 3600
  a_month_ago = Time.now - 30 * 24 * 3600
  
  newer_page = Page.create('google.com', {:title=>'newer google'}, {:timestamp=>a_week_ago})
  older_page = Page.create('google.com', {:title=>'older google'}, {:timestamp=>a_month_ago})
  
Now you can <tt>get</tt> by the timestamps you just set.
 
  Page.get('google.com', :timestamp=>a_week_ago).title # => "newer google"
  Page.get('google.com', :timestamp=>a_month_ago).title # => "older google"
  
If you call <tt>get</tt> with no arguments, you will get the most recent Page.
 
  Page.get('google.com').title # => "newer google"
  
== More information
 
Read the specs in the spec/ directory to see more usage examples. Also look at the spec models in spec/spec_helper.rb.