# HBase / HappyBase Tutorial

In [2]:
import happybase

## Connecting via Thrift
The containerized HBase stack includes a trift server to make connecting to HBase simple.  Right now, the port is dynamic, so a `docker-compose ps` is needed to check the external thrift port.  Included in the listing of containers will be a line similar to: `thrift-1         entrypoint.sh thrift             Up      0.0.0.0:32776->9090/tcp`.  This indicats that port 9090 inside the container is mapped to port 32776 on localhost.  Below the connection is opened.

Note that the connection timeout is pretty low, so it might be necessary to reconnect.

In [3]:
connection = happybase.Connection('localhost', port=32776)

## Listing Tables
The tables that are in the database can be listed via the connection object.  Once listed, it is possible to connect to a table (if any exist).

In [4]:
connection.tables()

[b'mytable', b'test']

In [5]:
# Grab the table object
table = connection.table('test')

## Creating Tables
If a table does not already exist, it is possible to create a table.  Below, a table with 3 column families is created and the tables in the database are relisted.

In [6]:
try:
    connection.create_table(
    'mytable',
    {'cf1': dict(max_versions=10),
     'cf2': dict(max_versions=1, block_cache_enabled=False),
     'cf3': dict(),  # use defaults
    }
    )
except:
    print('Table already exists.')

Table already exists.


In [7]:
connection.tables()

[b'mytable', b'test']

## Inserting Data
HBase using binary storage for everything, so we have to do some `encode` / `decode` action on the Python side...

Here inserts happen row by row

In [120]:
table = connection.table('mytable')

In [104]:
import string
import random

table = connection.table('mytable')
rows = range(100)
for i in range(1000):
    rk = 'row{}'.format(random.choice(rows))
    cf1 = 'cf1:{}'.format(random.choice(['a', 'b', 'c']))
    cf2 = 'cf2:{}'.format(random.choice(['foo', 'bar', 'baz', 'zab', 'rab', 'oof']))
    cf3 = 'cf3:{}'.format(random.choice(['a', 'b', 'c', 'd', 'e', 'f', 'g']))
    v = '{}'.format(random.choice(string.ascii_letters)).encode()
    table.put(rk, {cf1:v, cf2:v, cf3:v})

In [110]:
table = connection.table('mytable')

for k, d in table.scan():
    print(k,d)
    #table.delete(k)

b'row0' {b'cf2:baz': b'u', b'cf3:d': b'p', b'cf2:bar': b'h', b'cf3:g': b'L', b'cf3:e': b'e', b'cf3:c': b'h', b'cf1:c': b'L', b'cf2:foo': b'i', b'cf1:a': b'e', b'cf3:b': b'u', b'cf2:rab': b'L', b'cf2:zab': b'e'}
b'row1' {b'cf1:b': b'c', b'cf1:c': b'q', b'cf3:d': b'q', b'cf2:zab': b'e', b'cf2:foo': b'V', b'cf1:a': b'V', b'cf3:b': b'V', b'cf2:bar': b'q', b'cf2:rab': b'c', b'cf3:e': b'c', b'cf3:g': b'e'}
b'row10' {b'cf2:baz': b'R', b'cf3:d': b'I', b'cf2:bar': b'I', b'cf3:g': b'R', b'cf1:b': b'R', b'cf3:c': b'U', b'cf2:oof': b'B', b'cf3:a': b'q', b'cf1:a': b'I', b'cf3:b': b'd', b'cf2:rab': b'd', b'cf1:c': b'U', b'cf3:f': b'a'}
b'row11' {b'cf3:d': b'M', b'cf2:baz': b'f', b'cf2:zab': b'S', b'cf3:g': b'S', b'cf3:c': b'f', b'cf1:b': b'x', b'cf3:e': b'V', b'cf2:oof': b'O', b'cf3:b': b'Y', b'cf1:a': b'M', b'cf2:foo': b'M', b'cf2:rab': b'D', b'cf1:c': b'f', b'cf3:f': b'D'}
b'row12' {b'cf3:c': b'M', b'cf1:b': b'v', b'cf1:c': b'M', b'cf2:baz': b'b', b'cf2:zab': b'h', b'cf1:a': b'h', b'cf3:d': b'j', 

In [111]:
# Cleaning up without deleting the table
for k, d in table.scan():
    table.delete(k)

## Batch Inserting Data
The above inserts are slow because everything is happening row by row.  In order to get the inserts to happen more quickly, it is possible to use batch operations.  This is possible using a context (`with table.batch() as b:`) or manually by calling a send.

In [9]:
table = connection.table('mytable')
rows = range(100000)
batchsize = 1000000
try:
    with table.batch(transaction=True) as b:
        for i in range(batchsize):
            rk = 'row{}'.format(random.choice(rows))
            cf1 = 'cf1:{}'.format(random.choice(['a', 'b', 'c']))
            cf2 = 'cf2:{}'.format(random.choice(['foo', 'bar', 'baz', 'zab', 'rab', 'oof']))
            cf3 = 'cf3:{}'.format(random.choice(['a', 'b', 'c', 'd', 'e', 'f', 'g']))
            v = '{}'.format(random.choice(string.ascii_letters)).encode()
            b.put(rk, {cf1:v, cf2:v, cf3:v})
except:
    print('An error occurred sending the batch')

An error occurred sending the batch


In [124]:
for k, d in table.scan():
    print(k,d)

b'row0' {b'cf1:b': b'V', b'cf1:c': b'l', b'cf2:baz': b'l', b'cf2:zab': b'V', b'cf2:foo': b'G', b'cf3:b': b'l', b'cf3:g': b'J', b'cf3:e': b'u', b'cf2:rab': b'J', b'cf3:f': b'V'}
b'row1' {b'cf3:d': b'm', b'cf2:baz': b'g', b'cf2:zab': b'P', b'cf2:bar': b'm', b'cf3:g': b'g', b'cf3:c': b's', b'cf1:b': b's', b'cf3:e': b'P', b'cf2:oof': b's', b'cf3:a': b'm', b'cf1:a': b'k', b'cf2:foo': b'z', b'cf2:rab': b'k', b'cf1:c': b'm', b'cf3:f': b'b'}
b'row10' {b'cf3:d': b'U', b'cf2:baz': b'N', b'cf2:zab': b'I', b'cf2:bar': b's', b'cf3:g': b't', b'cf3:e': b's', b'cf1:b': b'I', b'cf3:c': b's', b'cf2:oof': b's', b'cf1:a': b'N', b'cf3:b': b'u', b'cf1:c': b'Q', b'cf3:f': b'I'}
b'row1003' {b'cf1:b': b'A', b'cf2:rab': b'A', b'cf3:c': b'A'}
b'row1004' {b'cf1:c': b'o', b'cf2:baz': b'o', b'cf3:d': b'o'}
b'row1011' {b'cf1:a': b'K', b'cf2:oof': b'K', b'cf3:a': b'K'}
b'row1024' {b'cf2:foo': b'K', b'cf3:e': b'K', b'cf1:c': b'K'}
b'row1048' {b'cf1:a': b'V', b'cf3:f': b'V', b'cf2:oof': b'V'}
b'row1052' {b'cf3:b': b'R'

In [116]:
# Cleaning up without deleting the table
for k, d in table.scan():
    table.delete(k)