# Interacting with HBase in Python (using Happybase)
[Happybase](https://happybase.readthedocs.io/en/latest/) is a python library to interact with HBase that uses its Thrift API under the hood. [Apache Thrift](https://thrift.apache.org/) is a software framework which is used for the development of cross-language services.

## An Overview of HBase Structure
It is important to first understand how data is stored in HBase. <br>HBase is a Column oriented database i.e. the table schema defines only column families and the records are sorted by row. <br>
A table can have many column families and each column family can have many columns (if a restriction is not defined). <br>
The data is then stored as key:value pairs in these columns. <br> <br>
The following list of points summarize the above paragraph:
1. A Table in HBase is a collection of rows
2. A Row is a collection of Column Families
3. A Column Family is a collection of columns and
4. A Column stores data as key:value pairs

The cell below shows such a structure:

In [1]:
# A typical HBase structure:

# |Row key| Column Family 1 | Column Family 2 |
# |-------|-----------------|-----------------|
# |       | col 1  | col 2  | col A  | col B  |
# |-------|-----------------|--------|--------|
# | row1  | alpha  | beta   | gama   | delta  |

## Interacting in Python

In [2]:
# check if the library is installed
# otherwise install it (pip install happybase)

import happybase

In [3]:
# get a connection
# start the hbase by running the script ${HBASE_HOME}/bin/start-hbase.sh
# ensure that the thrift server is running
# this is done using hbase-daemon.sh start thrift 

# Note that by default the thrift server runs at localhost:9090
# Hence the connection can be established as follows:
conn = happybase.Connection(host='localhost', port=9090)

# or simply as
#conn = happybase.Connection()

In [4]:
# print all the tables present
print(conn.tables())

[]


### Creating a Table

In [5]:
# let us create a simple table named 'books' with two column famiilies named Author and Info
# create the table only if it does not exist

tables_list = conn.tables() # get the list of tables

if b'books' not in tables_list:
    conn.create_table(
        'books',
        {'Author': dict(max_versions=2),
         'Info':dict(),
        }
    )

### Storing data

In [6]:
# First get an instance of the Table object
table = conn.table('books')

In [7]:
# Enter data using the put() method of Table object

table.put(b'101', {b'Author:FirstName' : b'George',
                   b'Author:LastName'  : b'Orwell'})
table.put(b'101', {b'Info:Title' : b'Animal Farm',
                   b'Info:Price' : b'100'})

table.put(b'102', {b'Author:FirstName' : b'George',
                   b'Author:LastName'  : b'Orwell'})
table.put(b'102', {b'Info:Title' : b'1984',
                   b'Info:Price' : b'150'})

table.put(b'103', {b'Author:FirstName' : b'Albert',
                   b'Author:LastName'  : b'Camus'})
table.put(b'103', {b'Info:Title' : b'The Fall',
                   b'Info:Price' : b'200'})

table.put(b'104', {b'Author:FirstName' : b'Franz',
                   b'Author:LastName'  : b'Kafka'})
table.put(b'104', {b'Info:Title' : b'The Trial',
                   b'Info:Price' : b'250'})

### Reading data

In [8]:
# We can read the whole table using Table.scan() as follows

for key, data in table.scan():
    print(key,data)

b'101' {b'Author:FirstName': b'George', b'Author:LastName': b'Orwell', b'Info:Price': b'100', b'Info:Title': b'Animal Farm'}
b'102' {b'Author:FirstName': b'George', b'Author:LastName': b'Orwell', b'Info:Price': b'150', b'Info:Title': b'1984'}
b'103' {b'Author:FirstName': b'Albert', b'Author:LastName': b'Camus', b'Info:Price': b'200', b'Info:Title': b'The Fall'}
b'104' {b'Author:FirstName': b'Franz', b'Author:LastName': b'Kafka', b'Info:Price': b'250', b'Info:Title': b'The Trial'}


In [9]:
# a row can be retrived using Table.row()
# for a given row key
# note that the byte objects can be decoded into strins using decode("utf-8")
row = table.row(b'102')
print("Author : {} {}".format(row[b'Author:FirstName'].decode("utf-8"), row[b'Author:LastName'].decode("utf-8")))

Author : George Orwell


In [10]:
# Note that similar to above multiple rows of data
# can be retrieved using Table.rows()
rows = table.rows([b'101', b'103', b'104'])
for key, data in rows:
    print(key, data)

b'101' {b'Author:FirstName': b'George', b'Author:LastName': b'Orwell', b'Info:Price': b'100', b'Info:Title': b'Animal Farm'}
b'103' {b'Author:FirstName': b'Albert', b'Author:LastName': b'Camus', b'Info:Price': b'200', b'Info:Title': b'The Fall'}
b'104' {b'Author:FirstName': b'Franz', b'Author:LastName': b'Kafka', b'Info:Price': b'250', b'Info:Title': b'The Trial'}


In [11]:
# get the result as a dictionary
rows_as_dict = dict(table.rows([b'101', b'103', b'104']))

In [12]:
type(rows_as_dict)

dict

In [13]:
print(rows_as_dict)

{b'101': {b'Author:FirstName': b'George', b'Author:LastName': b'Orwell', b'Info:Price': b'100', b'Info:Title': b'Animal Farm'}, b'103': {b'Author:FirstName': b'Albert', b'Author:LastName': b'Camus', b'Info:Price': b'200', b'Info:Title': b'The Fall'}, b'104': {b'Author:FirstName': b'Franz', b'Author:LastName': b'Kafka', b'Info:Price': b'250', b'Info:Title': b'The Trial'}}


In [14]:
# access individual columns
rows_as_dict[b'101'][b'Author:FirstName']

b'George'

In [15]:
# we can also retrieve only the required columns rather than 
# retrieving whole rows and filtering the output
# This improves the performance
# This is done with columns argument. For example:
rows = table.rows([b'101', b'104'], columns=[b'Author'])
for key,data in rows:
    print(key, data)

# In the abvoe example note how all the columns of a column family can be retrieved

b'101' {b'Author:FirstName': b'George', b'Author:LastName': b'Orwell'}
b'104' {b'Author:FirstName': b'Franz', b'Author:LastName': b'Kafka'}
