This guide will show you how to get started using Intake to read an Accumulo table.
For conda users, the Intake Accumulo plugin is installed with the following commands:
conda install -c intake intake-accumulo
The simplest use case for this plugin is to read an existing Accumulo table. Assuming the Accumulo instance is located at localhost:42424
and the table is in the variable, table
, this will read the entire table into a dataframe.:
>>> import intake
>>> ds = intake.open_accumulo(table)
>>> df = ds.read()
>>> df
row column_family column_qualifier column_visibility time value
0 row_0 cf1 cq1 2018-05-15 22:53:37.990 0
1 row_0 cf2 cq2 2018-05-15 22:53:38.009 0
2 row_1 cf1 cq1 2018-05-15 22:53:38.018 1
3 row_1 cf2 cq2 2018-05-15 22:53:38.026 1
4 row_2 cf1 cq1 2018-05-15 22:53:38.034 2
5 row_2 cf2 cq2 2018-05-15 22:53:38.042 2
6 row_3 cf1 cq1 2018-05-15 22:53:38.049 3
7 row_3 cf2 cq2 2018-05-15 22:53:38.057 3
8 row_4 cf1 cq1 2018-05-15 22:53:38.065 4
9 row_4 cf2 cq2 2018-05-15 22:53:38.072 4
This example is equivalent to the above example, except we now access the table through an existing catalog, catalog.yml
.:
>>> import intake
>>> c = intake.open_catalog("catalog.yml")
>>> df = c.basic.read()
>>> df
row column_family column_qualifier column_visibility time value
0 row_0 cf1 cq1 2018-05-15 22:53:37.990 0
1 row_0 cf2 cq2 2018-05-15 22:53:38.009 0
2 row_1 cf1 cq1 2018-05-15 22:53:38.018 1
3 row_1 cf2 cq2 2018-05-15 22:53:38.026 1
4 row_2 cf1 cq1 2018-05-15 22:53:38.034 2
5 row_2 cf2 cq2 2018-05-15 22:53:38.042 2
6 row_3 cf1 cq1 2018-05-15 22:53:38.049 3
7 row_3 cf2 cq2 2018-05-15 22:53:38.057 3
8 row_4 cf1 cq1 2018-05-15 22:53:38.065 4
9 row_4 cf2 cq2 2018-05-15 22:53:38.072 4