First lets import the emptyheaded python module.

In [1]:
from emptyheaded import *

Our query compiler is written in scala. We spin up the JVM with the following command.

In [2]:
start()

You can change how many threads we run with etc via the configuration that will be used for the database instance.

In [3]:
c = Config()
print c

(system: emptyheaded, num_threads: 1,
		num_sockets: 4, layout: hybrid, memory: RAM)


We accept data in the form of a pandas dataframe. Lets read some data from a csv file into a pandas dataframe.

In [4]:
ratings = pd.read_csv('test.csv',\
  sep=',',\
  names=["0","1"],\
  dtype={"0":np.uint32,"1":np.uint32})

Let's create a relation. A relation has a name and takes a dataframe that contains the data. The schema is automatically inferred.

In [5]:
graph = Relation(
  name="graph",
  dataframe=ratings)

We are finally ready to create a database. The database accepts a config, a path to the folder the database will be placed on disk (must not exist already) and a list of relations that the database will contain. (this step is slow).

In [7]:
db = Database.create(
  Config(),
  "/Users/caberger/Documents/Research/code/EmptyHeaded/examples/db",
  [graph])
db.build()

That step was slow, it would be nice if we could avoid it if we already have created this database. We can with the following command.

In [8]:
db = Database.from_existing("/Users/caberger/Documents/Research/code/EmptyHeaded/examples/db")

We can load the relation completely into memory with the following command.

In [9]:
db.load("graph")

We can load a relation from disk and return it to the front-end as follows. Note this is just a wrapper around the back-end class.
We can inspect the following fields of the relation.

In [10]:
g = db.get("graph")
print g.annotated
print g.num_rows
print g.num_columns

False
3
2


We can return a dataframe of the relation with the following command.

In [12]:
print g.getDF()

   0  1
0  2  3
1  4  5
2  6  7


We can parse datalog and return an intermediate representation.

In [13]:
def triangle():
  return datalog("""
    Triangle(a,b,c) :- Edge(a,b),Edge(b,c),Edge(a,c).
  """).ir
ir = triangle()

We can look at the intermediate representation.

In [14]:
for rule in ir.rules:
  print rule

RULE :-	 RESULT: [Triangle ['a', 'b', 'c'],[]]  
	 RECURSION:     
	 OPERATION: * 
	 ORDER: ['a', 'b', 'c'] 
	 PROJECT: [] 
	 JOIN: [[Edge ['a', 'b'],[]], [Edge ['b', 'c'],[]], [Edge ['a', 'c'],[]]] 
	 AGGREGATES: [] 
	 FILTERS: []>


We stop the JVM with the following command.

In [15]:
stop()