# How to build a database
An important element of the COSIMA cookbook is that model output metadata is held in a database. This database allows for easy querying of the data so that we so that variables can be loaded with a single command.

The current version of the cookbook (as of May 2019) has been upgraded to allow for multiple databases. We will continue to maintain databases of available experiments; but users can also create their own (smaller) databases covering just the simulations that they are interested in. This example shows how to build your database.


In [26]:
%matplotlib inline
import cosima_cookbook as cc
from dask.distributed import Client

**First, create a database using this function:**

In [8]:
help(cc.database.create_database)

Help on function create_database in module cosima_cookbook.database:

create_database(db, debug=False)
    Create new database file with the target schema.
    
    We create a foreign key constraint on the ncfile column in
    the ncvars table, but it won't be enforced without `PRAGMA
    foreign_keys = 1' in sqlite.



We will use our default ACCESS-OM2-01 database by way of example:

In [28]:
db = '/g/data3/hh5/tmp/cosima/database/access-om2-01.db'
cc.database.create_database(db)

(<sqlalchemy.engine.base.Connection at 0x7efca9f02588>,
 {'ncfiles': Table('ncfiles', MetaData(bind=None), Column('id', Integer(), table=<ncfiles>, primary_key=True, nullable=False), Column('index_time', DateTime(), table=<ncfiles>), Column('ncfile', Text(), table=<ncfiles>), Column('present', Boolean(), table=<ncfiles>), Column('experiment', Text(), table=<ncfiles>), Column('run', Integer(), table=<ncfiles>), Column('timeunits', Text(), table=<ncfiles>), Column('calendar', Text(), table=<ncfiles>), Column('time_start', Text(), table=<ncfiles>), Column('time_end', Text(), table=<ncfiles>), Column('frequency', Text(), table=<ncfiles>), schema=None),
  'ncvars': Table('ncvars', MetaData(bind=None), Column('id', Integer(), table=<ncvars>, primary_key=True, nullable=False), Column('ncfile', Integer(), ForeignKey('ncfiles.id'), table=<ncvars>, nullable=False), Column('variable', Text(), table=<ncvars>), Column('dimensions', Text(), table=<ncvars>), Column('chunking', Text(), table=<ncvars>)

**Second, start up a client** (required to build the index):

In [16]:
client = Client(n_workers=4)
client

0,1
Client  Scheduler: tcp://127.0.0.1:39367  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 8  Memory: 33.67 GB


**Now, you're ready to build the database** using the following function:

In [7]:
help(cc.database.build_index)

Help on function build_index in module cosima_cookbook.database:

build_index(directories, client, db, update=False, debug=False)
    Index all runs contained within a directory. Requires a distributed client for processing,
    and the filename of a database that's been created with the create_database() function.
    
    May scan for only new entries to add to database with the update flag.



In [32]:
directory='/g/data3/hh5/tmp/cosima/access-om2-01/'
cc.database.build_index(directory,client,db,update=True)

2/2

You can choose to supply a list of directories, or a single directory, to be included in your database. Note that this operation will take 10 minutes or so for the first build of the database, or if `update=False`. 