Raw Records and Tables
====================

While the primary way of querying performantly through Sina is to use the query API, Sina also stores and returns raw JSON suitable for manipulation. This notebook demonstrates using the raw form of Records to display data in tables. One
table is created for each Record type and a final one for the relationships
between Experiments and Observations. The example is coupled to the noaa database, in terms of data organization plus column and scalar names, but the underlying principles can be applied to any Sina-assembled Record.

First, we create a factory that's aware of our data store.

In [None]:
import json
import ipywidgets as widgets
import IPython.display as ipyd
import sys
import tabulate


import sina.datastores.sql as sina_sql
import sina.datastores.sql_schema as schema

DATABASE = '/collab/usr/gapps/wf/examples/data/noaa/noaa.sqlite'

# Create the database access factory.
factory = sina_sql.DAOFactory(DATABASE)


We then create a RecordDAO to extract every Record with the type "exp" (experiment). This pulls back the raw form, a JSON object that can be loaded for further analysis. In this case, we extract the first file associated with each "exp" Record and display it in a table.

In [None]:
# Extract experiment record(s) from the database
all_experiments = factory.createRecordDAO().get_all_of_type("exp")

# Build a list of table entries, one experiment per table row
table_data = [('Experiment Id', 'Data Source')]

exp_ids = []  # Save off the experiment ids for the relationship queries
for exp in all_experiments:
    rec = exp.raw
    exp_id = rec['id'].encode('utf-8')
    exp_ids.append(exp_id)

    exp_first_file = rec['files'][0]['uri'].encode('utf-8')
    table_data.append([exp_id, exp_first_file])

# Display the data in an HTML table
table = tabulate.tabulate(table_data, tablefmt='html')
ipyd.display(ipyd.HTML(table))

Again, we extract every Record of a specific type, this one "obs" (observation). This time, we iterate through them, building up a list of selected scalars per experiment, which we will display as a table. We also again display a file associated with this Record.

In [None]:
# Extract the observation records from the database
all_observations = factory.createRecordDAO().get_all_of_type("obs")

# Build a list of table entries, one observation per table row
# .. Hard-coding units in the heading instead of extracting from data
table_data = [('Observation Id', 'Depth (m)', 'Pressure (decibars)', 'Temp (C)',
               'Oxygen (micromol/kg)', 'O2 (micromol/kg)','O2 QC', 'pH', 'pH QC',
               'Observation Data')]

scalars_of_interest = ['depth', 'press', 'temp', 'ctd_oxy', 'o2', 'o2_qc', 'ph','ph_qc']

# Each observation gets a row in the table
for obs in all_observations:
    record = obs.raw
    table_row = ['']*len(table_data[0])
    table_row[0] = record['id'].encode('utf-8')
    
    # For each scalar in the observation, if we're interested in it, load it in the right position
    for scalar in record['data']:
        name = scalar['name']
        if name in scalars_of_interest:
            value = scalar['value']
            if isinstance(value, unicode):
                value = value.encode('utf-8')
            column_offset = scalars_of_interest.index(name) + 1
            table_row[column_offset] = value
    
    # Row is complete, add to the table
    table_row.append(record['files'][0]['uri'].encode('utf-8'))
    table_data.append(table_row)

# Display the data in an HTML table
table = tabulate.tabulate(table_data, tablefmt='html')
ipyd.display(ipyd.HTML(table))

The process for the third and final type of Record is similar to that of the second. We find and display scalars of interest for every Record of type "qc".

In [None]:
# Extract the quality control records from the database
all_qc = factory.createRecordDAO().get_all_of_type("qc")

# Build a list of table entries, one quality control entry per row
table_data = [['Quality Control Value', 'Description']]

scalars_of_interest = ['depth', 'press', 'temp', 'ctd_oxy', 'o2', 'o2_qc', 'ph','ph_qc']

for qc in all_qc:
    record = qc.raw
    desc = record['data'][0]['value']
    table_data.append([record['id'], desc.encode('utf-8')])

# Display the data in an HTML table
table = tabulate.tabulate(table_data, tablefmt='html')
ipyd.display(ipyd.HTML(table))

Finally, we build a table illustrating the heirarchy of this data; one experiment that encompasses many observations. 

While Relationships don't have a "raw" form like Records do, they're still very simple Python objects, each having a subject, predicate, and object (see Sina's documentation for more detail). For every experiment we discovered previously, we'll have Sina return all relationships in which that experiment was the subject.

In [None]:
# Extract relationship records from the database, for each experiment
dao = factory.createRelationshipDAO()

# Build a list of table entries, one quality control entry per row
table_data = [['Experiment Id', 'Relationship', 'Observation Id']]

for exp_id in exp_ids:
    all_relationships = dao.get(subject_id=exp_id)
    for rel in all_relationships:
        table_data.append([exp_id,
                           rel.predicate.encode('utf-8'),
                           rel.object_id.encode('utf-8')])

# Display the data in an HTML table
table = tabulate.tabulate(table_data, tablefmt='html')
ipyd.display(ipyd.HTML(table))