Data Visualization with Sina: Jupyter
=====================

What follows is an introductory example for using Matplotlib in Jupyter to visualize data accessed through Sina. All examples here use the NOAA example data set by default, but can use any database included with sina-examples. This demo uses the sqlite component of Sina only (for now). For enhanced performance on large data sets, make sure to check out the Cassandra portion of Sina!



Accessing the Data
-----------------

We'll first create a Sina DAOFactory that's aware of our database. Sina doesn't ship 

In [None]:
import matplotlib

import numpy as np
import sina.datastores.sql as sina_sql

DATABASE = "/collab/usr/gapps/wf/examples/data/noaa/noaa.sqlite"
factory = sina_sql.DAOFactory(DATABASE)

Performing our First "Query"
--------------------------

For our first query, we'll ask for something simple--the record with the ID `WCOA2011-13-95-1-7`. We can access its "raw" data (that is, the JSON used to create it) and use that to create a Python object. From here, we'll list the first 5 scalars that are stored in this record, to give us an idea of this data set's contents.

In [None]:
import json

record_dao = factory.createRecordDAO()
sample_record = record_dao.get("WCOA2011-13-95-1-7")
sample_json = json.loads(sample_record.raw)
print(json.dumps(sample_json['data'][:5]))

A Very Simple Graph
------------------

We can use the object we've built and matplotlib to prepare an example bar graph.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook

scalars_of_interest = ['ctd_oxy', 'o2']
scalars_from_json = []
y_label_fallback = "unknown units"
title = "Comparison of o2 and o2 quality check"

for entry in sample_json['data']:
    if entry['name'] in scalars_of_interest:
        scalars_from_json.append((entry['name'],
                                  entry['value'],
                                  (entry['units'] if entry.get('units') else y_label_fallback)))      

units_list = set(x[2] for x in scalars_from_json)
units = units_list.pop() if (len(units_list) == 1) else y_label_fallback

y_pos = np.arange(len(scalars_of_interest))
plt.figure(figsize=(9, 4))
plt.bar(y_pos, [x[1] for x in sorted(scalars_from_json)], align='center', alpha=0.5)
plt.xticks(y_pos, scalars_of_interest)
plt.ylabel(units)
plt.title(title)
plt.show()

Exploring Many Runs at Once
===========================

While we may sometimes be interested in one specific record, it's often useful to compare several. In this case, we'll compare all the different values of a few scalars between all runs in our sample database.

In [None]:
import sina.datastores.sql_schema as schema
import matplotlib.pyplot as plt
from sqlalchemy import func
%matplotlib notebook

colors=["#646881", "#62bec1"]
first_scalar_set=['press', 'temp', 'depth']
second_scalar_set=['temp', 'ctd_oxy', 'depth']
mag=10
alpha=0.5

# Can be replaced once the Sina version is updated
# Returns len(scalar_list)+1 lists, in the form [[record_ids],[scal_1_vals],[scal_2_vals],...]
def sina_extract(scalar_list):
    # This assumes that all records have all scalars. It'll fail if any are missing.
    list_of_result_tuples = (factory.session.query(schema.Scalar.value, schema.Scalar.record_id).
                             filter(schema.Scalar.name.in_(scalar_list))
                             .order_by(schema.Scalar.record_id, schema.Scalar.name)
                             .all())
    out = [[] for x in range(len(scalar_list)+1)]
    for x in range(0, len(list_of_result_tuples), len(scalar_list)):
        out[0].append(list_of_result_tuples[x][1])
        for y in range(0, len(scalar_list)):
            out[y+1].append(list_of_result_tuples[x+y][0])
    return out
            
# Matplotlib
fig = plt.figure(figsize=(9, 6))
ax = fig.add_subplot(111)
text=ax.text(0,0, "", va="bottom", ha="left")
    
scatter_lists_1 = sina_extract(first_scalar_set)
sc1 = plt.scatter(x=scatter_lists_1[1], y=scatter_lists_1[2], s=[x*mag for x in scatter_lists_1[3]], c=colors[0], edgecolor=colors[0], alpha=alpha)

scatter_lists_2 = sina_extract(second_scalar_set)
sc2 = plt.scatter(x=scatter_lists_2[1], y=scatter_lists_2[2], s=[x*mag for x in scatter_lists_2[3]], c=colors[1], edgecolor=colors[1], alpha=alpha)

plt.legend((sc1,
            sc2),
           (str(first_scalar_set),
            str(second_scalar_set)),
           scatterpoints=1,
           fontsize=8)

plt.show()