Simple Bar and Scatter Graph
=====================

What follows is an introductory example for using Matplotlib in Jupyter to visualize data accessed through Sina. All examples here use the NOAA example data set by default but can, in theory, use any database assembled by Sina. This demo uses the sqlite component of Sina only (for now). For enhanced performance on large data sets, make sure to check out the Cassandra portion of Sina!



Accessing the Data
-----------------

We'll first create a Sina DAOFactory that's aware of our database.

In [None]:
import matplotlib

import numpy as np
import sina.datastores.sql as sina_sql

DATABASE = "/collab/usr/gapps/wf/examples/data/noaa/noaa.sqlite"
factory = sina_sql.DAOFactory(DATABASE)

Performing our First Query
--------------------------

For our first query, we'll ask for something simple--the record with the ID `WCOA2011-13-95-1-7`. We can access its "raw" data (that is, the JSON used to create it) and use that to create a Python object. From here, we'll list the first 5 scalars that are stored in this record, to give us an idea of this data set's contents.

In [None]:
import json

record_of_interest="WCOA2011-13-95-1-7"
record_dao = factory.createRecordDAO()
sample_record = record_dao.get(record_of_interest).raw
print(sample_record['data'][:5])

Exploring A Single Run
------------------

We can use the object we've built and matplotlib to prepare an example bar graph comparing two scalars (by default, these are the oxygen content and its check).

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook

# Should be alphabetical (or call sorted() on them)
scalars_of_interest = ['ctd_oxy', 'o2']
title = "Comparison of {} for {}".format(" and ".join(scalars_of_interest), record_of_interest)

scalars = []
for entry in sample_record['data']:
    if entry['name'] in scalars_of_interest:
        scalars.append(entry)      

sorted_scalars = sorted(scalars, key=lambda k: k['name'])
units = sorted_scalars[0].get('units')

y_pos = np.arange(len(scalars_of_interest))
plt.figure(figsize=(9, 4))
plt.bar(y_pos, [x['value'] for x in sorted_scalars], align='center', alpha=0.5)
plt.xticks(y_pos, scalars_of_interest)
plt.ylabel(units)
plt.title(title)
plt.show()

Exploring Many Runs at Once
-----------------------------------

While we may sometimes be interested in one specific record, it's often useful to compare several. In this case, we'll compare the two scalars we selected above for all observations in our dataset.

In [None]:
import sina.datastores.sql_schema as schema
import matplotlib.pyplot as plt
from sqlalchemy import func
%matplotlib notebook

# Customize the graph
alpha=0.15
title = "Comparison of {} for all observations".format(" and ".join(scalars_of_interest))

# Get all the ids of records we're interested in
all_obs_ids = [x.id for x in record_dao.get_all_of_type("obs")]

# Get the scalars we're interested in for each of those records
all_obs_scalars = [record_dao.get_scalars(x, scalars_of_interest) for x in all_obs_ids]

# Discard information we won't plot, things like scalar names and bad reads (-999 as value)
all_obs_values = [(x[0]['value'], x[1]['value'])
                              for x in all_obs_scalars
                              if all(y['value'] != -999.0 for y in x)]

# Change list from [(x_val_1, y_val_1),(x_val_2, y_val_2)] to [(x_val_1, x_val_2),(y_val_1, y_val_2)]
point_coords = zip(*all_obs_values)

# We access the first scalar set available (all_obs_scalars[0]) and check its two entries for units
x_units = all_obs_scalars[0][0]['units'] 
y_units = all_obs_scalars[0][1]['units']

# Configure and display the graph
plt.scatter(x=point_coords[0], y=point_coords[1], alpha=alpha)
plt.xlabel(x_units)
plt.ylabel(y_units)
plt.title(title)
plt.show()