Simple Bar and Scatter Graph
=====================

What follows is an introductory example for using Matplotlib in Jupyter to visualize data accessed through Sina. All examples here use the NOAA example data set by default but can, in theory and with a few changes (e.g., scalars of interest), use any database assembled by Sina. 

This demo uses data from an SQLite database to facilitate the demonstration.  For enhanced performance on large data sets, you'll want to use a Cassandra back end instead.


Accessing the Data
-----------------

We'll first create a Sina DAOFactory that's aware of our database.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import sina.datastores.sql as sina_sql
import sina.utils

# Load the database
database = sina.utils.get_example_path('noaa/data.sqlite')
print("Using database {}".format(database))
factory = sina_sql.DAOFactory(database)

print("The data access object factory has been created.  Proceed to the next cell.")

Performing our First Query
--------------------------

For our first query, we'll ask for something simple--the record with the ID `WCOA2011-13-95-1-7`. We can access its "raw" data (that is, the JSON used to create it) and use that to create a Python object. From here, we'll list the scalars that are stored in this record, to give us an idea of this data set's contents.

In [None]:
record_of_interest = "WCOA2011-13-95-1-7"
record_dao = factory.create_record_dao()
sample_record = record_dao.get(record_of_interest)
print("Available scalars for record {}: {}".format(record_of_interest,
                                                   ", ".join(sample_record['data'].keys())))

Exploring A Single Run
------------------

We can use the object we've built and matplotlib to prepare an example bar graph comparing two scalars (by default, these are the oxygen content and its check).

You can interact with this plot in several ways that include: pan, zoom, and download.  Simply click the home icon in the lower left corner below the plot to restore the original layout.

Interaction can be disabled by clicking the power icon to the right of the Figure heading at the top.  Re-run the cell if the plot is not rendered.

In [None]:
%matplotlib notebook

# Customize the graph. Only the first 2 will be used in the scatter plot (next cell)
scalars_of_interest = ['ctd_oxy', 'o2']
title = "Comparison of {} for {}".format(" and ".join(scalars_of_interest), record_of_interest)

# Get data from record
scalars = []
for scalar in scalars_of_interest:
    scalars.append(sample_record['data'][scalar])
units = scalars[0].get('units')

# Create the graph
y_pos = np.arange(len(scalars_of_interest))
plt.figure(figsize=(9, 4))
plt.bar(y_pos, [x['value'] for x in scalars], align='center', alpha=0.5)
plt.xticks(y_pos, scalars_of_interest)
plt.ylabel(units)
plt.title(title)
plt.show()

Exploring Many Runs at Once
-----------------------------------

While we may sometimes be interested in one specific record, it's often useful to compare several. In this case, we'll compare the two scalars we selected above for all observations in our dataset.

Once again, you can pan, zoom, and download this plot. Simply click the home icon in the lower left corner below the plot to restore the original layout.  Re-run the cell if the plot is not rendered.

In [None]:
%matplotlib notebook

# Customize the graph
alpha = 0.15
title = "Comparison of {} for all observations".format(" and ".join(scalars_of_interest))

# Create a generator to produce the records we're interested in
all_obs = record_dao.get_all_of_type("obs")

# Extract the information we need from each observation
x_coords = []
y_coords = []

for obs in all_obs:
    x_scalar = obs['data'][scalars_of_interest[0]]
    y_scalar = obs['data'][scalars_of_interest[1]]
    # A value of -999.0 indicates bad data; discard all such observations
    if all(scalar['value'] != -999.0 for scalar in (x_scalar, y_scalar)):
        x_coords.append(x_scalar['value'])
        y_coords.append(y_scalar['value'])

x_units = x_scalar['units']
y_units = y_scalar['units']

# Configure and display the graph
plt.scatter(x_coords, y_coords, alpha=alpha)
plt.xlabel(x_units)
plt.ylabel(y_units)
plt.title(title)
plt.show()