Sina's Visualization Module
======================

Sina includes matplotlib integrations for quick and easy visualizations. These are an extension of Sina's idea of "querying across runs", as explained over in the basic usage tutorial, except (of course) the results are shown visually. In general, visualizations are more useful for exploring data, while the parts of the API covered in [basic usage](basic_usage.ipynb) are more useful for scripting and post processing.

We'll start with basic setup. But first...


> **Enable interactive mode if you can!**
Simply uncomment `%matplotlib notebook` in the cell below. This only works if you're running the notebook--having it enabled breaks the static docs, which is why it's disabled by default. Note that this part of Matplotlib can be somewhat fragile. If you're running Jupyter through something else and it doesn't seem to be working, try doing things directly in the browser. LC users, use [the CZ (or RZ) Jupyter instance](https://lc.llnl.gov/jupyter) to save setup time and access data straight from the machines!

In [None]:
import random
import warnings

import sina
from sina.visualization import Visualizer
from sina.model import CurveSet

import matplotlib.pyplot as plt
plt.style.use('dark_background')

#########################
# %matplotlib notebook
#########################

# Using interactive visualizations without setting the notebook mode would usually print warnings.
# We silence them here to keep things friendly to web readers.
warnings.filterwarnings("ignore")

ds = sina.connect()
record_handler = ds.records

print("Connection is ready!")

Loading in data
----------------

We'll insert a hundred randomly-generated records with some simple data to visualize.

In [None]:
possible_mode = ["quick", "standard", "test", "high-def"]
possible_machine = ["Quartz", "Catalyst", "local", "Sierra", "Lassen", "Ruby"]

num_data_records = 100

for val in range(0, num_data_records):
    # Our sample "code runs" are mostly random data
    record = sina.model.Record(id="rec_{}".format(val), type="foo_type")
    record.add_data('total_energy', random.randint(0, 1000) / 10.0)
    record.add_data('start_time', 0)
    record.add_data('elapsed_time', random.randint(1, 200))
    record.add_data('initial_volume', val)
    record.add_data('final_volume', val * random.randint(1, int(num_data_records / 5)))
    record.add_data('num_procs', random.randint(1, 4))
    record.add_data('mode', random.choice(possible_mode))
    record.add_data('machine', random.choice(possible_machine))
    record.add_data('fibonacci_scramble', random.sample([1, 1, 2, 3, 5, 8, 13], 7))
    cs1 = CurveSet("quick_sample")
    cs1.add_independent("time", [1, 2, 3, 4])
    cs1.add_dependent("local_density", random.sample(range(1, 10), 4))
    cs1.add_dependent("est_overall_density", random.sample(range(1, 10), 4))
    record.add_curve_set(cs1)
    cs2 = CurveSet("slow_sample")
    cs2.add_independent("longer_timestep", [2, 4])
    cs2.add_dependent("overall_density", random.sample(range(1, 10), 2))
    record.add_curve_set(cs2)
    if random.randint(1, 6) == 6:
        record.add_file("{}_log.txt".format(val))
    record_handler.insert(record)

print("{} Records have been inserted into the database.".format(num_data_records + 1))

Setting Up a Visualization (Histogram)
--------------

Create a Visualizer object, then use it to create your plot of choice. The only required setting is what Sina data to use.

Unlike standard matplotlib, Sina's histogram implementation supports both scalar and string data.

In [None]:
vis = Visualizer(ds)

# A histogram of string data
# The final .display() forces a redraw, and is included only for the sake of the online documentation, to ensure
# it displays. The visualizer automatically shows graphs you make; you usually won't need display()!
vis.create_histogram(x="machine").display()

# A 2d histogram with both scalar and string data
vis.create_histogram(x="machine", y="final_volume").display()

Interactive Mode
-----------------

"Interactive mode" includes Jupyter widgets that allow you to configure your graph on the fly. The histogram will now include a dropdown selection for which data to plot on the x axis. 

**IMPORTANT: Matplotlib's interactive mode can be finicky! If you run an interactive cell and the dropdown isn't doing anything, try inserting a** `%matplotlib notebook`

In [None]:
# Enabling interactive mode
interactive_hist = vis.create_histogram(x="machine", y="final_volume", interactive=True)
# The additional "show" calls are also for the sake of the online documentation. You can leave them off if you're
# doing things locally; it's essentially half of a display() call.
interactive_hist.fig.show()

Scatter Plots
---------------

Sina supports a number of other visualizations. Scatter plots may be particularly useful. All plots support both interactive and non-interactive modes, and scatter plots support an optional z axis and color bar.

Note: only axes are interactive at this time, `color_val` requires manual setting.  

In [None]:
interactive_scatter = vis.create_scatter_plot(x="initial_volume", y="final_volume", z="elapsed_time", color_val="total_energy", interactive=True)
interactive_scatter.fig.show()

Configuring Visualizations
-------------------------------

Because Sina is using matplotlib under the hood, it can both receive and pass a number of configurations. For example, you can pass it an existing figure and axis, give it a title, or hand it configuration keyword arguments to pass directly to matplotlib, such as plot color.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
settings = {"cmap": "magma", "alpha": 0.25}
vis.create_scatter_plot(fig=fig, ax=ax,
                        title="My Cool Graph of Initial vs. Final Volume",
                        x="initial_volume", y="final_volume",
                        color_val="total_energy",
                        matplotlib_options=settings).display()
ax.set_xlabel("Final Volume (m^3)")
_ = ax.set_ylabel("Initial Volume (m^3)")  # The _ = silences some Jupyter text output

You can also configure your interactive visualizations to have only certain data available for selection. This is useful, for example, for data sets where many parameters are constant. If all of your runs contain `do_print=1` and `ndims=2`, you may not want to see `do_print` and `ndims` in the drop-down menu.

In [None]:
config_hist = vis.create_histogram("machine", selectable_data=["final_volume", "machine"], interactive=True,
                                   matplotlib_options={"color": "darkgreen"})
config_hist.fig.show()

The module also provides a utility method for filtering out scalar and string constants. Visualizer's `get_contained_data_names` returns a dictionary of types of data, and will exclude constants with `filter_constants=True`.

In [None]:
non_constant_data = vis.get_contained_data_names(filter_constants=True)

var_hist = vis.create_histogram("final_volume", "total_energy", selectable_data=non_constant_data["scalar"], interactive=True)
var_hist.fig.show()

Line plots
-----------

Sina's line plots come with a special option: curve set selection. If you don't specify a curve set, you'll have the selection of **all** scalar list data found in a record, regardless of size or association.

Note that, due to the nature of line plots, you'll likely want to restrict them to a subset of records. You can use the `id_pool` argument to do so. All visualizations accept an `id_pool`.

The curve set dropdown includes a special option that will allow you to choose interactively from ANY scalar curve data found in a Record's data section. Use with caution! Scalar lists of different sizes can't be plotted against one another.

In [None]:
# You can pass id_pool at the Visualizer level, or per visualization.
# curve_vis = Visualizer(ds, id_pool=["rec_1", "rec_2", "rec_3"])

curve_plot = vis.create_line_plot(x="time", y="local_density", curve_set="quick_sample", interactive=True,
                                  id_pool=["rec_1", "rec_2", "rec_3"])
curve_plot.fig.show()

Further plot types
---------------------

This tutorial only covers the most basic types of plot. See the [Visualizer documentation](../generated_docs/sina.visualization.rst) for the full list, and reach out to siboka@llnl.gov if there's more you'd like to see!

In [None]:
surface_plot = vis.create_surface_plot(x="initial_volume", y="elapsed_time", z="final_volume", interactive=True)
surface_plot.fig.show()

# Violin and Box Plots

In [None]:
%matplotlib notebook
violin_box_plot = vis.create_violin_box_plot(x="final_volume", interactive=True)
violin_box_plot.fig.show()

# PDF and CDF Plots

In [None]:
%matplotlib notebook
pdf_cdf_plot = vis.create_pdf_cdf_plot(x="final_volume", interactive=True)
pdf_cdf_plot.fig.show()