Sina's visualization module
===========================

Sina includes matplotlib integrations that allow for quick creation of common visualizations. If there are more types of visualization you'd like to see, let us know at siboka@llnl.gov! For now, we support a basic set.

We'll start with basic setup. Note that **matplotlib's interactive mode is required** for interactive visualizations. Sina cannot set this itself! Use `%matplotlib notebook` as shown in the cell below to set matplotlib to interactive mode.

In [None]:
import random

import sina
from sina.visualization import Visualizer
from sina.model import CurveSet

import matplotlib.pyplot as plt

# Enable interactive mode for all graphs. Interactive mode is REQUIRED for interactive=True visualization!
%matplotlib notebook

ds = sina.connect()
record_handler = ds.records

print("Connection is ready!")

Loading in data
----------------

We'll insert a hundred randomly-generated records with some simple data to visualize.

In [None]:
possible_mode = ["quick", "standard", "test", "high-def"]
possible_machine = ["Quartz", "Catalyst", "local", "Sierra", "Lassen", "Ruby"]

num_data_records = 100

for val in range(0, num_data_records):
    # Our sample "code runs" are mostly random data
    record = sina.model.Record(id="rec_{}".format(val), type="foo_type")
    record.add_data('total_energy', random.randint(0, 1000) / 10.0)
    record.add_data('start_time', 0)
    record.add_data('elapsed_time', random.randint(1, 200))
    record.add_data('initial_volume', val)
    record.add_data('final_volume', val * random.randint(1, int(num_data_records / 5)))
    record.add_data('num_procs', random.randint(1, 4))
    record.add_data('mode', random.choice(possible_mode))
    record.add_data('machine', random.choice(possible_machine))
    record.add_data('fibonacci_scramble', random.sample([1, 1, 2, 3, 5, 8, 13], 7))
    cs1 = CurveSet("quick_sample")
    cs1.add_independent("time", [1, 2, 3, 4])
    cs1.add_dependent("local_density", random.sample(range(1, 10), 4))
    cs1.add_dependent("est_overall_density", random.sample(range(1, 10), 4))
    record.add_curve_set(cs1)
    cs2 = CurveSet("slow_sample")
    cs2.add_independent("longer_timestep", [2, 4])
    cs2.add_dependent("overall_density", random.sample(range(1, 10), 2))
    record.add_curve_set(cs2)
    if random.randint(1, 6) == 6:
        record.add_file("{}_log.txt".format(val))
    record_handler.insert(record)

print("{} Records have been inserted into the database.".format(num_data_records + 1))

Basic usage
--------------

Create a Visualizer object, then use it to create your plot of choice. The only required setting is what Sina data to use.

Unlike standard matplotlib, Sina's histogram implementation supports both scalar and string data.

You may notice the output of an internal Sina object. That's a Jupyter feature where it automatically prints the result of the last call in a cell; it'll vanish if the create_histogram() call isn't the last one (ex: you end with a print call), or you can silence it by assignment (`_ = vis.create_histogram(...)`)

In [None]:
vis = Visualizer(ds)

# A histogram of scalar data
vis.create_histogram(x="final_volume")

# A histogram of string data
vis.create_histogram(x="machine")

Interactive usage
-----------------

"Interactive mode" includes Jupyter widgets that allow you to configure your graph on the fly. The histogram will now include a dropdown selection for which data to plot on the x axis. 

**IMPORTANT: Matplotlib's interactive mode can be finicky! If you run an interactive cell and the dropdown isn't doing anything, try inserting another `%matplotlib notebook`, as below.**

In [None]:
# Matplotlib is a bit finicky, and often needs it called twice. Hopefully that gets fixed. For now, this is
# here to provide some safety.
%matplotlib notebook

# Enabling interactive mode
_ = vis.create_histogram("machine", interactive=True)

Scatter Plots
---------------

Sina supports a number of other visualizations. Scatter plots may be particularly useful. All plots support both interactive and non-interactive modes.

In [None]:
vis.create_scatter_plot(x="start_time", y="elapsed_time", interactive=True)

Configuring your visualizations
------------------------------------

Because Sina is using matplotlib under the hood, it can both receive and pass a number of configurations. For example, you can pass it an existing figure and axis, give it a title, or hand it configuration keyword arguments to pass directly to matplotlib, such as plot color.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
settings = {"color": "orange", "alpha": 0.25}
vis.create_scatter_plot(fig=fig, ax=ax,
                        title="My cool graph of {x_name}",
                        x="initial_volume", y="final_volume",
                        matplotlib_options=settings)

You can also configure your interactive visualizations to have only certain data available for selection. This is useful, for example, for data sets where many parameters are constant. If all of your runs contain `do_print=1` and `ndims=2`, you may not want to see `do_print` and `ndims` in the drop-down menu.

In [None]:
vis.create_histogram("machine", selectable_data=["final_volume", "machine"], interactive=True)

The module also provides a utility method for filtering out scalar and string constants. Visualizer's `get_contained_data_names` returns a dictionary of types of data, and will exclude constants with `filter_constants=True`.

In [None]:
non_constant_data = vis.get_contained_data_names(filter_constants=True)

vis.create_histogram("final_volume", selectable_data=non_constant_data["scalar"], interactive=True)

Line plots
-----------

Sina's line plots come with a special option: curve set selection. If you don't specify a curve set, you'll have the selection of **all** scalar list data found in a record, regardless of size or association.

Note that, due to the nature of line plots, you'll likely want to restrict them to a subset of records. You can use the `id_pool` argument to do so. You can also pass an `id_pool` at the Visualizer level, in which case all visualizations created from it will inherit its id_pool. All visualizations accept an `id_pool`.

The curve set dropdown includes a special option that will allow you to choose interactively from ANY scalar curve data found in a Record's data section. Use with caution! Scalar lists of different sizes can't be plotted against one another.

In [None]:
# You can pass id_pool at the Visualizer level, or per visualization.
# curve_vis = Visualizer(ds, id_pool=["rec_1", "rec_2", "rec_3"])

_ = vis.create_line_plot(x="time", y="local_density", curve_set="quick_sample", interactive=True,
                         id_pool=["rec_1", "rec_2", "rec_3"])

Further plot types
---------------------

See the API documentation for all plot types supported, and remember to reach out to siboka@llnl.gov if there's more you'd like to see!

In [None]:
vis.create_surface_plot(x="initial_volume", y="elapsed_time", z="final_volume", interactive=True)