# Working with Records

A lot of what makes Sina a powerful tool is the ability to look for records of interest.
For example, you can easily query millions of runs for the run where the temperature was
the highest, or find all runs where it stayed between certain thresholds. Once you find
your runs though, you are going to want to get values in them. This notebook will walk
you through accessing information in the different sections of a record: the data section,
the curve sets, and the data for libraries.

## Data

The main section containing user-defined information about records is the `data` section. The full
schema documentation can be found at
https://lc.llnl.gov/workflow/docs/sina/sina_schema.html#records. The jist of it is that each
data item is a dictionary with a required `value` field, and options `units` and `tags` fields.
The example below shows to to construct a record with ID `'some_id'` and type `'run'`.
It also contains two data items, `'energy'` and `'temerature'`, which have all the possible fields.
In the example below, both `value` fields are numbers. Values can also be string, lists of numbers,
and lists of strings.

In [None]:
from __future__ import print_function
from sina.model import Record
import json

# This record could have come back from a data store as well.
record_1 = Record(
    'some_id', 'run',
    data={
        'energy': {
            'value': 123.456,
            'units': 'J',
            'tags': ['output', 'main'],
        },
        'temperature': {
            'value': 987.6,
            'units': 'K',
            'tags': ['output', 'main'],
        }
    },
)

Once you have a record, you can access the
[Record.data](https://lc.llnl.gov/workflow/docs/sina/generated_docs/sina.model.html#sina.model.Record.data)
property to get an editable dictionary corresponding to the `data` section
of the record.

In [None]:
print(json.dumps(record_1.data, indent=4))

You can get the individual items in this dictionary, such as `'energy'`, and then
use that to get (or set) the value, units, or tags.

In [None]:
energy_data = record_1.data['energy']
print('energy_data is just a Python dictionary:', type(energy_data))
print('Energy is ', energy_data['value'], energy_data['units'])
print('Energy is tagged with', energy_data['tags'])

energy_data['value'] = 15
print('The energy has been updated in the original record:', record_1.data['energy']['value'])

Oftentimes, you only care about the actual values, rather than the units or
the tags. To simplify this, the `Record` class provides a
[`data_values`](https://lc.llnl.gov/workflow/docs/sina/generated_docs/sina.model.html#sina.model.Record.data_values)
property. This object allows you to directly access the values of the
data items through by attribute access and subscript access.

In [None]:
print('Energy is', record_1.data_values.energy)
print('Can also access with subscript operator:', record_1.data_values['energy'])

In addition to reading values, you can also set them. This does not
change the units or tags if they were already set.

In [None]:
record_1.data_values.energy = 20
print('Energy has been updated, leaving tags and units alone ', record_1.data['energy'])

Finally, you can use this feature to add completely new data items.
These will not have tags or units. If you want tags or units, use
[`Record.add_data()`](https://lc.llnl.gov/workflow/docs/sina/generated_docs/sina.model.html#sina.model.Record.add_data)
instead.

In [None]:
record_1.data_values.my_new_value = 100
print('Units and tags for new items are not set:', record_1.data['my_new_value'])

## Curve Sets

Curve sets are used to describe related curves: an independent variable and a set of
dependent variables. Like that a `data` section, the `curve_sets` section is a top-level
element of `Records`. The example below creates a records with a curve set named `'cs1'`,
whose indepdendent variables is `'time'`, and has two dependent variables:
`'energy'` and `'temperature'`.

In [None]:
record_2 = Record(
    'some_id', 'run',
    curve_sets={
        'cs1': {
            'independent': {
                'time': {
                    'value': [0.1, 0.2, 0.3, 0.4, 0.5]
                }
            },
            'dependent': {
                'energy': {
                    'value': [12.34, 56.78, 90.12, 34.56],
                    'units': 'J'
                },
                'temperature': {
                    'value': [50, 60, 70, 65, 30],
                    'units': 'K'
                }
            }
        }
    },
)

To access the curve sets in as record, you can use the
[`Record.curve_sets`](https://lc.llnl.gov/workflow/docs/sina/generated_docs/sina.model.html#sina.model.Record.curve_sets)
property. This gives you direct access to the Python dictionary containing the data
for the curve sets.

In [None]:
print('The full curve set is a python dictionary:', record_2.curve_sets['cs1'])
print('Time values are', record_2.curve_sets['cs1']['independent']['time']['value'])

Just like how for `Record.data` there is a `Record.data_values` property to more succinctly access
just the values, there is a `curve_set_values` property which allows you to succinctly access
the values of curve sets. You can use both attribute and subscript access. Also, you don't
have to worry about whether a particular curve is the dependent or independent one, though
you can be explicit about this if you wish.

In [None]:
record_2.curve_set_values.cs1.time
record_2.curve_set_values.cs1.independent.time
record_2.curve_set_values['cs1']['time']
record_2.curve_set_values['cs1'].independent['time']

record_2.curve_set_values.cs1.energy

Just like with `data_values`, `curve_set_values` also allows you to add new
curve sets to records. Here, though, you do have to be explicit about where
(depdendent or independent), you are adding them.

In [None]:
record_2.curve_set_values.cs1.dependent.new_entry = [-1, -2, -3, -4, -5]
print(record_2.curve_sets['cs1']['dependent']['new_entry'])

## Library Data

In addition to `data` and `curve_sets`, Sina provides a hierarchical section called
`library_data`. This is intended for software libraries running in simulation codes to
be able to add their own data sections. However, it can really be used for any nested data.
The `library_data` section can contain its own `data`, `curve_sets`, and `library_data`
sections.

In [None]:
record_3 = Record(
    'some_id', 'run',
    library_data={
        'my_library': {
            'data': {
                'helium_volume': {
                    'value': 12.34
                },
                'hydrogen_volume': {
                    'value': 56.78
                }
            },
            'curve_sets': {
                'cs1': {
                    'independent': {
                        'time': {
                            'value': [0.1, 0.2, 0.3, 0.4, 0.5]
                        }
                    },
                    'dependent': {
                        'energy': {
                            'value': [12.34, 56.78, 90.12, 34.56],
                            'units': 'J'
                        },
                        'temperature': {
                            'value': [50, 60, 70, 65, 30],
                            'units': 'K'
                        }
                    }
                }
            },
            'library_data': {
                'my_nested_library': {
                    'data': {
                        'max_iterations': {
                            'value': 200
                        }
                    }
                }
            }
        }
    }
)

Just like with `data` and `curve_sets`, we can use the `library_data` property
to get the Python dictionary for a given library. However, if all you want is
the values (and not tags our units), you can use the `library_data_values`
property instead.

In [None]:
print('Helium occupies a volume of',
      record_3.library_data['my_library']['data']['helium_volume']['value'])
print('It is easier to access values through "library_data_values"',
      record_3.library_data_values.my_library.data.hydrogen_volume)

You can also set values and, access curve sets, and reach values in nested libraries.

In [None]:
record_3.library_data_values.my_library.data.new_entry = 10

In [None]:
record_3.library_data_values.my_library.curve_sets.cs1.temperature

In [None]:
record_3.library_data_values.my_library.library_data.my_nested_library.data.max_iterations