Basic Usage
===========

This notebook will guide you through some of Sina's core functionality. For more examples, including advanced topics like handling large datasets or generating tables, see the example dataset folders (noaa/, fukushima/, etc).

Initial Setup
-------------
We first import one of Sina's backends; we'll use the sql backend for portability. We set up a connection to our database, then use that connection to create a "RecordDAO", the core object for inserting, querying, and generally handling Records.

In [None]:
import json
import random

import sina.datastores.sql as sina
from sina.model import Record, generate_record_from_json
from sina.utils import DataRange, has_all, has_any, has_only

# By default (without an argument), we open a connection to an in-memory database.
# If you'd like to create a file, just provide the filename as an arg.
factory = sina.DAOFactory()

record_handler = factory.createRecordDAO()

print("Connection is ready!")

Inserting Our First Records
----------------
Now that we've got a connection open and our handler ready, we can start inserting Records!. The first we'll create is as simple as possible, but the rest have data attached. We'll insert all of them into our database.

In [None]:
simple_record = Record(id="simplest", type="simple_sample")
record_handler.insert(simple_record)

possible_maintainers = ["John Doe", "Jane Doe", "Gary Stu", "Ann Bob"]
num_data_records = 100
for val in range(0, num_data_records):
    record = Record(id="rec_{}".format(val), type="foo_type")
    record['data']['initial_density'] = {'value': val, 'units': 'g/cm^3'}
    record['data']['final_volume'] = {'value': random.randint(0, int(num_data_records / 5))}
    record['data']['maintainer'] = {'value': random.choice(possible_maintainers), 'tags': ["personnel"]}
    record_handler.insert(record)

print("{} Records have been inserted into the database.".format(num_data_records + 1))

Type-Based Queries and Deleting Records
--------------------------------------------------

On second thought, the "simple_sample" Record isn't useful. Pretending we've forgotten the id we used to create it above, we'll go ahead and find every simple_sample-type Record in our database and delete it.

In [None]:
simple_record_ids = list(record_handler.get_all_of_type("simple_sample", ids_only=True))
print("Simple_sample Records found: {}".format(simple_record_ids))

print("Deleting them all...")
record_handler.delete_many(simple_record_ids)

simple_records_post_delete = list(record_handler.get_all_of_type("simple_sample", ids_only=True))
print("Simple_sample Records found now: {}".format(simple_records_post_delete))

Finding Records Based on Data
=============================
The remaining Records in our database represent randomized runs of some imaginary code. John Doe just completed a run of the version he maintains where the final_volume was 6, which seemed a little low. After inserting that Record, he finds all Records in the database that he's maintainer for and which have a volume of 6 or lower.

In [None]:
# Because Record data is represented by a JSON object/Python dictionary, we can also set it up like so:
data = {"final_volume": {"value": 6},
        "initial_density": {"value": 6, "units": "cm^3"},
        "maintainer": {"value": "John Doe"}}
record_handler.insert(Record(id="john_latest", type="foo_type", data=data))

# Now we'll find matching Records.
john_low_volume = record_handler.data_query(maintainer="John Doe",
                                            final_volume=DataRange(max=6, max_inclusive=True))

print("John Doe's low-volume runs: {}".format(', '.join(john_low_volume)))

List Data and Querying Them
=======================

Some data take the form of a list of entries, either numbers or strings: timeseries, options activated, and nodes in use are a few examples. Sina allows for storing and querying these lists. Note that, to maintain querying efficiency, a list can't have strings AND have scalars AND be queryable; only all-scalar or all-string lists can be part of a Record's data. Mixed-type lists (as well as any other JSON-legal structure) can be stored in a Record's user_defined section instead.

In [None]:
# Records expressed as JSON. We expect records 1 and 3 to match our query.
record_1 = """{"id": "list_rec_1",
               "type": "list_rec",
               "data": {"options_active": {"value": ["quickrun", "verification", "code_test"]}},
               "user_defined": {"mixed": [1, 2, "upper"]}}"""
record_2 = """{"id": "list_rec_2",
               "type": "list_rec",
               "data": {"options_active": {"value": ["quickrun", "distributed"]}},
               "user_defined": {"mixed": [1, 2, "upper"],
                                "nested": ["spam", ["egg"]]}}"""
record_3 = """{"id": "list_rec_3",
               "type": "list_rec",
               "data": {"options_active": {"value": ["code_test", "quickrun"]}},
               "user_defined": {"nested": ["spam", ["egg"]],
                                "bool_dict": {"my_key": [true, false]}}}"""

for record in (record_1, record_2, record_3):
    record_handler.insert(generate_record_from_json(json.loads(record)))
print("3 list-containing Records have been inserted into the database.\n")

# Find all the Records that have both "quickrun" and "code_test"
# in their options_active
quicktest = record_handler.data_query(options_active=has_all("quickrun",
                                                             "code_test"))

# Get those Records and print their id, value for options_active, and the contents of their user_defined.
print("Records whose traits include 'quickrun' and 'code_test':\n")
for id in quicktest:
    record = record_handler.get(id)
    print("{}\ntraits: {} | user_defined: {}\n".format(id,
                                                       ', '.join(record['data']['options_active']['value']),
                                                       str(record['user_defined'])))

Further List Queries
================
There are a few additional ways to retrieve Records based on their list data. A `has_any()` query will retrieve any Record that contains *at least* one of its arguments. A `has_only()` query retrieves Records that have all arguments and nothing additional.

It's important to note that, for these three types of list query, order and count don't matter. If `["quickrun", "code_test"]` would match, so would `["code_test", "quickrun", "quickrun"]`.

In [None]:
match_any = record_handler.data_query(options_active=has_any("quickrun",
                                                             "code_test"))
print("Records whose traits include 'quickrun' and/or 'code_test': {}".format(list(match_any)))

match_only = record_handler.data_query(options_active=has_only("quickrun",
                                                               "code_test"))
print("Records whose traits are 'quickrun' and 'code_test', with nothing additional: {}".format(list(match_any)))