# Kosh and Sina Interoperability

Table Of Content

1. [Introduction](#Introduction)
2. [Opening/Creating a New Store](#Opening/Creating-a-new-store.)
3. [Adding Entries](#Adding-Entries)
4. [Accessing a Record/Dataset with known id](#Accessing-a-Record/Dataset-with-known-id)
5. [Getting Everything In the Store](#Getting-Everything-In-the-Store)
6. [Deleting entries](#Deleting-Entries)
7. [Updating Entries](#Updating-Entries)
8. [Searching the Store](#Searching-the-Store)
9. [Data](#Data)

## Introduction

In this notebook we will show you how Kosh and Sina are related and how to do things that both can do, along with things that are better suited for each software.

We will also show how to make them work together.

Kosh uses Sina under the hood, for the purpose of this notebooks, both Sina and Kosh will work off the **same** store.


## Opening/Creating a new store.

### SQLite

Both Sina and Kosh will create a store for you if it does not exists.


In [1]:
# Cleanup first in case we ran this before
import os
import sys
if os.path.exists("my_sina_store.sql"):
    os.remove("my_sina_store.sql")
if os.path.exists("my_kosh_store.sql"):
    os.remove("my_kosh_store.sql")

# Sina
import sina 
# New or existing store
store_sina = sina.connect("my_sina_store.sql")
# If you want to clear the data in the store
store_sina.delete_all_contents(force="SKIP PROMPT")

# Kosh
import kosh
# New or existing store
store_kosh = kosh.connect("my_kosh_store.sql")
# You can also delete its content
store_kosh.delete_all_contents(force="SKIP PROMPT")
# Kosh let you wipe the data on loading
store_kosh = kosh.connect("my_kosh_store.sql", delete_all_contents=True)

# Kosh can open a Sina store, we will use it for the rest of this notebook
# so that both Sina and Kosh operate on the same store
# You will get a warning because this store does have have some of Kosh reserved features
store_kosh = kosh.connect("my_sina_store.sql")

### MySql

In [2]:
# Sina
# mysql_store_sina = sina.connect("mysql://<your_username>@:/>read_default_file=<path_to_cnf>")

In [3]:
# Kosh
# mysql_store_kosh = kosh.connect("mysql://<your_username>@:/>read_default_file=<path_to_cnf>")

### Casandra

???

**NOTE**

Kosh and Sina store are mostly interchangeable, you can access the sina store and records directly from a Kosh store.

In [4]:
# You can access the Sina store from a Kosh store
the_sina_store = store_kosh.get_sina_store()
# Or the records
records = store_kosh.get_sina_records()
# or from the store
records = the_sina_store.records

## Adding Entries

In Sina, entries to the database are called records. Records can be of many types.

Unless specified otherwise, Kosh will create records of type `dataset` by default.

### From Python

In [5]:
# Sina
from sina.model import Record
sina_record = Record(id="my_id", type="my_chosen_type")
store_sina.records.insert(sina_record)

In [6]:
# Kosh
# type will be 'dataset', random unique id will be generated
kosh_dataset_record = store_kosh.create()

# Picking id and type
kosh_dataset_record_2 = store_kosh.create(id="some_id", sina_type="some_type")

### From json files


#### Sina
Sina can also load records from json, you can read more about these [here](https://lc.llnl.gov/workflow/docs/sina/sina_schema.html#sina-schema)


In [7]:
import sina

In [8]:
sina_records = sina.utils.convert_json_to_records_and_relationships("sina_curve_rec.json")
for sina_record in sina_records:
    store_sina.records.insert(sina_record)

You can also *ingest* data outside of Python

In [9]:
!sina ingest --database my_sina_store.sql sina_curve_rec_2.json

/usr/bin/sh: sina: command not found


In [10]:
rec = sina_records[0][0]

#### Kosh

Similarly, Kosh has its own `export`/`import` functions, that are using Sina's json format under the hood.

Kosh can import Sina json files directly as well. 

The `match_attributes` is here to help resolving conflicts with other datasets already in the store.


In [11]:
store_kosh.import_dataset?
datasets = store_kosh.import_dataset("sina_curve_rec.json", match_attributes=["name", "id"])
datasets = store_kosh.import_dataset("kosh_dataset.json", match_attributes=["name", "id"])
datasets = store_kosh.import_dataset(kosh_dataset_record.export(), match_attributes=["name", "id"])
list(datasets)

[<kosh.core_sina.KoshSinaObject at 0x2aab4adb5810>]

[0;31mSignature:[0m [0mstore_kosh[0m[0;34m.[0m[0mimport_dataset[0m[0;34m([0m[0mdatasets[0m[0;34m,[0m [0mmatch_attributes[0m[0;34m=[0m[0;34m[[0m[0;34m'name'[0m[0;34m][0m[0;34m,[0m [0mmerge_handler[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mmerge_handler_kargs[0m[0;34m=[0m[0;34m{[0m[0;34m}[0m[0;34m,[0m [0mskip_sina_record_sections[0m[0;34m=[0m[0;34m[[0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
import datasets and ensembles that were exported from another store, or load them from a json file
:param datasets: Dataset/Ensemble object exported by another store, a dataset/ensemble
                 or a json file containing these.
:type datasets: json file, json loaded object, KoshDataset or KoshEnsemble
:param match_attributes: parameters on a dataset to use if this it is already in the store
                         in general we can't use 'id' since it is randomly generated at creation
                         If t

## Accessing a Record/Dataset with known id

In [12]:
# Sina
my_rec = store_sina.records.get("obj1")
print(my_rec)

# Kosh
dataset = store_kosh.open("an_id")
print(dataset)

Model Record <id=obj1, type=some_type>
KOSH DATASET
	id: an_id
	name: Unnamed Dataset
	creator: anonymous

--- Attributes ---
	creator: anonymous
	name: Unnamed Dataset
--- Associated Data (0)---
--- Ensembles (0)---
	[]


## Getting Everything In the Store

In [13]:
# Sina
sina_all = store_sina.records.get_all()
# sina_all = store_sina.records.find()

# Kosh
# Will only return "datasets" (not associated sources, see bellow)
kosh_all = store_kosh.find()

## Deleting Entries

In [14]:
# Sina

store_sina.records.delete(sina_record)
# or id
store_sina.records.delete("obj2")

In [15]:
# Kosh

# Using the dataset itself
store_kosh.delete(kosh_dataset_record)

# Or the id
store_kosh.delete("an_id")

## Updating Entries

In [16]:
# Sina
rec = store_sina.records.get("my_id")
rec.add_data("pi", 3.14159)
# or
rec["data"]["pi_over_2"] = {"value": 1.57}
print(rec["data"])

# Note that the record is NOT updated in the database yet
print(store_sina.records.get("my_id")["data"])
kosh_rec = store_kosh.open("my_id")
print(kosh_rec) # not updated
# Let's update
store_sina.records.delete("my_id")
store_sina.records.insert(rec)
print(kosh_rec) # Updated live no need to fetch again


{'pi': {'value': 3.14159}, 'pi_over_2': {'value': 1.57}}
{}
KOSH DATASET
	id: my_id
	name: ???
	creator: ???
--- Associated Data (0)---
--- Ensembles (0)---
	[]
KOSH DATASET
	id: my_id
	name: ???
	creator: ???

--- Attributes ---
	pi: 3.14159
	pi_over_2: 1.57
--- Associated Data (0)---
--- Ensembles (0)---
	[]


In [17]:
# Kosh
ds = store_kosh.open("some_id")
ds.pi = 3.14159
ds.pi_over_2 = 1.57

# Store is updated
# Kosh way
ds2 = store_kosh.open("some_id")
print(ds2)
# Sina way
print(store_sina.records.get("some_id")["data"])


KOSH DATASET
	id: some_id
	name: Unnamed Dataset
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: Unnamed Dataset
	pi: 3.14159
	pi_over_2: 1.57
--- Associated Data (0)---
--- Ensembles (0)---
	[]
{'creator': {'value': '29227d615664b750489776379f5cd287'}, 'name': {'value': 'Unnamed Dataset'}, '_associated_data_': {'value': None}, 'pi': {'value': 3.14159}, 'pi_over_2': {'value': 1.57}}


## Searching the Store

Sina is designed to help you query your store in many different ways.
Kosh is designed to help you get to your external data fast and easily

You can use sina query capabilities to pinpoint your Kosh datasets.

*Reminder:* You can access sina store and sina records directly from an opened Kosh store.

At its most basic think of Kosh's `find` function as an analog of Sina's `find` function

Sina let you query the store in many ways, and has much more advanced and efficient queries than Kosh

Kosh can do similar things, usually less efficiently, but within one function call only.

### Search records by type

In [18]:
# Sina
list(store_sina.records.find(types=["some_type",]))
list(store_sina.records.find_with_type("some_type"))

[Model Record <id=706a74b6d43548d19796d79453c833e4, type=some_type>,
 Model Record <id=obj1, type=some_type>,
 Model Record <id=some_id, type=some_type>]

In [19]:
# Kosh
list(store_kosh.find(types=["some_type",]))

[KOSH DATASET
	id: some_id
	name: Unnamed Dataset
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: Unnamed Dataset
	pi: 3.14159
	pi_over_2: 1.57
--- Associated Data (0)---
--- Ensembles (0)---
	[],
 KOSH DATASET
	id: 706a74b6d43548d19796d79453c833e4
	name: ???
	creator: ???

--- Attributes ---
	param1: 1
	param2: 2
	param3: 3.3
--- Associated Data (2)---
	Mime_type: image/png
		foo.png ( 706a74b6d43548d19796d79453c833e4 )
	Mime_type: sina/curve
		internal ( timeplot_1 )
--- Ensembles (0)---
	[],
 KOSH DATASET
	id: obj1
	name: ???
	creator: ???

--- Attributes ---
	param1: 1
	param2: 2
	param3: 3.3
--- Associated Data (2)---
	Mime_type: image/png
		foo.png ( obj1 )
	Mime_type: sina/curve
		internal ( timeplot_1 )
--- Ensembles (0)---
	[]]

### Search records based on data

More detailed documentation can be found on Sina's documentation [here](https://lc.llnl.gov/workflow/docs/sina/api_basics.html?highlight=datarange#filtering-records-based-on-their-data)

Kosh's `find` differs slightly here, as the 'data' keys of Sina's find function *can* be passed directly and keys required for any value can be passed as a simple string


In [20]:
list(store_sina.records.find(data= {"pi_over_2":sina.utils.DataRange(1.3, 1.6), "pi":3.14159, "creator":sina.utils.exists()}))
# or via the data dedicated function:
list(store_sina.records.find_with_data(pi_over_2=sina.utils.DataRange(1.3, 1.6), pi=3.14159, creator=sina.utils.exists()))

['some_id']

In [21]:
list(store_kosh.find('creator', pi_over_2=sina.utils.DataRange(1.3, 1.6), pi=3.14159))

[KOSH DATASET
	id: some_id
	name: Unnamed Dataset
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: Unnamed Dataset
	pi: 3.14159
	pi_over_2: 1.57
--- Associated Data (0)---
--- Ensembles (0)---
	[]]

### Search records with file uri

Sina records can contain a special field to store files related to this record. You can search Sina for all records *linked* to a specific file.

In [22]:
list(store_sina.records.find(file_uri="foo.png"))
# or via its dedicated function
list(store_sina.records.find_with_file_uri("foo.png"))

[Model Record <id=706a74b6d43548d19796d79453c833e4, type=some_type>,
 Model Record <id=obj1, type=some_type>]

Kosh can accomplish the same search via its dedicated `file_uri` key when searching

In [23]:
list(store_kosh.find(file_uri='foo.png'))
type(store_sina.records)

sina.datastore.DataStore.RecordOperations

At this point it is worth noting that, in Kosh, it is recommended to `associate` files with a dataset rather than using the `file` section.

*Associating* a file (source) with a Kosh dataset will create a new record in the database with a Kosh reserved record type. There many reasons why Kosh does this.

* If a file is `associated` with many Kosh datasets this saves on the number of entries in the database.
* Since files are now represented by their own records, we can add many queryable metadata to them.
* As your problem complexity grows, many files/sources can be associated with a dataset. Having these files represented as records in Sina allows Kosh to use Sina's query capabilities to quickly pinpoint the desired files(s)/source(s).

Let's demonstrate this:

In [24]:
my_kosh_dataset = store_kosh.open("my_id")
for i in range(100):
    my_kosh_dataset.associate("some_file_{:04d}.png".format(i), mime_type="png", metadata= {"some_param":i})
# now let's search all source for this dataset with `some_param` value between 73 and 90
list(my_kosh_dataset.search(some_param=sina.utils.DataRange(73, 90)))

[<kosh.core_sina.KoshSinaFile at 0x2aab4b02ed10>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4affe9d0>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b06bf10>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0a3e10>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0a3fd0>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad3b590>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0baa90>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad3bf50>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad3b390>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad3bd10>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad53410>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4ad531d0>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0aeed0>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0aee10>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0baf50>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0baf90>,
 <kosh.core_sina.KoshSinaFile at 0x2aab4b0babd0>]

## Data


### Curves 

#### Sina

Sina allows you to query the "data" section of its records, but you can also access **and** search `curves sets` which are essentially time series associated with a record.

A curve set is constituted of an `independent` variable and some `dependent` variable(s).

You can ask Sina to give you all records with a `volume` curve set having values greater than 15


In [25]:
list(store_sina.records.find(data={"volume":sina.utils.any_in(sina.utils.DataRange(min=15.))}))

[]

You can then get the curves from the record.

In [26]:
rec = store_sina.records.get("obj1")
rec["curve_sets"]

{'timeplot_1': {'independent': {'time': {'value': [0, 1, 2]}},
  'dependent': {'feature_a': {'value': [15, 25, 35], 'tags': ['tag1']},
   'feature_b': {'value': [10.1, 25.2, 40.3], 'units': 'm'}}}}

#### Kosh

Kosh's uses Sina search capabilities under the hood, so similarly you would do:


In [27]:
vol_ids = list(store_kosh.find(volume=sina.utils.any_in(sina.utils.DataRange(min=15.))))

# And to get the curves list:
dataset = store_kosh.open("obj1")
print(dataset.list_features())

['timeplot_1', 'timeplot_1/feature_a', 'timeplot_1/feature_b', 'timeplot_1/time']


Let's access the `time`

In [28]:
print(dataset.get("timeplot_1/time"))

[0 1 2]


### External Data (large files)

Sina provides a mechanism to link files to records, via the `add_file` function.

If you also provide a `mime_type` attribute to this added file Kosh will treat it as an associated file and will be able to extract its data via loader (although it will not be able to find it via an attribute search).

In [29]:
rec.add_file("sample_files/run_000.hdf5", mimetype="hdf5")
store_sina.records.delete(rec.id)
store_sina.records.insert(rec)

In [30]:
dataset.list_features(use_cache=False)  # Because it was cached and Kosh cannot know something changed from sina side

['timeplot_1',
 'timeplot_1/feature_a',
 'timeplot_1/feature_b',
 'timeplot_1/time',
 'cycles',
 'direction',
 'elements',
 'node',
 'node/metrics_0',
 'node/metrics_1',
 'node/metrics_10',
 'node/metrics_11',
 'node/metrics_12',
 'node/metrics_2',
 'node/metrics_3',
 'node/metrics_4',
 'node/metrics_5',
 'node/metrics_6',
 'node/metrics_7',
 'node/metrics_8',
 'node/metrics_9',
 'zone',
 'zone/metrics_0',
 'zone/metrics_1',
 'zone/metrics_2',
 'zone/metrics_3',
 'zone/metrics_4']

In [31]:
dataset.get("zone/metrics_0"), dataset.get("timeplot_1/feature_a")

(<HDF5 dataset "metrics_0": shape (2, 4), type "<f4">, array([15, 25, 35]))