# Connecting to a Store and Adding Datasets

In this Notebook we create a new store and add a few datasets to it.

## Connect to store (using sina local file)

First let's create an empty databse (with you as a single user)

In a real application only admin user should have write permission to the file

In [1]:
import os
import sys
import shlex
from subprocess import Popen, PIPE
import kosh

kosh_example_sql_file = "kosh_example.sql"

# Create a new store (erase if exists)
kosh.create_new_db(kosh_example_sql_file)

<kosh.sina.core.KoshSinaStore at 0x2aaaba1e6860>

In [2]:
from  kosh import KoshStore
import os

# connect to store
store = KoshStore(db_uri=kosh_example_sql_file)

## Adding datasets to the store

Let's add the first 10 runs

In [3]:
import glob
try:
    from tqdm.autonotebook import tqdm
except:
    tqdm = list

runs = glob.glob("sample_files/run*hdf5")
print("we found: {} runs".format(len(runs)))

for run in tqdm(runs[:10]):
    name = os.path.basename(run).split(".")[0]
    print("DS NAME:", name)
    # let's make sure it is unique, in case we run this cell multiple times
    datasets = store.search(name=name)
    if len(datasets) == 0:
        store.create(name)
    else:
        print("we found {} datasets already matching this name".format(len(datasets)))
        print(datasets[0])


we found: 125 runs


  This is separate from the ipykernel package so we can avoid doing imports until


HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

DS NAME: run_000
DS NAME: run_001
DS NAME: run_002
DS NAME: run_003
DS NAME: run_004
DS NAME: run_005
DS NAME: run_006
DS NAME: run_007
DS NAME: run_008
DS NAME: run_009



## Adding attributes do a dataset

For each of these runs let's add metadata

In [4]:
import random

def create_metadata():
    metadata = {"param1": random.random() * 2.,
                "param2": random.random() * 1.5,
                "param3": random.random() * 5,
                "param4": random.random() * 3,
                "param5": random.random() * 2.5,
                "param6": chr(random.randint(65, 91)),
               }
    metadata["project"] = "Kosh Tutorial"
    return metadata

pbar = tqdm(runs[:10])
for run in pbar:
    name = os.path.basename(run).split(".")[0]
    # Retrieve dataset via name
    dataset = store.search(name=name)[0]
    # Let's create a few random attributes
    metadata = create_metadata()
    for attribute in metadata:
        setattr(dataset, attribute, metadata[attribute])
print(dataset)

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))


KOSH DATASET
	id: f21638d2201441f386d5b0e7ec6768da
	name:run_009
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: run_009
	param1: 1.96155866130105
	param2: 0.598311636826357
	param3: 1.6207983434392808
	param4: 0.4540713849908521
	param5: 1.0719082463618532
	param6: E
	project: Kosh Tutorial
--- Associated Data (0)---



## Creating datasets with all the metadata at once.

This speeds things up.

We will also turn asynchronous mode on to speed up things further

In [5]:
store.synchronous(False)
pbar = tqdm(runs[10:])
for i, run in enumerate(pbar):
    name = os.path.basename(run).split(".")[0]
    #pbar.set_description("run: {:45}".format(name))
    # let's make sure it is unique
    #datasets = store.search(name=name)
    datasets=[]
    if len(datasets) == 0:
        metadata = create_metadata()
        dataset = store.create(name, metadata=metadata)
    else:
        print("we found {} datasets already matching this name".format(len(datasets)))
        print(datasets[0])
print(dataset)
# We need to sync the store to ensure it's written to the database
store.sync()

HBox(children=(FloatProgress(value=0.0, max=115.0), HTML(value='')))


KOSH DATASET
	id: e8d619a0a6b9412794ad0f1a8bec35fd
	name:run_124
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: run_124
	param1: 0.9251686219432851
	param2: 0.6843459510042346
	param3: 2.6061415205754077
	param4: 2.0618942190739373
	param5: 1.6820464799539523
	param6: Q
	project: Kosh Tutorial
--- Associated Data (0)---



## Adding/Modifying/Deleting Dataset attributes

In [6]:
# List existing attributes
print(dataset.listattributes())

['creator', 'name', 'param1', 'param2', 'param3', 'param4', 'param5', 'param6', 'project']


In [7]:
# Create a new attribute
dataset.new_attribute = "new"
print(dataset.listattributes())
print(dataset.new_attribute)

['creator', 'name', 'new_attribute', 'param1', 'param2', 'param3', 'param4', 'param5', 'param6', 'project']
new


In [8]:
# modify an attribute
dataset.new_attribute = "changed"
print(dataset.new_attribute)

changed


In [9]:
# Modify/add many attributes at once (less db access, faster)
dataset.update({"new_attribute": "changed_again", "yet_another_new_attribute":"yana"})
print(dataset.listattributes())
print(dataset.new_attribute)
print(dataset.yet_another_new_attribute)

['creator', 'name', 'new_attribute', 'param1', 'param2', 'param3', 'param4', 'param5', 'param6', 'project', 'yet_another_new_attribute']
changed_again
yana


In [10]:
# Deleting attributes
del(dataset.new_attribute)
del(dataset.yet_another_new_attribute)
print(dataset.listattributes())

['creator', 'name', 'param1', 'param2', 'param3', 'param4', 'param5', 'param6', 'project']


## Querying the store

In [11]:
# we're using sina to search in range
from sina.utils import DataRange
datasets = store.search(param1=DataRange(min=1.7))
print(len(datasets))

18
