# Adding Data To Datasets

This tutorial is a sequel to [Tutorial 00](Example_00_Open_Store_And_Add_Datasets.ipynb#Connect-to-store-(using-sina-local-file)) which should have been successfully ran before this tutotrial.


## Connect to store (using sina local file and asynchronous mode)


In [1]:
from  kosh import KoshStore
import os

# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"

# connect to store in asynchronous mode
store = KoshStore(db_uri=kosh_example_sql_file)

## Adding Files to Datasets

Let's search datasets containing param1

In [2]:
from sina.utils import DataRange
# We're setting a min value less than the known min, to ensure all dataset come back
datasets = store.search(param1=DataRange(-1.e20))
print(len(datasets))

125


Let's scan the directories and add relevant files to the datasets

In [3]:
import os
import glob
try:
    from tqdm.autonotebook import tqdm
except:
    tqdm = list

pth = "sample_files"
pbar = tqdm(datasets[:10])
for i, ds in enumerate(pbar):
    hdf5 = ds.name+".hdf5"
    if len(hdf5)>0:
        try:
            ds.associate(os.path.join(pth,hdf5), mime_type="hdf5")
        except Exception:  # file already here
            pass

  after removing the cwd from sys.path.


HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))




List ids of data associated with this dataset

In [4]:
ds._associated_data_

['b0cf3b20169940cba5a3bd327a644a57']

Let's search this datasets for all data with mimetype `hdf5`

In [5]:
ds.search(mime_type="hdf5")

[<kosh.sina.core.KoshSinaFile at 0x2aaaba1eb198>]

In [6]:
file = store._load(ds._associated_data_[0])
file.uri

'/g/g19/cdoutrix/git/kosh/examples/sample_files/run_120.hdf5'

In [7]:
h5 = ds.open(ds._associated_data_[0])
h5

<HDF5 file "run_120.hdf5" (mode r)>

In [8]:
h5 = store.open(ds._associated_data_[0])
h5

<HDF5 file "run_120.hdf5" (mode r)>

In [9]:
# You can assoviate many sources to a dataset
ds.associate("some_other_file", mime_type="netcdf")
ds._associated_data_

['b0cf3b20169940cba5a3bd327a644a57', '13670fe54a1d411bb31d145caf9e9cb9']

In [10]:
# Or many datasets at once
ds.associate(["file2", "file3"], mime_type="png")
ds._associated_data_

['b0cf3b20169940cba5a3bd327a644a57',
 '13670fe54a1d411bb31d145caf9e9cb9',
 'b5a8a0b72ae24b9bb851a6711c82b9d5',
 '76f4291ac0a54a69b155466e83dc7107']

In [11]:
# They do have to be of them type and/or metadata
ds.associate(["file5", "file6"], mime_type=["tiff", "jpg"], metadata=[{"name":"some"}, {"age":21}])
ds._associated_data_

['b0cf3b20169940cba5a3bd327a644a57',
 '13670fe54a1d411bb31d145caf9e9cb9',
 'b5a8a0b72ae24b9bb851a6711c82b9d5',
 '76f4291ac0a54a69b155466e83dc7107',
 '64ab258277c7438c8cb53c8f2c9a73b1',
 '560ddce9881540a28c8730f9a9781407']