# Adding Data To Datasets

This tutorial is a sequel to [Tutorial 00](Example_00_Open_Store_And_Add_Datasets.ipynb#Connect-to-store-(using-sina-local-file)) which should have been successfully ran before this tutotrial.


## Connect to store (using sina local file and asynchronous mode)


In [1]:
from  kosh import KoshStore
import os

# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"

# connect to store in asynchronous mode
store = KoshStore(db_uri=kosh_example_sql_file)

## Adding Files to Datasets

Let's search datasets containing param1

In [2]:
from sina.utils import DataRange
# We're setting a min value less than the known min, to ensure all dataset come back
datasets = store.search(param1=DataRange(-1.e20))
print(len(datasets))

In the next version the search function will return a generator.
You might need to wrap the result in a list.
  "\nIn the next version the search function will return a generator.\n"


125


Let's scan the directories and add relevant files to the datasets

In [3]:
import os
import glob
try:
    from tqdm.autonotebook import tqdm
except:
    tqdm = list

pth = "sample_files"
pbar = tqdm(datasets[:10])
for i, dataset in enumerate(pbar):
    hdf5 = dataset.name+".hdf5"
    if len(hdf5)>0:
        try:
            dataset.associate(os.path.join(pth,hdf5), mime_type="hdf5")
        except Exception:  # file already here
            pass

  after removing the cwd from sys.path.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=10.0), HTML(value='')))




List ids of data URIs associated with this dataset

In [4]:
dataset._associated_data_

['24a71302fcc74659a23ee37aeb12ac45']

Let's search this datasets for all data with mimetype `hdf5`

In [5]:
dataset.search(mime_type="hdf5")

In the next version the search function will return a generator.
You might need to wrap the result in a list.
  "\nIn the next version the search function will return a generator.\n"


[<kosh.sina.core.KoshSinaFile at 0x2aaade427390>]

In [6]:
file = store._load(dataset._associated_data_[0])
file.uri

'/g/g19/cdoutrix/git/kosh/examples/sample_files/run_101.hdf5'

In [7]:
h5 = dataset.open(dataset._associated_data_[0])
h5

<HDF5 file "run_101.hdf5" (mode r)>

In [8]:
h5 = store.open(dataset._associated_data_[0])
h5

<HDF5 file "run_101.hdf5" (mode r)>

In [9]:
# You can associate many sources to a dataset
dataset.associate("some_other_file", mime_type="netcdf")
dataset._associated_data_

['24a71302fcc74659a23ee37aeb12ac45', '1da40718ce664c858b6c8847d0b7b4ef']

In [10]:
# Or many datasets at once
dataset.associate(["file2", "file3"], mime_type="png")
dataset._associated_data_

['24a71302fcc74659a23ee37aeb12ac45',
 '1da40718ce664c858b6c8847d0b7b4ef',
 '33551cbbb4484e4ea387aa38e0226f82',
 '839056b962c64e77aade83026403e032']

In [11]:
# They do NOT have to be of them type and/or metadata
dataset.associate(["file5", "file6"], mime_type=["tiff", "jpg"], metadata=[{"name":"some"}, {"age":21}])
dataset._associated_data_

['24a71302fcc74659a23ee37aeb12ac45',
 '1da40718ce664c858b6c8847d0b7b4ef',
 '33551cbbb4484e4ea387aa38e0226f82',
 '839056b962c64e77aade83026403e032',
 'ef8eb70322354e29917447829b5cd02a',
 '0f3b28c5c8ca44efbc453490c466a72f']

## Removing associated data

Sometimes you might need to remove an association this can be done via the `dissociate` command.

In [12]:
dataset.dissociate("file5")
dataset._associated_data_

['24a71302fcc74659a23ee37aeb12ac45',
 '1da40718ce664c858b6c8847d0b7b4ef',
 '33551cbbb4484e4ea387aa38e0226f82',
 '839056b962c64e77aade83026403e032',
 '0f3b28c5c8ca44efbc453490c466a72f']