# Adding Data To Datasets

This tutorial is a sequel to [Tutorial 00](Example_00_Open_Store_And_Add_Datasets.ipynb#Connect-to-store-(using-sina-local-file)) which should have been successfully ran before this tutotrial.


## Connect to store (using sina local file and asynchronous mode)


In [1]:
from  kosh import connect
import os

# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"

# connect to store in asynchronous mode
store = connect(kosh_example_sql_file)

## Adding Files to Datasets

Let's find datasets containing `param1`

In [2]:
from sina.utils import DataRange
# We're setting a min value less than the known min, to ensure all dataset come back
datasets = list(store.find(param1=DataRange(-1.e20)))
print(len(datasets))

125


Let's scan the directories and add relevant files to the datasets

In [3]:
import os
import glob
try:
    from tqdm.autonotebook import tqdm
except:
    tqdm = list

pth = "sample_files"
pbar = tqdm(datasets[:10])
for i, dataset in enumerate(pbar):
    hdf5 = dataset.name+".hdf5"
    if len(hdf5)>0:
        try:
            dataset.associate(os.path.join(pth,hdf5), mime_type="hdf5")
        except Exception:  # file already here
            pass

  after removing the cwd from sys.path.


  0%|          | 0/10 [00:00<?, ?it/s]

List ids of data URIs associated with this dataset

In [4]:
dataset._associated_data_

['cdbb55b450c24ab3b89acdebe129f1dc']

Let's find datasets with data with mime type `hdf5`

In [5]:
dataset.find(mime_type="hdf5")

<generator object KoshDataset.find at 0x2aaab42aa5d0>

In [6]:
file = store._load(dataset._associated_data_[0])
file.uri

'/g/g19/cdoutrix/git/kosh/examples/sample_files/run_013.hdf5'

In [7]:
h5 = dataset.open(dataset._associated_data_[0])
h5

<HDF5 file "run_013.hdf5" (mode r)>

In [8]:
h5 = store.open(dataset._associated_data_[0])
h5

<HDF5 file "run_013.hdf5" (mode r)>

In [9]:
# You can associate many sources to a dataset
dataset.associate("some_other_file", mime_type="netcdf")
dataset._associated_data_

['cdbb55b450c24ab3b89acdebe129f1dc', 'ef211aa27d1b47069e542270fca2cf3b']

In [10]:
# Or many datasets at once
dataset.associate(["file2", "file3"], mime_type="png")
dataset._associated_data_

['cdbb55b450c24ab3b89acdebe129f1dc',
 'ef211aa27d1b47069e542270fca2cf3b',
 '71c82980a00844cbbd283bd0b92d3ec3',
 '835b420fe92146e4bac0668d2c65c61f']

In [11]:
# They do NOT have to be of them type and/or metadata
dataset.associate(["file5", "file6"], mime_type=["tiff", "jpg"], metadata=[{"name":"some"}, {"age":21}])
dataset._associated_data_

['cdbb55b450c24ab3b89acdebe129f1dc',
 'ef211aa27d1b47069e542270fca2cf3b',
 '71c82980a00844cbbd283bd0b92d3ec3',
 '835b420fe92146e4bac0668d2c65c61f',
 '5f1be0e9230f4e16990036f092e7b026',
 '854b7cf026fb4480a6981523bfabc995']

## Removing associated data

Sometimes you might need to remove an association this can be done via the `dissociate` command.

In [12]:
dataset.dissociate("file5")
dataset._associated_data_

['cdbb55b450c24ab3b89acdebe129f1dc',
 'ef211aa27d1b47069e542270fca2cf3b',
 '71c82980a00844cbbd283bd0b92d3ec3',
 '835b420fe92146e4bac0668d2c65c61f',
 '854b7cf026fb4480a6981523bfabc995']