# Adding Data To Datasets

This tutorial is a sequel to [Tutorial 00](Example_00_Open_Store_And_Add_Datasets.ipynb#Connect-to-store-(using-sina-local-file)) which should have been successfully ran before this tutotrial.


## Connect to store (using sina local file and asynchronous mode)


In [1]:
from  kosh import connect
import os

# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"

# connect to store in asynchronous mode
store = connect(kosh_example_sql_file)



## Adding Files to Datasets

Let's find datasets containing `param1`

In [2]:
from sina.utils import DataRange
# We're setting a min value less than the known min, to ensure all dataset come back
datasets = list(store.find(param1=DataRange(-1.e20)))
print(len(datasets))

125


Let's scan the directories and add relevant files to the datasets

In [3]:
import os
import glob
try:
    from tqdm.autonotebook import tqdm
except:
    tqdm = list

pth = "sample_files"
pbar = tqdm(datasets[:10])
for i, dataset in enumerate(pbar):
    hdf5 = dataset.name+".hdf5"
    if len(hdf5)>0:
        try:
            dataset.associate(os.path.join(pth,hdf5), mime_type="hdf5")
        except Exception:  # file already here
            pass

  from tqdm.autonotebook import tqdm


  0%|          | 0/10 [00:00<?, ?it/s]

List ids of data URIs associated with this dataset

In [4]:
dataset._associated_data_

['71e0d881b0b744dcaf31915e2c71d968']

Let's find datasets with data with mime type `hdf5`

In [5]:
dataset.find(mime_type="hdf5")

<generator object KoshDataset.find at 0x2aaade1d34a0>

In [6]:
file = store._load(dataset._associated_data_[0])
file.uri

'/g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5'

In [7]:
h5 = dataset.open(dataset._associated_data_[0])
h5

<HDF5 file "run_062.hdf5" (mode r)>

In [8]:
h5 = store.open(dataset._associated_data_[0])
h5

<HDF5 file "run_062.hdf5" (mode r)>

In [9]:
# You can associate many sources to a dataset
dataset.associate("some_other_file", mime_type="netcdf")
dataset._associated_data_

['71e0d881b0b744dcaf31915e2c71d968', '2008fdac8cdb4a37976f65c3d6f34b15']

In [10]:
# Or many datasets at once
dataset.associate(["file2", "file3"], mime_type="png")
dataset._associated_data_

['71e0d881b0b744dcaf31915e2c71d968',
 '2008fdac8cdb4a37976f65c3d6f34b15',
 'e466dd75d4e949c6b088c1f0f0e04449',
 '82225d89d283448183215aa8d742dd20']

In [11]:
# They do NOT have to be of them type and/or metadata
dataset.associate(["file5", "file6"], mime_type=["tiff", "jpg"], metadata=[{"name":"some"}, {"age":21}])
dataset._associated_data_

['71e0d881b0b744dcaf31915e2c71d968',
 '2008fdac8cdb4a37976f65c3d6f34b15',
 'e466dd75d4e949c6b088c1f0f0e04449',
 '82225d89d283448183215aa8d742dd20',
 'faa1d71f61644cc9835daf3f7927209f',
 '31fa0c4a5da04f6ba096f34ec86a93ab']

## Removing associated files

Sometimes you might need to remove an association this can be done via the `dissociate` command.

In [12]:
dataset.dissociate("file5")
dataset._associated_data_

['71e0d881b0b744dcaf31915e2c71d968',
 '2008fdac8cdb4a37976f65c3d6f34b15',
 'e466dd75d4e949c6b088c1f0f0e04449',
 '82225d89d283448183215aa8d742dd20',
 '31fa0c4a5da04f6ba096f34ec86a93ab']

## Adding curves to a dataset

Sometimes you don't need/want a file hanging around, you just want to save a curve (think 1D data)

You can easily do so.

You can organize/group your curve into different `curve_sets` and give them a name. If you don't, Kosh will name them automaticaly for you.

In [13]:
dataset.add_curve([1,2,3,4], "time", "my_curves")
dataset.add_curve([2.3, 3.4, 5.6, 7.8], "some_variable", "my_curves")
dataset.add_curve([3, 4,5], "time", "my_other_curves")
dataset

KOSH DATASET
	id: 209c7382c1334ef4afe8fa95ef0cb58b
	name: run_062
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: run_062
	param1: 0.3299019516056123
	param2: 0.24940142061599885
	param3: 4.635686431066943
	param4: 2.4118405159503844
	param5: 2.21532924044391
	param6: J
	project: Kosh Tutorial
--- Associated Data (6)---
	Mime_type: hdf5
		/g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 )
	Mime_type: jpg
		file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab )
	Mime_type: netcdf
		some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 )
	Mime_type: png
		file2 ( e466dd75d4e949c6b088c1f0f0e04449 )
		file3 ( 82225d89d283448183215aa8d742dd20 )
	Mime_type: sina/curve
		internal ( my_curves, my_other_curves )
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---


## Removing curves and curve_sets

Similarly you can remove curves or curve_set (if a curve_set becomes empty it will be automatically removed)

In [14]:
dataset.remove_curve("some_variable", "my_curves")
# or
dataset.remove_curve("my_curves/time")

# notice the "my_curves" is gone
dataset

KOSH DATASET
	id: 209c7382c1334ef4afe8fa95ef0cb58b
	name: run_062
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: run_062
	param1: 0.3299019516056123
	param2: 0.24940142061599885
	param3: 4.635686431066943
	param4: 2.4118405159503844
	param5: 2.21532924044391
	param6: J
	project: Kosh Tutorial
--- Associated Data (6)---
	Mime_type: hdf5
		/g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 )
	Mime_type: jpg
		file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab )
	Mime_type: netcdf
		some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 )
	Mime_type: png
		file2 ( e466dd75d4e949c6b088c1f0f0e04449 )
		file3 ( 82225d89d283448183215aa8d742dd20 )
	Mime_type: sina/curve
		internal ( my_other_curves )
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---


In [15]:
dataset.remove_curve("my_other_curves")
# all gone
dataset

KOSH DATASET
	id: 209c7382c1334ef4afe8fa95ef0cb58b
	name: run_062
	creator: cdoutrix

--- Attributes ---
	creator: cdoutrix
	name: run_062
	param1: 0.3299019516056123
	param2: 0.24940142061599885
	param3: 4.635686431066943
	param4: 2.4118405159503844
	param5: 2.21532924044391
	param6: J
	project: Kosh Tutorial
--- Associated Data (5)---
	Mime_type: hdf5
		/g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 )
	Mime_type: jpg
		file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab )
	Mime_type: netcdf
		some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 )
	Mime_type: png
		file2 ( e466dd75d4e949c6b088c1f0f0e04449 )
		file3 ( 82225d89d283448183215aa8d742dd20 )
--- Ensembles (0)---
	[]
--- Ensemble Attributes ---
