# Getting Started
swmr_tools can be installed using conda (from the conda-forge channel) or pip.

# DataSource

The DataSource class is designed to facilitate live data processing of
datasets contained within an hdf5 file. It achieves this through following
a set of *key* datasets and reading the corresponding data once all the keys are non-zero:

This tutorial assumes a basic level of skill using the h5py library.
Specifically, you should be comfortable with using h5py to:

* Open and create hdf5_files
* Navigate files using python dictionary methods
* Create groups and datasets

If you are unfamiliar with how to do any of this we recommend reading the
h5py quick start guide: https://docs.h5py.org/en/stable/quick.html


### Example - Iteration through a 4D dataset as images, writing image sum to a new file
We will create a dataset of non-zero integers, respresenting a complete scan, with all sets of
frames flushed to disk

In [None]:
import h5py
from swmr_tools import DataSource,utils
import numpy as np

#create some constants

test_file = "test_file.h5"
result_file = "sum.h5"
data_path = "/data/data_1"
key_path = "/keys/key_1"

#create a sequential array of the numbers 1-6 and reshape them into an array
# of shape (2,3,1,1)
complete_key_array = np.arange(6).reshape(2,3,1,1) + 1
#make grid of [5,10] images
complete_data_dataset = np.random.randint(low = 0, high = 1000, size = (2,3,5,10))

In [None]:
complete_key_array

We will create an empty hdf5 file, create a group called "keys" and create
a dataset in that group called "key_1" where we will add our array of non-zero
keys, and a group called data, with dataset called data_1, which is the data we want to process (a 2x3 grid of \[5,10\] shaped images).

In [None]:
with h5py.File(test_file, "w", libver = "latest") as f:
    f.create_group("keys")
    f["keys"].create_dataset("key_1", data = complete_key_array)
    f.create_group("data")
    f["data"].create_dataset("data_1", data = complete_data_dataset)

###### Next, we shall create an instance of the DataSource class and demonstrate a
simple example of its use. At a minimum we must pass the h5py.File object
we wish to read from, a list containing the paths to the hdf5 groups
containing our keys and a list containing the datasets we want to process.

Shown below is an example of using an instance of DataSource within a for loop,
as you would with any standard iterable object. For this basic example of a
dataset containing only non-zero values, the loop runs 6 times and stops as
expected 

In [None]:
#first check file and datasets are readable
utils.check_file_readable(test_file,[key_path,data_path])

# using an instance of Datasource in a for loop
with h5py.File(test_file, "r", libver = "latest", swmr = True) as f, h5py.File(result_file, "w",libver = "latest") as oh:
    keys = [f['/keys/keys_1']]
    datasets = {'/data/data_1' : f['/data/data_1'],
    '/data/data_2' : f['/data/data_2']}
    ds = DataSource(keys,datasets, timeout = 1)
    sum_dataset = None
    for dm in ds:
        s = dm[data_path].sum()
        
        if sum_dataset is None:
            sum_dataset = ds.create_dataset(s,oh,"result")
        else:
            ds.append_data(s,dm.slice_metadata,sum_dataset)

        sum_dataset.flush()
        print("Current result :" + str(sum_dataset[...]))
    
    print("Result dataset has shape: " + str(oh["/result"].shape))


Which shows the DataSource iterating over the 6 \[1,1,5,10\] datasets, which slice of the 2x3 block each image is taken from, and writing the sum of the image into a new file.