# Highly Scalable Data Service (HSDS)

## Installation

For this simplistic setup, the installation is easy:

1. Create a directory for the HSDS data files (`~/hsds_data`).
2. Use the user name `vscode` and password `vscode` to authenticate to HSDS.
3. Launch the service.

For more sophisticated setups (e.g., Kubernetes), please refer to the [HSDS documentation](https://github.com/HDFGroup/hsds/tree/master/docs).

In [None]:
%%bash
export HS_ENDPOINT=http://localhost:5101
export HS_USERNAME=$USER
export HS_PASSWORD=$USER
mkdir ~/hsds_data
hsds --root_dir ~/hsds_data --hs_username $USER --hs_password $USER >~/hs.log 2>&1 &

We create a configuration file for HSDS: `~/.hscfg`:

In [None]:
%%bash
hsconfigure <<< $'http://localhost:5101\nvscode\nvscode\n\nY\n'

Let's check that the server is running and the configuration is correct:

In [None]:
%%bash
hsinfo

Create the top-level domain and a user "directory" for the user `vscode`:

In [None]:
%%bash
hstouch /home/ && hstouch /home/$USER/

In [None]:
%%bash
hsinfo

## Kicking the tires

Let's create a simple HDF5 domain with a single dataset:

In [None]:
import h5pyd

f = h5pyd.File("/home/vscode/foo.h5", "w")
dset = f.create_dataset("dset", data=[1,2,3,4])
f.id.id
f.close()

Simple command line tools are available to interact with the service:

In [None]:
%%bash
hsls /home/vscode/foo.h5

In [None]:
%%bash
hsstat /home/vscode/foo.h5

In [None]:
%%bash
find ~/hsds_data

## Is HSDS really HDF5?

We can reuse the Python version of our HDF5 example with only two trivial changes:

1. We use the `h5pyd` package instead of `h5py`. (See line 2.)
2. The file name is now a domain name (i.e., `/home/vscode/ou_h5pyd.h5`). (See line 30.)

In [None]:
import numpy as np
import h5pyd as h5py

def ou_sampler(path_count, step_count, dt, theta, mu, sigma):
    '''
    Generates sample paths for an Ornstein-Uhlenbeck process.
    '''
    ou_process = np.zeros((path_count, step_count))
    for i in range(path_count):
        for j in range(1, step_count):
            dW = np.random.normal(0, np.sqrt(dt))
            ou_process[i, j] = ou_process[i, j-1] + theta * (mu - ou_process[i, j-1]) * dt + sigma * dW
    return ou_process

def main():
    # Parameters
    path_count = 100
    step_count = 1000
    dt = 0.01
    theta = 1.0
    mu = 0.0
    sigma = 0.1

    print("Running with parameters:", "paths=", path_count, "steps=", step_count, "dt=", dt, "theta=", theta, "mu=", mu, "sigma=", sigma)

    # Generate OU process sample paths
    ou_process = ou_sampler(path_count, step_count, dt, theta, mu, sigma)

    # Write sample paths to an HDF5 file
    with h5py.File('/home/vscode/ou_h5pyd.h5', 'w') as file:
        file.attrs['source'] = 'https://github.com/HDFGroup/hdf5-tutorial'

        # Create & write the dataset
        dataset = file.create_dataset('dataset', data=ou_process)

        # Add documentation to the dataset
        file['dataset'].attrs['comment'] = 'This dataset contains sample paths of an Ornstein-Uhlenbeck process.'
        file['dataset'].attrs['Wikipedia'] = 'https://en.wikipedia.org/wiki/Ornstein–Uhlenbeck_process'
        file['dataset'].attrs['rows'] = 'path'
        file['dataset'].attrs['columns'] = 'time'
        
        # Set attributes
        file['dataset'].attrs['dt'] = dt
        file['dataset'].attrs['θ'] = theta
        file['dataset'].attrs['μ'] = mu
        file['dataset'].attrs['σ'] = sigma

if __name__ == "__main__":
    main()

In [None]:
%%bash
hsls --showattrs -r /home/vscode/ou_h5pyd.h5

The visualization of the data with HSDS is identical with the same two trivial changes:

In [None]:
%matplotlib inline
import h5pyd as h5py
import numpy as np

f = h5py.File("/home/vscode/ou_h5pyd.h5")
dset = f["dataset"]

In [None]:
arr = dset[42,:]
print(f"min: {arr.min():.2f}, max: {arr.max():.2f}, mean: {arr.mean():.2f}")

In [None]:
import matplotlib.pyplot as plt

plt.style.use('_mpl-gallery')
fig, ax = plt.subplots()
ax.plot(np.arange(0,len(arr)), arr, linewidth=2.0)
plt.show()

In [None]:
f.close()

## HSDS and HDF5 files

It is easy to import and export HDF5 files from and to HSDS:

In [None]:
%%bash
hsload ou_process.h5 /home/vscode/ou_process.h5

In [None]:
%%bash
hsstat /home/vscode/ou_process.h5

In [None]:
%%bash
hsls --showattrs -r /home/vscode/ou_process.h5

Export an HSDS domain to an HDF5 file and test for equality:

In [None]:
%%bash
hsget /home/vscode/ou_process.h5 ou_process_copy.h5
h5diff ou_process.h5 ou_process_copy.h5

Voila!

## Summary

HSDS is a highly scalable data service for HDF5 data. It is easy to install and use. It is a great tool for sharing HDF5 data with others.

HSDS implements the HDF5 data model and, as we will see in a moment, is compatible with the HDF5 API.

HSDS complements the HDF5 library and file format. It is not a replacement, but a great addition to the HDF5 ecosystem and the better option for many use cases in the cloud, and, generally, where a service-oriented interface is preferred.