# Research Multi-dimensional Object Storage

This notebook serves to research the structure and storage options for multi-dimensional objects in hdf5 format before configuring Chapel code on the Arkouda server to handle this.

In [None]:
# import required packages
import os
import arkouda as ak
import h5py
import pandas as pd
import numpy as np

In [None]:
# connect ot arkouda server. Requires server to be running.
ak.connect()

## Review of Current Format

To make sure it is clear how Arkouda stores hdf5 files, we will perform some parsing of an object saved in arkouda as it stands today.

In [None]:
# load file saved by arkouda using h5py
f = h5py.File('/Users/ethandebandi/Documents/save_load_testing/ak_test_LOCALE0000.hdf5')

# List the keys
k = list(f.keys())
print(f"Keys: {k}")

# What is in _arkouda_metadata
meta = f['_arkouda_metadata']
mk = list(meta.keys())
print(f"METADATA: {meta}\n\tCurrently nothing in file, need to research what would be added.")

# access fields
for key in k:
    if key == "_arkouda_metadata":
        continue
    d = f[key][:]
    print(f"{key}: {d}\t DataType: {f[key]}")

# Based on dataset information we will probably be able to leverage this to determine shape

## Object Configuration

The Arkouda `ArrayView` module allows for multi-dimensional representation of `pdarrays`. We will be using this as the driving force behind storing Multi-Dimensional Objects.

In [None]:
a = ak.arange(27)
r = a.reshape((3, 3, 3))
r

## Write Multi-Dim Object to HDF5

Take the `ArrayView` object and write it into HDF5

In [None]:
def write_data(dset, shape, dim, data):
    if dim == len(shape)-1:
        for i in range(shape[dim]):
            dset[i] = data[i]
    else:
        dset[dim] = write_data(dset[dim+1], shape, dim+1, data[dim+1])
    return dset


In [None]:
fpath = '/Users/ethandebandi/Documents/save_load_testing/multi_dim.hdf5'
#os.remove(fpath)

s = r.shape.to_ndarray()
f = h5py.File(fpath, 'w')
dset = f.create_dataset('obj', shape=tuple(s), dtype='i', data=r)  # Note we will need to set the dtype based on
dset.attrs.create('Shape', s)
dset.attrs.create('Rank', s.size)
f.close()

# open file and see if it worked
f = h5py.File('/Users/ethandebandi/Documents/save_load_testing/multi_dim.hdf5', 'r')
dset = f['obj']
print(dset[:])
print(dict(dset.attrs.items()))

## Reading Using Chapel
Ensure that we are able to read the data from the file created using chapel

In [None]:
import arkouda as ak
ak.connect()
fpath = '/Users/ethandebandi/Documents/save_load_testing/multi_dim.hdf5'
ak.read_hdf5_multi_dim(fpath, 'obj')

## Writing using Chapel
Ensure that we are able to write multi-dimensional data to a file using chapel.

In [5]:
import arkouda as ak
ak.connect()
fpath = '/Users/ethandebandi/Documents/save_load_testing/multi_dim_save.hdf5'
arr = ak.ArrayView(ak.arange(27), ak.array([3, 3, 3]))
ak.write_hdf5_multi_dim(arr, fpath, 'WithType', mode="append", storage="flat")



connected to arkouda server tcp://*:5555


### Read The resulting file
Used for verification of the action

In [6]:
import h5py
fpath = '/Users/ethandebandi/Documents/save_load_testing/multi_dim_save.hdf5'
with h5py.File(fpath, 'r') as f:
    k = list(f.keys())
    print(f"File Keys:\n\t{k}")

    # make sure the dataset is readable
    dset = f['WithType']
    # print(dset[:])
    for i, v in dset.attrs.items():
        print(i,v)

File Keys:
	['WithType']
Format 0
ObjType b'ArrayView'
Rank 3
Shape [3 3 3]


In [7]:
ak.shutdown()