# Research Multi-dimensional Object Storage

This notebook serves to research the structure and storage options for multi-dimensional objects in hdf5 format before configuring Chapel code on the Arkouda server to handle this.

In [1]:
# import required packages
import os
import arkouda as ak
import h5py
import pandas as pd
import numpy as np

    _         _                   _       
   / \   _ __| | _____  _   _  __| | __ _ 
  / _ \ | '__| |/ / _ \| | | |/ _` |/ _` |
 / ___ \| |  |   < (_) | |_| | (_| | (_| |
/_/   \_\_|  |_|\_\___/ \__,_|\__,_|\__,_|
                                          

Client Version: v2022.06.06+4.g04c634a6


In [2]:
# connect ot arkouda server. Requires server to be running.
ak.connect()

connected to arkouda server tcp://*:5555


## Review of Current Format

To make sure it is clear how Arkouda stores hdf5 files, we will perform some parsing of an object saved in arkouda as it stands today.

In [3]:
# load file saved by arkouda using h5py
f = h5py.File('/Users/ethandebandi/Documents/save_load_testing/ak_test_LOCALE0000.hdf5')

# List the keys
k = list(f.keys())
print(f"Keys: {k}")

# What is in _arkouda_metadata
meta = f['_arkouda_metadata']
mk = list(meta.keys())
print(f"METADATA: {meta}\n\tCurrently nothing in file, need to research what would be added.")

# access fields
for key in k:
    if key == "_arkouda_metadata":
        continue
    d = f[key][:]
    print(f"{key}: {d}\t DataType: {f[key]}")

# Based on dataset information we will probably be able to leverage this to determine shape

Keys: ['Random_A', 'Random_B', 'Random_C', '_arkouda_metadata']
METADATA: <HDF5 group "/_arkouda_metadata" (0 members)>
	Currently nothing in file, need to research what would be added.
Random_A: [2 1 3 0 2]	 DataType: <HDF5 dataset "Random_A": shape (5,), type "<i8">
Random_B: [2 1 4 4 3]	 DataType: <HDF5 dataset "Random_B": shape (5,), type "<i8">
Random_C: [1 4 3 4 3]	 DataType: <HDF5 dataset "Random_C": shape (5,), type "<i8">


## Object Configuration

The Arkouda `ArrayView` module allows for multi-dimensional representation of `pdarrays`. We will be using this as the driving force behind storing Multi-Dimensional Objects.

In [4]:
a = ak.arange(27)
r = a.reshape((3, 3, 3))
r

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

## Write Multi-Dim Object to HDF5

Take the `ArrayView` object and write it into HDF5

In [5]:
def write_data(dset, shape, dim, data):
    if dim == len(shape)-1:
        for i in range(shape[dim]):
            dset[i] = data[i]
    else:
        dset[dim] = write_data(dset[dim+1], shape, dim+1, data[dim+1])
    return dset


In [17]:
fpath = '/Users/ethandebandi/Documents/save_load_testing/multi_dim.hdf5'
os.remove(fpath)

s = r.shape.to_ndarray()
f = h5py.File(fpath, 'w')
dset = f.create_dataset('obj', shape=tuple(s), dtype='i', data=r)  # Note we will need to set the dtype based on
dset.attrs.create('Shape', (3, 3, 3))
f.close()

# open file and see if it worked
f = h5py.File('/Users/ethandebandi/Documents/save_load_testing/multi_dim.hdf5', 'r')
dset = f['obj']
print(dset[:])
print(dict(dset.attrs.items()))

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
{'Shape': array([3, 3, 3])}


In [18]:
ak.shutdown()