# Using Python functions and community software to visualize `HDF5` <br> and `NeXus` files in the notebook

This notebook has been made using the following material:

H5Web project [https:/github.com/silx-kit/h5web](https:/github.com/silx-kit/h5web);

[www.christopherlovell.co.uk](www.christopherlovell.co.uk);<br>



[https://docs.h5py.org](https://docs.h5py.org) <br>
[pythonforthelab.com/blog/how-to-use-hdf5-files-in-python](https://www.pythonforthelab.com/blog/how-to-use-hdf5-files-in-python)
    
    

# Part 1: Creating and editing HDF5 files

<a name="Index"></a>

# Content
* [Create and display a simple HDF5 file](#Create)
* [Add content in hierarchical structure](#Hierarchical)
%* [call and visualize content](#explore)
* [create links: hard, soft, relative](#link)
* [insert compression](#compression)
* [save as ASCII](#ascii)
* [plotting routines](#plot)

<a name="Create"></a>

## 1. Create and display a simple HDF5 file

In [1]:
import numpy as np
import h5py, os, json
#remove first local  copies
filename= "simple.h5"
filename1="simple1.h5"
if os.path.exists(filename):
    os.remove(filename)
if os.path.exists(filename1):
    os.remove(filename1)    
#the file is created

with h5py.File(filename, "w") as h5file:
    X = np.arange(-5, 5, 0.25)
    Y = np.arange(-5, 5, 0.25)
    Xg, Yg = np.meshgrid(X, Y)

In [2]:
#new elements respect original notebook are some visualization
#and alternative creation of hdf5 files

In [3]:
print(Xg.shape,Yg.shape)

(40, 40) (40, 40)


*Definition of new arrays of data*

In [4]:
d1=[np.sin(2*np.pi*f*np.sqrt(Xg**2 + Yg**2)) for f in np.arange(0.1, 1.1, 0.1)]
d2= np.sin(np.sqrt(Xg**2 + Yg**2))

In [5]:
print(h5file)

<Closed HDF5 file>


There are several ways to enrich the hdf5 file with content
One is with `create_dataset`

In [6]:
hf2=h5py.File(filename1,'w')

In [7]:
hf2.create_dataset('dataset_1',data=d1)
hf2.create_dataset('dataset_2',data=d2)

<HDF5 dataset "dataset_2": shape (40, 40), type "<f8">

Let's parse the array elements

In [8]:
with h5py.File(filename1,'r') as f_t_u:
    t1=f_t_u['dataset_1']
    t2=f_t_u['dataset_2']
    print(t1[t1[:]>0.001])
    data = t1[t1[()]>0.001]
    print(data)

[0.01180287 0.0554391  0.09152959 ... 0.32010572 0.96838155 0.71253099]
[0.01180287 0.0554391  0.09152959 ... 0.32010572 0.96838155 0.71253099]


This way we created a copy of the dataset into the RAM, the actual data are in the hard-drive
for further manipulation

In [9]:
print(hf2.keys())

<KeysViewHDF5 ['dataset_1', 'dataset_2']>


## The other method for the definition of HDF5 elements is  the direct assignment,<br> with dictionary call and direct keys definition

In [10]:
os.remove(filename1)
with h5py.File(filename1,'w') as h5filen:
    h5filen['one'] = d1
    h5filen['two'] = d2
    #h5file['oneD'] = X
    #h5file['scalar'] = 42

In [11]:
print(h5filen.items())

ItemsViewHDF5(<Closed HDF5 file>)


It is possible to update the assignment
and to extract the elements of the hdf5 file using multiple methods

e.g. using the `get` function

In [12]:
n1=hf2.get('dataset_1')

we can convert the object to an array using `numpy`

In [13]:
n1=np.array(n1)
n1.shape

(10, 40, 40)

*alternatively*

In [14]:
n2=hf2['dataset_1']
np.array(n2).shape

(10, 40, 40)

or

Visualize the data in the file using the call `keys()`

In [15]:
hf1=h5py.File('simple.h5')

In [16]:
hf1.keys()

<KeysViewHDF5 []>

<a name="Hierarchical"></a> 

[Go back to index](#Index)

## 2. Introducing the hierarchical structure

*Using group function*

Group can be seen as container to organize the data in the HDF5
they are characterized by keys (names),  vakues (container elements)

In [17]:
#h5file['oneD']['data']=X 
#check why it doesn't work
if os.path.exists(filename1):
    os.remove(filename1)
h5file1=h5py.File(filename1,'w')
g1=h5file1.create_group('group1')
dg1=g1.create_dataset('data1',data=d1)

In [18]:
print(g1.items())    

ItemsViewHDF5(<HDF5 group "/group1" (1 members)>)


In [19]:
print(g1.name)

/group1


You can notice that the name of the group is specified by a slash "/"

Every group is created with an hard link and is possible to explore the content using the <br>
`keys` function

`items`,`values`,`keys` are calls to explore the groups
The names of the objects are all strings 

In [20]:
subgrp=g1.create_group("/group1/subgroup1")
#,"textsubgroup")

In [21]:
print(subgrp.name)

/group1/subgroup1


In [22]:
out=h5file1['group1']['subgroup1']
print(out)

<HDF5 group "/group1/subgroup1" (0 members)>


In [23]:
print(g1.values())

ValuesViewHDF5(<HDF5 group "/group1" (2 members)>)


Many groups can be defined and many arrays associated to the groups<br>
`del` command can be used to remove objects

#### Metadata

In [24]:
if os.path.exists(filename):
    os.remove(filename)
os.remove(filename1)
import time
with h5py.File(filename1,'w') as h5file1:
    g1=h5file1.create_group('group1')
    dg1=g1.create_dataset('data1',data=d1)
    print(g1.items())
    g1.attrs['Date'] = time.time()
    g1.attrs['User'] = 'Me'
    for k in g1.attrs.keys():
        print(k, g1.attrs[k])

    for j in dg1.attrs.keys():
      print(j, dg1.attrs[j])

ItemsViewHDF5(<HDF5 group "/group1" (1 members)>)
Date 1649851060.5856996
User Me


In [25]:
import time
metadata = {'Date': time.time(),
                'User': 'Me',
                'OS': os.name,}
if os.path.exists(filename1):
     os.remove(filename1)
with h5py.File(filename1,'w') as h5file1:
    g1 = h5file1.create_group('group1')
    d = g1.create_dataset('data_1', data=d1)
    m = g1.create_dataset('metadata', data=json.dumps(metadata))
    print(h5file1['group1']['metadata'])
    print(g1.name)


<HDF5 dataset "metadata": shape (), type "|O">
/group1


In [2]:
if os.path.exists(filename1):
    os.remove(filename1)
with h5py.File(filename1,'w') as hf2:
    g1=hf2.create_group('group1')
    dg1=g1.create_dataset('data1',data=d1)
    print(hf2['group1'].keys())
    subgroup = g1.create_group("/group1/subgroup1")
    subgroup.attrs['Date'] = time.time()
    subgroup.attrs['User'] = 'Me'
    print(hf2.keys())

NameError: name 'filename1' is not defined

### Metadata

[Go back to index](#Index)

<a name="link"></a>

## 3. Create links 
** Links to the dataset**
**Links to the group**
**hard links**
**soft links**

a.Hard links

ValueError: Invalid group (or file) id (invalid group (or file) ID)

In [None]:
hf2["data2"]=42

In [None]:
out=hf2['data2']

In [None]:
print(out)

In [None]:
hf2['data3']=out

In [None]:
print(hf2['data3'])

b. Soft link

In [None]:
c. External link

[Go back to index](#Index)

<a name="compression"></a>

## 4. Specify data type and  Insert compression

In [None]:
import h5py,os
filenamet="testdataset_sizes"
if os.path.exists(filenamet):
    os.remove(filenamet)
with h5py.File(filenamet,'w') as file_t:
    dset_int1=file_t.create_dataset('integers',(10,),dtype='i1')
    dset_int8=file_t.create_dataset('integers8',(10,),dtype='i8')
    dset_complex=file_t.create_dataset('complex',(10,),dtype='c16')
    dset_int1[0]=1200
    dset_int8[0]=1200.1
    dset_complex[0]=3+4j    
    print(file_t.keys())
    print(file_t['integers'][:])

In [None]:
arr = np.random.randn(100000)

with h5py.File('integer_1_compr.hdf5', 'w') as f:
    d = f.create_dataset('dataset', (100000,), dtype='i1', compression="gzip", compression_opts=9)
    d[:] = arr

with h5py.File('integer_8_compr.hdf5', 'w') as f:
    d = f.create_dataset('dataset', (100000,), dtype='i8', compression="gzip", compression_opts=9)
    d[:] = arr

with h5py.File('float_compr.hdf5', 'w') as f:
    d = f.create_dataset('dataset', (100000,), dtype='f16', compression="gzip", compression_opts=9)
    d[:] = arr


<a name="ascii"></a>

Resizing

In [None]:
with h5py.File('resize_dataset.hdf5', 'w') as f:
    d = f.create_dataset('dataset', (100, ),  maxshape=(500, ))
    d[:100] = np.random.randn(100)
    d.resize((200,))
    d[100:200] = np.random.randn(100)

with h5py.File('resize_dataset.hdf5', 'r') as f:
    dset = f['dataset']
    print(dset[99])
    print(dset[199])

Chunks

In [None]:
dset = f.create_dataset("chunked", (1000, 1000), chunks=(100, 100))
dset = f.create_dataset("autochunk", (1000, 1000), chunks=True)

Compression

[Go back to index](#Index)

## 5. Save in ASCII format**

[Go back to index](#Index)

<a name="plot"></a>

## 6. Plotting and Visualization 

**Panosc software**

In [1]:
from jupyterlab_h5web import H5Web
import numpy as np
import h5py, os
#remove first local  copies
filenamet= "simplet.h5"
if os.path.exists(filenamet):
    os.remove(filenamet)

with h5py.File(filenamet, "w") as h5file:
    X = np.arange(-5, 5, 0.25)
    Y = np.arange(-5, 5, 0.25)
    Xg, Yg = np.meshgrid(X, Y)
    h5file['threeD'] = [np.sin(2*np.pi*f*np.sqrt(Xg**2 + Yg**2)) for f in np.arange(0.1, 1.1, 0.1)]
    h5file['twoD'] = np.sin(np.sqrt(Xg**2 + Yg**2))
    h5file['oneD'] = X
    h5file['scalar'] = 42
H5Web(filenamet)

<jupyterlab_h5web.widget.H5Web object>

[Go back to index](#Index)