# h5netcdf data and xarray
an example of accessing / cleaning data stored in h5netcdf format with xarray

- data here is a geocoded unwrapped interferogram from a pair of Sentinel1 images accessed from ASF Vertex here: 
S1-GUNW-D-R-007-tops-20220602_20220521-043043-00022E_00042N-PP-9f71-v2_0_5 </br>
https://search.asf.alaska.edu/#/?zoom=3.000&center=31.783,13.898&dataset=SENTINEL-1%20INTERFEROGRAM%20(BETA)&resultsLoaded=true&granule=S1-GUNW-D-R-007-tops-20220602_20220521-043043-00022E_00042N-PP-9f71-v2_0_5-amplitude


In [23]:
import os, sys
#import h5netcdf
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import h5py

## Importing data

Let's use `xarray` to read in the S1 data as an `xarray.Dataset`

In [20]:
fpath = 'S1-GUNW-D-R-007-tops-20220602_20210625-043224-00021E_00036N-PP-dd9a-v2_0_5.nc'

In [22]:
ds1 = xr.open_dataset(fpath)
ds1

This show's us some information, but doesn't contain all of the data that we're expecting from this file. Let's take a look at how we can see the structure and contents of the hdf5 file

In [31]:
#help here from: https://www.youtube.com/watch?v=oWR7--o4no8
with h5py.File(fpath, 'r') as hdf:
    base_items = list(hdf.items())
    for item in range(len(base_items)):
        print('items in base directory: ', base_items[item])
        print('')

items in base directory:  ('crs_polygon', <HDF5 dataset "crs_polygon": shape (), type "<i4">)

items in base directory:  ('matchup', <HDF5 dataset "matchup": shape (0,), type ">f4">)

items in base directory:  ('productBoundingBox', <HDF5 dataset "productBoundingBox": shape (1, 451), type "|S1">)

items in base directory:  ('science', <HDF5 group "/science" (2 members)>)

items in base directory:  ('wkt_count', <HDF5 dataset "wkt_count": shape (1,), type ">f4">)

items in base directory:  ('wkt_length', <HDF5 dataset "wkt_length": shape (451,), type ">f4">)



In [38]:
print(base_items[0])
print(type(base_items[0]))
print(base_items[0][1])

('crs_polygon', <Closed HDF5 dataset>)
<class 'tuple'>
<Closed HDF5 dataset>


In [43]:
#how to see the 2 members of the science group? 
type(base_items[3])
base_items[3][1]

<Closed HDF5 group>

In [62]:
with h5py.File(fpath, 'r') as hdf:

    G1 = hdf.get('science')
    G1_items = list(G1.items())
    print('items in group1: ', G1_items)
    print(len(G1_items))
    print(G1_items[0])
    print(G1_items[0][0])
    print(G1_items[0][1])




items in group1:  [('grids', <HDF5 group "/science/grids" (3 members)>), ('radarMetaData', <HDF5 group "/science/radarMetaData" (15 members)>)]
2
('grids', <HDF5 group "/science/grids" (3 members)>)
grids
<HDF5 group "/science/grids" (3 members)>


In [2]:
ds1 = xr.open_dataset(fpath,
                     group = '/science/grids/data', engine='h5netcdf',
                     chunks = 'auto')

In [3]:
ds1

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 36.07 MiB 36.07 MiB Shape (2378, 3976) (2378, 3976) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",3976  2378,

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 36.07 MiB 36.07 MiB Shape (2378, 3976) (2378, 3976) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",3976  2378,

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 36.07 MiB 36.07 MiB Shape (2378, 3976) (2378, 3976) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",3976  2378,

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 36.07 MiB 36.07 MiB Shape (2378, 3976) (2378, 3976) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",3976  2378,

Unnamed: 0,Array,Chunk
Bytes,36.07 MiB,36.07 MiB
Shape,"(2378, 3976)","(2378, 3976)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray


In [16]:
type(ds1.amplitude.data)
np.count_nonzero(~(np.isnan(ds1.amplitude.data.compute())))

6278448

## Now, following example from youtube video to see groups etc within h5netcdf file
here: https://www.youtube.com/watch?v=oWR7--o4no8

In [49]:
with h5py.File(fpath, 'r') as hdf:
    base_items = list(hdf.items())
    for item in range(len(base_items)):
        print('items in base directory: ', base_items[item])
        print('')
    G1 = hdf.get('science')
    G1_items = list(G1.items())
    print('items in group1: ', G1_items)

items in base directory:  ('crs_polygon', <HDF5 dataset "crs_polygon": shape (), type "<i4">)

items in base directory:  ('matchup', <HDF5 dataset "matchup": shape (0,), type ">f4">)

items in base directory:  ('productBoundingBox', <HDF5 dataset "productBoundingBox": shape (1, 451), type "|S1">)

items in base directory:  ('science', <HDF5 group "/science" (2 members)>)

items in base directory:  ('wkt_count', <HDF5 dataset "wkt_count": shape (1,), type ">f4">)

items in base directory:  ('wkt_length', <HDF5 dataset "wkt_length": shape (451,), type ">f4">)

items in group1:  [('grids', <HDF5 group "/science/grids" (3 members)>), ('radarMetaData', <HDF5 group "/science/radarMetaData" (15 members)>)]
