# Exploring `.bag` Bathymetry Data Files

An exploration of data and metadata in Bathymetric Attributed Grid (BAG) files.

References:

* BAG website: https://marinemetadata.org/references/bag
* Format Specification Document: http://www.opennavsurf.org/papers/ons_fsd.pdf
* A slightly dated, Python 2 based video lesson on accessing BAG files: https://www.youtube.com/watch?v=dEtC6bRcjvc

Working environment for this notebook:

* Python 3.5
* `conda` packages:

    * `h5py` - Python interface to HDF5 format used by BAG
    * `lxml` - XML parser and manipulation library to access BAG metadata
    * `numpy` - for n-dimensional arrays
    * `matplotlib` - for plotting
    * `notebook` - Jupyter notebook
    
"Keep Calm and Conda Install"

If you are looking at this in the Salish Sea Tools docs at
http://salishsea-meopar-tools.readthedocs.io/en/latest/bathymetry/exploring_bag_files.html,
you can find the source notebook that generated the page in the Salish Sea project
[tools repo](https://bitbucket.org/salishsea/tools)
at `tools/bathymetry/ExploringBagFiles.ipynb`
or download the notebook by itself
(instead of cloning the [tools repo](https://bitbucket.org/salishsea/tools) to get it)
from
http://nbviewer.jupyter.org/urls/bitbucket.org/salishsea/tools/raw/tip/bathymetry/ExploringBagFiles.ipynb.

In [1]:
import h5py
import matplotlib.pyplot as plt
import numpy as np

In [2]:
%matplotlib inline

Load the BAG dataset and explore some of its basic attributes:

In [3]:
bag = h5py.File('/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag')

In [5]:
print(type(bag))
print(bag.name)
print(bag.filename)

<class 'h5py._hl.files.File'>
/
/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag


In [7]:
for item in bag.items():
    print(item)
    
for value in bag.values():
    print(value)

('BAG_root', <HDF5 group "/BAG_root" (4 members)>)
<HDF5 group "/BAG_root" (4 members)>


In [8]:
list(bag['BAG_root'].items())

[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
 ('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
 ('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
 ('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]

The list above contains the 4 elements that the BAG specification tells us
should be in the file.

Note that under Python 3 the `h5py` library maked heavy use of `memoryview` objects
which are iterators.
The transformation to a `list` object above,
or the use of a `for` loop above that collects the items from the `memoryview`.

One odd thing to note is that the metadata is stored as a collection of 1-character strings
which turn out to be single bytes in Python 3.
We're going to have to do something about that...

Peeling away the HDF5 group layer:

In [9]:
root = bag['BAG_root']
print(root.name)
print(root.parent)
list(root.items())

/BAG_root
<HDF5 group "/" (1 members)>


[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
 ('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
 ('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
 ('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]