# 2 Using Command Line Tools

Objectives:
 * Learn about the command line tools that are included with the HDF5 library (and h5pyd package for HSDS)
 * You can run these in a notebook using the shell escape '!', or open a terminal window and run them from there
 

Using HDF5 Library Tools
------------------------

There are several command line tools that are included with the HDF5 library.
The most commonly used ones are: 

* `h5ls` - list contents of an HDF5 file
* `h5dump` - dump out the contents of an HDF5 file
* `h5diff` - compare two HDF5 files
* `h5stat` - get detailed statistics on an HDF5 file

We'll explore each of these below...

In [None]:
# To start with, let's grab an HDF5 file to work with...
# The exclamation sign indicates to jupyter to execute the following cell in the shell
# Alternatively you use the codespace terminal tool and run wget there.
! wget https://s3.amazonaws.com/hdfgroup/data/hdf5test/tall.h5

In [None]:
# Display objects with a file. Use -r for recursive mode
! h5ls -r tall.h5

In [None]:
# h5dump will show the dataset contents by default
! h5dump -r tall.h5

In [None]:
# h5dump will display not just the objects in the file, but (by default) print
# the dataset data as  well
! h5dump -r tall.h5

In [None]:
# h5stat will show many detailed statitistics about the file
! h5stat tall.h5

Using h5pyd Tools
------------------------

The h5pyd Python package provides a Python interface for accessing HSDS.  
It's based on the h5py API, so most programs should be easily converted from using
h5py to h5pyd.  
The h5pyd package also include a set of command line tools for working with HSDS content.
There are analogs to the library tools (`hsls` rather than `h5ls`) plus some additional tools
that serve as standins for the common Linux command line tools (e.g. `hsrm` rather than `rm`).
There are also tools for uploading an HDF5 file to an HSDS domain (`hsload`) and 
downloading an HSDS domain to an HDF5 file (hopefully not the worse for wear).
The tools linclude: hsconfigure, hsload, hsget, hsls, hstouch, hsrm, hsacl, hsdiff,
and hsstat: 

* `hsconfigure` - setup a connection to an HSDS server
* `hsload` - copy an HDF5 file to an HSDS domain
* `hsget` - copy an HSDS domain to an HDF5 file
* `hsls` - list the contents of an HSDS domain (or HSDS folders)
* `hstouch` - create a new domain or folder
* `hsrm` - remove a domain or folder
* `hsacl` - view or edit HSDS folder or domain ACLs (permission settings)
* `hsdiff` - compare an HDF5 file with an HSDS domain
* `hsstat` - get detailed statistics on an HSDS domain

Running any of these with `--help` will provide usage info.

In addition we'll try  out some of these below...

In [None]:
# A dedicated instance of HSDS should be running as part of this 
# codespace.
# You can verify this by using the hsinfo command.  
# It will show the current server status
! hsinfo

In [None]:
# When you first create the codespace, there are no domains loaded in HSDS,
# but you can use hsload any HDF5 to HSDS.
# Let's try it with the file we downloaded earlier.
! hsload tall.h5 hdf5://home/vscode/

In [None]:
# hsls works like h5ls but with content managed by the server
! hsls -r hdf5://home/vscode/tall.h5

In [None]:
# hsls can also be used to display contents of an HSDS folder.
# HSDS folders are similar in concept to directories.  They allow you
# to organize collections of domains and sub-folders
# Note: trailing slash is required

! hsls hdf5://home/vscode/

In [None]:
# hsstat can be used to see statistics of the domain
! hsstat hdf5://home/vscode/tall.h5

In [None]:
# and hsget allows you to create an hdf5 file from an HSDS domain
! hsget hdf5://home/vscode/tall.h5  tall2.h5

In [None]:
# compare this to the original.  No output indicates that the two are equivalent
! h5diff tall.h5 tall2.h5

HDF5 File Linking
-----------------

If you would like to load a HDF5 file in the cloud (with s3 or azure blob storage), you can *link* to it rather
than copying all the data into the limited storage include with your codespace.
Linking we just copy the HDF5 file metadata (typically a small fraction of the over file size) to your
local HSDS store.  The HDF5 "chunks" (where dataset data is stored) are accessed on demand from the cloud provider.
Since your vscode space is also in the cloud, this should be quite fast compared with accessing directly from your 
laptop.

In [None]:
# Use the --link option to link to an existing file.
! hsload --link s3://hdf5.sample/data/hdf5test/snp500.h5 hdf5://home/vsode/snp500.h5

Problem: If the path name doesn't end in a slash, hsls assumes you are looking for a domain, not a folder.  What does "hsls -H -v /shared" return?

Problem: Try the above command with the -H -v flags

Problem: Try the above command with different options:
* -v
* --showattrs
* --showacls
* --loglevel debug

Problem: List the contents of the home directory for your account.
Add the --showacls flag to show the permissions for this folder

Problem: Run ``$showacl /home/test_user1/`` to show the permissions of the folder

### Upload the file to your home folder
Run: `$ hsload tall.h5 /home/test_user1/`  # replace 'myfolder' with your actual folder


### List the contents of the uploaded file
The file is now stored in HSDS, use hsls to display it:
`$ hsls -r /home/test_user1/tall.h5`



### Download the file 
Run: `$ hsget /home/test_user1/tall.h5 tall2.h5`
This will download the file as tall2.h5


### Compare the two files
Run: `$ h5dump tall.h5` 
and`$h5dump tall2.h5`

Problem: Are these files the same?

### ACLS (Access Control List)
Each server domain or folder can contain one or more ACLs that control
who may perform operations (e.g. read/update/delete)

Run: $ hsacl /home/test_user1/tall.h5

Problem: What happens when you run $ hsacl /home/test_user1/tall.h5?

Problem: Update *your* tall.h5 so that anyone can read the ACLs
(see hsacl --help)