<center>
<img src='./img/nsidc_logo.png'/>

# **Valkyrie**
### Point Cloud Data Access
</center>

---

<img align="right" width="50%" src='./img/vaex_ib.png'/>

### What is this notebook about?
This Jupyter notebook is an **interactive document** aimed to teach students, researchers and people who might be interested in cryospheric sciences how to access and work with various **data sets derived primarily from NASA’s [Icebridge](https://www.nasa.gov/mission_pages/icebridge/index.html) mission and some related data sets like the [Icesat](https://icesat-2.gsfc.nasa.gov/) (1 & 2) missions** using the NSIDC Valkyrie API.

> If you want to know what an API is take a look at this video: ["What is an API?"](https://www.youtube.com/watch?v=s7wmiS2mSXY) 

After using this notebook you should be able to:
* Learn the basics about the datasets served by Valkyrie
* Access Valkyrie data in a programatic way. 
* Along with the companion [visualization and analysis notebook](./2_dataviz_valkyrie.ipynb) you should have an idea about how to read and analyze data from Valkyrie.

### Icebridge Overview

**IceBridge** is the largest airborne survey of Earth's polar ice ever flown. It yields an unprecedented three-dimensional view of Arctic and Antarctic ice sheets, ice shelves and sea ice. These flights provide a yearly, multi-instrument look at the behavior of the rapidly changing features of the Greenland and Antarctic ice.

Data collected during IceBridge helps scientists bridge the gap in polar observations between NASA's Ice, Cloud and Land Elevation Satellite (ICESat) -- launched in 2003 and de-orbited in 2010 -- and ICESat-2, launched in 2018. ICESat stopped collecting science data in 2009, making IceBridge critical for ensuring a continuous series of observations.

IceBridge uses airborne instruments to map Arctic and Antarctic areas. IceBridge flights are generally conducted in **March-May over Greenland and in October-November over Antarctica**. Other smaller airborne surveys around the world are also part of the IceBridge campaign.

### IceSat Overview

The ICESat-2 mission is designed to provide elevation data needed to determine ice sheet mass balance as well as vegetation canopy information. It will provide topography measurements of cities, lakes and reservoirs, oceans and land surfaces around the globe. The sole instrument on ICESat-2 is the Advanced Topographic Laser Altimeter System (ATLAS), a space-based lidar. It was designed and built at Goddard Space Flight Center, with the laser generation and detection systems provided by Fibertek. ATLAS measures the travel time of laser photons from the satellite to Earth and back; computer programs use the travel time from multiple pulses to determine elevation.

You can go to NSIDC's landing page for a complete list of IceSat 2 datasets and their documentation: [ICESat-2 Data Sets at NSIDC](https://nsidc.org/data/icesat-2/data-sets)

<p align="center">
<img style="align: center;" width="80%" src='./img/valkyrie-coverage.jpg'/>
</p>

# Valkyrie

As you may notice by now, Icebridge is a multi year NASA campaign that used multiple instruments operated by more than one research institution. These conditions resulted in more rich but diverse data that present non-trivial problems when researchers want to use it in a holistic way. Valkyrie is more than an access tool, Valkyrie is a tool that abstracts a lot of the processing steps that a user would have to do before even start working with the data. The core feature of Valkyrie is that **translates** the data from their original formats into a coherent HDF-based table wihtout losing the original metadata. On top of that, Valkyrie provides a couple common corrections that affect geo located data over large periods of time: plate techtonic motion and continental drift.

Take a look at this document to know more about GPS data and the corrections that Valkyrie can perform for the users 
* ### [Applying Coordinate Transformations to Facilitate Data Comparison](https://gist.github.com/kbeamnsidc/b263eb992ce6c50a1ceafb24ac70cd0a)


### References
1. [Airborne Topographic Mapper Calibration Procedures and Accuracy Assessment](https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120008479.pdf)

2. [Open Source Tools for Point Cloud Processing, Storage, Subsetting, and Visualization](https://sea.ucar.edu/sites/default/files/kbeam_seaconf18.pdf)

### Related Tools

* [OpenAltimetry](https://openaltimetry.org/): Advanced discovery, processing, and visualization services for ICESat and ICESat-2 altimeter data
* [ITS_LIVE](https://its-live.jpl.nasa.gov/):A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets. 


### Knowledge requirements
To get the most out of this notebook you should be familiar with Python and its geoscience stack. If you only know some python that's also O.K. Most of the "heavy lifting" is done with our Valkyrie client code so you don't necessarily need to know a lot about these libraries. If you feel like learning more about geo-science and python there are a great turorials by CU Boulder's Earth Lab here: [Data Exploration and Analysis Lessons](https://www.earthdatascience.org/tags/data-exploration-and-analysis/) or from the data carpentry project: [Introduction to Geospatial Concepts](https://datacarpentry.org/organization-geospatial/)


The main packages that we are going to use are:

 * [requests](https://requests.readthedocs.io/en/master/):
 Simple HTTP library for Python, used to make requests as its name states.
 * [geopandas](https://geopandas.org/):
 library to make working with geospatial data in python easier (using pandas). 
 * [geojson](https://github.com/jazzband/geojson):
 Functions for encoding and decoding GeoJSON formatted data in Python
 * [h5py](https://github.com/h5py/h5py):
 h5py is a thin, pythonic wrapper around the the [HDF5 library](https://en.wikipedia.org/wiki/Hierarchical_Data_Format). 
 * [matplotlib](https://matplotlib.org/):
 Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
 * [vaex](https://github.com/vaexio/vaex):
 Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.:
 * [iPyLeaflet](https://github.com/jupyter-widgets/ipyleaflet):
 A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook.
 * [icepyx](https://github.com/icesat2py/icepyx):
 a software library for ICESat-2 data users
  
As we already mentioned, for the users convenience we created a python library that encapsulates the most repetitive tasks to access Valkyrie and Icesat 2 data. If you feel comfortable just using code you don't need to use the user interface. 


## Data sets and their coverage


The Valkyrie project provides web services for ordering spatially and temporally subsetted Lidar point cloud data from the [BLATM L1B](https://nsidc.org/data/BLATM1B), [ILATM L1B v1](https://nsidc.org/data/ilatm1b/versions/1), [ILATM L1B V2](https://nsidc.org/data/ILATM1B), [ILVIS2](https://nsidc.org/data/ILVIS2) and [IceSat GLAH06](https://nsidc.org/data/GLAH06/) data products.

The following table describes the temporal and spatial coverage of each of these dataset as well as the sensor and platform used to acquire the data.


---

|              | Spatial Coverage                                                      | Temporal Coverage                              | Platform                                              | Sensor                   |
|--------------|-----------------------------------------------------------------------|------------------------------------------------|-------------------------------------------------------|--------------------------|
| BLATM L1B    | South: N:-53, S: -90, E:180, W:-180 North: N:90, S: 60, E:180, W:-180 | 23 June 1993 - 30 October 2008                 | DC-8, DHC-6, P-3A ORION, P-3B                         | ATM                      |
| ILATM L1B V1 | South: N:-53, S: -90, E:180, W:-180 North: N:90, S: 60, E:180, W:-180 | 31 March 2009 - 8 November 2012 (updated 2013) | AIRCRAFT, DC-8, P-3B                                  | ATM                      |
| ILATM L1B V2 | South: N:-53, S: -90, E:180, W:-180 North: N:90, S: 60, E:180, W:-180 | 20 March 2013 - 16 May 2019 (updated 2020)     | C-130, DC-8, HU-25A, HU-25C, P-3B, WP-3D ORION        | ATM                      |
| ILVIS2       | North: N:90, S: 60, E:180, W:-180                                     | 25 August 2017 - 20 September 2017             | AIRCRAFT, B-200, C-130, DC-8, G-V, HU-25C, P-3B, RQ-4 | ALTIMETERS, LASERS, LVIS |
| GLAH06       | Global: N:86, S: -86, E:180, W:-180                                     |     20 February 2003 - 11 October 2009        | IceSat | ALTIMETERS, CD, GLAS, GPS, GPS Receiver, LA, PC


--- 

> Note: If you have any qustions about the data please contact NSIDC user services at users@nsidc.org

In this tutorial we are going to use iPyLeaflet and other Jupyter widgets to select our constraints and place a data order using the Valkyrie API.

First to get you familiarized with the combined coverage of these products run the next cell and play the Youtube video.




In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('jRB1OEDXXwY')

### NASA's EarthData Credentials

The first step to start working with Valkyrie data is to login into the [NASA's Earth Data system](https://earthdata.nasa.gov/).
We are going to use a user interface to build the parameters we need to post data orders to Valkyrie but this can be done in your own programmatic way, please refer to the Valkyrie documentation or the NSIDC Swagger [OpenAPI documentation](https://staging.nsidc.org/apps/orders/api/).



In [None]:
# We import our Valkyrie client library
from valkyrie.client import ValkyrieClient
# We instantiate our client
v = ValkyrieClient()
# Now we enter our NASA Earth Data Credentials and verify that they work.
# If you are using Jupyter Lab you can see the output in the logs window.
v.display(['credentials'])

In [None]:
# Now we are going to verify that our credentials are valid.
session = v.create_earthdata_authenticated_session()
if session is None:
    print('we are not logged into NASA EarthData')
else:
    print('we are logged into NASA EarthData!')

Ok, now we are going to render Valkyrie's user interface, if you want to change the hemisphere you need to select a different value in the hemisphere drop down control.
This user interface uses [ipylaflet](https://blog.jupyter.org/interactive-gis-in-jupyter-with-ipyleaflet-52f9657fa7a) which allows us to draw
polygons or bounding boxes to delimit our area of interest. We can also edit and delete these geometries using the the widget controls in the map.

The **"Get Granule Count"** button will query [NASA's CMR](https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/cmr) to get a granule count for the current parameters, we need to have a geometry and one or more datasets selected.

**Notes**: 
> * If you use the bounding box geometry in a polar projection you'll notice a distortion due the nature of polar coordinates, if you prefer you can use the global mercator map to draw a bounding box without aparent distortion. Polygons are probably a beter idea or you can even input your own coordinates as we'll see later.
> * The calculated download size of these granules is an upper bound since Valkyrie allows us to subset the data. 

## UI Elements

* **Hemisphere**: Which map projection we are going to use, we can pick global, north or south
* **Datasets**: A selection of the datasets served by Valkyrie, we can pick one or more (CTRL+ Space or CTRL+Click)
* **IceSat2**: If we want to also place a data orderfor IceSat 2 data for the current parameters we need to enter the Atlas name i.e. ATL06
* **ITRF**: The International Terrestrial Reference Frame see: [ITRF](https://gist.github.com/kbeamnsidc/b263eb992ce6c50a1ceafb24ac70cd0a)
* **Epoch**: The epoch in which the coordinate reference systems are based, valid when using ITRF.
* **Date Range**: A slider control to filter the selection area between a start and end date
* **Map**: The main widget, you can draw polygon or bounding boxes and edit them. You can also turn on and off the layers that show IceBridge flights and the rest.

To display the user interface we use the method `display` and we need 3 parameters:

* **what**: what we want to display, the valid values are **credentials**, **controls** and **map**
* **where**: where we want the widgets to be, horizonal will render them in the next cell or vertical will use a separate column.
* **extra_layers**: if we want to add more than the IceBridge layers to the map, True or False

> **note:** The Valkyrie client can work directly with the data ordering system. If you are logged into nasa you can just build your own parameters and send them direcly without using the user interface

In [None]:
v.display(['map'], 'vertical', extra_layers=True)

## CMR and related datasets

In [None]:
# Print the granule counts from CMR for our current search params
params = v.build_params()
params

In [None]:
# We can also query CMR to get an idea of coverages for the area we just selected
params = v.build_params()
granules = v.query_cmr(None)
for dataset in granules:
    size = round(sum(float(g['granule_size']) for g in granules[dataset]), 2)
    print(f'{dataset}: {len(granules[dataset])} granules found. Approx download size: {size} MB')

In [None]:
## We can print the first record for a given dataset
dataset = 'ILATM1B'
if len(granules[dataset])>0:
    print(granules[dataset][0])

## The Valkyrie API spec

The native Valkyrie API offers us an Open API specification that we can use to explore how the service works. 


In [None]:
from IPython.display import IFrame
IFrame('http://staging.valkyrie-vm.apps.nsidc.org/1.0/ui/#/', width=800, height=400)

## Placing a data order for one of the data sets served by Valkyrie

Now that we have our constraints we just need to post our order and wait for Valkyrie to fulfill it. 
We can put an order directly, in this case we are going to work on a geometry that overlaps with [Jakobshavn](https://en.wikipedia.org/wiki/Jakobshavn_Glacier) glacier in Greenland.


In [None]:
# note that we are explicitly using the name of one of the ATM datasets
dataset = 'ILATM1B'
my_params ={
    'dataset': dataset,
    'start': '2016-01-01',
    'end': '2016-12-31',
    'bbox': '-50.221637,69.095798,-49.164471,69.29418'
}
order = v.post_valkyrie_order(my_params)
order

## Downloading the data
Let's get some coffee, some Valkyrie orders are in the Gigabytes real amd may take a little while to be processed. 
Once that your status URL says is completed we can grab the HDF5 data file using the URL on the same response!

In [None]:
order['response'].json()

In [None]:
import requests
my_order_status = order['response'].json()['status_url']
order_status = requests.get(my_order_status).json()
order_status

## Placing multiple data orders to Valkyrie

In [None]:
# Or we can use the interface to put our order based in the current map selection and control values.
orders = v.post_orders()
orders

## Downloading Related Datasets


## Reading the Data

Remote sensing data can be overwhelmingly big. Reading a big file is not trivial and when we have an array of them this task can become an untractable barrier.
The main constraint if you don't have a super computer is memory. The average granule size is in the 10s of MB for IceSat 2 and could be Gigabytes in Valkyrie depending on the selected area. This is when libraries like Dask, Vaex and others come into play. 

These libraries read our files using a battery of optimizations like lazy loading, memory mapping and parallelism. Let's now explore 4 different ways of reading these HDF5 files using libraries included in this notebook:

* h5py + Pandas
* Dask arrays
* Vaex
* xarray



## h5py + pandas

With h5py we get almost native access to the hdf5 files and we can use pandas or geopandas to compute operations on them.



In [None]:
# importing all of our dependencies
import warnings
warnings.filterwarnings("ignore")
import glob
import geopandas
import pandas as pd
import h5py
import vaex
import dask.dataframe as dd
import dask.array as da
import numpy as np

In [None]:
%%time

f = h5py.File('data/atm1b_data_2020-07-11T20-39.hdf5', 'r')
print(list(f.keys()))
df_data = {
    'latitude': f['latitude'],
    'longitude': f['longitude'],
    'elevation': f['elevation'],
    'time': pd.to_datetime(f['utc_datetime'])
}
df = pd.DataFrame(data=df_data)
display(df.describe())
df

# Dask arrays
Dask allows us to load larger-than-memory files, both in a distributed fashion or locally. We are going to use the local way.


In [None]:
%%time

def read_h5(fname, vnames=[]):
    """Read a list of vars [v1, v2, ..] -> 2D."""
    f = h5py.File(fname, 'r')
    return np.column_stack([f[v][()] for v in vnames])

files = list(glob.glob('data/*.hdf5'))
print(files)
# we only care about these columns
vnames = ['longitude', 'latitude', 'elevation', 'utc_datetime']

# List with one dataframe per file
dfs = [dd.from_array(read_h5(f_name, vnames), columns=vnames) for f_name in files]

# Single parallel dataframe (larger than memory)
ddf = dd.concat(dfs)

print('Number of files:', len(files))
print('Number of points:', len(ddf))
ddf['elevation']

# Vaex

In [None]:
%%time
df = vaex.open('data/atm1b_data_2020-07-11T20-39.hdf5')
# We're parsing the utc_datetime from Valkyrie into a data type that Vaex understands.
df['date'] = df.utc_datetime.values.astype('datetime64[ns]')
my_df = df['longitude', 'latitude', 'elevation', 'date']
# vaex.vrange() is like numpy.arange but uses 0-memory no matter the length.
# df.add_column('index', vaex.vrange(0, len(df)))
# We are going to create a "decimated" dataframe with only 1/100 of the size of the original to plot the big picture faster.
# df_decimated = df[(df.index % 100 == 0)]
my_df.describe()