## Summary

In this tutorial we will use NASA's Earthdata Harmony to subset and access ICESat-2 data with the `harmony-py` Python library.  We will use the ATL03 Geolocated Photon Dataset as an example.  However, Harmony can be used to subset other ICESat-2 datasets.

**What is Harmony?**  Harmony provides access (_is it a service?_) to a set of services that can be used to subset, reproject and reformat NASA datasets.  Data can be subsetted for a geographic region, a temporal range and by variable.  Data can be "reprojected" from it's native coordinate reference system (CRS) to the coordinate reference system relevant to your analysis.  And data can be reformatted from it's native file format to a format that is more relevant for your application.  These services are collectively called _transformation services_.  However, not all services are available for all datasets.  You will learn how to discover which services are available for your dataset.

Data transformed by Harmony services are staged on NASA Amazon Web Services (AWS) S3 buckets or on user-owned AWS S3 buckets.  Data in NASA S3 buckets are accessed using signed URLs or temporary access credentials.  This data can be downloaded to your local machine or you can access the data directlyif you are working on a AWS cloud instances, such as a Jupyter Hub, in AWS `us-west-2`.  

_Add links or provide background to terminology_

## Learning Objectives

In this tutorial you will learn how to:

1. discover Harmony service options ICESat-2 datasets;
3. use the `harmony-py` library to perform a spatial and temporal subset ATL03;
4. download ATL03 subsetted data to a local machine;
5. access and load ATL03 subsetted data directly into xarray.

## Prerequisites


First we’ll import python packages and set up the authentication needed for requesting ICESat-2 subsets. 

Data in NASA’s Earthdata Cloud, including the subsetted data processed by Harmony, reside in Amazon Web Services (AWS) Simple Storage Service (S3) buckets. Access is provided via temporary credentials; this free access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. While this compute location is required for direct S3 access, all data in Earthdata Cloud are still freely available via download.

This tutorial will demonstrate both download and direct s3 access methods. For the latter method to work, you will need to be running this notebook in the AWS us-west-2 region. 

### Import Required Packages

In [19]:
# Earthdata Login Authentication
import earthaccess 

# Harmony services
from harmony import BBox, Client, Collection, Request, LinkType, CapabilitiesRequest, Environment 
#Remove Environment module when we are ready to switch to production. Including this for UAT notebook testing.
import json
import datetime as dt
from pprint import pprint
import s3fs
import xarray as xr

## Login to NASA Earthdata

An [Earthdata Login](https://urs.earthdata.nasa.gov) account is required to access data from the NASA Earthdata system. Before requesting a subset of ICESat-2 data, we first need to set up our Earthdata Login authentication.

The `earthaccess.login()` method will automatically search for these credentials as environment variables or in a `.netrc` file, and if those aren't available it will prompt us to enter our username and password. We use a `.netrc` strategy. A `.netrc` file is a text file located in our home directory that contains login information for remote machines. If we don't have a `.netrc` file, login can create one for us.

`earthaccess.login(strategy='interactive', persist=True)`

In [27]:
auth = earthaccess.login()

## Discover service options for a given data set

TODO: Add capabilities endpoint guidance here. Adopt from https://github.com/nasa/harmony-py/blob/main/examples/collection_capabilities.ipynb. See also API documentation: https://harmony.earthdata.nasa.gov/docs#get-harmony-capabilities-for-the-provided-collection

First, we need to create a Harmony Client, which is what we will interact with to submit and inspect a data request to Harmony, as well as to retrieve results.

In [36]:
harmony_client = Client(env=Environment.UAT)
#Including UAT environment for testing purposes. Update when ready to move to production:
#harmony_client = Client()

capabilities_request = CapabilitiesRequest(collection_id='C1256407609-NSIDC_CUAT')
#Including a specific collection id for testing. Update to use short_name when ready to move to production:
#capabilities_request = CapabilitiesRequest(short_name='ATL03')

capabilities = harmony_client.submit(capabilities_request)
print(json.dumps(capabilities, indent=2))

{
  "conceptId": "C1256407609-NSIDC_CUAT",
  "shortName": "ATL03",
  "variableSubset": false,
  "bboxSubset": true,
  "shapeSubset": true,
  "concatenate": false,
  "reproject": false,
  "outputFormats": [
    "application/x-hdf"
  ],
  "services": [
    {
      "name": "sds/trajectory-subsetter",
      "href": "https://cmr.uat.earthdata.nasa.gov/search/concepts/S1242315633-EEDTEST",
      "capabilities": {
        "subsetting": {
          "temporal": true,
          "bbox": true,
          "shape": true,
          "variable": true
        },
        "output_formats": [
          "application/x-hdf"
        ]
      }
    }
  ],
  "variables": [],
  "capabilitiesVersion": "2"
}


## Using `harmony-py` to subset data

[`harmony-py`](https://github.com/nasa/harmony-py) provides a pip installable Python alternative to directly using Harmony’s RESTful API to make it easier to request data and service options, especially when interacting within a Python Jupyter Notebook environment.

### Create A Subset Request

Here we’ll create a request for a spatial subset of data.

See the [harmony-py](https://harmony-py.readthedocs.io/en/latest/) documentation for details on how to construct your request.

In [40]:
collection_id='C1256407609-NSIDC_CUAT'
#Including a specific collection id for testing. Update to use short_name when ready to move to production:
# short_name = 'ATL03'

request = Request(
  collection=Collection(id=collection_id),
  spatial=BBox(-105.5,40,-105,40.005),
  temporal={
    'start': dt.datetime(2020, 4, 27),
    'stop': dt.datetime(2020, 4, 28)
  }
)

### Submit a subset request

In [41]:
job_id = harmony_client.submit(request)
job_id

'451ae86b-c911-4910-a3c3-d2557a910bdb'

### Check request status

In [42]:
harmony_client.wait_for_processing(job_id, show_progress=True)

 [ Processing: 100% ] |###################################################| [|]


In [43]:
data = harmony_client.result_json(job_id)
pprint(data)

{'createdAt': '2024-09-17T23:08:38.357Z',
 'dataExpiration': '2024-10-17T23:08:38.357Z',
 'jobID': '451ae86b-c911-4910-a3c3-d2557a910bdb',
 'links': [{'href': 'https://harmony.uat.earthdata.nasa.gov/stac/451ae86b-c911-4910-a3c3-d2557a910bdb/',
            'rel': 'stac-catalog-json',
            'title': 'STAC catalog',
            'type': 'application/json'},
           {'bbox': [-108.28738, 26.94838, -103.60569, 59.54235],
            'href': 'https://harmony.uat.earthdata.nasa.gov/service-results/harmony-uat-staging/public/451ae86b-c911-4910-a3c3-d2557a910bdb/4834058/ATL03_20200427193622_04930702_006_02_subsetted.h5',
            'rel': 'data',
            'temporal': {'end': '2020-04-27T19:44:52.680Z',
                         'start': '2020-04-27T19:36:22.028Z'},
            'title': 'ATL03_20200427193622_04930702_006_02_subsetted.h5',
            'type': 'application/x-hdf5'},
           {'href': 'https://harmony.uat.earthdata.nasa.gov/jobs/451ae86b-c911-4910-a3c3-d2557a910bdb?lin

### Access data

We will demonstrate both download and direct s3 access options below.

#### Download Harmony Results

TODO: The followin code block may need updating. It is based on a harmony-py tutorial: https://github.com/nasa/harmony-py/blob/main/examples/basic.ipynb

In [44]:
print('\nDownloading results:')
futures = harmony_client.download_all(job_id)

for f in futures:
    print(f.result())  # f.result() is a filename, in this case

print('\nDone downloading.')


Downloading results:
ATL03_20200427193622_04930702_006_02_subsetted.h5
ATL03_20200427193622_04930702_006_02_subsetted.h5

Done downloading.


#### Direct S3 Access of Harmony Results

You must be running this notebook in the AWS us-west-2 region in order for the following code to run.

In [47]:
results = harmony_client.result_urls(job_id, link_type=LinkType.s3)
urls = list(results)
url = urls[0]

creds = harmony_client.aws_credentials()

s3_fs = s3fs.S3FileSystem(
    key=creds['aws_access_key_id'],
    secret=creds['aws_secret_access_key'],
    token=creds['aws_session_token'],
    client_kwargs={'region_name':'us-west-2'},
)

f = s3_fs.open(url, mode='rb')
ds = xr.open_dataset(f, group='gt1l/heights')
ds