# ATLAS/ICESat-02 ATL08 Access and Visualize

Author: Sumant Jha (MSFC/USRA), Alex Mandel (DevSeed), Jamison French (DevSeed), Sheyenne Kirkland (UAH)

Date: November 10, 2023

Description: In this example, we'll walk through how to access and explore ICESat ATL08 data, as well as how to download it locally. Then, we will visualize the data.

## Run This Notebook

To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: This tutorial will use an experimental feature to allow access to the DAAC without using EarthDataLogin. This tutorial will need to be ran within MAAP's ADE to allow this experimental feature to work. Running the tutorial outside of the MAAP ADE will result in errors.

## About the Data

This data set (ATL08) contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m data segments, which typically contain more than 100 signal photons. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.

```
Parameter(s): CANOPY HEIGHT TERRAIN ELEVATION
Platform(s):ICESat-2
Sensor(s): ATLAS
Data Format(s): HDF5
Temporal Coverage: 14 October 2018 to present
Temporal Resolution: 91 day
Spatial Resolution: Varies
Spatial Reference System(s): WGS 84 EPSG:4326
Spatial Coverage: N: 90 S: -90 E: 180 W: -180
            
(source: ATL08 v5 Dataset Landing Page, https://nsidc.org/data/atl08/versions/5)
```


## Additional Resources
- [Earthdata Search](https://search.earthdata.nasa.gov/search?q=ATL08&ac=true&lat=-35.84910426015104&long=-180.84375&zoom=1)
- [ATL08 v5 User Guide](https://nsidc.org/sites/default/files/atl08-v005-userguide_1_0.pdf)

## Importing and Installing Packages

The following example uses: maap-py,h5py, and h5glance.
If you do not have these packages, uncomment the code below:

In [1]:
!pip install -q h5py h5glance requests rioxarray rasterio fsspec s3fs h5netcdf

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
boto3 1.28.73 requires botocore<1.32.0,>=1.31.73, but you have botocore 1.31.64 which is incompatible.[0m[31m
[0m

In [2]:
import os
import h5py
from maap.maap import MAAP
from h5glance import H5Glance
import requests
import xarray
import rasterio
from rasterio.session import AWSSession
import boto3
import fsspec

## Accessing the Data

In this section we'll walk through two different ways to access the data. The first method is to access it by using `maap-py`, then download the data locally. From there, we will explore some of the data using `h5glance`.

The second method is to access the data through S3.

### Example 1: Download Data Locally

Now that we have imported relevant packages, lets put them to use. We are going to use NASA host which is NASA's Common Metadata Repository (CMR) to search for and download ICESat data. 
ICESat's ATL08 data's concept id can be found on https://search.earthdata.nasa.gov/search and looking for 'ATL08' in the search bar. When you check the metadata associated with your search result, you can get the concept_id associated with below tutorial. In this case, the concept_id is `C2153574670-NSIDC_CPRD`.  

For this example, we are going to use granule id of `ATL08_20211114213015_08161305_005_01`. This will be in HDF5 format. 

With all this information in hand, we are ready to make a query to cmr.earthdata.nasa.gov using maap-py. 


In [3]:
maap = MAAP(maap_host='api.maap-project.org')

nasa_host = "cmr.earthdata.nasa.gov"
results = maap.searchGranule(cmr_host=nasa_host,
                             concept_id="C2153574670-NSIDC_CPRD", 
                             readable_granule_name="ATL08_20211114213015_08161305_005_01.h5", 
                             limit=100)

Let's see how this turned out. Did we get a result?

In [4]:
results[0]

{'concept-id': 'G2208041838-NSIDC_CPRD',
 'collection-concept-id': 'C2153574670-NSIDC_CPRD',
 'revision-id': '1',
 'format': 'application/echo10+xml',
 'Granule': {'GranuleUR': 'ATL08_20211114213015_08161305_005_01.h5',
  'InsertTime': '2022-01-26T23:08:01.838Z',
  'LastUpdate': '2022-01-26T23:08:01.838Z',
  'Collection': {'DataSetId': 'ATLAS/ICESat-2 L3A Land and Vegetation Height V005'},
  'DataGranule': {'SizeMBDataGranule': '15.952223777770996',
   'ProducerGranuleId': 'ATL08_20211114213015_08161305_005_01.h5',
   'DayNightFlag': 'UNSPECIFIED',
   'ProductionDateTime': '2021-12-21T01:54:36.000Z'},
  'Temporal': {'RangeDateTime': {'BeginningDateTime': '2021-11-14T21:30:15.724Z',
    'EndingDateTime': '2021-11-14T21:35:25.694Z'}},
  'Spatial': {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': '-169.8753286285306',
     'StartLat': '80',
     'StartDirection': 'D',
     'EndLat': '59.5',
     'EndDirection': 'D'}}},
  'OrbitCalculatedSpatialDomains': {'OrbitCalculatedSpatial

Looks like we did get a result and will be able to learn a lot about it from available metadata. Let's download the HDF file locally.

In [5]:
s3_url = results[0]['Granule']['OnlineAccessURLs']['OnlineAccessURL'][1]['URL']
s3_url

's3://nsidc-cumulus-prod-protected/ATLAS/ATL08/005/2021/11/14/ATL08_20211114213015_08161305_005_01.h5'

In [6]:
data_file = results[0]

Establish a temporary directory to store the data file and display the path and filename. 

In [7]:
dataDir = './data'
if not os.path.exists(dataDir): os.mkdir(dataDir)
data = data_file.getData(dataDir)
data

'./data/ATL08_20211114213015_08161305_005_01.h5'

### Example 2: Accessing the Data with S3

Since the NSIDC DAAC does not have federated token access, we will use role assumption to gain access to the data. We'll use an experimental feature by setting up a parameter, assuming a role, and get temporary credentials with the assumed role. We'll also set up credenials into fsspec so we can later use xarray for data exploration.

In [8]:
def assume_role_credentials(ssm_parameter_name):
    # Create a session using your current credentials
    session = boto3.Session()

    # Retrieve the SSM parameter
    ssm = session.client('ssm', "us-west-2")
    parameter = ssm.get_parameter(
        Name=ssm_parameter_name, 
        WithDecryption=True
    )
    parameter_value = parameter['Parameter']['Value']

    # Assume the DAAC access role
    sts = session.client('sts')
    assumed_role_object = sts.assume_role(
        RoleArn=parameter_value,
        RoleSessionName='TutorialSession'
    )

    # From the response that contains the assumed role, get the temporary 
    # credentials that can be used to make subsequent API calls
    credentials = assumed_role_object['Credentials']

    return credentials

def fsspec_access(credentials):
    # Pass assumed role credentials into fsspec
    return fsspec.filesystem(
        "s3",
        key=credentials['AccessKeyId'],
        secret=credentials['SecretAccessKey'],
        token=credentials['SessionToken']
    )

In [9]:
s3_fsspec = fsspec_access(assume_role_credentials("/iam/maap-data-reader"))

## Exploring the Data

There are two different ways we'll open and look at the data:
1. Using h5py
2. Using xarray

### 1. H5py and H5glance

Let's check the available keys and structure of the HDF5 files using H5py and H5glance. 

Open the file and list the keys:

In [10]:
atl08_file = h5py.File(data,'r')
list(atl08_file.keys())

['METADATA',
 'ancillary_data',
 'ds_geosegments',
 'ds_metrics',
 'ds_surf_type',
 'gt1l',
 'gt1r',
 'gt2l',
 'gt2r',
 'gt3l',
 'gt3r',
 'orbit_info',
 'quality_assessment']

Use H5glance module to interactively check all available variables and field that can be used for further analysis and visualizations.

In [11]:
H5Glance(atl08_file)

### 2. xarray

By using xarray, we can open a specific group within the HDF5 file. Note that we found the group using H5glance above.

In [12]:
nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/005/2021/11/14/ATL08_20211114213015_08161305_005_01.h5"
with s3_fsspec.open(nsidc_object) as f:
    atl08_track = xarray.open_dataset(f, group='gt1l/land_segments', engine="h5netcdf", phony_dims='sort')
atl08_track

In [18]:
# nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/005/2021/11/14/ATL08_20211114213015_08161305_005_01.h5"
# with s3_fsspec.open(nsidc_object) as f:
#     atl08_track = xarray.open_dataset(f, engine="h5netcdf", phony_dims='sort')
# atl08_track

## Visualizing the Data

By looking at the data variables, we can also create a visualization.

In [19]:
with s3_fsspec.open(nsidc_object) as f:
    atl08_track = xarray.open_dataset(f, group='gt1l/land_segments', engine="h5netcdf", phony_dims='sort')

df = atl08_track.to_dataframe()
df

ds_geosegments,ds_metrics,ds_surf_type
1,1,1
1,1,2
1,1,3
1,1,4
1,1,5
...,...,...
5,18,1
5,18,2
5,18,3
5,18,4


In [13]:
# with h5py.File(data,'r') as f:
#     gt1l_can_h_m = f['/gt1l/land_segments/canopy/n_toc_photons'][:]
#     gt1l_lat = f['/gt1l/land_segments/latitude'][:]
#     gt1l_lon = f['/gt1l/land_segments/longitude'][:]
#     gt1l_seg = f['/gt1l/land_segments/segment_id_beg'][:]
#     #gt1l_dist_ph = f['/gt1l/land_segments/canopy/h_canopy'][:]

In [14]:
# !pip install -q geopandas
# import geopandas as gpd

# geometry = gpd.points_from_xy(gt1l_lon, gt1l_lat)
# data = {'Latitude': gt1l_lat, 'Longitude': gt1l_lon, 'Canopy_Heights': gt1l_can_h_m, 'Segment_ID': gt1l_seg}
# gdf = gpd.GeoDataFrame(data,geometry=geometry, crs='EPSG:4326')
# gdf

In [15]:
# import matplotlib.pyplot as plt
# fig, ax = plt.subplots(figsize=(14, 4))
# gdf.Canopy_Heights.plot(ax=ax, ls='', marker='.', ms=0.01)
# ax.set_xlabel('Segment ID', fontsize=12);
# ax.set_ylabel('Photon Height (m)', fontsize=12)
# ax.set_title('ICESat-2 ATL08', fontsize=14)
# ax.tick_params(axis='both', which='major', labelsize=12)