## Accessing ICESat-2 Data
### Data Query and Download Example Notebook
This notebook illustrates the use of icepyx for ICESat-2 data access and download from the NASA NSIDC DAAC (NASA National Snow and Ice Data Center Distributed Active Archive Center).

#### Credits
* notebook by: Jessica Scheick and Amy Steiker


### Import packages, including icepyx

In [1]:
from icepyx import is2class as ipd
import os
import shutil

In [2]:
!pwd

/Users/jessica/Scripts/github/icesat2py/icepyx/doc/examples


### Create an ICESat-2 data object with the desired search parameters

There are three required inputs:
- `short_name` = the dataset of interest, known as its "short name".
See https://nsidc.org/data/icesat-2/data-sets for a list of the available datasets.
- `spatial extent` = a bounding box to search within. Given in decimal degrees for the lower left longitude, lower left latitude, upper right longitude, and upper right latitude. Future development will enable alternative spatial extent inputs, such as a poloygon.
- `date_range` = the date range for which you would like to search for results. Must be formatted as a set of 'YYYY-MM-DD' strings.

In [3]:
short_name = 'ATL06'
spatial_extent = [-64, 66, -55, 72]
date_range = ['2019-02-22','2019-02-28']

In [4]:
region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range)

print(region_a.dataset)
print(region_a.dates)
print(region_a.start_time)
print(region_a.end_time)
print(region_a.dataset_version)
print(region_a.spatial_extent)

ATL06
['2019-02-22', '2019-02-28']
00:00:00
23:59:59
002
['bounding box', [-64, 66, -55, 72]]


There are also several optional inputs to allow the user finer control over their search.
- `start_time` = start time to search for data on the start date. If no input is given, this defaults to 00:00:00.
- `end_time` = end time for the end date of the temporal search parameter. If no input is given, this defaults to 23:59:59. Times must be input as 'HH:mm:ss' strings.
- `version` = What version of the dataset to use, input as a numerical string. If no input is given, this value defaults to the most recent version of the dataset specified in `short_name`.

In [6]:
region_a = ipd.Icesat2Data(short_name, spatial_extent, date_range, \
                           start_time='03:30:00', end_time='21:30:00', version='001')

print(region_a.dataset)
print(region_a.dates)
print(region_a.start_time)
print(region_a.end_time)
print(region_a.version)
print(region_a.spatial_extent)

ATL06
['2019-02-22', '2019-02-28']
03:30:00
21:30:00
001
['bounding box', [-64, 66, -55, 72]]




Alternatively, you can also just create the data object without creating named variables.

In [17]:
region_a = ipd.Icesat2Data('ATL06',[-64, 66, -55, 72],['2019-02-22','2019-02-28'], \
                           start_time='00:00:00', end_time='23:59:59', version='002')

### Built in methods allow us to get more information about our dataset
In addition to viewing the stored object information shown above (e.g. dataset, start and end date and time, version, etc.), we can also request summary information about the dataset itself or confirm that we have manually specified the latest version.

In [8]:
print(region_a.about_dataset())
print(region_a.latest_version())

{'feed': {'updated': '2019-11-15T17:05:25.845Z', 'id': 'https://cmr.earthdata.nasa.gov:443/search/collections.json?short_name=ATL06', 'title': 'ECHO dataset metadata', 'entry': [{'processing_level_id': 'Level 3', 'boxes': ['-90 -180 90 180'], 'time_start': '2018-10-14T00:00:00.000Z', 'version_id': '001', 'dataset_id': 'ATLAS/ICESat-2 L3A Land Ice Height V001', 'has_spatial_subsetting': True, 'has_transforms': False, 'associations': {'services': ['S1568899363-NSIDC_ECS', 'S1613689509-NSIDC_ECS', 'S1613669681-NSIDC_ECS']}, 'has_variables': True, 'data_center': 'NSIDC_ECS', 'short_name': 'ATL06', 'organizations': ['NASA NSIDC DAAC', 'NASA/GSFC/EOS/ESDIS'], 'title': 'ATLAS/ICESat-2 L3A Land Ice Height V001', 'coordinate_system': 'CARTESIAN', 'summary': 'This data set (ATL06) provides geolocated, land-ice surface heights (above the WGS 84 ellipsoid, ITRF2014 reference frame), plus ancillary parameters that can be used to interpret and assess the quality of the height estimates. The data wer

### Querying a dataset
In order to search the dataset collection for available data granules, we need to build our search parameters. These are formatted as a dictionary of key:value pairs according to the CMR documentation.

In [18]:
region_a.build_CMR_params()

In [19]:
#view the parameters that will be submitted in our query
region_a.CMRparams

{'short_name': 'ATL06',
 'version': '002',
 'temporal': '2019-02-22T00:00:00Z,2019-02-28T23:59:59Z',
 'bounding_box': '-64,66,-55,72'}

Now that our parameter dictionary is constructed, we can search the CMR database for the available granules. Granules returned by the search are automatically stored within the data object.

In [20]:
#search for available granules
region_a.avail_granules()

the pnum is about to be calculated
0


{'Number of available granules': 7,
 'Average size of granules (MB)': 35.445385387971434,
 'Total size of all granules (MB)': 248.11769771580003}

In [21]:
#print the information about the returned search results
region_a.granules

[{'producer_granule_id': 'ATL06_20190223130150_08720203_002_01.h5',
  'time_start': '2019-02-23T13:03:39.000Z',
  'orbit': {'ascending_crossing': '-52.55767101081557',
   'start_lat': '59.5',
   'start_direction': 'A',
   'end_lat': '80',
   'end_direction': 'A'},
  'updated': '2019-10-24T13:10:31.738Z',
  'orbit_calculated_spatial_domains': [{'equator_crossing_date_time': '2019-02-23T12:46:18.829Z',
    'equator_crossing_longitude': '-52.55767101081557',
    'orbit_number': '2460'}],
  'dataset_id': 'ATLAS/ICESat-2 L3A Land Ice Height V002',
  'data_center': 'NSIDC_ECS',
  'title': 'SC:ATL06.002:166252035',
  'coordinate_system': 'ORBIT',
  'time_end': '2019-02-23T13:07:02.000Z',
  'id': 'G1647395161-NSIDC_ECS',
  'original_format': 'ISO-SMAP',
  'granule_size': '20.3570375443',
  'browse_flag': True,
  'polygons': [['66.1487718813469 -61.827543324880494 78.94854746717458 -68.90106547515634 79.00768017806882 -67.23509872799909 66.17674542277184 -61.02898525548528 66.1487718813469 -61.

### Downloading the found granules
In order to download any data from NSIDC, we must first authenticate ourselves using a valid Earthdata login. This will create a valid token to interface with the DAAC as well as start an active logged-in session to enable data download. The token is attached to the data object and stored, but the session must be passed to the download function. Passwords are entered but not shown or stored in plain text by the system (I think?)

In [22]:
earthdata_uid = ''
email = ''
session=region_a.earthdata_login(earthdata_uid, email)

Earthdata Login password:  ········


Once we have generated our session, we must build the required configuration parameters needed to actually download data. These will tell the system how we want to download the data.
- `page_size` = 10. This is the number of granules we will request per order.
- `page_num` = 1. Determine the number of pages based on page size and the number of granules available. If no page_num is specified, this calculation is done automatically to set page_num, which then provides the number of individual orders we will request given the number of granules.
- `request_mode` = 'async'
- `agent` = 'NO'
- `include_meta` = 'Y'

#### More details about the configuration parameters
`request_mode` is "synchronous" by default, meaning that the request relies on a direct, continous connection between you and the API endpoint. Outputs are directly downloaded, or "streamed" to your working directory. For this tutorial, we will set the request mode to asynchronous, which will allow concurrent requests to be queued and processed without the need for a continuous connection.

** Use the streaming `request_mode` with caution: While it can be beneficial to stream outputs directly to your local directory, note that timeout errors can result depending on the size of the request, and your request will not be queued in the system if NSIDC is experiencing high request volume. For best performance, I recommend setting `page_size=1` to download individual outputs, which will eliminate extra time needed to zip outputs and will ensure faster processing times per request. An example streaming request loop is available at the bottom of the tutorial below. **

Recall that we queried the total number and volume of granules prior to applying customization services. `page_size` and `page_num` can be used to adjust the number of granules per request up to a limit of 2000 granules for asynchronous, and 100 granules for synchronous (streaming). For now, let's select 10 granules to be processed in each zipped request. For ATL06, the granule size can exceed 100 MB so we want to choose a granule count that provides us with a reasonable zipped download size. 

If no keyword inputs are entered into the build function for these parameters, default values will be used. We must also specify that we would like to build the required configuration parameters for downloading (versus those needed for a 'search').

In [23]:
region_a.build_reqconfig_params('download', page_size=9)

the pnum is about to be calculated
1


In [24]:
region_a.reqparams

{'page_size': 9,
 'page_num': 1,
 'email': 'jessica.scheick@maine.edu',
 'token': 'E175B383-E4B9-57E2-7F53-C901D287A528',
 'request_mode': 'async',
 'agent': 'NO',
 'include_meta': 'Y'}

#### Place the order
Then, we can send the order to NSIDC by providing our active session to the order_granules function. Information about the granules ordered and their status will be printed automatically as well as emailed to the address provided. Additional information on the order, including request URLs, can be viewed by setting the optional keyword input 'verbose' to True.

In [25]:
region_a.order_granules(session)
#region_a.order_granules(session, verbose=True)

Order:  1
Request HTTP response:  201
Order request URL:  https://n5eil02u.ecs.nsidc.org/egi/request?short_name=ATL06&version=002&temporal=2019-02-22T00%3A00%3A00Z%2C2019-02-28T23%3A59%3A59Z&bounding_box=-64%2C66%2C-55%2C72&page_size=9&page_num=1&email=jessica.scheick%40maine.edu&token=E175B383-E4B9-57E2-7F53-C901D287A528&request_mode=async&agent=NO&include_meta=Y
Order request response XML content:  b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<eesi:agentResponse xsi:schemaLocation="http://eosdis.nasa.gov/esi/rsp/e https://newsroom.gsfc.nasa.gov/esi/8.1/schemas/ESIAgentResponseExternal.xsd" xmlns="" xmlns:iesi="http://eosdis.nasa.gov/esi/rsp/i" xmlns:ssw="http://newsroom.gsfc.nasa.gov/esi/rsp/ssw" xmlns:eesi="http://eosdis.nasa.gov/esi/rsp/e" xmlns:esi="http://eosdis.nasa.gov/esi/rsp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n    <order>\n        <orderId>5000000416602</orderId>\n        <Instructions>You may receive an email about your order if you specifi

In [26]:
#view a short list of order IDs
region_a.orderIDs

['5000000416602']

#### Download the order
Finally, we can download our order to a specified directory (which needs to have a full path but doesn't have to point to an existing directory) and the download status will be printed as the program runs. Additional information is again available by using the optional boolean keyword 'verbose'.

In [35]:
path = './downloads'
region_a.download_granules(session, path)

Order:  1
Request HTTP response:  201
Order request URL:  https://n5eil02u.ecs.nsidc.org/egi/request?short_name=ATL06&version=002&temporal=2019-02-22T00%3A00%3A00Z%2C2019-02-28T23%3A59%3A59Z&bounding_box=-64%2C66%2C-55%2C72&email=jessica.scheick%40maine.edu&token=464C6E30-135C-0274-5E50-8DA341E5D482&page_size=10&page_num=1&request_mode=async&agent=NO&include_meta=Y
Order request response XML content:  b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<eesi:agentResponse xsi:schemaLocation="http://eosdis.nasa.gov/esi/rsp/e https://newsroom.gsfc.nasa.gov/esi/8.1/schemas/ESIAgentResponseExternal.xsd" xmlns="" xmlns:iesi="http://eosdis.nasa.gov/esi/rsp/i" xmlns:ssw="http://newsroom.gsfc.nasa.gov/esi/rsp/ssw" xmlns:eesi="http://eosdis.nasa.gov/esi/rsp/e" xmlns:esi="http://eosdis.nasa.gov/esi/rsp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n    <order>\n        <orderId>5000000415433</orderId>\n        <Instructions>You may receive an email about your order if you specif

#### Clean up the download folder by removing individual order folders:

In [31]:
#Clean up Outputs folder by removing individual granule folders 

for root, dirs, files in os.walk(path, topdown=False):
    for file in files:
        try:
            shutil.move(os.path.join(root, file), path)
        except OSError:
            pass
        
for root, dirs, files in os.walk(path):
    for name in dirs:
        os.rmdir(os.path.join(root, name))

Elements to develop further/implement (and then include in the example, as in Amy's tutorial)
- customization: subsetting and reformatting (check for options and include with order)
- polygon visualization
- input of polygon (including simplification steps) instead of bounding box
- more information/details on the above steps for a novice user