# Customize and Access NSIDC DAAC Data

This notebook will walk you through how to programmatically access data from the NASA National Snow and Ice Data Center Distributed Active Archive Center (NSIDC DAAC) using spatial and temporal filters, as well as how to request customization services including subsetting, reformatting, and reprojection. No Python experience is necessary; each code cell will prompt you with the information needed to configure your data request. The notebook will print the resulting API command that can be used in a command line, browser, or in Python as executed below.

### Import packages


In [1]:
import requests
import getpass
import socket 
import json
import zipfile
import io
import math
import os
import shutil
import pprint
import re
import time
import geopandas as gpd
import fiona
import matplotlib.pyplot as plt
# To read KML files with geopandas, we will need to enable KML support in fiona (disabled by default)
fiona.drvsupport.supported_drivers['LIBKML'] = 'rw'
from shapely.geometry import Polygon, mapping
from shapely.geometry.polygon import orient
from statistics import mean
from requests.auth import HTTPBasicAuth
%matplotlib inline

### Input Earthdata Login credentials

An Earthdata Login account is required to access data from the NSIDC DAAC. If you do not already have an Earthdata Login account, visit http://urs.earthdata.nasa.gov to register.

In [2]:
my_credential_path = "../auth.json"
with open(my_credential_path, 'r') as infile:
    my_credentials = json.load(infile)
    
uid = my_credentials['username'] # Enter Earthdata Login user name
pswd = my_credentials['password'] # Enter Earthdata Login password
email = my_credentials['email'] # Enter Earthdata login email 


## Imput data

In [31]:
# Get json response from CMR collection metadata
short_name = 'MOD15A2H'
params = {
    'short_name': short_name
}

cmr_collections_url = 'https://cmr.earthdata.nasa.gov/search/collections.json'
response = requests.get(cmr_collections_url, params=params)
results = json.loads(response.content)

# Find all instances of 'version_id' in metadata and print most recent version number

versions = [el['version_id'] for el in results['feed']['entry']]
latest_version = max(versions)
print('The most recent version of ', short_name, ' is ', latest_version)

The most recent version of  MOD15A2H  is  061


### Select time period of interest

In [32]:
#Input temporal range 

start_date = '2015-03-31'# input('Input start date in yyyy-MM-dd format: ')
start_time = '00:00:00' # input('Input start time in HH:mm:ss format: ')
end_date = '2022-03-30' # input('Input end date in yyyy-MM-dd format: ')
end_time = '00:00:00' # input('Input end time in HH:mm:ss format: ')

temporal = start_date + 'T' + start_time + 'Z' + ',' + end_date + 'T' + end_time + 'Z'

### Select area of interest

#### Select bounding box or shapefile entry

For all data sets, you can enter a bounding box to be applied to your file search. If you are interested in ICESat-2 data, you may also apply a spatial boundary based on a vector-based spatial data file.

In [33]:
# Enter spatial coordinates in decimal degrees, with west longitude and south latitude reported as negative degrees. Do not include spaces between coordinates.
# Example over the state of Colorado: -109,37,-102,41
bounding_box = '147.534,-35.324,147.535,-35.323' #input('Input spatial coordinates in the following order: lower left longitude,lower left latitude,upper right longitude,upper right latitude. Leave blank if you wish to provide a vector-based spatial file for ICESat-2 search and subsetting:')

### Determine how many granules exist over this time and area of interest.

In [35]:
# Create CMR parameters used for granule search. Modify params depending on bounding_box or polygon input.

granule_search_url = 'https://cmr.earthdata.nasa.gov/search/granules'
aoi='1'
if aoi == '1':
# bounding box input:
    search_params = {
    'short_name': short_name,
    'version': latest_version,
    'temporal': temporal,
    'page_size': 100,
    'page_num': 1,
    'bounding_box': bounding_box
    }

granules = []
headers={'Accept': 'application/json'}
while True:
    response = requests.get(granule_search_url, params=search_params, headers=headers)
    results = json.loads(response.content)

    if len(results['feed']['entry']) == 0:
        # Out of results, so break out of loop
        break

    # Collect results and increment page_num
    granules.extend(results['feed']['entry'])
    search_params['page_num'] += 1

print('There are', len(granules), 'granules of', short_name, 'version', latest_version, 'over my area and time of interest.')



There are 322 granules of MOD15A2H version 061 over my area and time of interest.


### Determine the average size of those granules as well as the total volume

In [37]:
granule_sizes = [float(granule['granule_size']) for granule in granules]
granule_urls = [granule['links'][0]['href'] for granule in granules]
print(f'The average size of each granule is {mean(granule_sizes):.2f} MB and the total size of all {len(granules)} granules is {sum(granule_sizes):.2f} MB')


The average size of each granule is 4.47 MB and the total size of all 322 granules is 1438.54 MB


In [39]:
from pathlib import Path
def save_data(data, filename,destination_folder,overwrite=False):
    '''
    Save data in data to destination_folder/filename

    Arguments:
    ---------

    data   : binary array
    filename : string: name of output file
    destination_folder: name of output directory (created if not existing)

    Keyword Arguments:
    -----------------

    overwrite : overwrite file if it already exists
    '''
    dest_path = Path(destination_folder)
    if not dest_path.exists():
        dest_path.mkdir()

    output_fname = dest_path.joinpath(filename)

    d = 0
    if overwrite or (not output_fname.exists()):  
        with open(output_fname, 'wb') as fp:
            d = fp.write(data)

    return(d)

saveDir= r'G:\Shared drives\Ryoko and Hilary\SMSigxSMAP\analysis\1_data\SMAP\OZNET\MODIS-LAI'
# loop over urls
for url in granule_urls:
    r = session.request('get',url)

    # check response
    if r.ok:
        # get the filename from the url
        filename = url.split('/')[-1]
        # get the binary data
        d = save_data(r.content, filename, saveDir)
        print(filename, d)
    else:
        print (f'response from {url} not good')
        

MOD15A2H.A2015089.h30v12.061.2021322004847.hdf 4694515
MOD15A2H.A2015097.h30v12.061.2021323034548.hdf 4595580
MOD15A2H.A2015105.h30v12.061.2021323062129.hdf 4743115
MOD15A2H.A2015113.h30v12.061.2021323182029.hdf 4927470
MOD15A2H.A2015121.h30v12.061.2021324003130.hdf 4537196
MOD15A2H.A2015129.h30v12.061.2021326041248.hdf 4706099
MOD15A2H.A2015137.h30v12.061.2021326062224.hdf 4863406
MOD15A2H.A2015145.h30v12.061.2021326082512.hdf 4484858
MOD15A2H.A2015153.h30v12.061.2021326112629.hdf 4684239
MOD15A2H.A2015161.h30v12.061.2021326135531.hdf 4905224
MOD15A2H.A2015169.h30v12.061.2021326171123.hdf 5167728
MOD15A2H.A2015177.h30v12.061.2021326233525.hdf 4908146
MOD15A2H.A2015185.h30v12.061.2021328001303.hdf 4983539
MOD15A2H.A2015193.h30v12.061.2021328054910.hdf 5191019
MOD15A2H.A2015201.h30v12.061.2021329051619.hdf 5092892
MOD15A2H.A2015209.h30v12.061.2021329150918.hdf 5172800
MOD15A2H.A2015217.h30v12.061.2021330000513.hdf 5306172
MOD15A2H.A2015225.h30v12.061.2021331171425.hdf 5308971
MOD15A2H.A

### Finally, we will clean up the Output folder by removing individual order folders:

In [None]:
# Clean up Outputs folder by removing individual granule folders 

for root, dirs, files in os.walk(path, topdown=False):
    for file in files:
        try:
            shutil.move(os.path.join(root, file), path)
        except OSError:
            pass
    for name in dirs:
        os.rmdir(os.path.join(root, name))    

### To review, we have explored data availability and volume over a region and time of interest, discovered and selected data customization options, constructed an API endpoint for our request, and downloaded data directly to our local machine. You are welcome to request different data sets, areas of interest, and/or customization services by re-running the notebook or starting again at the 'Select a data set of interest' step above. 