# Customize and Access NSIDC DAAC Data

This notebook will walk you through how to programmatically access data from the NASA National Snow and Ice Data Center Distributed Active Archive Center (NSIDC DAAC) using spatial and temporal filters, as well as how to request customization services including subsetting, reformatting, and reprojection. No Python experience is necessary; each code cell will prompt you with the information needed to configure your data request. The notebook will print the resulting API command that can be used in a command line, browser, or in Python as executed below.

### Import packages


In [1]:
import requests
import json
import zipfile
import io
import math
import os
import glob
import shutil
import pprint
import re
import time
from statistics import mean
%matplotlib inline



In [2]:
short_name = 'SMAP_L1_L3_ANC_STATIC'
out_path = r'..\1_data'

if not os.path.exists(out_path):
   os.makedirs(out_path)

if not os.path.exists(os.path.join(out_path, short_name)):
   os.makedirs(os.path.join(out_path, short_name))

### Input Earthdata Login credentials

An Earthdata Login account is required to access data from the NSIDC DAAC. If you do not already have an Earthdata Login account, visit http://urs.earthdata.nasa.gov to register.

In [3]:
my_credential_path = "./auth.json"
with open(my_credential_path, 'r') as infile:
    my_credentials = json.load(infile)
    
uid = my_credentials['username'] # Enter Earthdata Login user name
pswd = my_credentials['password'] # Enter Earthdata Login password
email = my_credentials['email'] # Enter Earthdata login email 

### Select data set and determine version number

Data sets are selected by data set IDs (e.g. MOD10A1), whic are also referred to as a "short name". These short names are located at the top of each NSIDC data set landing page in gray above the full title.

In [4]:
# Get json response from CMR collection metadata

params = {
    'short_name': short_name
}

cmr_collections_url = 'https://cmr.earthdata.nasa.gov/search/collections.json'
response = requests.get(cmr_collections_url, params=params)
results = json.loads(response.content)

# Find all instances of 'version_id' in metadata and print most recent version number
versions = [el['version_id'] for el in results['feed']['entry']]
latest_version = max(versions)
print('The most recent version of ', short_name, ' is ', latest_version)

The most recent version of  SMAP_L1_L3_ANC_STATIC  is  1


In [None]:
session = requests.session()
base_url = 
request = session.get(base_url, params=param_dict)

print('Request HTTP response: ', request.status_code)

# Raise bad request: Loop will stop for bad response code.
request.raise_for_status()
print('Order request URL: ', request.url)
esir_root = ET.fromstring(request.content)
print('Order request response XML content: ', request.content)

#Look up order ID
orderlist = []   
for order in esir_root.findall("./order/"):
    orderlist.append(order.text)
orderID = orderlist[0]
print('order ID: ', orderID)

#Create status URL
statusURL = base_url + '/' + orderID
print('status URL: ', statusURL)

#Find order status
request_response = session.get(statusURL)    
print('HTTP response from order response URL: ', request_response.status_code)

# Raise bad request: Loop will stop for bad response code.
request_response.raise_for_status()
request_root = ET.fromstring(request_response.content)
statuslist = []
for status in request_root.findall("./requestStatus/"):
    statuslist.append(status.text)
status = statuslist[0]
print('Data request ', page_val, ' is submitting...')
print('Initial request status is ', status)

#Continue loop while request is still processing
while status == 'pending' or status == 'processing': 
    print('Status is not complete. Trying again.')
    time.sleep(10)
    loop_response = session.get(statusURL)

# Raise bad request: Loop will stop for bad response code.
    loop_response.raise_for_status()
    loop_root = ET.fromstring(loop_response.content)

#find status
    statuslist = []
    for status in loop_root.findall("./requestStatus/"):
        statuslist.append(status.text)
    status = statuslist[0]
    print('Retry request status is: ', status)
    if status == 'pending' or status == 'processing':
        continue

#Order can either complete, complete_with_errors, or fail:
# Provide complete_with_errors error message:
if status == 'complete_with_errors' or status == 'failed':
    messagelist = []
    for message in loop_root.findall("./processInfo/"):
        messagelist.append(message.text)
    print('error messages:')
    pprint.pprint(messagelist)

# Download zipped order if status is complete or complete_with_errors
if status == 'complete' or status == 'complete_with_errors':
    try:
        downloadURL = 'https://n5eil02u.ecs.nsidc.org/esir/' + orderID
        print('Zip download URL: ', downloadURL)
        print('Beginning download of zipped output...')
        zip_response = session.get(downloadURL)
        # Raise bad request: Loop will stop for bad response code.
        zip_response.raise_for_status()
        with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
            z.extractall(path)

### Select time period of interest

In [7]:
#Input temporal range 

# Somehow only up to 2020-10-26 was downloaded in the first try

# start_date = '2015-03-31'# input('Input start date in yyyy-MM-dd format: ')
# start_date = '2020-10-27'# input('Input start date in yyyy-MM-dd format: ')
# start_date = '2019-06-20'# input('Input start date in yyyy-MM-dd format: ')
# start_time = '00:00:00' # input('Input start time in HH:mm:ss format: ')
# end_date = '2019-07-22' # input('Input end date in yyyy-MM-dd format: ')
# end_time = '00:00:00' # input('Input end time in HH:mm:ss format: ')

# temporal = start_date + 'T' + start_time + 'Z' + ',' + end_date + 'T' + end_time + 'Z'

### Select area of interest

#### Select bounding box or shapefile entry

For all data sets, you can enter a bounding box to be applied to your file search. If you are interested in ICESat-2 data, you may also apply a spatial boundary based on a vector-based spatial data file.

In [66]:
# Enter spatial coordinates in decimal degrees, with west longitude and south latitude reported as negative degrees. Do not include spaces between coordinates.
# Example over the state of Colorado: -109,37,-102,41

# bounding_box = '147.534,-35.324,147.535,-35.323' #input('Input spatial coordinates in the following order: lower left longitude,lower left latitude,upper right longitude,upper right latitude. Leave blank if you wish to provide a vector-based spatial file for ICESat-2 search and subsetting:')

### Determine how many granules exist over this time and area of interest.

In [6]:
# Create CMR parameters used for granule search. Modify params depending on bounding_box or polygon input.

granule_search_url = 'https://cmr.earthdata.nasa.gov/search/granules'
aoi='1'
if aoi == '1':
# bounding box input:
    search_params = {
    'short_name': short_name,
    'version': latest_version,
    'page_size': 100,
    'page_num': 1
    # 'producer_granule_id': 'sand_M01_004.float32'
    }

granules = []
headers={'Accept': 'application/json'}
while True:
    response = requests.get(granule_search_url, params=search_params, headers=headers)
    results = json.loads(response.content)

    if len(results['feed']['entry']) == 0:
        # Out of results, so break out of loop
        break

    # Collect results and increment page_num
    granules.extend(results['feed']['entry'])
    search_params['page_num'] += 1

print('There are', len(granules), 'granules of', short_name, 'version', latest_version, 'over my area and time of interest.')


There are 3535 granules of SMAP_L1_L3_ANC_STATIC version 1 over my area and time of interest.


### Determine the average size of those granules as well as the total volume

In [10]:
granule_sizes = [float(granule['granule_size']) for granule in granules]
print(f'The average size of each granule is {mean(granule_sizes):.2f} MB and the total size of all {len(granules)} granules is {sum(granule_sizes):.2f} MB')

The average size of each granule is 158.44 MB and the total size of all 3535 granules is 560091.66 MB


In [16]:
soil_texture_granules = []
for i, granule in range(len(granules)):
    if granule['producer_granule_id'] contain ''

[{'producer_granule_id': 'NDVI_M01_287_002.int16',
  'updated': '2019-01-28T12:52:38.117Z',
  'dataset_id': 'Soil Moisture Active Passive (SMAP) L1-L3 Ancillary Static Data V001',
  'data_center': 'NSIDC_ECS',
  'title': 'SC:SMAP_L1_L3_ANC_STATIC.001:116619294',
  'coordinate_system': 'NO_SPATIAL',
  'day_night_flag': 'UNSPECIFIED',
  'id': 'G1577889859-NSIDC_ECS',
  'original_format': 'ECHO10',
  'granule_size': '967.471',
  'browse_flag': False,
  'collection_concept_id': 'C1539051655-NSIDC_ECS',
  'online_access_flag': True,
  'links': [{'rel': 'http://esipfed.org/ns/fedsearch/1.1/data#',
    'type': 'application/octet-stream',
    'hreflang': 'en-US',
    'href': 'https://n5eil01u.ecs.nsidc.org/DP4/SMAP_ANC/SMAP_L1_L3_ANC_STATIC.001/2015.01.14/NDVI_M01_287_002.int16'},
   {'rel': 'http://esipfed.org/ns/fedsearch/1.1/metadata#',
    'type': 'text/xml',
    'title': '(METADATA)',
    'hreflang': 'en-US',
    'href': 'https://n5eil01u.ecs.nsidc.org/DP4/SMAP_ANC/SMAP_L1_L3_ANC_STATIC.0

Note that subsetting, reformatting, or reprojecting can alter the size of the granules if those services are applied to your request.

Because variable subsetting can include a long list of variables to choose from, we will decide on variable subsetting separately from the service options above.

### Select data access configurations

The data request can be accessed asynchronously or synchronously. The asynchronous option will allow concurrent requests to be queued and processed without the need for a continuous connection. Those requested orders will be delivered to the specified email address, or they can be accessed programmatically as shown below. Synchronous requests will automatically download the data as soon as processing is complete. The granule limits differ between these two options:

Maximum granules per synchronous request = 100 

Maximum granules per asynchronous request = 2000 

We will set the access configuration depending on the number of granules requested. For requests over 2000 granules, we will produce multiple API endpoints for each 2000-granule order. Please note that synchronous requests may take a long time to complete depending on request parameters, so the number of granules may need to be adjusted if you are experiencing performance issues. The `page_size` parameter can be used to adjust this number. 

In [50]:
#Set NSIDC data access base URL
base_url = 'https://n5eil02u.ecs.nsidc.org/egi/request'

#Set the request mode to asynchronous if the number of granules is over 100, otherwise synchronous is enabled by default
if len(granules) > 100:
    request_mode = 'async'
    page_size = 2000
else: 
    page_size = 100
    request_mode = 'stream'

#Determine number of orders needed for requests over 2000 granules. 
page_num = math.ceil(len(granules)/page_size)

print('There will be', page_num, 'total order(s) processed for our', short_name, 'request.')

There will be 1 total order(s) processed for our SPL3SMP_E request.


In [None]:
https://n5eil02u.ecs.nsidc.org/egi/request?short_name=SPL3SMP_E&version=005

### Create the API endpoint 

Programmatic API requests are formatted as HTTPS URLs that contain key-value-pairs specifying the service operations that we specified above. The following command can be executed via command line, a web browser, or in Python below. 

In [51]:
if aoi == '1':
# bounding box search and subset:
    param_dict = {'short_name': short_name, 
                  'version': latest_version, 
                  'temporal': temporal, 
                  'time': time_var, 
                  'format': reformat, 
                  'projection': projection, 
                  'projection_parameters': projection_parameters, 
                  'Coverage': coverage, 
                  'page_size': page_size, 
                  'request_mode': request_mode, 
                  'agent': agent, 
                  'email': email, }

#Remove blank key-value-pairs
param_dict = {k: v for k, v in param_dict.items() if v != ''}

#Convert to string
param_string = '&'.join("{!s}={!r}".format(k,v) for (k,v) in param_dict.items())
param_string = param_string.replace("'","")

#Print API base URL + request parameters
endpoint_list = [] 
for i in range(page_num):
    page_val = i + 1
    API_request = api_request = f'{base_url}?{param_string}&page_num={page_val}'
    endpoint_list.append(API_request)

print(*endpoint_list, sep = "\n") 

https://n5eil02u.ecs.nsidc.org/egi/request?short_name=SPL3SMP_E&version=005&temporal=2017-12-26T00:00:00Z,2020-10-26T00:00:00Z&format=NetCDF4-CF&projection=GEOGRAPHIC&Coverage=/Soil_Moisture_Retrieval_Data_AM/soil_moisture,                /Soil_Moisture_Retrieval_Data_AM/retrieval_qual_flag,                    /Soil_Moisture_Retrieval_Data_AM/longitude,                        /Soil_Moisture_Retrieval_Data_AM/latitude,                            /Soil_Moisture_Retrieval_Data_AM/EASE_column_index,                                /Soil_Moisture_Retrieval_Data_AM/EASE_row_index,            /Soil_Moisture_Retrieval_Data_PM/soil_moisture_pm,                /Soil_Moisture_Retrieval_Data_PM/retrieval_qual_flag_pm,                    /Soil_Moisture_Retrieval_Data_PM/latitude_pm,                        /Soil_Moisture_Retrieval_Data_PM/longitude_pm,                            /Soil_Moisture_Retrieval_Data_PM/EASE_column_index_pm,                                /Soil_Moisture_Retrieval_Data_PM/EASE

### Request data

We will now download data using the Python requests library. The data will be downloaded directly to this notebook directory in a new Outputs folder. The progress of each order will be reported.

In [52]:
# Create an output folder if the folder does not already exist.

path = os.path.join(out_path, short_name)
if not os.path.exists(path):
    os.mkdir(path)

# Different access methods depending on request mode:

if request_mode=='async':
    # Request data service for each page number, and unzip outputs
    
    for i in range(page_num):
        page_val = i + 1
        print('Order: ', page_val)

    # For all requests other than spatial file upload, use get function
        request = session.get(base_url, params=param_dict)

        print('Request HTTP response: ', request.status_code)

    # Raise bad request: Loop will stop for bad response code.
        request.raise_for_status()
        print('Order request URL: ', request.url)
        esir_root = ET.fromstring(request.content)
        print('Order request response XML content: ', request.content)

    #Look up order ID
        orderlist = []   
        for order in esir_root.findall("./order/"):
            orderlist.append(order.text)
        orderID = orderlist[0]
        print('order ID: ', orderID)

    #Create status URL
        statusURL = base_url + '/' + orderID
        print('status URL: ', statusURL)

    #Find order status
        request_response = session.get(statusURL)    
        print('HTTP response from order response URL: ', request_response.status_code)

    # Raise bad request: Loop will stop for bad response code.
        request_response.raise_for_status()
        request_root = ET.fromstring(request_response.content)
        statuslist = []
        for status in request_root.findall("./requestStatus/"):
            statuslist.append(status.text)
        status = statuslist[0]
        print('Data request ', page_val, ' is submitting...')
        print('Initial request status is ', status)

    #Continue loop while request is still processing
        while status == 'pending' or status == 'processing': 
            print('Status is not complete. Trying again.')
            time.sleep(10)
            loop_response = session.get(statusURL)

    # Raise bad request: Loop will stop for bad response code.
            loop_response.raise_for_status()
            loop_root = ET.fromstring(loop_response.content)

    #find status
            statuslist = []
            for status in loop_root.findall("./requestStatus/"):
                statuslist.append(status.text)
            status = statuslist[0]
            print('Retry request status is: ', status)
            if status == 'pending' or status == 'processing':
                continue

    #Order can either complete, complete_with_errors, or fail:
    # Provide complete_with_errors error message:
        if status == 'complete_with_errors' or status == 'failed':
            messagelist = []
            for message in loop_root.findall("./processInfo/"):
                messagelist.append(message.text)
            print('error messages:')
            pprint.pprint(messagelist)

    # Download zipped order if status is complete or complete_with_errors
        if status == 'complete' or status == 'complete_with_errors':
            try:
                downloadURL = 'https://n5eil02u.ecs.nsidc.org/esir/' + orderID
                print('Zip download URL: ', downloadURL)
                print('Beginning download of zipped output...')
                zip_response = session.get(downloadURL)
                # Raise bad request: Loop will stop for bad response code.
                zip_response.raise_for_status()
                with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
                    z.extractall(path)
            except:
                for nzip in range(1,5):
                    downloadURL = 'https://n5eil02u.ecs.nsidc.org/esir/' + orderID + '.zip?' +  str(nzip)
                    print('Zip download URL: ', downloadURL)
                    print('Beginning download of zipped output...')
                    zip_response = session.get(downloadURL)
                    # Raise bad request: Loop will stop for bad response code.
                    zip_response.raise_for_status()
                    with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
                        z.extractall(path)
                print('Data request', page_val, 'is complete.')
        else: print('Request failed.')
            
else:
    for i in range(page_num):
        page_val = i + 1
        print('Order: ', page_val)
        print('Requesting...')
        request = session.get(base_url, params=param_dict)
        print('HTTP response from order response URL: ', request.status_code)
        request.raise_for_status()
        d = request.headers['content-disposition']
        fname = re.findall('filename=(.+)', d)
        dirname = os.path.join(path,fname[0].strip('\"'))
        print('Downloading...')
        open(dirname, 'wb').write(request.content)
        print('Data request', page_val, 'is complete.')
    
    # Unzip outputs
    for z in os.listdir(path): 
        if z.endswith('.zip'): 
            zip_name = path + "/" + z 
            zip_ref = zipfile.ZipFile(zip_name) 
            zip_ref.extractall(path) 
            zip_ref.close() 
            os.remove(zip_name) 


Order:  1
Request HTTP response:  201
Order request URL:  https://n5eil02u.ecs.nsidc.org/egi/request?short_name=SPL3SMP_E&version=005&temporal=2017-12-26T00%3A00%3A00Z%2C2020-10-26T00%3A00%3A00Z&format=NetCDF4-CF&projection=GEOGRAPHIC&Coverage=%2FSoil_Moisture_Retrieval_Data_AM%2Fsoil_moisture%2C++++++++++++++++%2FSoil_Moisture_Retrieval_Data_AM%2Fretrieval_qual_flag%2C++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_AM%2Flongitude%2C++++++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_AM%2Flatitude%2C++++++++++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_AM%2FEASE_column_index%2C++++++++++++++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_AM%2FEASE_row_index%2C++++++++++++%2FSoil_Moisture_Retrieval_Data_PM%2Fsoil_moisture_pm%2C++++++++++++++++%2FSoil_Moisture_Retrieval_Data_PM%2Fretrieval_qual_flag_pm%2C++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_PM%2Flatitude_pm%2C++++++++++++++++++++++++%2FSoil_Moisture_Retrieval_Data_PM%2Flongitude_pm%2C+++++++++++++++++++++++

KeyboardInterrupt: 

### Finally, we will clean up the Output folder by removing individual order folders:

In [53]:
# Clean up Outputs folder by removing individual granule folders 

for root, dirs, files in os.walk(path, topdown=False):
    for file in files:
        try:
            shutil.move(os.path.join(root, file), path)
        except OSError:
            pass
    for name in dirs:
        try:
            os.rmdir(os.path.join(root, name))    
        except OSError:
            pass

In [55]:
for root, dirs, files in os.walk(path, topdown=False):
    for file in files:
        print(file)

README
SMAP_L3_SM_P_E_20150331_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150401_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150402_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150403_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150404_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150405_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150406_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150407_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150408_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150409_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150410_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150411_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150412_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150413_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150414_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150415_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150416_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150417_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150418_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150419_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150420_R18290_001_HEGOUT.nc
SMAP_L3_SM_P_E_20150421_R18290_001_HEGOUT.nc
SMA

In [63]:
import os
import datetime

def is_complete(folder):
    # Create a set to store the dates for which we've seen a file
    seen_dates = set()
    
    # Loop through the files in the folder
    for file in os.listdir(folder):
        # Check if the file name matches the pattern
        if not file.startswith("SMAP_L3_SM_P_E_") or not file.endswith("_HEGOUT.nc"):
            continue
        
        # Extract the date from the file name
        date_str = file[15:23]
        # print(date_str)
        try:
            date = datetime.datetime.strptime(date_str, "%Y%m%d").date()
        except ValueError:
            continue
        
        # Add the date to the set of seen dates
        seen_dates.add(date)
        
    # Create a set of all the dates from 2015 to 2022
    all_dates = set()
    start_date = datetime.date(2015, 3, 31)
    end_date = datetime.date(2022, 3, 30)
    current_date = start_date
    while current_date <= end_date:
        all_dates.add(current_date)
        current_date += datetime.timedelta(days=1)
    
    # Find the missing dates
    missing_dates = all_dates - seen_dates
    # print(seen_dates)
    # Check if the set of seen dates is equal to the set of all dates
    if missing_dates:
        print("Missing dates:")
        for date in sorted(missing_dates):
            print(date)
    else:
        return True
    return False

if is_complete(path):
    print("Daily data is complete from March 31, 2015 to March 30, 2022.")
else:
    print("Daily data is not complete from March 31, 2015 to March 30, 2022.")


Missing dates:
2015-01-01
2015-01-02
2015-01-03
2015-01-04
2015-01-05
2015-01-06
2015-01-07
2015-01-08
2015-01-09
2015-01-10
2015-01-11
2015-01-12
2015-01-13
2015-01-14
2015-01-15
2015-01-16
2015-01-17
2015-01-18
2015-01-19
2015-01-20
2015-01-21
2015-01-22
2015-01-23
2015-01-24
2015-01-25
2015-01-26
2015-01-27
2015-01-28
2015-01-29
2015-01-30
2015-01-31
2015-02-01
2015-02-02
2015-02-03
2015-02-04
2015-02-05
2015-02-06
2015-02-07
2015-02-08
2015-02-09
2015-02-10
2015-02-11
2015-02-12
2015-02-13
2015-02-14
2015-02-15
2015-02-16
2015-02-17
2015-02-18
2015-02-19
2015-02-20
2015-02-21
2015-02-22
2015-02-23
2015-02-24
2015-02-25
2015-02-26
2015-02-27
2015-02-28
2015-03-01
2015-03-02
2015-03-03
2015-03-04
2015-03-05
2015-03-06
2015-03-07
2015-03-08
2015-03-09
2015-03-10
2015-03-11
2015-03-12
2015-03-13
2015-03-14
2015-03-15
2015-03-16
2015-03-17
2015-03-18
2015-03-19
2015-03-20
2015-03-21
2015-03-22
2015-03-23
2015-03-24
2015-03-25
2015-03-26
2015-03-27
2015-03-28
2015-03-29
2015-03-30
2015-0

In [59]:
print(all_dates)

NameError: name 'all_dates' is not defined

### To review, we have explored data availability and volume over a region and time of interest, discovered and selected data customization options, constructed an API endpoint for our request, and downloaded data directly to our local machine. You are welcome to request different data sets, areas of interest, and/or customization services by re-running the notebook or starting again at the 'Select a data set of interest' step above. 