# NEON API exploration

In this file, I'll be exploring the NEON API using [this NEON API tutorial](https://www.neonscience.org/resources/learning-hub/tutorials/neon-api-01-introduction-requests) and NEON AOP HDF5 reflectance files using [this NEON AOP HDF5 tutorial](https://www.neonscience.org/resources/learning-hub/tutorials/neon-aop-hdf5-tile-py)

In [3]:
import requests # necessary for NEON API
import json

import numpy as np # necessary for exploring h5 files
import h5py
import osgeo.gdal, osgeo.osr, os
import matplotlib.pyplot as plt

from collections import namedtuple # for nicely organizing the many URLs and metadata
from itertools import chain # flattening a list when iterating very nested json info


In [4]:
# To request data, need to provide NEON's URL
SERVER = 'http://data.neonscience.org/api/v0/'

Some of the important endpoints for NEON are:
* _sites/_
* _products/_
* _data/_

In [5]:
# Define sites of interest
SITECODE_LIST = ['TEAK', 'SOAP', 'SJER']

# Make requests, then turn request data into a Python JSON object so it's easier to interact with
site_json_list = [requests.get(SERVER+'sites/'+SITECODE).json() 
                          for SITECODE in SITECODE_LIST]

In [6]:
# Display json info TEAK site
TEAK_json = site_json_list[0]
## TEAK_json ## super long output -- uncomment if you want to see

That was a huge dict...let's explore smaller parts and try to avoid listing the many dates

In [7]:
print(TEAK_json.keys(),'\n') # original dict only has 1 key -- 'data'

# look at info inside of 'data' key
NEON_dataDict_info = [str(key)+':\t'+str(type(TEAK_json['data'][key])) 
                          for key in TEAK_json['data'].keys()] 
print('\n'.join(NEON_dataDict_info), '\n')


# data products looked most interesting because I want to use that list to query products of interest
print('Exploring dataProducts')
dataProduct_Info = [str(type(product)) 
                        for product in TEAK_json['data']['dataProducts']] 
print('dataProducts contains a bunch of dicts...')
print(', '.join(dataProduct_Info), '\n')


print('What is in each of those dicts? They all have the same structure.')
dataProducts_ListOfDicts = TEAK_json['data']['dataProducts']
NEON_dataProducts_info = [str(subDict.keys())
                              for subDict in dataProducts_ListOfDicts] 
print('\n'.join(NEON_dataProducts_info), '\n')


print('What are the data product codes and titles?')
NEON_dataProductCodes_info = [str(productDict['dataProductCode'])+'\t'+str(productDict['dataProductTitle'])
                                  for productDict in dataProducts_ListOfDicts] 
print('\n'.join(NEON_dataProductCodes_info), '\n')

dict_keys(['data']) 

siteCode:	<class 'str'>
siteName:	<class 'str'>
siteDescription:	<class 'str'>
siteType:	<class 'str'>
siteLatitude:	<class 'float'>
siteLongitude:	<class 'float'>
stateCode:	<class 'str'>
stateName:	<class 'str'>
domainCode:	<class 'str'>
domainName:	<class 'str'>
deimsId:	<class 'str'>
releases:	<class 'list'>
dataProducts:	<class 'list'> 

Exploring dataProducts
dataProducts contains a bunch of dicts...
<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class '

# Filtering URLs (for eventual download)

Using this information about the data available for my 3 sites of interest, I want to download data that meet the following specifications for each site:
* Data products:
    * Vegetation structure (DP1.10098.001)
    * Elevation - LiDAR (DP3.30024.001)
    * Spectrometer orthorectified surface directional reflectance - mosaic (DP3.30006.001)
* Most recent dataset

In [8]:
# Specifications
SERVER = 'http://data.neonscience.org/api/v0/'
SITES_LIST = ['TEAK', 'SOAP', 'SJER']
DATAPRODUCTS_DICT = {'Vegetation structure': 'DP1.10098.001', 
                     'Elevation - LiDAR': 'DP3.30024.001', 
                     'Spectrometer orthorectified surface directional reflectance - mosaic': 'DP3.30006.001'}

## Get a list of URLs for data to download ##

# get json info for my sites of interest
site_json_list = [requests.get(SERVER+'sites/'+site).json() 
                          for site in SITES_LIST]

# for each site, extract only the relevant dataproducts
dataprod_codes_list = list(DATAPRODUCTS_DICT.values())

def getProductsAtSite(site_json, dataprod_codes_list):
    return [site_dataProd_dict
            for site_dataProd_dict in site_json['data']['dataProducts']
            if site_dataProd_dict['dataProductCode'] in dataprod_codes_list
           ]

site_dataprod_dict = {site_json['data']['siteCode']:
                      getProductsAtSite(site_json, dataprod_codes_list)
                      for site_json in site_json_list}

## get the URL and only relevant metadata for only the most recent available datasets
# for each site, create a named tuple of tuples of the format below
productTupleNames = namedtuple('PRODUCT_METADATA', ['SITE','PRODUCT_TITLE', 'PRODUCT_CODE', 'DATE', 'URL'])

# for all products at all sites, get the metadata needed for our named productTuple
productTupleMetadata = list(chain(*[
    [(site,
      productD['dataProductTitle'],
      productD['dataProductCode'],
      productD['availableMonths'][-1],
      productD['availableDataUrls'][-1]
     )
     for productD in site_dataprod_dict[site]
    ]
    for site in site_dataprod_dict.keys()
]))

# the final named tuple with only our sites, products, and dates of interest
productTupleL=[productTupleNames(*productInfo)
               for productInfo in productTupleMetadata]

print(productTupleL)

[PRODUCT_METADATA(SITE='TEAK', PRODUCT_TITLE='Vegetation structure', PRODUCT_CODE='DP1.10098.001', DATE='2021-12', URL='https://data.neonscience.org/api/v0/data/DP1.10098.001/TEAK/2021-12'), PRODUCT_METADATA(SITE='TEAK', PRODUCT_TITLE='Spectrometer orthorectified surface directional reflectance - mosaic', PRODUCT_CODE='DP3.30006.001', DATE='2021-07', URL='https://data.neonscience.org/api/v0/data/DP3.30006.001/TEAK/2021-07'), PRODUCT_METADATA(SITE='TEAK', PRODUCT_TITLE='Elevation - LiDAR', PRODUCT_CODE='DP3.30024.001', DATE='2021-07', URL='https://data.neonscience.org/api/v0/data/DP3.30024.001/TEAK/2021-07'), PRODUCT_METADATA(SITE='SOAP', PRODUCT_TITLE='Vegetation structure', PRODUCT_CODE='DP1.10098.001', DATE='2022-01', URL='https://data.neonscience.org/api/v0/data/DP1.10098.001/SOAP/2022-01'), PRODUCT_METADATA(SITE='SOAP', PRODUCT_TITLE='Spectrometer orthorectified surface directional reflectance - mosaic', PRODUCT_CODE='DP3.30006.001', DATE='2021-07', URL='https://data.neonscience.or

In [10]:
# exploring the files to download from each URL
for product_metadata in productTupleL:
    # make request with saved url
    data_request = requests.get(product_metadata.URL)
    data_json = data_request.json()
    
    # print info on the files we can access for this particular site and product
    print(product_metadata.SITE, '\t|', product_metadata.PRODUCT_TITLE)
    print('\n'.join(set([' '.join(fileD['name'].replace('.','_').split('_')[6:]) # the first 6 info slots are redundant 
                     for fileD in data_json['data']['files']]) # for clarity, I used set to remove duplicate printouts -- but there are LOTS of h5 files
                   ),
         '\n\n')

TEAK 	| Vegetation structure
vst perplotperyear 2021-12 basic 20220711T220416Z csv
variables 20220711T220416Z csv
vst apparentindividual 2021-12 basic 20220711T220416Z csv
readme 20220711T220416Z txt
validation 20220711T220416Z csv
EML 20211207-20211208 20220711T220416Z xml
vst mappingandtagging basic 20220711T220416Z csv
categoricalCodes 20220711T220416Z csv 


TEAK 	| Spectrometer orthorectified surface directional reflectance - mosaic
pdf
readme 20211007T212640Z txt
reflectance h5 


TEAK 	| Elevation - LiDAR

classified point cloud prj
readme 20211007T212640Z txt
DSM tif
classified point cloud dbf
classified point cloud shx
DTM tif
classified point cloud kml
processing pdf
classified point cloud shp 


SOAP 	| Vegetation structure
variables 20220801T222504Z csv
vst mappingandtagging basic 20220801T222504Z csv
categoricalCodes 20220801T222504Z csv
vst perplotperyear 2022-01 basic 20220801T222504Z csv
readme 20220801T222504Z txt
EML 20220117-20220117 20220801T222504Z xml
validation 2

# Detour: Downloading only 1 h5 file for initial exploration

In my original workflow planning, I didn't really put together that there would be SO MANY files associated with each plot's DSM, DTM, and reflectance data, because the data are tiled. So, I'm taking a quick detour to explore reflectance data from a single tile and then I'll learn to put the tiles back into the original mosaic so I can proceed with my workflow.