## Download AOP data from the API in Python
### Subset by an area of interest, specified by a Polygon

This script runs through a basic workflow of using functions to explore the NEON API, and downloads a subset of NEON discrete lidar data within an area of interest at the NEON site SERC (Smithsonian Environmental Research Center).

### Requirements
1. Python packages (only the non-standard ones are listed):
    - h5py
    - requests
    - urllib
    - shapely
    
2. This script calls on a module located in the sub-directory '../python_modules', which will be in the correct location if you clone this full repository. The module is called `neon_aop_download.py`.

The first code chunks below import the required Python packages and modules, creates a class provided the input requirements for the API (requests and token) and defines the NEON discrete lidar data product ID and site of interest.

In [1]:
# download the required packages; this script calls on two modules (scripts) that need to be saved in the same directory as this notebook
import os, sys
import time
import numpy as np
import pandas as pd
from shapely.geometry import Polygon, box

In [2]:
# add the python_modules to the path and import the python neon download and hyperspectral functions
sys.path.insert(0, '../python_modules')
import neon_aop_download_functions as neon_dl

TIP: When downloading data using the NEON API, we recommend using a token. To set this up, refer to the first part of the [NEON API Tokens Tutorial](https://www.neonscience.org/resources/learning-hub/tutorials/neon-api-tokens-tutorial). While this tutorial is written for working in R, the instructions for creating your personal token are the same.

You would enter your token in the Python download function as follows, where 'token_string' will be your token string obtained from your NEON User Account. By default, the function will not use a token (as shown in the code block below).

For AOP we only have the current release (2023) + Provisional data, so you can leave this as the default (None). Please be aware of the nature of Provisional data. For more information please read: [data-revisions-releases](https://www.neonscience.org/data-samples/data-management/data-revisions-releases)

```python
my_token = 'token_string' 
neon_api = neon_dl.AopApiHandler(token=my_token, release_tag=None)
```

In [3]:
# define a class with optional inputs (of token and release_tag), or leave blank for default parameters
neon_api = neon_dl.AopApiHandler()

# replace with the code below including your token
# my_token = 'token_string'
# neon_api = neon_dl.AopApiHandler(token = my_token)

In [4]:
# use the neon_download_functions "neon_dl.list_available_urls" to show the urls of data available at SERC
serc_urls = neon_api.list_urls_by_product_site('DP1.30003.001','SERC')
serc_urls

['https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2016-07',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2017-07',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2017-08',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2019-05',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2021-08',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2022-05']

In [5]:
# set the NEON data product ID and site, see links in comments below for more details on these
dpID='DP1.30003.001' # https://data.neonscience.org/data-products/DP1.30003.001
site = 'SERC' # https://www.neonscience.org/field-sites/serc

Now you can use the `neon_api` class to determine what data is available, what files are available, and subset by an area (polygon) of interest.

In [6]:
# use the neon_download_functions "neon_dl.list_available_urls" to show the urls of data available at SERC
neon_api.list_urls_by_product_site(dpID,site)

['https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2016-07',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2017-07',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2017-08',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2019-05',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2021-08',
 'https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2022-05']

In [7]:
# let's look at only the last year of data (the last url in the list)
urls = ['https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2022-05']

In [8]:
# the discrete lidar data product (DP1.30003.001) includes the unclassified point clouds by flight line, 
# classified (and colorized) point cloud by 1km x 1km tile, and metadata shape files
all_laz_files = neon_api.list_all_files(urls)
all_laz_files[:10]

['2022053017_P1C1_SBET_QAQC.pdf',
 'NEON.D02.SERC.DP1.30003.001.readme.20220725T151332Z.txt',
 'NEON_D02_SERC_DP1_361000_4301000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DPQA_359000_4301000_boundary.prj',
 'NEON_D02_SERC_DPQA_359000_4304000_boundary.prj',
 'NEON_D02_SERC_DP1_358000_4304000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DPQA_L014-1_2022053017_boundary.kml',
 'NEON_D02_SERC_DPQA_363000_4300000_boundary.dbf',
 'NEON_D02_SERC_DPQA_363000_4303000_boundary.dbf',
 'NEON_D02_SERC_DP1_363000_4309000_classified_point_cloud_colorized.laz']

In [9]:
# extensions for classified point clouds are either 'colorized.laz' or '_classified_point_cloud.laz'
# to see all the files available for only the classified point cloud data, specify those extensions
classified_laz_exts = ('colorized.laz','_classified_point_cloud.laz')
classified_laz_files = neon_api.list_all_files(urls,classified_laz_exts)
classified_laz_files[:10]

['NEON_D02_SERC_DP1_361000_4301000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_358000_4304000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_363000_4309000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_361000_4306000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_365000_4302000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_365000_4307000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_363000_4304000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_358000_4309000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_368000_4307000_classified_point_cloud_colorized.laz',
 'NEON_D02_SERC_DP1_366000_4304000_classified_point_cloud_colorized.laz']

In [10]:
# to see the corresponding shape files, you can use the boundary.kml extension
# these include shapefiles of the flightline boundaries as well as the tile boundaries
boundary_kmls = neon_api.list_all_files(urls,'boundary.kml')
boundary_kmls[:10]

['NEON_D02_SERC_DPQA_L014-1_2022053017_boundary.kml',
 'NEON_D02_SERC_DPQA_365000_4309000_boundary.kml',
 'NEON_D02_SERC_DPQA_365000_4300000_boundary.kml',
 'NEON_D02_SERC_DPQA_365000_4306000_boundary.kml',
 'NEON_D02_SERC_DPQA_365000_4303000_boundary.kml',
 'NEON_D02_SERC_DPQA_L022-1_2022053017_boundary.kml',
 'NEON_D02_SERC_DPQA_L021-1_2022053017_boundary.kml',
 'NEON_D02_SERC_DPQA_L026-1_2022052911_boundary.kml',
 'NEON_D02_SERC_DPQA_362000_4305000_boundary.kml',
 'NEON_D02_SERC_DPQA_362000_4308000_boundary.kml']

In [11]:
# Define the UTM polygon vertices
polygon_vertices = [(362050, 4308050), (364600, 4307460), (365200, 4306600), (363525, 4305230)]
polygon = Polygon(polygon_vertices)

# Function to extract UTM coordinates from file names
def extract_coordinates(file_name):
    parts = file_name.split('_')
    utm_x = int(parts[4])
    utm_y = int(parts[5])
    return utm_x, utm_y

# Function to create a tile polygon from UTM coordinates
def create_tile_polygon(utm_x, utm_y):
    # Each tile is 1km by 1km
    return box(utm_x, utm_y, utm_x + 1000, utm_y + 1000)

# Find tiles that intersect with the polygon
tile_subset = []

for file_name in classified_laz_files:
#     if file_name.endswith('.laz'):
    utm_x, utm_y = extract_coordinates(file_name)
    tile_polygon = create_tile_polygon(utm_x, utm_y)
    if tile_polygon.intersects(polygon) or tile_polygon.within(polygon):
        tile_subset.append(file_name)

# Print the list of tiles that intersect with the polygon
print('Classified point cloud tiles found within the polygon:')
print("%s files found" % len(tile_subset))
for tile_name in tile_subset:
    print(tile_name)

Classified point cloud tiles found within the polygon:
10 files found
NEON_D02_SERC_DP1_364000_4306000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_363000_4305000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_365000_4306000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_364000_4307000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_364000_4305000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_363000_4306000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_363000_4307000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_362000_4308000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_362000_4306000_classified_point_cloud_colorized.laz
NEON_D02_SERC_DP1_362000_4307000_classified_point_cloud_colorized.laz


In [12]:
# trick to find the associated boundary kml files, they follow a similar naming convention
laz_boundary_subset = [f.replace('DP1','DPQA').replace('classified_point_cloud_colorized.laz','boundary.kml') for f in tile_subset]
laz_boundary_subset

['NEON_D02_SERC_DPQA_364000_4306000_boundary.kml',
 'NEON_D02_SERC_DPQA_363000_4305000_boundary.kml',
 'NEON_D02_SERC_DPQA_365000_4306000_boundary.kml',
 'NEON_D02_SERC_DPQA_364000_4307000_boundary.kml',
 'NEON_D02_SERC_DPQA_364000_4305000_boundary.kml',
 'NEON_D02_SERC_DPQA_363000_4306000_boundary.kml',
 'NEON_D02_SERC_DPQA_363000_4307000_boundary.kml',
 'NEON_D02_SERC_DPQA_362000_4308000_boundary.kml',
 'NEON_D02_SERC_DPQA_362000_4306000_boundary.kml',
 'NEON_D02_SERC_DPQA_362000_4307000_boundary.kml']

In [13]:
file_urls = neon_api.get_aop_file_urls(product=dpID,
                                       site='SERC',
                                       file_list=tile_subset,
                                       year='2022')

https://data.neonscience.org/api/v0/data/DP1.30003.001/SERC/2022-05


Download these file lists (for the boundary kml files and associated classified point cloud data tiles) using the `download_aop_file_list` function:

In [14]:
# download the laz_boundary_files (kmls) within the polygon
start_time = time.time()
neon_api.download_aop_file_list(product=dpID,
                               site='SERC',
                               file_list=laz_boundary_subset,
                               year='2022',
                               download_folder='.\data\laz_boundary',
                               check_size = False)
print("--- %s seconds ---" % round((time.time() - start_time),0))

Downloading NEON_D02_SERC_DPQA_365000_4306000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_362000_4308000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_363000_4305000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_364000_4306000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_364000_4305000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_364000_4307000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_362000_4307000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_363000_4306000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_362000_4306000_boundary.kml to .\data\laz_boundary
Downloading NEON_D02_SERC_DPQA_363000_4307000_boundary.kml to .\data\laz_boundary
--- 1.0 seconds ---


In [15]:
start_time = time.time()
neon_api.download_aop_file_list(product=dpID,
                               site='SERC',
                               file_list=tile_subset,
                               year='2022',
                               download_folder='.\data\laz',
                               check_size = True)
print("--- %s minutes ---" % round((time.time() - start_time)/60,1))

file count: 10
Download size: 1.54 GB
Do you want to continue with the download? (y/n) y
Downloading NEON_D02_SERC_DP1_364000_4306000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_363000_4305000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_365000_4306000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_364000_4307000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_364000_4305000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_363000_4306000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_363000_4307000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_362000_4308000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_362000_4306000_classified_point_cloud_colorized.laz to .\data\laz
Downloading NEON_D02_SERC_DP1_362000_4307000_cl