Let's download some lidar data using the [NEON Data Portal API](http://data.neonscience.org/data-api)

First we are going to import `requests` and `json` libraries

In [3]:
import sys
print(sys.version)

3.5.5 |Anaconda, Inc.| (default, May 13 2018, 21:12:35) 
[GCC 7.2.0]


In [5]:
import requests
import json
import os
import time

SITECODE = "SRER" #the site code for Santa Rita Experimental Range
PRODUCTCODE = "DP1.30003.001" #the product code for Discrete Return lidar
SERVER = "http://data.neonscience.org/api/v0/" #the current server address

We want to track how big the files sizes are - in case there is a breakdown in the transfer.

Here is a simple way to track the file size using some definitions:

In [6]:
def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0

def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)

We query the server for the site location:

In [7]:
site_response = requests.get(SERVER + 'sites/' + SITECODE)

Next, we create a JSON blob for viewing the data at this site:

In [8]:
site_response_json = site_response.json()
print(json.dumps(site_response_json, indent=2)) #using json.dumps for formatting

{
  "data": {
    "stateName": "Arizona",
    "domainName": "Desert Southwest",
    "siteType": "CORE",
    "stateCode": "AZ",
    "siteLongitude": -110.83549,
    "dataProducts": [
      {
        "dataProductCode": "DP1.10066.001",
        "dataProductTitle": "Root sampling (Megapit)",
        "availableDataUrls": [
          "http://data.neonscience.org:80/api/v0/data/DP1.10066.001/SRER/2013-11"
        ],
        "availableMonths": [
          "2013-11"
        ]
      },
      {
        "dataProductCode": "DP1.10023.001",
        "dataProductTitle": "Herbaceous clip harvest",
        "availableDataUrls": [
          "http://data.neonscience.org:80/api/v0/data/DP1.10023.001/SRER/2016-04",
          "http://data.neonscience.org:80/api/v0/data/DP1.10023.001/SRER/2016-08",
          "http://data.neonscience.org:80/api/v0/data/DP1.10023.001/SRER/2017-04",
          "http://data.neonscience.org:80/api/v0/data/DP1.10023.001/SRER/2017-05",
          "http://data.neonscience.org:80/api/v0/

We subset the data to look for the discrete lidar data by date using the product code:

In [9]:
data_products = site_response_json['data']['dataProducts']

#use a list comprehension here if you're feeling fancy
for data_product in data_products:
    if (data_product['dataProductCode'] == PRODUCTCODE):
        months = data_product['availableMonths']

print(months)

['2017-08']


Using the result we select the date range for only these lidar data:

In [10]:
data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
data_response_json = data_response.json()
print(json.dumps(data_response_json, indent=2))

{
  "data": {
    "month": "2017-08",
    "files": [
      {
        "url": "https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/Laz/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180710T211034Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=pub-internal-read%2F20180710%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=63b769bbd8a7f3fc153e40e689c18d2b825fcb22cfa2d88272c365786956fb41",
        "size": "42397860",
        "crc32": "e633e720",
        "name": "NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz"
      },
      {
        "url": "https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/ClassifiedPointCloud/NEON_D14_SRER_DP1_514000_3524000_classified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180710T211034Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=pub-internal

We can see how many files are in the dataset:

In [11]:
print("number of files in dataset: ")
number_files = print(len(data_response_json["data"]["files"][0]["url"]))

number of files in dataset: 
432


The JSON blob returns with the name of the tiles and their url for downloading.

We can go a level deeper into the JSON by looking for the individual file url, name, and size:

In [14]:
data_url = data_response_json["data"]["files"][0]["url"]
data_name = data_response_json["data"]["files"][0]["name"]
data_size = data_response_json["data"]["files"][0]["size"]
print(json.dumps(data_url, indent=0))
print(json.dumps(data_name, indent=0))
print(json.dumps(data_size, indent=0))

"https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/Laz/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180710T211034Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=pub-internal-read%2F20180710%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=63b769bbd8a7f3fc153e40e689c18d2b825fcb22cfa2d88272c365786956fb41"
"NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz"
"42397860"


In [15]:
print(data_url)

https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/Laz/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180710T211034Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=pub-internal-read%2F20180710%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=63b769bbd8a7f3fc153e40e689c18d2b825fcb22cfa2d88272c365786956fb41


In [16]:
path = '/scratch/' + data_name
print(path)

/scratch/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz


In [20]:
data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
data_response_json = data_response.json()
data_url = data_response_json["data"]["files"][0]["url"] 
print("Data URL: " + data_url)
data_name = data_response_json["data"]["files"][0]["name"]
data_size = data_response_json["data"]["files"][0]["size"]
path = '/scratch/' + data_name

Data URL: https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/Laz/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180710T211210Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=pub-internal-read%2F20180710%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=e9f87ccfb20b1a864a721fbee928b59fd41284cac34997a698c84c150aaa3c50


In [21]:
extension_filename = data_url.split('/')[-1]
local_filename = extension_filename.split('?')[-2]
print(local_filename)

NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz


We are ready to download individual files using the GET command: 

In [22]:
print("Downloading file of size " + data_size + " to " + path)
response = requests.get(data_url, stream=True)  
handle = open(path, "wb")
start_time = time.time()
for chunk in response.iter_content(chunk_size=67108864):
    if chunk: # filter out to keep alive new chunks
        handle.write(chunk)
        print("Downloaded size: " + file_size(path))
    print("Expected file size: " + data_size + " bytes")
    print("Downloaded file size: " + file_size(path))
print("--- %s seconds ---" % (time.time() - start_time))

Downloading file of size 42397860 to /scratch/NEON_D14_SRER_DP1_L084-4_2017082519_unclassified_point_cloud.laz
Downloaded size: 40.4 MB
Expected file size: 42397860 bytes
Downloaded file size: 40.4 MB
--- 65.40738487243652 seconds ---


I've written a for loop to query the server for the JSON every time a new file is requested.

Initially,  I had a problem with the loop timing out from the initial requests.get  because

the NEON API creates a time stamp for when you hit the service, after about 5 minutes the files 

were breaking and only the headers were being downloaded.


In [14]:
for x in range(0, 443):
    data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
    data_response_json = data_response.json()
    data_url = data_response_json["data"]["files"][x]["url"] 
    print("Data URL: " + data_url)
    data_name = data_response_json["data"]["files"][x]["name"]
    data_size = data_response_json["data"]["files"][x]["size"]
    path = '/vol_c/srer/' + data_name
    print("Downloading file of size " + data_size + " to " + path)
    response = requests.get(data_url)  
    handle = open(path, "wb")
    for chunk in response.iter_content(chunk_size=67108864):
        if chunk: # filter out to keep alive new chunks
            handle.write(chunk)
    print(data_name + " downloaded!")
    print("Expected file size: " + data_size)
    print("Downloaded file size: " + file_size(path))


Data URL: https://neon-aop-product.s3.data.neonscience.org:443/2017/FullSite/D14/2017_SRER_1/L1/DiscreteLidar/ClassifiedPointCloud/NEON_D14_SRER_DP1_518000_3516000_classified_point_cloud.laz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20180409T221350Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=pub-internal-read%2F20180409%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=597911c1bba45e26adfd048c67688a3f59716a3a5838c5c235820cc312cd988c
Downloading file of size 15645597 to /vol_c/srer/NEON_D14_SRER_DP1_518000_3516000_classified_point_cloud.laz


KeyboardInterrupt: 

Once the data are done downloading to my VM, I'm ready  to move onto the next step, filtering and processing the data with PDAL