Let's download some lidar data using the [NEON Data Portal API](http://data.neonscience.org/data-api)

First we are going to import `requests` and `json` libraries

In [None]:
import requests
import json
import os
import time

SITECODE = "SRER" #the site code for Santa Rita Experimental Range
PRODUCTCODE = "DP1.30003.001" #the product code for Discrete Return lidar
SERVER = "http://data.neonscience.org/api/v0/" #the current server address

We want to track how big the files sizes are - in case there is a breakdown in the transfer.

Here is a simple way to track the file size using some definitions:

In [None]:
def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0

def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)

We query the server for the site location:

In [None]:
site_response = requests.get(SERVER + 'sites/' + SITECODE)

Next, we create a JSON blob for viewing the data at this site:

In [None]:
site_response_json = site_response.json()
print(json.dumps(site_response_json, indent=2)) #using json.dumps for formatting

We subset the data to look for the discrete lidar data by date using the product code:

In [None]:
data_products = site_response_json['data']['dataProducts']

#use a list comprehension here if you're feeling fancy
for data_product in data_products:
    if (data_product['dataProductCode'] == PRODUCTCODE):
        months = data_product['availableMonths']

print(months)

Using the result we select the date range for only these lidar data:

In [None]:
data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
data_response_json = data_response.json()
print(json.dumps(data_response_json, indent=2))

We can see how many files are in the dataset:

In [None]:
print("number of files in dataset: ")
number_files = print(len(data_response_json["data"]["files"][0]["url"]))

The JSON blob returns with the name of the tiles and their url for downloading.

We can go a level deeper into the JSON by looking for the individual file url, name, and size:

In [None]:
data_url = data_response_json["data"]["files"][0]["url"]
data_name = data_response_json["data"]["files"][0]["name"]
data_size = data_response_json["data"]["files"][0]["size"]
print(json.dumps(data_url, indent=0))
print(json.dumps(data_name, indent=0))
print(json.dumps(data_size, indent=0))

In [None]:
print(data_url)

In [None]:
path = '/vol_c/srer/' + data_name
print(path)

In [None]:
data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
data_response_json = data_response.json()
data_url = data_response_json["data"]["files"][0]["url"] 
print("Data URL: " + data_url)
data_name = data_response_json["data"]["files"][0]["name"]
data_size = data_response_json["data"]["files"][0]["size"]
path = '/vol_c/srer/' + data_name

In [None]:
extension_filename = data_url.split('/')[-1]
local_filename = extension_filename.split('?')[-2]
print(local_filename)

We are ready to download individual files using the GET command: 

In [None]:
print("Downloading file of size " + data_size + " to " + path)
response = requests.get(data_url, stream=True)  
handle = open(path, "wb")
start_time = time.time()
for chunk in response.iter_content(chunk_size=67108864):
    if chunk: # filter out to keep alive new chunks
        handle.write(chunk)
        print("Downloaded size: " + file_size(path))
    print("Expected file size: " + data_size + " bytes")
    print("Downloaded file size: " + file_size(path))
print("--- %s seconds ---" % (time.time() - start_time))

I've written a for loop to query the server for the JSON every time a new file is requested.

Initially,  I had a problem with the loop timing out from the initial requests.get  because

the NEON API creates a time stamp for when you hit the service, after about 5 minutes the files 

were breaking and only the headers were being downloaded.


In [None]:
for x in range(0, 443):
    data_response = requests.get(SERVER + 'data/' + PRODUCTCODE + '/' + SITECODE + '/' + '2017-08')
    data_response_json = data_response.json()
    data_url = data_response_json["data"]["files"][x]["url"] 
    print("Data URL: " + data_url)
    data_name = data_response_json["data"]["files"][x]["name"]
    data_size = data_response_json["data"]["files"][x]["size"]
    path = '/vol_c/srer/' + data_name
    print("Downloading file of size " + data_size + " to " + path)
    response = requests.get(data_url)  
    handle = open(path, "wb")
    for chunk in response.iter_content(chunk_size=67108864):
        if chunk: # filter out to keep alive new chunks
            handle.write(chunk)
    print(data_name + " downloaded!")
    print("Expected file size: " + data_size)
    print("Downloaded file size: " + file_size(path))


Once the data are done downloading to my VM, I'm ready  to move onto the next step, filtering and processing the data with PDAL