---
syncID: 
title: "Introduction to NEON Discrete Lidar Point Clodus in Python"
description: "Programmatically download lidar data and metadata and explore discrete lidar point clouds in Python"
dateCreated: 2022-09-24
authors: Bridget Hass
contributors: 
estimatedTime: 30 minutes
packagesLibraries: requests, json, gdal, geopandas, laspy, lasrs
topics:
languagesTool: python
dataProduct: DP3.10003.001, 
code1: https://raw.githubusercontent.com/NEONScience/NEON-Data-Skills/main/tutorials/Python/Lidar/intro-lidar/intro_point_clouds_py/intro_discrete_point_clouds.py
tutorialSeries: 
urlTitle: neon-discrete-point-clouds
---

<div id="ds-objectives" markdown="1">

### Objectives
After completing this tutorial, you will be able to:

* Use Python functions to programmatically download NEON AOP data from the API
* Download and plot shapefiles and kmls included in lidar metadata to visualize coverage for a given year
* Explore the NEON discrete lidar point cloud contents in Python

### Requirements

To follow along with this code, you will need to install Python. We recommend starting in Jupyter Notebooks so you can run each cell "chunk" individually. You can install both Python and Jupyter Notebooks by downloading <a href="https://www.anaconda.com/products/distribution" target="_blank"> Anaconda</a>.

### Additional Resources

If you are interested in learning more about the NEON API, or want a deeper die in how this works with the Python `requests` package, please refer to the tutorial and webpages linked below.
 * <a href="https://www.neonscience.org/resources/learning-hub/tutorials/neon-api-01-introduction-requests" target="_blank"> Introduction to NEON API in Python  </a>
 * <a href="https://data.neonscience.org/data-api/" target="_blank"> NEON Data API </a>

For a handy resource on Jupyter Notebook tips, tricks and shortcuts, check out the DataQuest blog linked below.
 * <a href="https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-sh" target="_blank"> 28 Jupyter Notebook Tips, Tricks, and Shortcuts  </a>
 

### Install Python Packages

* **requests**
* **json** 
* **gdal**
* **fiona**
* **geopandas**
* **laspy**
* **lazrs**

#### Installation Tips: 
Most of these packages can be installed using `pip install`, eg. to install `gdal`, in the command line, run:

```python
pip install gdal
```

or within Jupyter notebooks you can also install packages but have to include an ! before the statement to run a shell command (as you would from a command prompt):

```python
!pip install gdal
```

However for many of the geospatial packages (eg. gdal, fiona, geopandas), there may be errors installing on your version of python if you don't find the correct wheel file. You can find the package wheel file specific to your version of python and your computer. A comprehensive archive of these geospatial (and other) wheel files can be found here:

https://www.lfd.uci.edu/~gohlke/pythonlibs/

For example, to install `gdal` on a windows 64 machine, using Python 3.9, download the file GDAL-3.4.3-cp39-cp39-win_amd64.whl, found here (you can also find this by navigating through the link above):

https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal

```python
pip install C:\Users\username\Downloads\GDAL-3.4.3-cp39-cp39-win_amd64.whl
```

Similarly, to install `fiona`, download the wheel file Fiona-1.8.21-cp39-cp39-win_amd64.whl

```python
pip install C:\Users\username\Downloads\Fiona-1.8.21-cp39-cp39-win_amd64.whl
```

</div>

#Package Installation Requirements (for Python 3.9.12 on Windows 10)
download package wheel files from https://www.lfd.uci.edu/~gohlke/pythonlibs/
- https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal
- https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona
- !pip install C:\Users\bhass\Downloads\GDAL-3.4.3-cp39-cp39-win_amd64.whl
- !pip install C:\Users\bhass\Downloads\Fiona-1.8.21-cp39-cp39-win_amd64.whl
- !pip install geopandas
- !pip install laspy

also need to install las backend to unzip laz files
https://laspy.readthedocs.io/en/latest/installation.html
- !pip install lazrs

Optional dependencies / features
laspy does not support LAZ (.laz) file by itself but can use one of several optional dependencies to support compressed LAZ files.

The 2 supported options are:
1. lazrs [lazrs PyPi] - lazrs is a Rust port of the laszip compression and decompression. Its main advantage is that it is able to compress/decompress using multiple threads which can greatly speed up things. However it does not supports points with waveforms.
2.  laszip-python (bindings to laszip) - laszip is the official and original LAZ implementation by Martin Isenburg. The advantage of the laszip backend is that its the official implementation, it supports points with waveform but does not offer multi-threaded compression/decompression.

When encountering LAZ data, laspy will try to use one of the backend in the order described above. (Example: if lazrs is not installed or if it fails during, the process, laspy will try laszip)

Both the laszip bindings and lazrs are available on pip.

- !pip install open3d #for 3D visualization (optional)

Starting with download, following similar workflow to:
https://github.com/NEONScience/NEON-Data-Skills/blob/main/tutorials-in-development/CyverseNEON/aop_data_download/Download_NEON_AOP_Data_Python_API.ipynb

Borrowing heavily from these tutorials, using NEON data.
- https://towardsdatascience.com/guide-to-real-time-visualisation-of-massive-3d-point-clouds-in-python-ea6f00241ee0
- https://medium.com/spatial-data-science/an-easy-way-to-work-and-visualize-lidar-data-in-python-eed0e028996c
- https://geemap.org/notebooks/101_lidar/

First, import the required packages. 

Reminder: If you haven't installed these (see more detailed installation instructions above), you can install them in the notebook as shown below, substituting the path to the wheel file if the install doesn't work on it's own. Run the installation for each package separately (eg. each in it's own code cell) so you can make sure each package installs successfully.

```python
!pip install requests
!pip install json
!pip install gdal
!pip install fiona
!pip install geopandas
!pip install laspy
!pip install lazrs
```

Once all packages are successfully installed, import them as follows. Note that the `requests` and `json` packages will be imported when we import the a separate module, so you don't need to import those separately.

In [1]:
#import required packages
import os
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import laspy

Now we'll pull in all the functions in the module **neon_aop_download_functions.py**, linked at the top of this tutorial. 

First make sure this script is saved in your working directory, which we'll check below, otherwise you will need to provide the relative path to this script. 

In [2]:
# check that script is saved in same folder:
os.listdir()

['.ipynb_checkpoints',
 'intro_discrete_point_clouds.ipynb',
 'neon_aop_download_functions.py']

We can see that the download_functions script is there, so to import the contents, use the command:

In [4]:
from neon_aop_download_functions import *

Alternatively, if you'd like to see the contents of that file, you can use the "magic" command `%load` as follows:

```python
%load neon_aop_download_functions.py
```

If you go this route, you will need to run the cell twice for the functions to be read into the ntoebook. The first run will load the functions and the second will run the cell. This option of loading in the functions may be useful if you wish to modify the functions in the notebook cell for your specific workflow.

In [None]:
# %load neon_download_functions.py
"""
Created on Mon Apr  8 08:18:00 2019
@author: bhass
"""

import requests, urllib, os

def list_available_urls(product,site):
    """
    list_available urls lists the api url for a given product and site
    --------
     Inputs:
         product: the data product code (eg. 'DP3.30015.001' - CHM)
         site: the 4-digit NEON site code (eg. 'SRER', 'JORN')
    --------
    Usage:
    --------
    jorn_chm_urls = list_available_urls('DP3.30015.001','JORN')
    """
    r = requests.get("http://data.neonscience.org/api/v0/products/" + product)
    for i in range(len(r.json()['data']['siteCodes'])):
        if site in r.json()['data']['siteCodes'][i]['siteCode']:
            data_urls=r.json()['data']['siteCodes'][i]['availableDataUrls']
    if len(data_urls)==0:
        print('WARNING: no urls found for product ' + product + ' at site ' + site)
    else:
        return data_urls

def list_available_urls_by_year(product,site,year):
    """
    list_available urls_by_year lists the api url for a given product, site, and year
    --------
     Inputs:
         product: the data product code (eg. 'DP3.30015.001' - CHM)
         site: the 4-digit NEON site code (eg. 'SRER', 'JORN')
         year: the year data was collected (eg. '2017','2018','2019')
    --------
    Usage:
    --------
    jorn_chm_2018_url = list_available_urls_by_year('DP3.30015.001','JORN','2018')
    """
    r = requests.get("http://data.neonscience.org/api/v0/products/" + product)
    for i in range(len(r.json()['data']['siteCodes'])):
        if site in r.json()['data']['siteCodes'][i]['siteCode']:
            all_data_urls=r.json()['data']['siteCodes'][i]['availableDataUrls']
    data_urls = [url for url in all_data_urls if year in url]
    if len(data_urls)==0:
        print('WARNING: no urls found for product ' + product + ' at site ' + site + ' in year ' + year)
    else:
        return data_urls
    
def download_urls(url_list,download_folder_root,zip=False):
    # downloads data from urls to folder, maintaining month-year folder structure
    for url in url_list:
        month = url.split('/')[-1]
        download_folder = download_folder_root + month + '/'
        if not os.path.exists(download_folder):
            os.makedirs(download_folder)
        r=requests.get(url)
        files=r.json()['data']['files']
        for i in range(len(files)):
            if zip==False:
                if '.zip' not in files[i]['name']:
                    print('downloading ' + files[i]['name'] + ' to ' + download_folder)
                    urllib.request.urlretrieve(files[i]['url'],download_folder + files[i]['name'])
            elif zip==True:
                if '.zip' in files[i]['name']:
                    print('downloading ' + files[i]['name'] + ' to ' + download_folder)
                    urllib.request.urlretrieve(files[i]['url'],download_folder + files[i]['name'])

def download_file(url,filename):
    r = requests.get(url)
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return

def get_file_size(urls,match_string):
    size=0
    for url in urls:
        r = requests.get(url)
        files = r.json()['data']['files']
        for i in range(len(files)):
            if match_string is not None:
                if match_string in files[i]['name']:
    #             print('downloading ' + files[i]['name'] + ' to ' + download_folder)
                    size += int(files[i]['size'])
            else:
                size += int(files[i]['size'])
    if size < 10**6:
        print('Download size:',size,'bytes')
    elif size > 10**6 and size < 10**9:
        print('Download size:',round(size/(10**6),2),'MB')
    elif size > 10**9 and size < 10**12:
        print('Download size:',round(size/(10**9),2),'GB')
    else:
        print('Download size:',round(size/(10**12),2),'TB')
    return size

def download_aop_files(data_product_id,site,year=None,download_folder='./data',match_string=None,check_size=True):
    """
    download_aop_files downloads NEON AOP files from the AOP for a given data product, site, and 
    optional year, download folder, and 
    --------
     Inputs:
         required:
             data_product_id: the data product code (eg. 'DP3.30015.001' - CHM)
             site: the 4-digit NEON site code (eg. 'SRER', 'JORN')
         
         optional:
             year: year (eg. '2020'); default (None) is all years
             download_folder: folder to store downloaded files; default (./data) in current directory
             match_string: subset of data to match, need to use exact pattern for file name
             check_size: prompt to continue download (y/n) after displaying size; default = True
    --------
    Usage:
    --------
    download_aop_files('DP3.30015.001','JORN','2019','./data/JORN_2019/CHM','314000_3610000_CHM.tif')
    """
    
    #get a list of the urls for a given data product, site, and year (if included)
    if year is not None:
        urls = list_available_urls_by_year(data_product_id,site,year)
    else:
        urls = list_available_urls(data_product_id,site)
    
    #make the download folder if it doesn't already exist
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)
    
    #get the size of all the files you are planning to download
    size = get_file_size(urls,match_string)
    
    #prompt to continue with download after displaying the file size
    if check_size:
        if input("Do you want to continue with the download? (y/n) ") != "y":
            print('Exiting download_aop_files')
            return
    
    #download files in the urls
    for url in urls:
        r = requests.get(url)
        files = r.json()['data']['files']
        for i in range(len(files)):
            if match_string is not None:
                if match_string in files[i]['name']:
                    print('downloading ' + files[i]['name'] + ' to ' + download_folder)
                    try:
                        download_file(files[i]['url'],os.path.join(download_folder,files[i]['name']))
                    except requests.exceptions.RequestException as e:
                        print(e)
            else:
                try:
                    download_file(files[i]['url'],os.path.join(download_folder,files[i]['name']))
                except requests.exceptions.RequestException as e:
                    print(e)

Now that we've imported in all the required packages and functions, we can get started! First let's take a look at what exactly we've imported by using the magic command `%whos`. Since there's no variable explorer in Jupyter Notebooks, this is a quick way to see what all we have in our working environment.

In [8]:
%whos

Variable                      Type        Data/Info
---------------------------------------------------
download_aop_files            function    <function download_aop_fi<...>es at 0x0000016A68B848B0>
download_file                 function    <function download_file at 0x0000016A68B84B80>
download_urls                 function    <function download_urls at 0x0000016A68B84CA0>
dpID                          str         DP1.30003.001
get_file_size                 function    <function get_file_size at 0x0000016A68B84C10>
gpd                           module      <module 'geopandas' from <...>\geopandas\\__init__.py'>
laspy                         module      <module 'laspy' from 'C:\<...>ges\\laspy\\__init__.py'>
list_available_urls           function    <function list_available_<...>ls at 0x0000016A637D4F70>
list_available_urls_by_year   function    <function list_available_<...>ar at 0x0000016A68B84040>
np                            module      <module 'numpy' from 'C:\<...>ges\\numpy\

### Data Tip
If you are unsure what a function or module does, you can find more information about the function in two ways: 
1. type `help(function_name)`, which will print out documentation in the cell below, or
2. type the function name followed by `?`, which will pop up the documentation at the bottom of your Jupyter Notebook window, and you can then exit out of it at your convenience

```python
help(requests)
requests?
```

The functions loaded from the `neon_aop_download_functions.py` file also include similar documentation, so you can use this trick for these user-defined functions.

```python
help(list_available_urls)
list_available_urls?
```

This tutorial will be working with the Discrete Return LiDAR Point Cloud https://data.neonscience.org/data-products/DP1.30003.001)

First we'll start by defining variables that specify the NEON data product ID, site, and year. You can change these to look at a different site of your interest, but this tutorial will explore the NEON site `GUAN` in Domain 04, Puerto Rico.

In [5]:
dpID='DP1.30003.001' 
site = 'GUAN'

We can use the function `list_available_urls` to see what data is available for this data product and site. This function requires two inputs: the data product ID `dpID` and the site ID, `site`.

In [12]:
help(list_available_urls)

Help on function list_available_urls in module neon_aop_download_functions:

list_available_urls(product, site)
    list_available urls lists the api url for a given product and site
    --------
     Inputs:
         product: the data product code (eg. 'DP3.30015.001' - CHM)
         site: the 4-digit NEON site code (eg. 'SRER', 'JORN')
    --------
    Usage:
    --------
    jorn_chm_urls = list_available_urls('DP3.30015.001','JORN')



In [6]:
list_available_urls(dpID,site)

['https://data.neonscience.org/api/v0/data/DP1.30003.001/GUAN/2018-05']

The AOP has only flown Puerto Rico (D04) once so far. D04 is only scheduled every 4-5 years, and the next campaign is scheduled for the fall of 2022, so new data is expected relatively soon!

Next let's set up some paths to download data. We'll start with downloading geospatial metadata in order to get a sense of the coverage and geographic extents of the flight area. We'll make a folder for the shapefiles (shp), kml files (which may be useful for interactively looking at the site boundaries in Google Earth, and a folder for the actual lidar (laz) data.

In [9]:
year='2018'
data_root_path = './data/'+site+'/'+year+'/'
shp_path = data_root_path+'shp/'
kml_path = data_root_path+'kml/'
laz_path = data_root_path+'laz/'
print(data_root_path)

./data/GUAN/2018/


Next, let's take a look at the `download_aop_files` function, which we'll use to download the metadata and data that we want to explore.

In [11]:
help(download_aop_files)

Help on function download_aop_files in module neon_aop_download_functions:

download_aop_files(data_product_id, site, year=None, download_folder='./data', match_string=None, check_size=True)
    download_aop_files downloads NEON AOP files from the AOP for a given data product, site, and 
    optional year, download folder, and 
    --------
     Inputs:
         required:
             data_product_id: the data product code (eg. 'DP3.30015.001' - CHM)
             site: the 4-digit NEON site code (eg. 'SRER', 'JORN')
         
         optional:
             year: year (eg. '2020'); default (None) is all years
             download_folder: folder to store downloaded files; default (./data) in current directory
             match_string: subset of data to match, need to use exact pattern for file name
             check_size: prompt to continue download (y/n) after displaying size; default = True
    --------
    Usage:
    --------
    download_aop_files('DP3.30015.001','JORN','2019','.

We'll start by downloading the shape files, which are included as part of the metadata with the lidar data products. There are summary shape files for each site which end in `merged_tiles`, so we can key off that string to download only the full boundary. You could also download all `.shp` files, which would include shape files for each tile. For this we can 

```python
download_aop_files(dpID,site,year,kml_path,'.kml')
```

In [13]:
download_aop_files(dpID,site,year,shp_path,'merged_tiles.shp',check_size=False)
download_aop_files(dpID,site,year,shp_path,'merged_tiles.shx',check_size=False)

Download size: 0.78 kB
downloading 2018_GUAN_1_merged_tiles.shp to ./data/GUAN/2018/shp/
Download size: 0.0 kB
downloading 2018_GUAN_1_merged_tiles.shx to ./data/GUAN/2018/shp/


In [None]:
os.listdir(shp_path)

In [None]:
download_aop_files(dpID,site,year,laz_path)

download_aop_files(dpID,site,year,kml_path,'.kml')

In [None]:
gdf = gpd.read_file(os.path.join(shp_path,'2019_SERC_4_merged_tiles.shp'))
gdf.plot();
ax = plt.gca(); ax.ticklabel_format(style='plain') 
ax.set_title('AOP Coverage of ' + site + ' in ' + year);

Now that we can see the extent of the tiles, we'll pick a single tile in this area to download. For this example, I'll choose the tile "544000_4976000" towards the Eastern part of the site.

In [None]:
download_aop_files('DP1.30003.001',site,year,laz_path,match_string='367000_4306000_classified_point_cloud_colorized.laz')

In [None]:
os.listdir(laz_path)

In [None]:
laz_data_file=os.path.join(laz_path,'NEON_D02_SERC_DP1_367000_4306000_classified_point_cloud_colorized.laz')
point_cloud=laspy.read(laz_data_file)

Reading in the file with with laspy.read() reads in both the metadata and the raw point cloud data. We can print out the `point_cloud` variable to show some basic information about what we've read in:

In [None]:
point_cloud

`point_format.dimension_names` show us the available information stored in this LasData object format:

In [None]:
list(point_cloud.point_format.dimension_names)

In the next few cells, we can explore some of these variables:

In [None]:
point_cloud.classification

set(list(point_cloud.classification))

https://desktop.arcgis.com/en/arcmap/10.3/manage-data/las-dataset/lidar-point-classification.htm
If you are working with LAS 1.1 - 1.4 specification, refer to the predefined classification schemes defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) for the desired data category. The following table lists the LAS classification codes defined by ASPRS for these LAS versions:

| Classification value | Meaning           |
|---------------------|-------------------|
| 0                   | Never classified  |
| 1                   | Unassigned        |
| 2                   | Ground            |
| 3                   | Low Vegetation    |
| 4                   | Medium Vegetation |
| 5                   | High Vegetation   |
| 6                   | Building          |
| 7                   | Low Point         |

In [None]:
point_cloud.xyz

In [None]:
xyz = point_cloud.xyz

In [None]:
len(xyz)

There are 6.4 million lidar points in this single 1km x 1km tile! For the rest of this exercise, we'll just look at a random subset of these points, taking every 200th point (you can change this subset factor, but when we visualize the data in a few steps, subsetting by a larger factor will speed up the time it takes to make the plot.

In [None]:
factor=100
points_dec = xyz[::factor]

In [None]:
len(points_dec)

In [None]:
# points = np.vstack((point_cloud.x, point_cloud.y, point_cloud.z)).transpose()
colors = np.vstack((point_cloud.red, point_cloud.green, point_cloud.blue)).transpose()

In [None]:
colors_norm = (colors - np.min(colors))/np.ptp(colors)
colors_dec = colors_norm[::factor]

### 3D Point Cloud Visualization 
While Python arguably isn't the best way to visualize this 3D data, we can use matplotlib to see what this looks like. Other open-source tools such as https://plas.io/ are more interactive!

https://www.neonscience.org/resources/learning-hub/tutorials/introduction-light-detection-and-ranging-lidar-explore-point#toggle-13

In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
# from mpl_toolkits.mplot3d.axes3d import Axes3D

In [None]:
# Plot X,Y,Z
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(points_dec[:,0],points_dec[:,1],points_dec[:,2],color=colors_dec,s=4)
ax.set_zlim3d(-10,90)
# ax.set_zlim3d(1400,2200)
plt.show()

In [None]:
print(point_cloud.header)
print(point_cloud.header.point_format)
print(point_cloud.header.point_count)
print(point_cloud.vlrs)

## What's next?

The laspy website has some nice examples you can follow (using NEON data) on the laspy website:
https://laspy.readthedocs.io/en/latest/complete_tutorial.html

### Other open-source tools for working with las (point cloud) data:
LAStools - there is a free version
PDAL (similar to GDAL but for Point clouds)
lidR (R)
GitHub examples!
https://github.com/topics/point-cloud
also lidar, lidar-point-cloud, 