# Leveraging Tile Indexes for Programmatic Access to OpenTopographyâ€™s Lidar Datasets

# Table of Contents
- [Authors](#Authors)
- [Purpose](#Purpose)
- [Funding](#Funding)
- [Library Imports](#Library-Imports)
- [Define Area of Interest](#Define-Area-of-Interest)
- [OpenTopography Data Catalog API](#OpenTopography-Data-Catalog-API)
- [Tile Index Access](#Tile-Index-Access)
- [Download and Extract the Tile Index](#Download-and-Extract-the-Tile-Index)
- [Find Intersecting Lidar Tiles](#Find-Intersecting-Lidar-Tiles)
- [Download Lidar Files](#Download-Lidar-Files)
- [Streaming Access](#Streaming-Access)
- [Conclusions](#Conclusions)

# Authors
Author1 = {"name": "Matthew Beckley", "affiliation": "EarthScope Consortium", "email": "matt.beckley@earthscope.org", "orcid":""}

# Purpose
[OpenTopography](https://opentopography.org/) is designed to support a broad spectrum of applications, from scientific research and commercial projects to education and personal exploration, by providing a variety of flexible data access methods. While many users are familiar with our web-based map interface, this tutorial focuses on programmatic workflows preferred for scalable and automated research.

We will explore two distinct strategies for working with tiled point cloud datasets. First, we will demonstrate how to leverage an OpenTopography-provided tile index to programmatically identify and download only the specific files covering a defined area of interest. Second, we will introduce a more advanced, cloud-native workflow that uses PDAL to stream data directly from remote storage, allowing users to clip and process the data on-the-fly without requiring local downloads of the lidar tiles.

# Funding
OpenTopography is supported by the National Science Foundation under Award Numbers 2410799, 2410800 & 2410801

# Library Imports
First we'll import the necessary python libraries.  If you don't have these installed, it is recommended that you set up an isolated conda environment, and install these libraries via [conda](https://anaconda.org/anaconda/conda) or [miniconda](https://www.anaconda.com/docs/getting-started/miniconda/main) 

In [1]:
#python imports
import zipfile, os, json
import requests
import geopandas as gpd
from shapely.geometry import box

# Define Area of Interest
Next, enter the Area of Interest (AOI) for the region where you would like to download data.  For this notebook we will explore a section of New Zealand.

In [2]:
# Bounding Box for Area of Interest (AOI) over a section of New Zealand.
minlon, minlat = 175.48116715, -37.34580303
maxlon, maxlat = 175.48772079, -37.33819874

# OpenTopography Data Catalog API
Using the OpenTopography Data Catalog API, we'll search for metadata for point cloud datasets that intersect our AOI. Refer to the OpenTopography ["Find Data"](https://portal.opentopography.org/datasets) map for spatial coverage of point cloud datasets.  Note that tile index files are only created for OpenTopography-hosted point cloud data (red polygons on the "Find Data" map):

![OpenTopo Point cloud Data Coverage](./docs/FindDataMap.png)


In [3]:
# Find metadata for datasets that interesect a given AOI

# OpenTopography Data Catalog API endpoint
url = "https://portal.opentopography.org/API/otCatalog"

# Parameters for the query
params = {
    "productFormat": "PointCloud",
    "minx": minlon,
    "miny": minlat,
    "maxx": maxlon,
    "maxy": maxlat,
    "detail": "true",
    "outputFormat": "json",
    "include_federated": "false"
}

# Send the request
response = requests.get(url, params=params)

# Check the status
if response.status_code == 200:
    data = response.json()
    # Pretty-print the JSON
    print(json.dumps(data, indent=2))
else:
    print(f"Error: {response.status_code} - {response.text}")


{
  "Datasets": [
    {
      "Dataset": {
        "name": "Waikato, New Zealand 2024",
        "identifier": {
          "@type": "PropertyValue",
          "propertyID": "opentopoID",
          "value": "OTLAS.022025.2193.1"
        },
        "alternateName": "NZ24_Waikato",
        "url": "https://doi.org/10.5069/G91J980G",
        "fileFormat": "Point Cloud Data",
        "dateCreated": "2025-02-25",
        "description": "LiDAR was captured for Regional Software Holdings Limited (RSHL) by Landpro Ltd between 21 January 2024 and 17 May 2024. These datasets were generated by Landpro and their subcontractors. Data management and distribution is by Toit&#363; Te Whenua Land Information New Zealand.",
        "citation": "Ministry for the Environment, Regional Software Holdings Limited, Toit&#363; Te Whenua Land Information New Zealand (LINZ) (2025). Waikato, New Zealand 2024. Collected by Landpro, distributed by OpenTopography and LINZ. https://doi.org/10.5069/G91J980G",
        "ke

Next we'll extract the names of the datasets that intersect our AOI.

In [4]:
datasets = data['Datasets']
print("Number of Datasets = "+str(len(datasets)))

d_name  = []
d_sname = []
for d in datasets:
    #pdb.set_trace()
    d_name.append(d['Dataset']['name'])
    d_sname.append(d['Dataset']['alternateName'])

print("------------------------------------\n")
print("AOI contains the following datasets:\n" + "\n".join(str(d) for d in d_name))
print("\n")
print("AOI contains the following \"short\" or \"alternate\" names:\n" + "\n".join(str(d) for d in d_sname))


Number of Datasets = 3
------------------------------------

AOI contains the following datasets:
Waikato, New Zealand 2024
Huntly, Waikato, New Zealand 2015-2019
West Coast and Hauraki Plains, Waikato, New Zealand 2015


AOI contains the following "short" or "alternate" names:
NZ24_Waikato
NZ15_Huntly
Waikato_2015


# Tile Index Access
We are going to work on the dataset, "Huntly, Waikato, New Zealand 2015-2019"

The OpenTopography landing page for this dataset is located here: [Huntly, Waikato, New Zealand 2015-2019](https://doi.org/10.5069/G9FT8J6K)

For each lidar dataset, OpenTopography creates a tile index shapefile that contains the URLs to each tiled, laz file.  You can access these tile indexes in a variety of ways.  

1. Download the tile index file directly from a dataset's landing page:

![Example link to a dataset's tile index file](./docs/Huntly_OTTileURL.png)

2.  Download the file from the "Bulk Download" section.  We provide bulk access to each dataset hosted by OpenTopography.  Data in bulk access can be downloaded through the browser, AWS command line tools, or 3rd party apps.  The image below highlights how to access the bulk data section for the dataset, "Huntly, Waikato, New Zealand 2015-2019": 
![OpenTopography Bulk Access](./docs/Huntly_OTLandingPage.png)

3. Build the URL programmatically.  Since there is a consistent filename convention for all tile indexes, it is possible to simply build the URL based on a datasets "shortname" or "alternateName". For almost all datasets, the tile index URL is in the form:

*"https://opentopography.s3.sdsc.edu/pc-bulk/"+"/"+alternateName+"/"+alternateName+"_TileIndex.zip"*

(e.g. https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/NZ15_Huntly_TileIndex.zip)

Since this notebook focuses on programmatic access, we will use this third method to download the tile index file based on the dataset's "alternateName" that we extracted from the metadata.

# Download and Extract the Tile Index

This code snippet will use the "alternateName" variable that we obtained from the metadata, and build the URL to download the tile index file.  This file is a zipped shapefile, so the code will then unzip the file to your local disk.

In [5]:
#extract only the info for "Huntly, Waikato, New Zealand 2015-2019" from the metadata:
for d in datasets:
    if d['Dataset']['alternateName'] == "NZ15_Huntly":
        Huntly = d['Dataset']
        break

#extract the "alternateName" for this dataset:
alternateName = Huntly['alternateName']

#build the URL to the tile index:
tile_url = "https://opentopography.s3.sdsc.edu/pc-bulk/"+"/"+alternateName+"/"+alternateName+"_TileIndex.zip"
print("Tile Index URL is: \n"+tile_url)

#download the tile index to your computer
zipName = alternateName+"_TileIndex.zip"
response = requests.get(tile_url, stream=True)
with open(zipName, "wb") as f:
    f.write(response.content)

#next we need to unzip the file to get the shapefile

#get the basename of the zipped TileIndex and create a directory with that name
extract_dir, ext = os.path.splitext(zipName)

# Make sure output directory exists
os.makedirs(extract_dir, exist_ok=True)

# Extract all zip contents
with zipfile.ZipFile(zipName, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print(f"Extracted {zipName} to {extract_dir}/")


Tile Index URL is: 
https://opentopography.s3.sdsc.edu/pc-bulk//NZ15_Huntly/NZ15_Huntly_TileIndex.zip
Extracted NZ15_Huntly_TileIndex.zip to NZ15_Huntly_TileIndex/


# Find Intersecting Lidar Tiles
Now that we have downloaded the tile index shapefile programmatically and unzipped it to local disk, we are going to use its attribute table to find lidar tiles that intersect our AOI.  The attribute table of each tile index file contains a row for each lidar tile with the columns: 
1. Filename
2. MinX
3. MinY
4. MaxX
5. MaxY
6. URL

![Example tile index attribute table](./docs/Huntly_TileIndex_ATTTable.png)

Ultimately, we are going to grab the URL for every intersecting tile to our area of interest.


In [6]:
# Load the TileIndex shapefile downloaded from OpenTopography using geopandas
shapefile      = extract_dir+".shp"
shapefile_path = os.path.join(os.getcwd(),extract_dir,shapefile)
gdf            = gpd.read_file(shapefile_path)

# Convert Coordinate Reference System to geographic coordinates (EPSG:4326)
if gdf.crs != "EPSG:4326":
    gdf = gdf.to_crs("EPSG:4326")

# Define your Area of Interest (AOI) bounding box
aoi = box(minlon, minlat, maxlon, maxlat)

# Convert AOI to GeoDataFrame
aoi_gdf = gpd.GeoDataFrame([{"geometry": aoi}], crs="EPSG:4326")

# Spatial filter: only geometries that intersect the AOI
intersecting = gdf[gdf.intersects(aoi)]

# Extract the "URL" column
urls = intersecting["URL"].dropna().tolist()

# Output the result
print("LAZ Tile URLs within bounding box:")
for url in urls:
    print(url)


LAZ Tile URLs within bounding box:
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1333.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1334.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1335.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1433.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1434.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1435.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1533.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1534.laz
https://opentopography.s3.sdsc.edu/pc-bulk/NZ15_Huntly/CL2_BC34_2015_1000_1535.laz


# Download Lidar Files
Now that we have a list of URLs for lidar files that intersect our area of interest, we can simply download those intersecting tiles to a local directory:

In [7]:
# Download directory
download_dir = "downloads"
os.makedirs(download_dir, exist_ok=True)

for url in urls:
    filename = os.path.basename(url)  # Extract filename from URL
    output_path = os.path.join(download_dir, filename)

    try:
        print(f"Downloading {filename}...")
        response = requests.get(url, stream=True)
        response.raise_for_status()  # Raise error if download failed

        with open(output_path, "wb") as f:
            #download in chunks to be memory efficient.
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"Saved to {output_path}")

    except Exception as e:
        print(f"Failed to download {url}: {e}")


Downloading CL2_BC34_2015_1000_1333.laz...
Saved to downloads/CL2_BC34_2015_1000_1333.laz
Downloading CL2_BC34_2015_1000_1334.laz...
Saved to downloads/CL2_BC34_2015_1000_1334.laz
Downloading CL2_BC34_2015_1000_1335.laz...
Saved to downloads/CL2_BC34_2015_1000_1335.laz
Downloading CL2_BC34_2015_1000_1433.laz...
Saved to downloads/CL2_BC34_2015_1000_1433.laz
Downloading CL2_BC34_2015_1000_1434.laz...
Saved to downloads/CL2_BC34_2015_1000_1434.laz
Downloading CL2_BC34_2015_1000_1435.laz...
Saved to downloads/CL2_BC34_2015_1000_1435.laz
Downloading CL2_BC34_2015_1000_1533.laz...
Saved to downloads/CL2_BC34_2015_1000_1533.laz
Downloading CL2_BC34_2015_1000_1534.laz...
Saved to downloads/CL2_BC34_2015_1000_1534.laz
Downloading CL2_BC34_2015_1000_1535.laz...
Saved to downloads/CL2_BC34_2015_1000_1535.laz


Here is a map showing the tile index bounds (blue) with our AOI (red), and the downloaded point cloud tiles plotted as function of ground (brown points) vs unclassified (grey points) classification values:

![Tile index bounds with AOI](./docs/PCTiles_wAOI.png)


# Streaming Access
Downloading multiple, large LAZ files can be slow and consume a lot of disk space, especially when you only need data from a small area. A more efficient approach is to extract only the points you need before downloading.

We will now use the [Point Data Abstraction Library (PDAL)](https://pdal.io/en/stable/index.html) to do exactly that. PDAL is a powerful open-source tool for processing point cloud data. While it can be run from the command line, we'll use its Python extension in this notebook.

Our plan is to feed the list of LAZ file URLs (which we found using the tile index) directly into a PDAL pipeline. For each URL, PDAL will:

1. Stream the data from the remote server.

2. Extract only the points that fall within our Area of Interest.

3. Merge the results from all files into a single output.

The final, combined output will be saved as a [Cloud Optimized Point Cloud (COPC)](https://copc.io/) file. COPC is a modern, cloud-friendly version of the standard LAZ format that is compatible with most current GIS and lidar software. You can learn more about this format at the official [COPC website](https://copc.io/).

(NOTE: If you need to install the PDAL Python extension, please visit the [PDAL website](https://pdal.io/en/stable/python.html) for instructions.)

In [8]:
import pdal

def crop_remote_laz(urls, bounds, output_filename="cropped_merged.copc.laz"):
    """
    Read and crop remote LAZ files to specified bounds, then write to COPC file.
    
    Args:
        urls: List of LAZ file URLs
        bounds: Tuple of (minlon, maxlon, minlat, maxlat)
        output_filename: Name for the output COPC file
    
    Returns:
        Combined point data or None if error
    """
    minlon, maxlon, minlat, maxlat = bounds
    
    # Build a single pipeline that processes all files
    pipeline_stages = []
    
    # Add reader and crop for each file
    for url in urls:
        pipeline_stages.extend([
            {
                "type": "readers.las",
                "filename": url
            },
            {
                "type": "filters.crop",
                "bounds": {
                    "minx": minlon,
                    "miny": minlat, 
                    "maxx": maxlon,
                    "maxy": maxlat
                },
                "a_srs": "EPSG:4326"
            }
        ])
    
    # Add merge filter to combine all cropped data
    pipeline_stages.append({
        "type": "filters.merge"
    })
    
    # Add COPC writer
    pipeline_stages.append({
        "type": "writers.copc",
        "filename": output_filename
    })
    
    pipeline_json = {"pipeline": pipeline_stages}
    
    try:
        pipeline = pdal.Pipeline(json.dumps(pipeline_json))
        count = pipeline.execute()
        
        # Get the data for return
        data = pipeline.arrays[0] if pipeline.arrays else None
        
        if data is not None and len(data) > 0:
            print(f"Retrieved {len(data)} points from {len(urls)} files")
            print(f"Successfully wrote merged data to {output_filename}")
            return data
        else:
            print("No data retrieved from any files")
            return None
            
    except Exception as e:
        print(f"Pipeline failed: {e}")
        return None
    
# Put the bounds into a tuple
bounds = (minlon, maxlon, minlat, maxlat)

#feed in the urls to the laz files and the bounds of our AOI
data = crop_remote_laz(urls, bounds)

if data is not None:
    print(f"Got {len(data)} points")

Retrieved 3480839 points from 9 files
Successfully wrote merged data to cropped_merged.copc.laz
Got 3480839 points


---
Here is a plot of the tile index bounds with the merged point cloud file colored by elevation.  In this example, we did not have to download all of the intersecting laz files first.  The code downloaded only the intersecting point cloud data, and merged it all into a single lidar point cloud for output.  


![Tile index bounds with merged Point Cloud](./docs/PCTiles_wMerged.jpeg)

---

# Create a Digital Terrain Model (DTM)
As a final step lets use PDAL to grid our merged LAZ file and create a Digital Terrain Model (DTM).

In [9]:
def create_dtm_from_copc(input_filename="cropped_merged.copc.laz", 
                         output_filename="Merged_dtm.tif",
                         resolution=1.0):
    """
    Create a DTM (Digital Terrain Model) from ground-classified points in a COPC LAZ file.
    
    Args:
        input_filename: Path to input COPC LAZ file
        output_filename: Path for output Cloud Optimized GeoTIFF
        resolution: Resolution of output raster in meters (default 1.0)
    
    Returns:
        True if successful, False otherwise
    """
    
    pipeline_json = {
        "pipeline": [
            # Input file
            input_filename,
            
            # Remove statistical outliers
            {
                "type": "filters.outlier",
                "method": "statistical",
                "multiplier": 3,
                "mean_k": 8
            },
            
            # Keep only ground points (Classification = 2)
            {
                "type": "filters.range",
                "limits": "Classification[2:2]"
            },
            
            # Write to output to GeoTIFF
            {
                "type": "writers.gdal",
                "filename": output_filename,
                "gdaldriver": "GTIFF",
                "resolution": resolution,
                "gdalopts": "TILED=YES,COMPRESS=DEFLATE",
                "output_type": "min"
            }
        ]
    }
    
    try:
        print(f"\nCreating DTM from {input_filename}...")
        print(f"  Resolution: {resolution}m")
        print(f"  Output: {output_filename}")
        
        pipeline = pdal.Pipeline(json.dumps(pipeline_json))
        count = pipeline.execute()
        
        # Check if file was created
        if os.path.exists(output_filename):
            file_size = os.path.getsize(output_filename)
            print(f"  DTM created successfully!")
            print(f"  File size: {file_size:,} bytes ({file_size/(1024*1024):.2f} MB)")
            print(f"  Ground points processed: {count:,}")
            return True
        else:
            print("  DTM file was not created")
            return False
            
    except Exception as e:
        print(f"  DTM creation failed: {e}")
        return False


#--- Inputs ---
copc_filename = "cropped_merged.copc.laz"
dtm_filename = "Merged_dtm.tif"
resolution = 1.0
#--------------

print("\nStep 2: Creating DTM from ground points...")
success = create_dtm_from_copc(copc_filename, dtm_filename, resolution)

if success:
    print("\n" + "="*60)
    print("PIPELINE COMPLETE!")
    print("="*60)
    print(f"  COPC file: {copc_filename}")
    print(f"  DTM file: {dtm_filename}")
else:
    print("PIPELINE FAILED!")



Step 2: Creating DTM from ground points...

Creating DTM from cropped_merged.copc.laz...
  Resolution: 1.0m
  Output: Merged_dtm.tif
  DTM created successfully!
  File size: 742,436 bytes (0.71 MB)
  Ground points processed: 1,502,879

PIPELINE COMPLETE!
  COPC file: cropped_merged.copc.laz
  DTM file: Merged_dtm.tif


---
Here is the final gridded DTM from our merged COPC file.  Note that because we only used points classified as "ground" there are some gaps in our DTM where the points were classified as "unclassified", and therefore not including in the gridding process.

![Gridded DTM file](./docs/MergedLAZ2DTM.png)

# Conclusions
This notebook demonstrated two key methods for programmatic data access. We first used a tile index to selectively download specific LAZ files for an area of interest. We then showed a more advanced workflow, using PDAL to stream and clip data directly from the cloud, which avoids the need for large local downloads.

Mastering these techniques unlocks the ability to build powerful, automated, and scalable workflows. However, they are just one piece of the data access puzzle. OpenTopography remains committed to serving all users, from those who prefer our intuitive map interface for exploration to those who require programmatic access for complex research. 