# Working with LiDAR Cloud Optimized Point Cloud (COPC) in MAAP

**Authors**: Harshini Girish (UAH), Rajat Shinde (UAH), Alex Mandel (Development Seed), Jamison French (Development Seed), Brian Freitag (NASA MSFC), Sheyenne Kirkland (UAH), Henry Rodman (Development Seed), Zac Deziel (Development Seed), Chuck Daniels (Development Seed)

**Date**: March 25, 2024

**Description**: The LASER (LAS) file format is designed to store 3-dimensional (x,y,z) point cloud data typically collected from LiDAR. An LAZ file is a compressed LAS file, and a Cloud-Optimized Point Cloud (COPC) file is a valid LAZ file. COPC files are similar to COGs for GeoTIFFs: Both are valid versions of the original file format but with additional requirements to support cloud-optimized data access. In the case of COGs, there are additional requirements for tiling and overviews.

**Setup**
This tutorial will explore how to:

1. Read a LiDAR LAS file using PDAL in Python
2. Convert the LiDAR LAS file to Cloud-Optimized Point Cloud (COPC) format
3. Validate the generated COPC file

## Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the [Getting started with the MAAP](#) section of our documentation.

**Disclaimer**: It is highly recommended to run this tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.


## About the Dataset

This Python script is designed to programmatically download a specific LiDAR .las file from the MAAP STAC (SpatioTemporal Asset Catalog) API. It begins by establishing a connection to the MAAP STAC catalog using the `pystac_client library`, which provides access to a curated collection of geospatial datasets. From this catalog, the script selects the `GEDI_CalVal_Lidar_Data` collection, which contains LiDAR data acquired for calibration and validation of the GEDI mission. 

In [None]:
from pystac_client import Client
import requests
cat = Client.open("https://stac.maap-project.org/")
collection = cat.get_collection("GEDI_CalVal_Lidar_Data")
item = collection.get_item("usa_neonsrer_2019_NEON_D14_SRER_DP1_L088-1_2019091314_unclassified_point_cloud_0000001")
asset = item.assets["data"]
url = asset.href
filename = url.split("/")[-1]
save_path = f"/projects/{filename}"
response = requests.get(url)
with open(save_path, "wb") as f:
    f.write(response.content)

## Introduction to Pdal

PDAL (Point Data Abstraction Library) is a C/C++ based open-source library for processing point cloud data. Additionally, it also has a PDAL-Python wrapper to work in a Pythonic environment.
Installing from conda-forge ensures compatibility across platforms and avoids version conflicts common with pip.

`!conda install -c conda-forge pdal -y`

## Accessing and Getting Metadata Information

PDAL CLI provides multiple applications for processing point clouds. Also, it allows chaining of these applications for processing point clouds. Similar to gdal info for TIFFs, we can run pdal info `<filename> `on the command line for getting metadata from a point cloud file without reading it in memory.

In [5]:
!pdal info /projects/GLLIDARPC_FL_20200311_FIA8_l0s12.las

{
  "file_size": 25708263,
  "filename": "/projects/GLLIDARPC_FL_20200311_FIA8_l0s12.las",
  "now": "2025-03-25T09:08:47-0700",
  "pdal_version": "2.5.5 (git-version: 9d28a2)",
  "reader": "readers.las",
  "stats":
  {
    "bbox":
    {
      "EPSG:4326":
      {
        "bbox":
        {
          "maxx": -80.9173406,
          "maxy": 25.19483337,
          "maxz": 20.43,
          "minx": -80.92108766,
          "miny": 25.18579162,
          "minz": -0.02
        },
        "boundary": { "type": "Polygon", "coordinates": [ [ [ -80.921087663044844, 25.18579366676925, -0.02 ], [ -80.921081838916265, 25.194833369042652, -0.02 ], [ -80.917340601969812, 25.194831325803804, 20.43 ], [ -80.917346702199453, 25.185791624363894, 20.43 ], [ -80.921087663044844, 25.18579366676925, -0.02 ] ] ] }
      },
      "native":
      {
        "bbox":
        {
          "maxx": 508327.9363,
          "maxy": 2786523.983,
          "maxz": 20.43,
          "minx": 507951.0063,
          "miny": 2785523

In [15]:
!pdal info /projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz --metadata


{
  "file_size": 7492843,
  "filename": "/projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz",
  "metadata":
  {
    "comp_spatialreference": "PROJCS[\"WGS 84 / UTM zone 17N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",-81],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32617\"]]",
    "compressed": true,
    "copc": true,
    "copc_info":
    {
      "center_x": 508451.4963,
      "center_y": 2786023.493,
      "center_z": 500.47,
      "gpstime_maximum": 0,
      "gpstime_mi

## LAS to COPC Conversion

For converting the LiDAR LAS file to COPC format, we will define a pdal pipeline. A pipeline defines data processing within pdal for reading (using pdal readers), processing (using pdal filters) and writing operations (using pdal writers). The pipelines can also represent sequential operations and can be executed as stages.

A pdal pipeline is defined in a JSON format either as a JSON object or a JSON array. Below is an example of a pdal pipeline taking a .las file as input, generating stats and writing it to a COPC format.



In [None]:
#open a new notebook and create this json file

import json
pipeline_dict = {
    "pipeline": [
        {
            "type": "readers.las",
            "filename": "/projects/GLLIDARPC_FL_20200311_FIA8_l0s12.las"
        },
        {
            "type": "filters.stats"
        },
        {
            "type": "writers.copc",
            "filename": "/projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz"
        }
    ]
}

with open("las_to_copc.json", "w") as f:
    json.dump(pipeline_dict, f, indent=2)

print("JSON pipeline file created successfully!")


## Validation of File
As we can see from output of the below cell, the .copc.laz file is created in the destination directory.


In [11]:
!ls -lh /projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz


-rw-r--r-- 1 root root 7.2M Mar 25 09:17 /projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz


Let’s read the created COPC file again and check the value of copc flag from the metadata. If the generated LiDAR file is a valid COPC file, then this flag should be set to `True`.

In [24]:
import pdal
import json

copc_filename = "/projects/GLLIDARPC_FL_20200311_FIA8_l0s12.copc.laz"

pipeline_dict = {
    "pipeline": [
        {
            "type": "readers.copc",
            "filename": copc_filename
        },
        {
            "type": "filters.stats"
        }
    ]
}

pipeline = pdal.Pipeline(json.dumps(pipeline_dict))
pipeline.execute()


metadata = pipeline.metadata
copc_flag = metadata["metadata"]["readers.copc"].get("copc")

print("COPC valid:", copc_flag)


COPC valid: True


## Accessing the Data

The data values can be accessed from the executed pipeline using `valid_pipe.arrays`. The values in the arrays represent the LiDAR point cloud attributes such as `X, Y, Z,` and `Intensity`, etc.



In [5]:
arr_values = pipeline.arrays
print(arr_values)


[array([(508144.69, 2786399.23, 0.1 , 23898, 1, 1, 1, 0, 2,  13.002, 1, 0, 310327.57128751, 0, 0),
       (508141.26, 2786399.08, 0.16, 16492, 1, 1, 1, 0, 2,  13.002, 1, 0, 310327.57791575, 0, 0),
       (508139.79, 2786400.09, 0.12, 18240, 1, 1, 1, 0, 2,  12.   , 1, 0, 310327.5978811 , 0, 0),
       ...,
       (507962.93, 2786522.54, 0.11, 23963, 1, 1, 0, 0, 2, -12.   , 2, 0, 310329.90509551, 0, 0),
       (507956.73, 2786522.96, 0.08, 24553, 1, 1, 0, 0, 2, -13.002, 2, 0, 310329.92512884, 0, 0),
       (507955.45, 2786523.25, 0.09, 43646, 1, 1, 0, 0, 2, -13.998, 2, 0, 310329.93179795, 0, 0)],
      dtype=[('X', '<f8'), ('Y', '<f8'), ('Z', '<f8'), ('Intensity', '<u2'), ('ReturnNumber', 'u1'), ('NumberOfReturns', 'u1'), ('ScanDirectionFlag', 'u1'), ('EdgeOfFlightLine', 'u1'), ('Classification', 'u1'), ('ScanAngleRank', '<f4'), ('UserData', 'u1'), ('PointSourceId', '<u2'), ('GpsTime', '<f8'), ('ScanChannel', 'u1'), ('ClassFlags', 'u1')])]




Similarly, we can get COPC file statistic and log from the executed pipeline using valid_pipe.metadata`["metadata"]``["filters.stats"]``["statistic"] `and valid_pipe.log. The readers are encouraged to explore the results of these operations on their own.

In [6]:
metadata = pipeline.metadata
stats = metadata["metadata"]["filters.stats"]["statistic"]
import pprint
pprint.pprint(stats)


[{'average': 508099.5139,
  'count': 918138,
  'maximum': 508327.93,
  'minimum': 507951,
  'name': 'X',
  'position': 0,
  'stddev': 59.41990754,
  'variance': 3530.725412},
 {'average': 2785815.21,
  'count': 918138,
  'maximum': 2786523.98,
  'minimum': 2785523,
  'name': 'Y',
  'position': 1,
  'stddev': 242.6482821,
  'variance': 58878.18882},
 {'average': 5.265347497,
  'count': 918138,
  'maximum': 20.43,
  'minimum': -0.02,
  'name': 'Z',
  'position': 2,
  'stddev': 4.852162068,
  'variance': 23.54347674},
 {'average': 31652.79552,
  'count': 918138,
  'maximum': 65535,
  'minimum': 13631,
  'name': 'Intensity',
  'position': 3,
  'stddev': 8490.675942,
  'variance': 72091577.95},
 {'average': 1.30460018,
  'count': 918138,
  'maximum': 6,
  'minimum': 1,
  'name': 'ReturnNumber',
  'position': 4,
  'stddev': 0.5609339823,
  'variance': 0.3146469325},
 {'average': 1.609117584,
  'count': 918138,
  'maximum': 6,
  'minimum': 1,
  'name': 'NumberOfReturns',
  'position': 5,
  's