# Working with LiDAR Cloud Optimized Point Cloud (COPC) in MAAP

**Authors**: Harshini Girish (UAH), Rajat Shinde (UAH), Alex Mandel (DevSeed), Jamison French (DevSeed), Brian Freitag (NASA MSFC), Sheyenne Kirkland (UAH), Henry Rodman (DevSeed), Zac Deziel (DevSeed), Chuck Daniels (DevSeed)

**Date**: November 14, 2024

**Description**: The LASER (LAS) file format is designed to store 3-dimensional (x,y,z) point cloud data typically collected from LiDAR. An LAZ file is a compressed LAS file, and a Cloud-Optimized Point Cloud (COPC) file is a valid LAZ file. COPC files are similar to COGs for GeoTIFFs: Both are valid versions of the original file format but with additional requirements to support cloud-optimized data access. In the case of COGs, there are additional requirements for tiling and overviews. For COPC, data must be organized into a clustered octree with a variable-length record (VLR) describing the octree structure.

**Setup**
This tutorial will explore how to:

1. Read a LiDAR LAS file using PDAL in Python
2. Convert the LiDAR LAS file to Cloud-Optimized Point Cloud (COPC) format
3. Validate the generated COPC file

## Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the [Getting started with the MAAP](#) section of our documentation.

**Disclaimer**: It is highly recommended to run this tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.


## Importing Packages
In this example, we demonstrate how to read a LiDAR LAS file using PDAL in Python and convert the LiDAR LAS file to Cloud-Optimized Point Cloud (COPC) format in the MAAP ADE.

Within your Jupyter Notebook, start by importing the maap package. Then invoke the MAAP constructor, setting the `maap_host` argument to `'api.maap-project.org'`.

In [70]:
# import os module
import os
# import the maap package to handle queries
from maap.maap import MAAP
# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.maap-project.org')


## Downloading The Data
We are using `searchCollection` method from the maap module for searching the granules from the collection. 


In [66]:
collections = maap.searchCollection(
    short_name="G-LiHT",    
    version="001",            
    cmr_host="cmr.earthdata.nasa.gov",
    cloud_hosted="true"       )


# Search for granules in the collection
granules = maap.searchGranule(
    concept_id=collection_id,
    cmr_host="cmr.earthdata.nasa.gov",
    limit=10  # Number of granules to retrieve
)
print("First granule metadata:", granules[2])


First granule metadata: {'concept-id': 'G2164306102-LPCLOUD', 'collection-concept-id': 'C2142771958-LPCLOUD', 'revision-id': '3', 'format': 'application/echo10+xml', 'Granule': {'GranuleUR': 'GEDI02_A_2019108002012_O01959_04_T03909_02_003_01_V002', 'InsertTime': '2021-02-21T13:39:49.553Z', 'LastUpdate': '2021-03-29T12:23:24.958Z', 'Collection': {'ShortName': 'GEDI02_A', 'VersionId': '002'}, 'DataGranule': {'SizeMBDataGranule': '685.071', 'ProducerGranuleId': 'GEDI02_A_2019108002012_O01959_04_T03909_02_003_01_V002', 'DayNightFlag': 'UNSPECIFIED', 'ProductionDateTime': '2021-02-21T14:37:35Z', 'AdditionalFile': {'Name': 'GEDI02_A_2019108002012_O01959_04_T03909_02_003_01_V002.png', 'SizeInBytes': '305718', 'Checksum': {'Value': '4e2ce5ada8384de2def51cf3e1fe6de072d8654ac62872cc21944a45502f0cdf', 'Algorithm': 'SHA256'}}}, 'PGEVersionClass': {'PGEVersion': '003'}, 'Temporal': {'RangeDateTime': {'BeginningDateTime': '2019-04-18T00:20:12.000000Z', 'EndingDateTime': '2019-04-18T01:52:53.000000Z'

Here we are using searchGranule method along with the temporal argument that defines the temporal range.

In [68]:
# Define the collection concept ID 
COLLECTION_ID = "C2142771958-LPCLOUD"  

# Define temporal filter
temporal_filter = "2019-04-01T00:00:00Z,2019-04-30T23:59:59Z"  # Combined as a single string

# Search for granules in the collection with temporal filter
granules = maap.searchGranule(
    concept_id=COLLECTION_ID,             
    cmr_host="cmr.earthdata.nasa.gov",
    temporal=temporal_filter,
    limit=10  # Number of granules to retrieve
)
# Print the number of results and metadata for the first granule
pprint(f"Got {len(granules)} results")
if granules:
    pprint(f"First granule metadata: {granules[0]}")


'Got 10 results'
("First granule metadata: {'concept-id': 'G2164305167-LPCLOUD', "
 "'collection-concept-id': 'C2142771958-LPCLOUD', 'revision-id': '3', "
 "'format': 'application/echo10+xml', 'Granule': {'GranuleUR': "
 "'GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002', 'InsertTime': "
 "'2021-02-21T13:34:19.936Z', 'LastUpdate': '2021-09-16T13:43:36.441Z', "
 "'Collection': {'ShortName': 'GEDI02_A', 'VersionId': '002'}, 'DataGranule': "
 "{'SizeMBDataGranule': '84.0355', 'ProducerGranuleId': "
 "'GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002', 'DayNightFlag': "
 "'UNSPECIFIED', 'ProductionDateTime': '2021-02-21T14:33:03Z', "
 "'AdditionalFile': {'Name': "
 "'GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002.png', 'SizeInBytes': "
 "'270695', 'Checksum': {'Value': "
 "'53dff154a2e73288da9aaf6d1fd54ed2326f0a1176f696b9a1b2f6ad327312c4', "
 "'Algorithm': 'SHA256'}}}, 'PGEVersionClass': {'PGEVersion': '003'}, "
 "'Temporal': {'RangeDateTime': {'BeginningDateTime': "
 

In [69]:
# Filter for LAS files
las_files = [
    granule for granule in granules
    if granule.get('DataFormat') == 'LAS' or granule.get('GranuleUR', '').endswith('.las')
]


In [None]:
# using -go for removing user details and h for getting memory size in MBs
!ls -goh ./data

## PDAL Pipelines for converting the LiDAR LAS file to COPC format

PDAL CLI provides multiple applications for processing point clouds. Also, it allows chaining of these applications for processing point clouds. Similar to gdal info for TIFFs, we can run `pdal info <filename>` on the command line for getting metadata from a point cloud file without reading it in memory.

For converting the LiDAR LAS file to COPC format, we will define a pdal pipeline. A pipeline defines data processing within pdal for reading (using pdal readers), processing (using pdal filters) and writing operations (using pdal writers). The pipelines can also represent sequential operations and can be executed as stages.


In [None]:
{
  "pipeline": [
    {
        "filename":las_filename,
        "type":"readers.las"
    },
    {
        "type":"filters.stats",
    },
    {
        "type":"writers.copc",
        "filename":copc_filename
    }
]
}

This pipeline can be executed using the pdal pipeline `<path_to_json_file>` from the command line for a pipeline saved as a local JSON file.

## LAS to COPC Conversion
Now, to convert the LAS file to a COPC format based on the programmatic pipeline construction.


In [None]:
# Defining output filename. Usually, COPC files are saved as .copc.laz
copc_filename = las_filename.replace('.las', '.copc.laz')
copc_filename


In [None]:
# Once the pipeline is executed successfully, it prints the count of number of points
pipe = pdal.Reader.las(filename=las_filename) | pdal.Writer.copc(filename=copc_filename)
pipe.execute()

# Validating The Product


As we can see from output of the below cell, the `.copc.laz` file is created in the destination directory.

In [None]:
# using -go for removing user details and h for getting memory size in MBs
!ls -goh ./data

Let’s read the created COPC file again and check the value of copc flag from the metadata. If the generated LiDAR file is a valid COPC file, then this flag should be set to True.


In [None]:
# Creating a pipeline to validate COPC file and check metadata
valid_pipe = pdal.Reader.copc(filename=copc_filename) | pdal.Filter.stats()
valid_pipe.execute()

# Getting value for the "copc" key under the metadata
# Output is True for a valid COPC
value = valid_pipe.metadata["metadata"]["readers.copc"].get("copc")
print(value)


# Accessing The Data
The data values can be accessed from the executed pipeline using `valid_pipe.arrays`. The values in the arrays represent the LiDAR point cloud attributes such as X, Y, Z, and intensity, etc.

In [None]:
# Extract array values from the pipeline
arr_values = valid_pipe.arrays

# Print the array values as a dataframe
print(arr_values)

As observed from the output of the above cell, the data values are retrieved from the downloaded product. Hence, validating the downloaded file.

