# Converting LiDAR LAS Files to Cloud Optimized Point Clouds (COPCs)

## Environment

The packages needed for this notebook can be installed with `conda` or `mamba`. Using the [`environment.yml` from this folder](./environment.yml) run:

```bash
conda env create -f environment.yml
```

or

```bash
mamba env create -f environment.yml
```

Finally, you may activate and select the kernel in the notebook (running in Jupyter)

```bash
conda activate coguide-copc
```

The notebook has been tested to work with the listed Conda environment.

## Setup

This tutorial will explore how to-

1. Read a LiDAR LAS file using PDAL in Python
2. Convert the LiDAR LAS file to Cloud Optimized Point Cloud (COPC) format
2. Validate the generated COPC file

## About the Dataset

We will be using the [G-LiHT Lidar Point Cloud V001](http://doi.org/10.5067/Community/GLIHT/GLLIDARPC.001) from the NASA EarthData. To access NASA EarthData into Jupyter Notebook, you can create an account by visiting [NASA's Earthdata Login page](https://urs.earthdata.nasa.gov/users/new). This will enable you to register for an account and retrieve the datasets used in the notebook.

We will use [earthaccess](https://github.com/nsidc/earthaccess) library to set up credentials to fetch data from NASA's EarthData catalog.

In [86]:
import earthaccess
import os
import pdal

In [3]:
earthaccess.login()

<earthaccess.auth.Auth at 0x110a94b90>

## Creating a Data Directory for this Tutorial

We are creating a data directory for downloading all the required files locally. 

In [4]:
# set data directory path
data_dir = './data'

# check if directory exists -> if directory doesn't exist, directory is created
if not os.path.exists(data_dir):
    os.mkdir(data_dir)

## Downloading the Dataset from EarthData

We are using `search_data` method from the `earthaccess` module for searching the Granules from the selected collection. The `temporal` argument defines the temporal range for 

In [5]:
# Search Granules
short_name = 'GLLIDARPC'
version = '001'

las_item_results = earthaccess.search_data(
    short_name=short_name,
    version=version,
    temporal = ("2020"), 
    count=3
)

Granules found: 72


In [6]:
las_item_results

[Collection: {'EntryTitle': 'G-LiHT Lidar Point Cloud V001'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -81.03452828650298, 'Latitude': 25.50220025425373}, {'Longitude': -81.01391715300757, 'Latitude': 25.50220365895999}, {'Longitude': -81.01391819492625, 'Latitude': 25.5112430715201}, {'Longitude': -81.03453087148995, 'Latitude': 25.511239665437053}, {'Longitude': -81.03452828650298, 'Latitude': 25.50220025425373}]}}]}}}
 Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-11T04:00:00.000Z', 'EndingDateTime': '2020-03-12T03:59:59.000Z'}}
 Size(MB): 238.623
 Data: ['https://e4ftl01.cr.usgs.gov//GWELD1/COMMUNITY/GLLIDARPC.001/2020.03.11/GLLIDARPC_FL_20200311_FIA8_l0s47.las'],
 Collection: {'EntryTitle': 'G-LiHT Lidar Point Cloud V001'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -81.02242648723991, 'Latitude': 25.493163090615468}, {

Let's use the file with size 91.04 MB and convert it to a COPC format. 

In [7]:
# Download Data - Selecting the 3rd file from the `las_item_results` list
gliht_las_file = earthaccess.download(las_item_results[2], data_dir)
las_filename = f"{gliht_las_file[0]}"statssta
print(las_filename)

 Getting 1 granules, approx download size: 0.09 GB


QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 1736.05it/s]


File GLLIDARPC_FL_20200311_FIA8_l0s22.las already downloaded


PROCESSING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 3788.89it/s]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 7096.96it/s]

data/GLLIDARPC_FL_20200311_FIA8_l0s22.las





## A Brief Introduction to PDAL 

For converting the LiDAR LAS file to COPC format, we will define a [pdal pipeline](https://pdal.io/en/2.6.0/pipeline.html). A pipeline defines data processing within pdal for reading (using [pdal readers](https://pdal.io/en/2.6.0/stages/readers.html)), processing (using [pdal filters](https://pdal.io/en/2.6.0/stages/filters.html)) and writing operations (using [pdal writers](https://pdal.io/en/2.6.0/stages/writers.html)). The pipelines can also represent sequential operations and can be executed as [_stages_](https://pdal.io/en/2.6.0/pipeline.html#stage-object).

#### PDAL Pipelines
A pdal pipeline is defined in a JSON format either as a JSON object or a JSON array. Below is an example of a pdal pipeline taking a `.las` file as input, generating `stats` and writing it to a COPC format. 

```json
{
  "pipeline": [
    {
        "filename":las_filename,
        "type":"readers.las"
    },
    {
        "type":"filters.stats",
    },
    {
        "type":"writers.copc",
        "filename":copc_filename
    }
]
}
```

This pipeline can be executed using the `pdal pipeline <path_to_json_file>` from the command line for a pipeline saved as a local `JSON` file. 

#### Programmatic Pipeline Construction

However, here we will explore a comparatively easier and Pythonic approach to define a pipeline and execute it. This is based on the [PDAL Python extension](https://pypi.org/project/pdal/) which provides a programmatic pipeline construction approach in addition to the simple pipeline construction approach discussed above. 

This approach utilizes the `|` operator to pipe various stages together representing a pipeline. For eg., the above pipeline can be represented as -

```python
pipeline = pdal.Reader.las(filename=las_filename) | pdal.Writer.copc(filename=copc_filename) | pdal.Filter.stats()
```
This pipeline can be executed using `pipeline.execute`.

## LAS to COPC Conversion

Now, let's dive into converting the LAS file to a COPC format based on the programmatic pipeline construction.  

In [45]:
# Defining output filename. Usually, COPC files are saved as .copc.laz
copc_filename = las_filename.split('.')[0]+'.copc.laz'

In [48]:
# pipe = stage 1 | stage 2 | stage 3
# Or, pipeline = pipeline 1 | stage 2

pipe = pdal.Reader.las(filename=las_filename) | pdal.Writer.copc(filename=copc_filename)
print(pipe.execute())

3409439


## Validation

As we can see from output of the below cell, the `.copc.laz` file is created in the destination directory.

In [73]:
# using -go for removing user details and h for getting memory size in MBs
!ls -goh {os.path.dirname(copc_filename)}

total 253960
-rw-r--r--  1     26M Feb 29 14:05 GLLIDARPC_FL_20200311_FIA8_l0s22.copc.laz
-rw-r--r--  1     91M Feb 29 11:27 GLLIDARPC_FL_20200311_FIA8_l0s22.las


Let's read the created COPC file again and explore it's stats.

In [83]:
valid_pipe = pdal.Reader.las(filename=las_filename) | pdal.Filter.stats()
valid_pipe.execute()
print(valid_pipe.metadata)

{'metadata': {'filters.stats': {'bbox': {'EPSG:4326': {'bbox': {'maxx': -80.93555795, 'maxy': 25.28524102, 'maxz': 69.99, 'minx': -80.94099075, 'miny': 25.27619906, 'minz': -12.54}, 'boundary': {'type': 'Polygon', 'coordinates': [[[-80.94099075054905, 25.276201329530473, -12.54], [-80.94098637748567, 25.285241015299494, -12.54], [-80.9355579494582, 25.285238744206318, 69.99], [-80.9355627247816, 25.276199059361314, 69.99], [-80.94099075054905, 25.276201329530473, -12.54]]]}}, 'native': {'bbox': {'maxx': 506487.7363, 'maxy': 2796533.993, 'maxz': 69.99, 'minx': 505941.2263, 'miny': 2795533.003, 'minz': -12.54}, 'boundary': {'type': 'Polygon', 'coordinates': [[[505941.22630256764, 2795533.0032408433, -12.54], [505941.22630256764, 2796533.993240843, -12.54], [506487.73630256765, 2796533.993240843, 69.99], [506487.73630256765, 2795533.0032408433, 69.99], [505941.22630256764, 2795533.0032408433, -12.54]]]}}}, 'statistic': [{'average': 506237.8598, 'count': 3409439, 'maximum': 506487.7363, 'm

Similarly, we can get `metadata`, `log` and `array values` from the executed pipeline using `valid_pipe.metadata`, `valid_pipe.log` and `valid_pipe.arrays`. The readers are encouraged to explore the results of these operations on their own. 

Additionally, we can run `pdal info <filename>` on the command line for getting information from the generated COPC file.

In [89]:
!pdal info {copc_filename}

{
  "file_size": 27367236,
  "filename": "data/GLLIDARPC_FL_20200311_FIA8_l0s22.copc.laz",
  "now": "2024-02-29T15:43:42-0600",
  "pdal_version": "2.6.3 (git-version: Release)",
  "reader": "readers.copc",
  "stats":
  {
    "bbox":
    {
      "EPSG:4326":
      {
        "bbox":
        {
          "maxx": -80.93555791,
          "maxy": 25.28524099,
          "maxz": 69.99,
          "minx": -80.94099071,
          "miny": 25.27619903,
          "minz": -12.54
        },
        "boundary": { "type": "Polygon", "coordinates": [ [ [ -80.94099071383971, 25.276201300248541, -12.54 ], [ -80.940986340773605, 25.285240986017602, -12.54 ], [ -80.935557912747441, 25.285238714923079, 69.99 ], [ -80.93556268807356, 25.276199030078029, 69.99 ], [ -80.94099071383971, 25.276201300248541, -12.54 ] ] ] }
      },
      "native":
      {
        "bbox":
        {
          "maxx": 506487.74,
          "maxy": 2796533.99,
          "maxz": 69.99,
          "minx": 505941.23,
          "miny": 279553

From the above metadata, it is important to note the line `"reader": "readers.copc"` which validates that the generated file is a valida COPC file. 