**This Notebook requires a Bash Kernel**

# Point Data Abstraction Library (PDAL) 

Least importantly, [PDAL](https://pdal.io), is pronounced 'Poo-dal' by its developers [Hobu Inc.](https://hobu.co/), in homage to [GDAL](http://www.gdal.org/) (whose creator pronounces as 'Goo-dal').

Most importantly, PDAL is a utilitarian software library for handling point cloud data that is fast, scalable, and free!

## Step 1: Get PDAL

We are going to use Docker. This is because we are working on potentially dozens of remote machines, with various software stacks. Docker levels the playing field and allows us to install software without worring about it breaking. If we are running Docker on HPC, we can use Singularity to convert the containers.

### Docker

If you haven't yet, install Docker on the VM using the `ezd` command, follow subsequent instructions for removing the use of `sudo` and restart the VM. 

Pull the [PDAL Docker](https://hub.docker.com/r/pdal/pdal/) container from its [DockerHub](https://hub.docker.com/) location:

In [None]:
docker pull pdal/pdal:latest

## Step 2: Get Some Data

At this point you're ready to do some analyses with PDAL

Next, we need to pull some data onto this virtual instance.

From a console, initiate the iRODS connection using the `iinit` command.

|field|response|
|-----|--------|
|host name (DNS)|data.cyverse.org|
|port number|1247|
|user name|$USER|
|irods zone|iplant|
|irods password|type password|

Now you're ready to pull data from the CyVerse data store onto the virtual instance.

Create a new scratch directory for the lidar data: 

In [None]:
mkdir -p /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/

Pull the 2017 SRER classified lidar data from the Community Folder on the CyVerse data store using the `iget` command:

In [None]:
iget -Pbrvf /iplant/home/shared/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/

In [None]:
docker run \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud:/data \
    pdal/pdal pdal info \
    data/NEON_D17_TEAK_DP1_313000_4097000_classified_point_cloud.laz \
    -p 1 

Find the boundaries of the tile

In [None]:
docker run \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud:/data \
    pdal/pdal pdal info \
    data/NEON_D17_TEAK_DP1_313000_4097000_classified_point_cloud.laz \
    --boundary

# Entwine and Greyhound 

[Entwine](https://entwine.io/) and [Greyhound](https://greyhound.io/) are created and owned by Hobu Inc. 

Entwine creates an indexed directory of files for viewing lidar data in your browser. These tiles are served by Greyhound.

Here I'm going to launch the Greyhound server on my localhost, this will allow me to view Entwine output results in my browser

In [None]:
docker pull connormanning/greyhound:latest

In [None]:
docker pull connormanning/entwine:latest

Next, I'm going to start Entwine as a background process.

In [4]:
docker run -it -v $HOME/entwine:/opt/data -p 8080:8080 connormanning/greyhound

Using default config
Settings:
	Cache: 209715200 bytes
	Threads: 4
	Resource timeout: 2 minutes
	Tmp dir: /tmp
Paths:
	/greyhound
	~/greyhound
	/entwine
	~/entwine
	/opt/data
Headers:
	Access-Control-Allow-Methods: GET,PUT,POST,DELETE
	Access-Control-Allow-Origin: *
	Cache-Control: public, max-age=300
Writes NOT allowed
Static serve:
	/usr/include/greyhound/public
Listening:
	HTTP: 8080
^C

Process a single tile from the SRER and put it into a directory that is can be read by Greyhound. The home directory of your username should be readable (`~/entwine`).

I'm also going to convert the projection using the web-mercator.json file in the /pdal directory of the Git Repo. 

In [6]:
docker run -it \
    -v /scratch/neon_data_science/pdal:/pdal_files \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud:/input \
    -v ~/entwine:/output \
    connormanning/entwine build \
    pdal_files/web-mercator.json \
    -i input/NEON_D17_TEAK_DP1_313000_4097000_classified_point_cloud.laz \
    -o output/teak-test

Scanning input
1 / 2: pdal_files/web-mercator.json
2 / 2: input/NEON_D17_TEAK_DP1_313000_4097000_classified_point_cloud.laz

Version: 2.0.0
Input:
	Files: 2
	Total points: 4,608,681
	Density estimate (per square unit): 5.15719
	Threads: [3, 5]
Output:
	Path: output/teak-test/
	Data type: laszip
	Hierarchy type: json
	Hierarchy step: auto
	Sleep count: 2097152
	Scale: 0.01
	Offset: (313553, 4097500, 1887)
Metadata:
	Native bounds: [(313106, 4096999, 938), (314000, 4098000, 2836)]
	Cubic bounds: [(312603, 4096550, 937), (314503, 4098450, 2837)]
	Scaled cube: [(-95000, -95000, -95000), (95000, 95000, 95000)]
	Reprojection: (none)
	Storing dimensions: [
                X:int32, Y:int32, Z:int32, Intensity:uint16, ReturnNumber:uint8,
                NumberOfReturns:uint8, ScanDirectionFlag:uint8,
                EdgeOfFlightLine:uint8, Classification:uint8, ScanAngleRank:float,
                UserData:uint8, PointSourceId:uint16, GpsTime:double, OriginId:uint32
	]
Build parameters:
	Ticks:

View the outputs here: http://potree.entwine.io/data/custom.html?s=localhost:8080&r=teak-test

We can see that there are a lot of outliers in these data. We want to filter these out using PDAL's outlier tools

In [7]:
mkdir /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/FilteredClassifiedPointCloud

Now, I'm going to iterate individual jobs and docker containers across the number of cores on the machine using the `-P` flag for `xargs`

I use PDAL's `pipeline` feature with a .json file called `outlier.json` to filter the outliers from the individual tiles

In [None]:
cd /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud
ls *.laz | cut -d. -f1 | xargs -P4 -I{} \
docker run \
    -v /scratch/neon_data_science/pdal:/pdal_files \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud:/data \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/FilteredClassifiedPointCloud:/filtered \
    pdal/pdal pdal \
    pipeline pdal_files/outlier.json \
    --readers.las.filename=/data/{}.laz \
    --writers.las.filename=/filtered/{}.laz



After the collection has been filtered for outliers, we will run Entwine on the entire directory:

In [None]:
docker run -it -v $HOME:$HOME \
    -v $HOME/QUBES_NEON/pdal:/pdal_files \
    -v /home/tswetnam/DiscreteLidar/FilteredClassifiedPointCloud:/data \
    connormanning/entwine build \
    pdal_files/web-mercator.json \
    -i data/ \
    -o ~/entwine/srer

In [None]:
docker run -it \
    -v /scratch/NEON_data_institute_2018/2017_Campaign/TEAK/L1/DiscreteLidar/ClassifiedPointCloud:/data \
    connormanning/entwine build \
    -i /data/NEON_D17_TEAK_DP1_313000_4097000_classified_point_cloud.laz \
    -o ~/entwine/teak-test

In [None]:
docker run \
-v /vol_c/SRER/L1/DiscreteLidar/FilteredClassifiedPointCloud/:/input \
-v /home/tswetnam/potree/pointclouds/:/output \
potreeconverter PotreeConverter /input \
--overwrite \
-p SRER \
-o /output/SRER \
--title "Santa Rita Experimental Range, Arizona" \
--description "2017 NEON Aerial Observation Platform Discrite Lidar" \
--scale 0.001 \
--output-format LAZ \
--output-attributes INTENSITY \
--projection "+proj=longlat +datum=WGS84 +no_defs"