This workflow follows the framework of two tutorials on lidar [pre-processing](https://rapidlasso.com/2013/10/13/tutorial-lidar-preparation/) and [information extraction](https://rapidlasso.com/2013/10/20/tutorial-derivative-production/) published by Martin Isenburg. It assumes that your lidar data is in tiles and has ground returns already classified. 

In [28]:
import os, shutil, glob
import geopandas as gpd, pandas as pd
import rasterio
from matplotlib import pyplot as plt
from pyFIRS.utils import lastools, fusion
import ipyparallel as ipp

In [23]:
las = lastools.useLAStools('/storage/lidar/LAStools/bin/')
fus = fusion.useFUSION('/storage/lidar/FUSION/')

## Specify some key parameters for the processing pipeline

In [3]:
# where the raw lidar data is currently stored
raw_tiles = '/storage/lidar/Swinomish_Lidar_2016/source/*.laz'
workdir = os.path.abspath('/storage/lidar/Swinomish_Lidar_2016')

num_cores=32 # how many cores to use for parallel processing

Take a look at the format of the lidar data provided by the vendor.

In [4]:
vendor_tiles = glob.glob(raw_tiles)
las.lasinfo(i=vendor_tiles[0], echo=True);


lasinfo (180812) report for '/storage/lidar/Swinomish_Lidar_2016/source/q48122D5415.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            1
  project ID GUID data 1-4:   00000000-0000-0000-6557-747300004157
  version major.minor:        1.2
  system identifier:          'Quantum Spatial'
  generating software:        'LasMonkey 2.2.6'
  file creation day/year:     88/2017
  header size:                227
  offset to point data:       2392
  number var. length records: 3
  point data format:          1
  point data record length:   33
  number of point records:    36190794
  number of points by return: 17005960 12142658 5370083 1418786 228637
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  1151361.60 1124700.02 -253.61
  max x y z:                  1154513.99 1129337.94 380.54
variable length header record 1 of 3:
  re

## Set up the workspace 

In [5]:
# define data handling directories
raw, interim, processed = os.path.join(workdir,'raw'), os.path.join(workdir,'interim'), os.path.join(workdir,'processed')

## 1. Get the raw data into our working directory
First, move the tiles over to our working directory.

In [6]:
%%time
las.las2las(i=raw_tiles,
            odir=raw,
            drop_withheld=True, # drop any points flagged as withheld by vendor
            drop_class=(7,18), # drop any points classified as noise by vendor
            olaz=True,
            cores=num_cores)
print('Done moving tiles into working directory.')

Done moving tiles into working directory.
CPU times: user 12 ms, sys: 8 ms, total: 20 ms
Wall time: 2min 7s


Next, create spatial indexes for the input files to allow fast spatial queries (which we'll use for adding buffers).

In [7]:
%%time
infiles = os.path.join(raw,'*.laz')

las.lasindex(i=infiles, 
             cores=num_cores)

print("Done adding spatial indexes.")

Done adding spatial indexes.
CPU times: user 12 ms, sys: 4 ms, total: 16 ms
Wall time: 1min 7s


Then, we'll retile the data to add buffers for avoiding edge effects during processing.

**THERE ARE ARGUMENTS IN THE FOLLOWING COMMAND THAT DEPEND UPON THE UNITS OF THE DATA.**

If your data are in meters, you should change these parameters, or consider reprojecting the data to a projection that is in feet when you copy the source data into our working directory using `las2las` command at the top of this notebook.

In [8]:
%%time
infiles = os.path.join(raw, '*.laz')
odir = os.path.join(interim, 'retiled')

las.lastile(i=infiles,
            tile_size=3500, # assumes units are in feet... if using meters, change to 1000 or comment out
            buffer=100, # assumes units are in feet... if using meters, change to 25
            flag_as_withheld=True,
            olaz=True,
            odir=odir,
            cores=num_cores)

new_tiles = glob.glob(os.path.join(odir,'*.laz'))
print('Done retiling and adding buffers. Created {} tiles.'.format(len(new_tiles)))

Done retiling and adding buffers. Created 76 tiles.
CPU times: user 28 ms, sys: 16 ms, total: 44 ms
Wall time: 3min 26s


In [None]:
# in case you need to save on storage space
# clean out the raw directory now that you have retiled data to work with
# shutil.rmtree(raw)

## 2. Classify points in the lidar point cloud
Remove noise and identify high vegetation and buildings.

In [9]:
%%time
infiles=os.path.join(interim, 'retiled', '*.laz')
odir = os.path.join(interim, 'denoised')

las.lasnoise(i=infiles, 
             remove_noise=True,
             odir=odir, 
             olaz=True, 
             cores=num_cores) # use parallel processing

print('Done denoising tiles.')

Done denoising tiles.
CPU times: user 20 ms, sys: 16 ms, total: 36 ms
Wall time: 3min 56s


Next, calculate the height aboveground for each point for use in classifying them.

In [10]:
%%time
infiles=os.path.join(interim, 'denoised', '*.laz')
odir = os.path.join(interim, 'lasheight')

las.lasheight(i=infiles,
              odir=odir, 
              olaz=True, 
              cores=num_cores) # use parallel processing

print('Done calculating height above ground.')

Done calculating height above ground.
CPU times: user 120 ms, sys: 116 ms, total: 236 ms
Wall time: 6min 57s


Now, we'll classify points (that haven't already been classified into meaningful categories) as building or high vegetation that meet certain criteria for 'planarity' or 'ruggedness'. 

**THERE ARE ARGUMENTS IN THE FOLLOWING COMMAND THAT DEPEND UPON THE UNITS OF THE DATA.**

If your data are in meters, you should change these parameters, or consider reprojecting the data to a projection that is in feet when you copy the source data into our working directory using `las2las` command at the top of this notebook.

In [11]:
%%time
infiles = os.path.join(interim, 'lasheight', '*.laz')
odir = os.path.join(interim, 'classified')

las.lasclassify(i=infiles,
                odir=odir,
                olaz=True,
                step=5, # if your data are in meters, the LAStools default is 2.0
                planar=0.5, # if your data are in meters, the LAStools default is 0.1
                rugged=1, # if your data are in meters, the LAStools default is 0.4
                ignore_class=(2,9,10,11,13,14,15,16,17), # ignore points already classified meaningfully
                cores=num_cores) # use parallel processing

print('Done classifying lidar tiles.')

Done classifying lidar tiles.
CPU times: user 268 ms, sys: 240 ms, total: 508 ms
Wall time: 23min 3s


Finally, remove buffer points from the classified tiles and put the clean tiles in the processed folder.

In [12]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(processed, 'points')

las.las2las(i=infiles,
            odir=odir,
            olaz=True,
            drop_withheld=True, # remove points in tile buffers that were flagged as withhled with lastile
            set_user_data=0, # remove height aboveground calculated using lasheight
            cores=num_cores)

print('Done trimming classified lidar tiles.')

Done trimming classified lidar tiles.
CPU times: user 12 ms, sys: 8 ms, total: 20 ms
Wall time: 2min 16s


Produce a shapefile showing the layout of the tiles of points.

In [13]:
%%time
infiles = os.path.join(processed, 'points', '*.laz')
odir = os.path.join(processed, 'vectors')

las.lasboundary(i=infiles,
                odir=odir,
                o='tiles.shp',
                oshp=True,
                use_bb=True, # use bounding box of tiles
                overview=True,
                labels=True,
                cores=num_cores) # use parallel processing

print('Produced a shapefile overview of clean tile boundaries.')

Produced a shapefile overview of clean tile boundaries.
CPU times: user 8 ms, sys: 8 ms, total: 16 ms
Wall time: 4.8 s


In [24]:
# %%time
# remove intermediate lidar files if you want to reclaim storage space
# shutil.rmtree(os.path.join(interim, 'retiled'))
# shutil.rmtree(os.path.join(interim, 'denoised'))
# shutil.rmtree(os.path.join(interim, 'lasheight'))

## 3. Generate a bare earth Digital Elevation Model
Generate tiles of the bare earth model. This assumes that there are already ground-classified points

In [14]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(processed, 'rasters', 'DEM_tiles')

las.las2dem(i=infiles,
            odir=odir,
            obil=True, # create tiles as .bil rasters
            keep_class=2, # keep ground-classified returns only
            thin_with_grid=1, # use a 1 x 1 resolution for creating the TIN for the DEM
            extra_pass=True, # uses two passes over data to execute DEM creation more efficiently
            use_tile_bb=True, # remove buffers from tiles
            cores=num_cores) 

print('Done producing bare earth tiles.')

Done producing bare earth tiles.
CPU times: user 28 ms, sys: 8 ms, total: 36 ms
Wall time: 3min 48s


Merge the bare earth tiles into a single DEM formatted as a GeoTiff.

In [15]:
%%time
infiles = os.path.join(processed, 'rasters', 'DEM_tiles', '*.bil')
outfile = os.path.join(processed, 'rasters', 'dem.tif')

las.lasgrid(i=infiles,
            merged=True,
            o=outfile)

print('Done producing merged DEM GeoTiff.')

Done producing merged DEM GeoTiff.
CPU times: user 4 ms, sys: 8 ms, total: 12 ms
Wall time: 3min 31s


To create a hillshade layer, we'll first, generate hillshade tiles from the bare earth model.

In [16]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(processed, 'rasters', 'hillshade_tiles')

las.las2dem(i=infiles,
            odir=odir,
            obil=True, # create tiles as .bil rasters
            cores=num_cores,
            hillshade=True,
            keep_class=2, # keep ground-classified returns only
            thin_with_grid=1, # use a 0.5 x 0.5 resolution for creating the TIN for the DEM
            extra_pass=True, # uses two passes over data to execute DEM creation more efficiently
            use_tile_bb=True) # remove buffers from tiles

print('Done producing hillshade bare earth tiles.')

Done producing hillshade bare earth tiles.
CPU times: user 24 ms, sys: 16 ms, total: 40 ms
Wall time: 3min 43s


Now merge the hillshade tiles into a single raster formatted as GeoTiff.

In [17]:
%%time
infiles = os.path.join(processed, 'rasters', 'hillshade_tiles', '*.bil')
outfile = os.path.join(processed, 'rasters', 'hillshade.tif')

las.lasgrid(i=infiles,
            merged=True,
            o=outfile)

print('Done producing merged hillshade GeoTiff.')

Done producing merged hillshade GeoTiff.
CPU times: user 4 ms, sys: 12 ms, total: 16 ms
Wall time: 2min 18s


## 4. Identify building footprints
First start by building shapefiles showing building boundaries in each buffered tile.

In [18]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(interim, 'building_tiles')

las.lasboundary(i=infiles,
                odir=odir,
                keep_class=6, # use only building-classified points
                disjoint=True, # compute separate polygons for each building
                concavity=3, # map concave boundary if edge length >= 3ft
                cores=num_cores)

print('Done producing building footprints in buffered tiles.')

Done producing building footprints in buffered tiles.
CPU times: user 12 ms, sys: 16 ms, total: 28 ms
Wall time: 1min 15s


Generate shapefiles showing the bounding box of each (unbuffered) tile that we'll use to remove buildings that fall in the buffered area.

In [19]:
%%time
infiles = os.path.join(processed, 'points', '*.laz')
odir = os.path.join(interim, 'tile_boundaries')

las.lasboundary(i=infiles,
                odir=odir,
                oshp=True,
                use_tile_bb=True,
                cores=num_cores)

print('Done producing boundaries of unbuffered tiles.')

Done producing boundaries of unbuffered tiles.
CPU times: user 0 ns, sys: 8 ms, total: 8 ms
Wall time: 5.74 s


For each shapefile containing polygons of the building boundaries, we'll use the `clean_tile` function to remove polygons from a tile if their centroid falls in the buffered area of the tile.

In [20]:
%%time
building_tiles = glob.glob(os.path.join(interim, 'building_tiles', '*.shp'))
odir = os.path.join(processed, 'vectors', 'building_tiles')

for poly_shp in building_tiles:
    fname = os.path.basename(poly_shp)
    tile_shp = os.path.join(interim, 'tile_boundaries', fname)
    lastools.clean_tile(poly_shp, tile_shp, odir, simp_tol=3, simp_topol=True)

print('Done producing building footprints in cleaned (unbuffered) tiles.')

Done producing building footprints in cleaned (unbuffered) tiles.
CPU times: user 11.4 s, sys: 4 ms, total: 11.4 s
Wall time: 12.1 s


Merge the cleaned tiles of building footprints together into a single shapefile. We'll use geopandas to concatenate all the polygons together into a single geodataframe and then write out to a new shapefile.

In [21]:
%%time
building_tiles = glob.glob(os.path.join(processed, 'vectors', 'building_tiles', '*.shp'))
# create a list of geodataframes containing the tiles of building footprints
gdflist = [gpd.read_file(tile) for tile in building_tiles]
# merge them all together
merged = gpd.GeoDataFrame(pd.concat(gdflist, ignore_index=True))
# using pandas concat caused us to lose projection information, so let's add that back in
merged.crs = gdflist[0].crs
# and write the merged data to a new shapefile
merged.to_file(os.path.join(processed,'vectors','buildings.shp'))

print('Done merging tiles of building footprints into a single shapefile.')

Done merging tiles of building footprints into a single shapefile.
CPU times: user 4.29 s, sys: 12 ms, total: 4.3 s
Wall time: 4.32 s


## 4. Create a Canopy Height Model
Although `pyFIRS` includes a method for generating pit free Canopy Height Models following the approach laid out by Martin Isenburg in this [blog post](https://rapidlasso.com/2014/11/04/rasterizing-perfect-canopy-height-models-from-lidar/)--which you can call using `las.pitfree(...)`--this approach is very time-consuming and doesn't (yet) scale well if you're trying to process a whole lidar acquisition. 

We'll first normalize the point clouds to have z values represent height above ground.

In [24]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(interim, 'normalized_chm')

las.lasheight(i=infiles,
              odir=odir,
              olaz=True,
              replace_z=True,
              keep_class=(1,2,3,4,5), # keep only ground, unclassified, and vegetation points
              cores=num_cores)

print('Done normalizing tiles.')

Done normalizing tiles.
CPU times: user 52 ms, sys: 24 ms, total: 76 ms
Wall time: 6min 58s


We'll borrow from the pit free canopy modeling workflow logic to 'splat' the lidar points before processing them into a surface model.

**THERE ARE ARGUMENTS IN THE FOLLOWING COMMAND THAT DEPEND UPON THE UNITS OF THE DATA.**

If your data are in meters, you should change these parameters, or consider reprojecting the data to a projection that is in feet when you copy the source data into our working directory using `las2las` command at the top of this notebook.

In [25]:
%%time
infiles = os.path.join(interim, 'normalized_chm', '*.laz')
odir = os.path.join(interim, 'splatted_chm')

las.lasthin(i=infiles,
            odir=odir,
            olaz=True,
            highest=True,
            subcircle=0.3, # if your data are in meters, the LAStools default is 0.1
            cores=num_cores)

print('Done splatting.')

Done splatting.
CPU times: user 44 ms, sys: 12 ms, total: 56 ms
Wall time: 3min 41s


We'll use a median filter over the highest hits to generate a Canopy Height Model using the `canopymodel` command line tool from the FUSION software package. Because FUSION doesn't offer multi-core options for it's command line tools, we'll use `ipyparallel` to process the jobs asynchronously.

In [33]:
rc = ipp.Client() # a client for controlling the workers
print(rc.ids) # how many workers?

dv = rc[:] # create a direct view of the workers
v = rc.load_balanced_view() # create a load-balanced view of the workers

# import the relevant packages to all the workers
with dv.sync_imports():
    import subprocess
    import os
    import pyFIRS
    from pyFIRS.utils import fusion
    
#%px fus = fusion.useFUSION('/storage/lidar/FUSION') # let the workers know how to call FUSION tools

            Controller appears to be listening on localhost, but not on this machine.
            If this is true, you should specify Client(...,sshserver='you@Ford')
            or instruct your controller to listen on an external IP.


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
importing subprocess on engine(s)
importing os on engine(s)
importing pyFIRS on engine(s)
importing fusion from pyFIRS.utils on engine(s)


CompositeError: one or more exceptions from call to method: remote_import
[0:apply]: ModuleNotFoundError: No module named 'pyFIRS.utils'
[1:apply]: ModuleNotFoundError: No module named 'pyFIRS.utils'
[2:apply]: ModuleNotFoundError: No module named 'pyFIRS.utils'
[3:apply]: ModuleNotFoundError: No module named 'pyFIRS.utils'
.... 28 more exceptions ...

In [26]:
def make_chm(infile):
    "A wrapper function to execute canopymodel on an input file using canned parameters."
    basename = os.path.basename(infile).split('.')[0]
    odir = '/storage/lidar/Swinomish_Lidar_2016/interim/chm'
    outfile = os.path.join(odir,basename + '.dtm')
    
    return fus.canopymodel(surfacefile=outfile,
                           cellsize=1.64042, # half-meter resolution
                           xyunits='F',
                           zunits='F',
                           coordsys=2,
                           zone=0,
                           horizdatum=2,
                           vertdatum=2,
                           datafiles=infile,
                           median=3,
                           peaks=True)

In [27]:
fus.canopymodel?