This workflow follows the framework of two tutorials on lidar [pre-processing](https://rapidlasso.com/2013/10/13/tutorial-lidar-preparation/) and [information extraction](https://rapidlasso.com/2013/10/20/tutorial-derivative-production/) published by Martin Isenburg. It assumes that your lidar data is in tiles and has ground returns already classified. 

In [1]:
import os, shutil, glob, platform, subprocess
import geopandas as gpd, pandas as pd
import rasterio
from matplotlib import pyplot as plt
import ipyparallel as ipp
from pyFIRS.wrappers import lastools, fusion



In [2]:
from pyFIRS.utils import clean_dir, clean_buffer_polys, clip_tile_from_shp, pbar, convert_project, setup_cluster

### Setting up parallel computing using `ipyparallel`
`LAStools` offers native multi-core processing as an optional argument (`cores`)supplied to its command-line tools. `FUSION` command line tools do not. To enable parallel processing of `FUSION` commands, we'll use `ipyparallel` to process tiles in asynchronous parallel batches. We'll do this by using the `map_async` method on a load-balanced view of the workers in our cluster. This will map a function we define (e.g., executing a FUSION or LAStools command line tool with parameters we specify) to a list of lidar tiles. This approach also offers us the ability to track progress using a progress bar.

You'll need to launch a parallel computing cluster. If you have `ipyparallel` installed in the computing environment that was used to launch this notebook (which may be different from the kernel you're using to execute it), you should be able to start a parallel computing cluster by switching to the "IPython Clusters" tab of the "Home" tab that was created when you called `jupyter notebook` from the console. 

![IPython Clusters Tab](https://berkeley-stat159-f17.github.io/stat159-f17/_images/dashboard_clusters_tab_4_0.png)

Once the cluster is up-and-running, you can execute the `setup_cluster()` helper function from `pyFIRS.utils` to set up a Client, Direct View, and Load-Balanced view of the workers in the cluster.

In [3]:
rc, dv, v = setup_cluster()

            Controller appears to be listening on localhost, but not on this machine.
            If this is true, you should specify Client(...,sshserver='you@Ford')
            or instruct your controller to listen on an external IP.


importing subprocess on engine(s)
importing os on engine(s)
importing lastools from pyFIRS.wrappers on engine(s)
importing fusion from pyFIRS.wrappers on engine(s)
importing rasterio on engine(s)
importing clip_tile_from_shp,convert_project from pyFIRS.utils on engine(s)
Viewing 32 workers in the cluster.


Define where we can find the binary executables for LAStools and FUSION command line tools.

In [4]:
las_bin = '/storage/lidar/LAStools/bin/' # wherever they live
fus_bin = '/storage/lidar/FUSION/' # wherever they live

# instantiate the LAStools and FUSION wrappers provided by pyFIRS
las = lastools.useLAStools(las_bin)
fus = fusion.useFUSION(fus_bin)

In [5]:
# push these paths to the workers as well
dv.push(dict(las_bin=las_bin, fus_bin=fus_bin))

# instantiate the useLAStools and useFUSION wrappers on each of the workers
# using some ipyparallel jupyter magic (%)
%px las = lastools.useLAStools(las_bin)
%px fus = fusion.useFUSION(fus_bin)

### Setting up WINE for parallel executions
If the machine you're processing these data on isn't running Windows, you may want to setup separate WINE servers that will handle the work separately for each worker in the cluster. The WINE server to use can be specified as a "WINE prefix". We'll push a unique ID to each core in the cluster identifying its WINE prefix. Methods executed by the `useLAStools` and `useFUSION` classes will check to see if a `wine_prefix` object exists and use it when executing any command line tools. 

In [6]:
# push a unique identifier to each worker that we'll use to run separate instances of WINE
prefixes = ['/storage/wine/.wine-{}'.format(x) for x in range(len(rc.ids))]
dv.scatter('wine_prefix', prefixes)
print(dv['wine_prefix'])

# using some jupyter notebook ipyparallel magic (%), we'll execute the following command on each 
# of the workers to create the wine prefixes if they don't already exist
%px subprocess.run(['export WINEPREFIX={}'.format(wine_prefix[0])], shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

[['/storage/wine/.wine-0'], ['/storage/wine/.wine-1'], ['/storage/wine/.wine-2'], ['/storage/wine/.wine-3'], ['/storage/wine/.wine-4'], ['/storage/wine/.wine-5'], ['/storage/wine/.wine-6'], ['/storage/wine/.wine-7'], ['/storage/wine/.wine-8'], ['/storage/wine/.wine-9'], ['/storage/wine/.wine-10'], ['/storage/wine/.wine-11'], ['/storage/wine/.wine-12'], ['/storage/wine/.wine-13'], ['/storage/wine/.wine-14'], ['/storage/wine/.wine-15'], ['/storage/wine/.wine-16'], ['/storage/wine/.wine-17'], ['/storage/wine/.wine-18'], ['/storage/wine/.wine-19'], ['/storage/wine/.wine-20'], ['/storage/wine/.wine-21'], ['/storage/wine/.wine-22'], ['/storage/wine/.wine-23'], ['/storage/wine/.wine-24'], ['/storage/wine/.wine-25'], ['/storage/wine/.wine-26'], ['/storage/wine/.wine-27'], ['/storage/wine/.wine-28'], ['/storage/wine/.wine-29'], ['/storage/wine/.wine-30'], ['/storage/wine/.wine-31']]


[0;31mOut[0:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-0'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[1:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-1'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[2:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-2'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[3:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-3'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[4:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-4'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[5:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-5'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[6:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-6'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[7:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-7'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[8:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-8'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[9:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-9'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[10:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-10'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[11:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-11'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[12:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-12'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[13:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-13'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[14:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-14'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[15:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-15'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[16:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-16'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[17:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-17'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[18:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-18'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[19:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-19'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[20:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-20'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[21:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-21'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[22:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-22'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[23:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-23'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[24:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-24'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[25:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-25'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[26:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-26'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[27:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-27'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[28:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-28'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[29:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-29'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[30:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-30'], returncode=0, stdout=b'', stderr=b'')

[0;31mOut[31:3]: [0mCompletedProcess(args=['export WINEPREFIX=/storage/wine/.wine-31'], returncode=0, stdout=b'', stderr=b'')

# Enough already, let's get to work with some lidar data

### Specify some key parameters for the processing pipeline

In [7]:
# where the raw lidar data is currently stored
raw_tiles = '/storage/lidar/Swinomish_Lidar_2016/source/*.laz'
workdir = os.path.abspath('/storage/lidar/Swinomish_Lidar_2016')

### Set up the workspace 

In [8]:
# define data handling directories
raw, interim, processed = os.path.join(workdir,'raw'), os.path.join(workdir,'interim'), os.path.join(workdir,'processed')

# push these file locations to the workers as well
dv.push(dict(raw=raw, interim=interim, processed=processed));

## Preview basic info about the lidar data provided by the vendor.

In [9]:
vendor_tiles = glob.glob(raw_tiles)
las.lasinfo(i=vendor_tiles[0], # get info for the first raw tile
            echo=True);


lasinfo (180812) report for '/storage/lidar/Swinomish_Lidar_2016/source/q48122D5415.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            1
  project ID GUID data 1-4:   00000000-0000-0000-6557-747300004157
  version major.minor:        1.2
  system identifier:          'Quantum Spatial'
  generating software:        'LasMonkey 2.2.6'
  file creation day/year:     88/2017
  header size:                227
  offset to point data:       2392
  number var. length records: 3
  point data format:          1
  point data record length:   33
  number of point records:    36190794
  number of points by return: 17005960 12142658 5370083 1418786 228637
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  1151361.60 1124700.02 -253.61
  max x y z:                  1154513.99 1129337.94 380.54
variable length header record 1 of 3:
  re

## 1. Get the raw data into our working directory
First, move the tiles over to our working directory.

In [10]:
def run_las2las(infile): # the function we'll map to a list of inputs
    return las.las2las(i=infile,
                       odir=raw,
                       drop_withheld=True, # drop any points flagged as withheld by vendor
                       drop_class=(7,18), # drop any points classified as noise by vendor
                       olaz=True,
                       wine_prefix=wine_prefix[0])

In [11]:
res = v.map_async(run_las2las, vendor_tiles)
pbar(res)
print('Done moving tiles into working directory.')

HBox(children=(IntProgress(value=0, description='Progress', max=58), HTML(value='')))


Done moving tiles into working directory.


Next, create spatial indexes for the input files to allow fast spatial queries (which we'll use for adding buffers).

In [12]:
def run_lasindex(infile): # the function we'll map to a list of inputs
    return las.lasindex(i=infile, 
                        wine_prefix=wine_prefix[0])

In [14]:
infiles = glob.glob(os.path.join(raw,'*.laz'))
res = v.map_async(run_lasindex, infiles)
pbar(res)
print("Done adding spatial indexes.")

HBox(children=(IntProgress(value=0, description='Progress', max=58), HTML(value='')))


Done adding spatial indexes.


## 2. Retile the data to add buffers for avoiding edge effects during processing.

**THERE ARE ARGUMENTS IN THE FOLLOWING COMMAND THAT DEPEND UPON THE UNITS OF THE DATA.**

The workflow demonstrated here is working in units of US feet on a dataset in Washington State Plane (South). 

In [15]:
%%time
las.lastile(i=os.path.join(raw, '*.laz'),
            merged=True,
            tile_size=4000, # in units of lidar data
            buffer=100, # assumes units are in feet... if using meters, change to 25
            flag_as_withheld=True,
            olaz=True,
            odir=os.path.join(interim, 'retiled'),
            cores=len(rc.ids))

CPU times: user 16 ms, sys: 4 ms, total: 20 ms
Wall time: 2min 36s




In [10]:
def run_lastile(infile): # the function we'll map to a list of inputs
    return las.lastile(i=infile,
            tile_size=4000, # in units of lidar data
            buffer=100, # assumes units are in feet... if using meters, change to 25
            flag_as_withheld=True,
            olaz=True,
            odir=os.path.join(interim, 'retiled_v3'),
            wine_prefix=wine_prefix[0])

In [11]:
infiles = glob.glob(os.path.join(interim, 'denoised_raw','*.laz'))
res = v.map_async(run_lastile, infiles)
pbar(res)
print("Done retiling.")

HBox(children=(IntProgress(value=0, description='Progress', max=83), HTML(value='')))


Done retiling.


In [16]:
# in case you need to save on storage space
# clean out the raw directory now that you have retiled data to work with
# shutil.rmtree(raw)

If you executed the `FUSION` `catalog` command above, you should have a new html report you can view in your interim/retiled directory. Here's what [this report](/storage/lidar/Swinomish_Lidar_2016/interim/retiled/FUSION_catalog.html) looks like. 


## 2. Classify points in the lidar point cloud

In [17]:
# Remove noise... except this command seems to hang for some reason on certain tiles...
def run_lasnoise(infile): # the function we'll map to a list of inputs
    return las.lasnoise(i=infile,
                        remove_noise=True,
                        odir=os.path.join(interim, 'denoised'),
                        olaz=True,
                        wine_prefix=wine_prefix[0]) 

In [18]:
infiles = glob.glob(os.path.join(interim, 'retiled', '*.laz'))
res = v.map_async(run_lasnoise, infiles)
pbar(res)
print('Done denoising tiles.')

HBox(children=(IntProgress(value=0, description='Progress', max=62), HTML(value='')))


Done denoising tiles.


Next, calculate the height aboveground for each point for use in classifying them.

In [21]:
def run_lasheight(infile):
    return las.lasheight(i=infile,
                         odir=os.path.join(interim, 'lasheight'),
                         olaz=True,
                         wine_prefix=wine_prefix[0]) # use parallel processing

In [22]:
infiles=glob.glob(os.path.join(interim, 'retiled', '*.laz'))
res = v.map_async(run_lasheight, infiles)
pbar(res)
print('Done calculating height above ground.')

HBox(children=(IntProgress(value=0, description='Progress', max=62), HTML(value='')))


Done calculating height above ground.


Now, we'll classify points (that haven't already been classified into meaningful categories) as building or high vegetation that meet certain criteria for 'planarity' or 'ruggedness'. 

**THERE ARE ARGUMENTS IN THE FOLLOWING COMMAND THAT DEPEND UPON THE UNITS OF THE DATA.**

If your data are in meters, you should change these parameters, or consider reprojecting the data to a projection that is in feet when you copy the source data into our working directory using `las2las` command at the top of this notebook.

In [None]:
def run_lasclassify(infile):
    return las.lasclassify(i=infile,
                           odir=os.path.join(interim, 'classified'),
                           olaz=True,
                           step=5, # if your data are in meters, the LAStools default is 2.0
                           planar=0.5, # if your data are in meters, the LAStools default is 0.1
                           rugged=1, # if your data are in meters, the LAStools default is 0.4
                           ignore_class=(2,9,10,11,13,14,15,16,17)) # ignore points already classified meaningfully

In [None]:
infiles = glob.glob(os.path.join(interim, 'lasheight', '*.laz'))
res = v.map_async(run_lasclassify, infiles)
pbar(res)
print('Done classifying lidar tiles.')

Finally, remove buffer points from the classified tiles and put the clean tiles in the processed folder.

In [None]:
def dropwithheld(infile):
    return las.las2las(i=infile,
                       odir=os.path.join(processed, 'points'),
                       olaz=True,
                       drop_withheld=True, # remove points in tile buffers that were flagged as withhled with lastile
                       set_user_data=0) # remove height aboveground calculated using lasheight

In [None]:
infiles = glob.glob(os.path.join(interim, 'classified', '*.laz'))
res = v.map_async(dropwithheld, infiles)
pbar(res)
print('Done trimming classified lidar tiles.')

Optional -- Use `FUSION`'s `catalog` tool to generate some descriptive reports and an html file you can review.

In [None]:
%%time
fus.catalog(datafile=os.path.join(processed, 'points','*.laz'),
            catalogfile = os.path.join(processed, 'points') + '/FUSION_catalog',
            image=True,
            drawtiles=True,
            coverage=True,
            countreturns=True,
            uselascounts=True,
            # generate an intensity image with resolution of 2.5x2.5 m
            intensity=(67.27444,0,90), # 67 ft2 ~ 2.5x2.5m pixel area, maps intensity to 0-90 range
            # color areas that have return densities below 2 per m2 or above 8 per m2
            density=(269.098,0.1858,0.7432), # equivalent to (6.25, 2, 8) if units were in meters
            # color areas that have first return densities below 1 per m2 or above 6 per m2
            firstdensity=(269.098,0.0929,0.5574)) # equivalent to (6.25, 1, 6) if units were in meters

Produce a shapefile showing the layout of the tiles of points.

In [None]:
%%time
infiles = os.path.join(processed, 'points', '*.laz')
odir = os.path.join(processed, 'vectors')

las.lasboundary(i=infiles,
                odir=odir,
                o='tiles.shp',
                oshp=True,
                use_bb=True, # use bounding box of tiles
                overview=True,
                labels=True,
                cores=num_cores) # use parallel processing

print('Produced a shapefile overview of clean tile boundaries.')

In [None]:
# %%time
# remove intermediate lidar files if you want to reclaim storage space
# shutil.rmtree(os.path.join(interim, 'retiled'))
# shutil.rmtree(os.path.join(interim, 'denoised'))
# shutil.rmtree(os.path.join(interim, 'lasheight'))

## 3. Generate a bare earth Digital Elevation Model
Generate tiles of the bare earth model. This assumes that there are already ground-classified points

In [None]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(processed, 'rasters', 'DEM_tiles')

proc_dem = las.las2dem(i=infiles,
                       odir=odir,
                       otif=True, # create tiles as GeoTiff rasters
                       keep_class=2, # keep ground-classified returns only
                       step=1, # resolution of output raster, in units of lidar data
                       thin_with_grid=1, # use a 1 x 1 resolution for creating the TIN for the DEM
                       extra_pass=True, # uses two passes over data to execute DEM creation more efficiently
                       use_tile_bb=True, # remove buffers from tiles
                       cores=num_cores) 

for file in glob.glob(os.path.join(odir, '*.tif')):
    subprocess.run(['rio', 'edit-info', '--crs', 'EPSG:2286', file],
                   stderr=subprocess.PIPE, stdout=subprocess.PIPE)

print('Done producing bare earth tiles.')

In [None]:
%%time
# get rid of the .tfw and .kml files that LAStools generates
clean_dir(odir, ['.tfw', '.kml'])

Merge the bare earth tiles into a single GeoTiff.

In [None]:
%%time
infiles = os.path.join(processed, 'rasters', 'DEM_tiles', '*.tif')
outfile = os.path.join(processed, 'rasters', 'dem.tif')

proc_merge = subprocess.run(['rio', 'merge', *glob.glob(infiles), outfile, '--co', 'compress=LZW'],
                            stderr=subprocess.PIPE, stdout=subprocess.PIPE)

print('Done producing merged DEM GeoTiff.')

To create a hillshade layer, we'll first, generate hillshade tiles from the bare earth model.

In [None]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(processed, 'rasters', 'hillshade_tiles')

las.las2dem(i=infiles,
            odir=odir,
            otif=True, # create tiles as GeoTiffs
            hillshade=True,
            keep_class=2, # keep ground-classified returns only
            step=1, # resolution of output raster, in units of lidar data
            thin_with_grid=1, # use a 0.5 x 0.5 resolution for creating the TIN for the DEM
            extra_pass=True, # uses two passes over data to execute DEM creation more efficiently
            use_tile_bb=True, # remove buffers from tiles
            cores=num_cores) 

for file in glob.glob(os.path.join(odir, '*.tif')):
    subprocess.run(['rio', 'edit-info', '--crs', 'EPSG:2286', file],
                   stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    
print('Done producing hillshade bare earth tiles.')

In [None]:
%%time
# get rid of the .tfw and .kml files that LAStools generates
clean_dir(odir, ['.tfw', '.kml'])

Now merge the hillshade tiles into a single raster formatted as GeoTiff.

In [None]:
%%time
infiles = os.path.join(processed, 'rasters', 'hillshade_tiles', '*.tif')
outfile = os.path.join(processed, 'rasters', 'hillshade.tif')

proc_merge = subprocess.run(['rio', 'merge', *glob.glob(infiles), outfile, '--co', 'compress=LZW'],
                            stderr=subprocess.PIPE, stdout=subprocess.PIPE)

print('Done producing merged hillshade GeoTiff.')

## 4. Identify building footprints
First start by building shapefiles showing building boundaries in each buffered tile.

In [None]:
%%time
infiles = os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(interim, 'building_tiles')

las.lasboundary(i=infiles,
                odir=odir,
                keep_class=6, # use only building-classified points
                disjoint=True, # compute separate polygons for each building
                concavity=3, # map concave boundary if edge length >= 3ft
                cores=num_cores)

print('Done producing building footprints in buffered tiles.')

Generate shapefiles showing the bounding box of each (unbuffered) tile that we'll use to remove buildings that fall in the buffered area.

In [None]:
%%time
infiles = os.path.join(processed, 'points', '*.laz')
odir = os.path.join(interim, 'tile_boundaries')

las.lasboundary(i=infiles,
                odir=odir,
                oshp=True,
                use_tile_bb=True,
                cores=num_cores)

print('Done producing boundaries of unbuffered tiles.')

For each shapefile containing polygons of the building boundaries, we'll use the `clean_tile` function to remove polygons from a tile if their centroid falls in the buffered area of the tile.

In [None]:
%%time
building_tiles = glob.glob(os.path.join(interim, 'building_tiles', '*.shp'))
odir = os.path.join(processed, 'vectors', 'building_tiles')

for poly_shp in building_tiles:
    fname = os.path.basename(poly_shp)
    tile_shp = os.path.join(interim, 'tile_boundaries', fname)
    lastools.clean_tile(poly_shp, tile_shp, odir, simp_tol=3, simp_topol=True)

print('Done producing building footprints in cleaned (unbuffered) tiles.')

Merge the cleaned tiles of building footprints together into a single shapefile. We'll use geopandas to concatenate all the polygons together into a single geodataframe and then write out to a new shapefile.

In [None]:
%%time
building_tiles = glob.glob(os.path.join(processed, 'vectors', 'building_tiles', '*.shp'))
# create a list of geodataframes containing the tiles of building footprints
gdflist = [gpd.read_file(tile) for tile in building_tiles]
# merge them all together
merged = gpd.GeoDataFrame(pd.concat(gdflist, ignore_index=True))
# using pandas concat caused us to lose projection information, so let's add that back in
merged.crs = gdflist[0].crs
# and write the merged data to a new shapefile
merged.to_file(os.path.join(processed,'vectors','buildings.shp'))

print('Done merging tiles of building footprints into a single shapefile.')

## 4. Create a Canopy Height Model
We're going to switch use a FUSION command line tool to generate a Canopy Height Models (CHMs). 

### Using FUSION's `canopymodel` to generate CHMs
`FUSION` wants to have ground models formatted as .dtm files, for CHM development and for estimating other canopy metrics. Let's generate these ground models first using a 1-meter x-y resolution.

In [None]:
def run_gridsurface(infile):
    odir = os.path.join(interim, 'dtm_ground_tiles')
    outname = os.path.basename(infile).split('.')[0] + '.dtm'
    outfile = os.path.join(odir, outname)
    return fus.gridsurfacecreate(surfacefile=outfile,
                           cellsize=3.28084,
                           xyunits='F',
                           zunits='F',
                           coordsys=2, # in State Plane
                           zone=0, # not in UTM
                           horizdatum=2, # NAD83
                           vertdatum=2, # NAVD88
                           datafiles=infile,
                           las_class=2, # keep only ground-classified points
                           odir=odir) # will make sure output directory is created if doesn't already exist

In [None]:
%%time
infiles=os.path.join(interim, 'classified', '*.laz')
odir = os.path.join(interim, 'normalized')

las.lasheight(i=infiles,
              odir=odir, 
              olaz=True, 
              replace_z=True,
              drop_below=0.1,
              cores=num_cores) # use parallel processing

print('Done normalizing tiles with ground and vegetation.')

Now we'll create a function that we'll map to a list of input files and distribute to our workers in parallel.

In [None]:
def run_canopymodel(infile):
    odir = os.path.join(interim, 'chm_tiles')
    outname = os.path.basename(infile).split('.')[0] + '.dtm'
    outfile = os.path.join(odir, outname)
    return fus.canopymodel(surfacefile=outfile,
                           cellsize=1,
                           xyunits='F',
                           zunits='F',
                           coordsys=2, # in State Plane
                           zone=0, # not in UTM
                           horizdatum=2, # NAD83
                           vertdatum=2, # NAVD88
                           datafiles=infile,
                           median=3, # median smoothing in 3x3 kernel
                           las_class=(1,2,5), # keep only ground, unclassified, and high veg points
                           asc=True, # also output in ascii format
                           odir=odir) # will make sure output directory is created if doesn't already exist

Execute the canopy model command in parallel.

In [None]:
tiles_to_run = glob.glob(os.path.join(interim, 'normalized', '*.laz'))
res = v.map_async(run_canopymodel, tiles_to_run)
pbar(res)

Convert the ascii files that `canopymodel` generated into GeoTiffs, specifying their projection. Then cleanup the files `canopymodel` generated that we don't care about.

In [None]:
def batch_convert_project(infile):
    return convert_project(infile, '.tif', 'EPSG:2286')

In [None]:
to_convert = glob.glob(os.path.join(interim, 'chm_tiles', '*.asc'))
res = v.map_async(batch_convert_project, to_convert)
pbar(res)

In [None]:
%%time
# remove the ascii and dtm files FUSION created with canopymodel
clean_dir(os.path.join(interim, 'chm_tiles'), ['.asc', '.dtm'])

Clip the canopy height model tiles to remove overlapping areas that were from tile buffering to avoid edge effects.

In [None]:
def batch_clip(infile):
    fname = os.path.basename(infile).split('.')[0]
    in_shp = os.path.join(interim, 'tile_boundaries', fname + '.shp')
    odir = os.path.join(processed, 'rasters', 'chm_tiles')
    return clip_tile_from_shp(infile, in_shp, odir)

In [None]:
to_clip = glob.glob(os.path.join(interim, 'chm_tiles', '*.tif'))
res = v.map_async(batch_clip, to_clip)
pbar(res)

Merge the trimmed canopy height model tiles into a single raster.

In [None]:
%%time
infiles = os.path.join(processed, 'rasters', 'chm_tiles', '*.tif')
outfile = os.path.join(processed, 'rasters', 'chm.tif')

proc_merge = subprocess.run(['rio', 'merge', *glob.glob(infiles), outfile, '--co', 'compress=LZW'],
                            stderr=subprocess.PIPE, stdout=subprocess.PIPE)

print('Done producing merged Canopy Height Model GeoTiff.')