# Project TreeBeard 
## An Open Source Solution to Quantifyng Tree Patterns in Forest Land

### Objectives

The objective of Treebeard is to automate the classification and quantification of land cover from raster mosaic datasets in forested areas. The final product is intended to a be a free open-source GIS plugin  that runs on a python base.  It is intended for users wanting a categorial map of features and open spaces in areas with high densities of trees.  Such a tool would prove valuable to managers in forestry as it would allow them to process information for sizeable tracts of land to identify areas that may need treatment or brush clearing. It could be used to report spatial statistics of stands already undergoind treatment. This would hopefully save time and budget costs by removing the need to manually delineate certain types of spaces by spectral signature, and allowing the user to then run selected computations to report spatial heterogenity characteristics.

For this particular study, we will be interested in using our current methods to evaluate the spatial heterogenity in a few sites currently undergoing study in the Lefthand Creek Watershed neard Boulder, Colorado. 
We aim to use unsupervised clustering methods such as k-means on high resolution aerial photogrpahy of the study site to identify specific tree cluster patterns in the landscape.  After that, we plan to bolster this with canopy height models derived from available LiDAR data at the study site.  This would then allow us to create and designate polygons of the immedaite areas of certain stands  --- which could then allow for quick calcualtions of spatial composition, much faster than processing the mosaic data or manually specifying the areas in a shape file.  



Background
----------

### Importance of Spatial Heterogeneity

Spatial heterogeneity is a critical factor in assessing the health of an ecosystem. It is commonly utilized to monitor the distribution of tree species, respond to pollutants and diseases among plant populations, and support hydrological studies that assess potential flood paths. Additionally, it plays a crucial role in evaluating and mitigating the risk of wildfires by forest managers.

### Application in Landscape Analysis

In this project, our focus is on measuring spatial heterogeneity in terms of the distribution of tree groups within the landscape. Specifically, we aim to identify various areas where trees are clustered together in formations ranging from small copses to larger stands. The arrangement and density of these stands are significant as they influence several ecological dynamics, including the patterns of wind flow through the area. When combined with variables such as brush density and canopy overlap, these factors become critical indicators of an area's susceptibility to wildfires.

### Impact of Human Activity

It is also vital to acknowledge that aerial surveys may capture spatial heterogeneity resulting from human activities, such as land development or resource extraction. These anthropogenic influences often introduce different patterns of spatial composition and configuration compared to those arising from natural processes. A primary goal in forestry management is to restore the land to a condition similar to its pre-European settlement state in North America, believed to represent the most unmodified and stable ecological environment.

Study Site Overview
-------------------

### Left-hand Creek Watershed

The study site is located in the Left-hand Creek Watershed. This section of the report will describe the specific area and its forest biome characteristics. The watershed serves as a significant ecological area that supports diverse plant and animal species and plays a crucial role in the local hydrological cycle.

The area exists within the Rocky Mountains to the north of Boulder, CO. It featrues moutainaous terrain at a high elevation, with a contiental climate that features large differences in day and night temperature due to high elevation. It is a highly wooded area, with some small rural devleopments found throughout. It is a home to a diverse set of plant and animal life. The area is currently undergoing restoration projects on some of its local streams.

![Lefthand Creek](../images/left_hand_creek.png)

## Data Sources
### Shape files
Shape files for the project area of interest were provided by Eric Frederik of the Watershed Center. They indicate sections of land currently under observation by the Watershed Center. These are available as vector polygons in a vareity of machine-readable file formats.

### Aerial Data
Aerial Data was taken by the Denver Regional Counsel of Governance as part of the Denver Regional Aerial Photography Project (DRAPP) taken every two years.  This is stored on the regional catalog on the [DRCOG website.](https://data.drcog.org/data?page=1&program%5b0%5d=Denver%20Regional%20Aerial%20Photography%20Project&q=&sort=dataVintage)  The resoultion for most of the project is 12 in per pixel.  This data contains four individual spectral bands: red, green, blue, and near infrared -- these are important for calculating  spectral signatures of the areas within the photo.  We are using the 2929 version of the dataset as that is the most recent set freely available to the public. 


### LiDAR Data
LiDar data was also obtained through the DRCOG.  We took our data from the DRCOG LIDAR QL2 INDEX IN CO SP NORTH 2020 set.  This contained mostly point-cloud data that would need to be porcessed into DEM.  Because our area of interest is rural, we must use the QL2 set, which is a qaulity later that will result in a 1 meter cell size in any resulting DEMS.  



## Methods
### Importing Packages And Data

In [None]:
import os
import warnings
import zipfile

import contextily as ctx
import cv2
import earthpy as et
import geopandas as gpd
import hvplot.pandas
import hvplot.xarray
import hvplot as hv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import rasterio
import rasterio.features
import rioxarray as rxr
import rioxarray.merge as rxrm
from ipyleaflet import Map
from leafmap import leafmap
from localtileserver import get_leaflet_tile_layer, TileClient
from rasterio.features import shapes
from rasterio.mask import mask
from rasterio.merge import merge
from rasterio.plot import show
import requests
from samgeo import overlay_images, SamGeo, tms_to_geotiff, get_basemaps
from scipy.ndimage import binary_opening, binary_closing
import shapely
from shapely.geometry import box, Point, shape
from shapely.ops import unary_union
from sklearn.cluster import KMeans
from IPython.display import Image
import whitebox
import zipfile

warnings.filterwarnings("ignore")

In [None]:
# Code from treebeard.ipynb, commented. 

# Define the path to the shapefile of the Area of Interest (AOI)
shape_file_path = 'assets/areas/immediate_project/Zumwinkel_property.shp'
aoi_gdf = gpd.read_file(shape_file_path)

# Define directories and filepaths
data_dir = os.path.join(et.io.HOME, et.io.DATA_NAME, 'treebeard')

# Data Set #1: Denver Regional Aerial Imagery Boundary
drapp_url = (
    'https://gisdata.drcog.org:8443'
    '/geoserver/DRCOGPUB/ows?'
    'service=WFS&version=1.0.0&'
    'request=GetFeature&'
    'typeName=DRCOGPUB:drapp_tile_scheme_2020'
    '&outputFormat=SHAPE-ZIP'
)
drapp_shp_file = os.path.join(data_dir, 'drapp_tile_scheme_2020', 'drapp_tile_scheme_2020.shp')
drapp_dir = os.path.join(data_dir, 'drapp_tile_scheme_2020')

# Data Set #2: Denver Regional Aerial Imagery Tiles
drapp_tiles_dir = os.path.join(data_dir, 'drapp_tiles')
os.makedirs(drapp_tiles_dir, exist_ok=True)

# Download the DRAPP shapefile if it does not exist
if not os.path.exists(drapp_shp_file):
    drapp_resp = requests.get(drapp_url)
else:
    print(f'Data {drapp_shp_file.split("/")[-1]} already downloaded.')

# Function to extract a zip file to a target directory
def extract_zip(target_dir, zip_filename, resp):
    zip_path = os.path.join(data_dir, zip_filename)
    with open(zip_path, 'wb') as f:
        f.write(resp.content)
    with zipfile.ZipFile(zip_path, 'r') as z:
        z.extractall(target_dir)

# Extract the DRAPP shapefile if it is not already extracted
drapp_dir = os.path.join(data_dir, 'drapp_tile_scheme_2020')
if not os.path.exists(drapp_shp_file):
    extract_zip(data_dir, 'drapp_tile_scheme_2020.zip', drapp_resp)
else:
    print(f'Data {drapp_shp_file.split("/")[-1]} already extracted.')

# Check the Coordinate Reference System (CRS) of the AOI GeoDataFrame
aoi_gdf.crs

# Create a bounding box around the AOI
bbox_aoi_gdf = gpd.GeoDataFrame(geometry=aoi_gdf.bounds.apply(lambda row: box(row['minx'], row['miny'], row['maxx'], row['maxy']), axis=1))
bbox_aoi_gdf.crs = aoi_gdf.crs

# Read the DRAPP shapefile and reproject to match the AOI CRS
drapp_gdf = gpd.read_file(drapp_shp_file)
drapp_gdf = drapp_gdf.to_crs(aoi_gdf.crs) # EPSG:6428 -> EPSG:4326

# Spatial join between DRAPP tiles and the AOI
drapp_aoi_gdf = gpd.sjoin(drapp_gdf, aoi_gdf, how='inner', predicate='intersects')

# Plot the AOI, bounding box, and DRAPP tiles
aoi_plot = aoi_gdf.hvplot(
        geo=True, tiles='OSM', alpha=1, 
        height=800, width=800, color='red',
        label='Area of Interest'
    )
aoi_bound_plot = bbox_aoi_gdf.hvplot(
        geo=True, tiles='OSM', alpha=0.7, 
        color='blue',
        label='AOI Bounding Box'
    )
drapp_plot = drapp_aoi_gdf.hvplot(
        geo=True, tiles='OSM', 
        alpha=0.3, color='blue',
        hover_cols=['tile', 'photo_date'],
        title='Area of Interest in DRAPP Tile N4W351'
    )
composite_plot = aoi_bound_plot * drapp_plot * aoi_plot

# Save and display the plots
Image("images/aoi_plot.png")
Image("images/composite_plot.png")

# Generate URLs for downloading DRAPP tiles
tile_base_url = 'https://drapparchive.s3.amazonaws.com/2020/'
tile_urls = [f'{tile_base_url}{tile}.tif' for tile in drapp_aoi_gdf['tile'].unique()]

# Create directory for DRAPP tiles
drapp_tiles_dir = os.path.join(data_dir, 'drapp_tiles')
os.makedirs(drapp_tiles_dir, exist_ok=True)

# Function to download files from URLs
def download_files(urls):
    paths = []
    for url in urls:
        filename = os.path.basename(url)
        path = os.path.join(drapp_tiles_dir, filename)
        paths.append(path)
        if not os.path.exists(path):
            print(f"Downloading {url}...")
            resp = requests.get(url)
            with open(path, 'wb') as f:
                f.write(resp.content)
        else:
            print(f"File {filename} already downloaded.")
    return paths

# Download DRAPP tile files
tile_paths = download_files(tile_urls)

# Extract photo date and tile name for the first tile
photo_date = pd.to_datetime(
        drapp_aoi_gdf['photo_date']
    ).dt.date.unique()[0].strftime('%Y-%m-%d')
tile_name = drapp_aoi_gdf['tile'].unique()[0]

# Open and merge the DRAPP tiles to create a mosaic
src_files_to_mosaic = [rasterio.open(path) for path in tile_paths]
mosaic, out_trans = merge(src_files_to_mosaic)
red = mosaic[0]
green = mosaic[1]
blue = mosaic[2]
nir = mosaic[3]
rgb = np.dstack((red, green, blue))
plt.imshow(rgb)
plt.title(f'RGB Image: DRAPP Tile {tile_name} ({photo_date})')
plt.savefig('images/N4W351_rgb_image.png')
plt.show()

# Crop the mosaic using the AOI bounding box
with rasterio.open(tile_paths[0]) as src:
    # Ensure CRS match between raster and GeoDataFrame
    if bbox_aoi_gdf.crs != src.crs:
        bbox_aoi_gdf = bbox_aoi_gdf.to_crs(src.crs)

    # Create a mask for cropping
    geom = [bbox_aoi_gdf.geometry.unary_union]  # Combine all geometries in the GeoDataFrame
    out_image, out_transform = mask(src, geom, crop=True)

    # Update the metadata for the cropped raster
    out_meta = src.meta.copy()
    out_meta.update({
        "driver": "GTiff",
        "height": out_image.shape[1],
        "width": out_image.shape[2],
        "transform": out_transform
    })

# Save the cropped raster to a file
cropped_file = "scratch/cropped_N4W351.tif"
with rasterio.open(cropped_file, "w", **out_meta) as dest:
    dest.write(out_image)

# Plot the cropped RGB image
with rasterio.open(cropped_file) as src:
    red = src.read(1)
    green = src.read(2)
    blue = src.read(3)
    rgb = np.dstack((red, green, blue))
    plt.imshow(rgb)
    plt.title(f'Cropped RGB Image: DRAPP Tile {tile_name} ({photo_date})')
    plt.savefig('images/cropped_N4W351_rgb_image.png')
    plt.show()

# Load the cropped GeoTIFF file for further processing
geotiff = "scratch/cropped_N4W351.tif"
with rasterio.open(geotiff) as src:
    bands = [src.read(i) for i in range(1, 5)]
    image_data = np.dstack(bands)

# Prepare the data for clustering
pixels = image_data.reshape((-1, 4))  # Flatten the image data for clustering

# Perform K-means clustering for 2 to 5 clusters
predictions = {}
for k in range(2, 6): 
    kmeans = KMeans(n_clusters=k, random_state=0, n_init=10)
    kmeans.fit(pixels)
    labels = kmeans.labels_
    segmented_image = labels.reshape(image_data.shape[0], image_data.shape[1])
    predictions[k] = segmented_image

# Plot the clustering results
f, axes = plt.subplots(2, 2, figsize=(15, 15))  # Adjust subplot layout
for ax_i, (k, prediction) in enumerate(predictions.items()):
    if ax_i >= 6:
        continue
    ax = axes.flatten()[ax_i]
    im = ax.imshow(prediction, cmap="terrain", interpolation='none')
    ax.set_title("Number of clusters: " + str(k))
    ax.axis('off')  # Hide axis ticks
plt.show()



# Open the GeoTIFF file
image = rasterio.open(geotiff)

# Read the red, green, blue, and NIR bands from the image
red = image.read(1).astype(float)
green = image.read(2).astype(float)
blue = image.read(3).astype(float)
nir = image.read(4).astype(float)

# Stack the bands into a single array
bands = np.dstack((red, green, blue, nir))

# Get the number of bands in the image
nbands = image.read().shape[0]

# Close the image file
image.close()

# Calculate the NDVI (Normalized Difference Vegetation Index)
ndvi = (nir - red) / (nir + red)

# Replace NaN values in the NDVI with 0
ndvi[np.isnan(ndvi)] = 0

# Create a mask for open spaces where NDVI is greater than or equal to 0.1
open_spaces = ndvi >= 0.1

# Display the open spaces mask
show(open_spaces, cmap='Greys')

# Function to save a NumPy array as a GeoTIFF
def array_to_image(
    array, output, source=None, dtype=None, compress="deflate", **kwargs
):
    """Save a NumPy array as a GeoTIFF using the projection information from an existing GeoTIFF file.

    Args:
        array (np.ndarray): The NumPy array to be saved as a GeoTIFF.
        output (str): The path to the output image.
        source (str, optional): The path to an existing GeoTIFF file with map projection information. Defaults to None.
        dtype (np.dtype, optional): The data type of the output array. Defaults to None.
        compress (str, optional): The compression method. Can be one of the following: "deflate", "lzw", "packbits", "jpeg". Defaults to "deflate".
    """
    
    from PIL import Image

    if isinstance(array, str) and os.path.exists(array):
        array = cv2.imread(array)
        array = cv2.cvtColor(array, cv2.COLOR_BGR2RGB)

    if output.endswith(".tif") and source is not None:
        with rasterio.open(source) as src:
            crs = src.crs
            transform = src.transform
            if compress is None:
                compress = src.compression

        # Determine the minimum and maximum values in the array
        min_value = np.min(array)
        max_value = np.max(array)

        if dtype is None:
            # Determine the best dtype for the array
            if min_value >= 0 and max_value <= 1:
                dtype = np.float32
            elif min_value >= 0 and max_value <= 255:
                dtype = np.uint8
            elif min_value >= -128 and max_value <= 127:
                dtype = np.int8
            elif min_value >= 0 and max_value <= 65535:
                dtype = np.uint16
            elif min_value >= -32768 and max_value <= 32767:
                dtype = np.int16
            else:
                dtype = np.float64

        # Convert the array to the best dtype
        array = array.astype(dtype)

        # Define the GeoTIFF metadata
        if array.ndim == 2:
            metadata = {
                "driver": "GTiff",
                "height": array.shape[0],
                "width": array.shape[1],
                "count": 1,
                "dtype": array.dtype,
                "crs": crs,
                "transform": transform,
            }
        elif array.ndim == 3:
            metadata = {
                "driver": "GTiff",
                "height": array.shape[0],
                "width": array.shape[1],
                "count": array.shape[2],
                "dtype": array.dtype,
                "crs": crs,
                "transform": transform,
            }

        if compress is not None:
            metadata["compress"] = compress
        else:
            raise ValueError("Array must be 2D or 3D.")

        # Create a new GeoTIFF file and write the array to it
        with rasterio.open(output, "w", **metadata) as dst:
            if array.ndim == 2:
                dst.write(array, 1)
            elif array.ndim == 3:
                for i in range(array.shape[2]):
                    dst.write(array[:, :, i], i + 1)

    else:
        img = Image.fromarray(array)
        img.save(output, **kwargs)

# Save the open spaces mask as a GeoTIFF file
mask_output = "ndvi-open-spaces.tif"
array_to_image(open_spaces, mask_output, source=geotiff)

# Perform K-means clustering with 2 clusters on the image pixels
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(pixels)

# Get the cluster labels for each pixel
cluster_labels = kmeans.labels_

# Reshape the cluster labels to the original image shape
clustered_image = cluster_labels.reshape(image_data.shape[:2])

# Create a binary mask for a specific cluster (e.g., cluster 0)
binary_mask = (clustered_image == 0).astype(np.uint8) * 255
binary_mask = binary_mask == 0

# Display the binary mask
plt.figure(figsize=(10, 6))
plt.imshow(binary_mask, cmap='Greys')
plt.title('Binary Mask from K-Means Clustering')
plt.show()

# Save the binary mask as a GeoTIFF file
array_to_image(binary_mask, 'kmeans-open-spaces.tif', source=geotiff)

# Define file paths for the NDVI and K-means masks
ndvi_mask = 'ndvi-open-spaces.tif'
kmeans_mask = 'kmeans-open-spaces.tif'

# Function to convert a GeoTIFF to a GeoDataFrame
def geotiff_to_gdf(filepath):
    with rasterio.open(filepath) as src:
        mask = src.read(1).astype('uint8')
        mask_transform = src.transform
        geometries = list(shapes(mask, transform=mask_transform))
        shapes_list = [{'properties': {'raster_val': v}, 'geometry': shape(geom)}
                        for geom, v in geometries if v == 0]
        gdf = gpd.GeoDataFrame.from_features(shapes_list)
        gdf.crs = src.crs
    return gdf

# Function to save a GeoDataFrame as a shapefile
def gdf_to_shp(gdf, out_path):
    gdf.to_file(out_path)

# Function to calculate the area of each polygon in a GeoDataFrame
def calculate_area(gdf):
    gdf['area_feet'] = gdf.area
    gdf['area_acres'] = gdf['area_feet'] / 43560
    return gdf['area_acres']

# List of mask file paths to process
files = [ndvi_mask, kmeans_mask]
results = []

# Process each mask file
for file in files:
    filename = file
    gdf = geotiff_to_gdf(file)    
    gdf['area_acres'] = calculate_area(gdf)
    results.append(gdf)

# Function to plot the size categories of polygons in a GeoDataFrame
def bin_plot(gdf, bins, labels, title):
    gdf['size_category'] = pd.cut(gdf['area_acres'], bins=bins, labels=labels, right=False)
    fig, ax = plt.subplots(1, 1, figsize=(10, 10))
    plot = gdf.plot(column='size_category', ax=ax, legend=True, categorical=True, legend_kwds={'title': 'Size Category'})
    ax.set_title(title)
    plt.tight_layout()
    filename = title.split('.')[0]
    plt.savefig(f'{filename}.png')
    plt.show()

# Plot the size categories for each processed mask
for i, result in enumerate(results):
    bins = [0, 1/8, 1/4, 1/2, 1]
    labels = ['1/8 acre', '1/4 acre', '1/2 acre', '1+ acres']
    title = files[i]
    bin_plot(result, bins, labels, title)

In [None]:
# process_lidar.ipynb code, commented

import os
import earthpy as et
import geopandas as gpd
import hvplot as hv
import hvplot.pandas
import hvplot.xarray
import numpy as np
import rasterio
import rasterio.features
import rioxarray as rxr
import rioxarray.merge as rxrm
from scipy.ndimage import binary_opening, binary_closing
import shapely
from shapely.geometry import shape
import whitebox
import zipfile
from shapely.ops import unary_union

# Import the LIDAR index grid and set up directories
data_dir = os.path.join(et.io.HOME, et.io.DATA_NAME)
project_dir = os.path.join(data_dir, "treebeard")
# Create the directory if it doesn't exist
os.makedirs(data_dir, exist_ok=True)

las_index_path = os.path.join(
    data_dir,
    'earthpy-downloads',
    'lidar_index_cspn_q2',
    'lidar_index_cspn_q2.shp'
)

# Download LIDAR index tiles if not already present
if not os.path.exists(las_index_path):
    las_index_url = ('https://gisdata.drcog.org:8443/geoserver/DRCOGPUB/'
             'ows?service=WFS&version=1.0.0&request=GetFeature&'
             'typeName=DRCOGPUB:lidar_index_cspn_q2&outputFormat=SHAPE-ZIP')

    las_index_shp = et.data.get_data(url=las_index_url)

# Read the LIDAR index shapefile and set its CRS
las_index_gdf = (
    gpd.read_file(las_index_path).set_index('tile')
#    .loc[['N3W345']]  # Uncomment and specify tiles if needed
)

las_index_gdf = las_index_gdf.to_crs('EPSG:4269')
crs = las_index_gdf.crs

# Plot the LIDAR index grid using hvplot
las_index_plot = las_index_gdf.hvplot(
    tiles='OSM',
    crs=las_index_gdf.crs,
    geo=True,
    line_color='black',
    line_width=2,
    fill_alpha=0
)
las_index_plot

# Open project areas shapefile and plot
proj_zip_path = '../assets/project_areas_merged.zip'

with zipfile.ZipFile(proj_zip_path, 'r') as zip_ref:
    temp_dir = '/tmp/extracted_shapefile'  # You can specify any temporary directory
    zip_ref.extractall(temp_dir)

extracted_shapefile_path = temp_dir + '/'

proj_area_gdf = gpd.read_file(extracted_shapefile_path)

# Convert project area CRS to EPSG:4326
proj_area_gdf = proj_area_gdf.to_crs("EPSG:4326")

# Plot the project areas using hvplot
proj_area_plot = proj_area_gdf.hvplot(
    x='x',
    y='y',
    aspect='equal',
    tiles='EsriImagery',
    geo=True,
    line_color='blue',
    line_width=2,
    fill_alpha=0
)

proj_area_plot

# Identify the tiles that intersect each project area using spatial join
select_tiles_gdf = gpd.sjoin(las_index_gdf, proj_area_gdf, how='inner', predicate='intersects')

select_tiles_gdf.reset_index(drop=False)
select_tiles_gdf.hvplot(
    x='x',
    y='y',
    aspect='equal',
    tiles='EsriImagery',
    geo=True,
    line_color='blue',
    line_width=2,
    fill_alpha=0
)

# Reset index for the selected tiles
select_tiles_gdf = select_tiles_gdf.reset_index(drop=False)

# Generate list of all tiles per project area
tiles_by_area = select_tiles_gdf.groupby('Proj_ID')['tile'].apply(list).reset_index()
tiles_by_area

# Function to process LAS files to canopy using Whitebox
def convert_las_to_tif(input_las, output_tif, return_type):
    """
    Converts a LAS file to a GeoTIFF using WhiteboxTools, based on the specified return type.

    Parameters
    ----------
    input_las : str
        Path to the input LAS file.
    output_tif : str
        Path to save the output GeoTIFF file.
    return_type : str
        Type of returns to process. Must be either 'first' for first returns or 'ground' for ground returns.

    Raises
    ------
    ValueError
        If `return_type` is not 'first' or 'ground'.

    Notes
    -----
    This function uses WhiteboxTools' `lidar_idw_interpolation` method to perform the conversion.
    The interpolation method used is Inverse Distance Weighting (IDW) with a resolution of 1.

    Examples
    --------
    >>> convert_las_to_tif_whitebox('input.las', 'output_first_returns.tif', 'first')
    >>> convert_las_to_tif_whitebox('input.las', 'output_ground_returns.tif', 'ground')
    """
    wbt = whitebox.WhiteboxTools()
    
    if return_type == "first":
        wbt.lidar_idw_interpolation(
            i=input_las,
            output=output_tif,
            parameter="return_num",
            returns=1,
            resolution=1  # Adjust as needed
        )
    elif return_type == "ground":
        wbt.lidar_idw_interpolation(
            i=input_las,
            output=output_tif,
            parameter="classification",
            classification=2,
            resolution=1  # Adjust as needed
        )
    else:
        raise ValueError("Invalid return_type. Use 'first' or 'ground'.")

# Process tiles for each project area
# Generate a dictionary of canopy TIFs for each project area
las_root_url = 'https://lidararchive.s3.amazonaws.com/2020_CSPN_Q2/'
canopy_dict = {}
for index, row in tiles_by_area.iterrows():
    tiles = row['tile']
    proj_area_name = row['Proj_ID']
    sel_proj_area_gdf = proj_area_gdf[proj_area_gdf['Proj_ID'] == proj_area_name]
    # Download all tiles for project area, process, and clip/merge
    tile_agg = []
    print("Processing LIDAR for " + proj_area_name)
    for tile in tiles:
        file_name = tile + ".las"
        print("Processing LIDAR tile " + tile)
        tile_path = os.path.join(
            data_dir,
            'earthpy-downloads',
            file_name
        )
        download_url = las_root_url + tile + ".las"
        if not os.path.exists(tile_path):
            et.data.get_data(url=download_url)
        # PDAL is required for this step, see readme for install instructions

        # Output path for first returns DEM
        output_fr_tif = os.path.join(
            project_dir,
            tile +'_fr.tif'
        )
        if not os.path.exists(output_fr_tif):
            convert_las_to_tif(tile_path, output_fr_tif, "first")
        
        # Output path for ground DEM
        output_gr_tif = os.path.join(
            project_dir,
            tile +'_gr.tif'
        )
        if not os.path.exists(output_gr_tif):
            convert_las_to_tif(tile_path, output_gr_tif, "ground")
        
        # Process ground and first return data to canopy height
        fr_dem = rxr.open_rasterio(output_fr_tif)
        fr_dem = fr_dem.rio.reproject("EPSG:4326")

        gr_dem = rxr.open_rasterio(output_gr_tif)
        gr_dem = gr_dem.rio.reproject("EPSG:4326")
        gr_dem = gr_dem.rio.reproject_match(fr_dem)

        # Calculate canopy height by subtracting ground DEM from first returns DEM
        canopy_dem = fr_dem - gr_dem

        # Set all values greater than 1 (canopy) to 1 and all values less than 1 (no canopy) to 0
        canopy_dem.values[canopy_dem < 1] = 0
        canopy_dem.values[canopy_dem > 1] = 1
        canopy_dem.name = tile + "_Canopy"
        canopy_dem = canopy_dem.round()
        tile_agg.append(canopy_dem)
    print("Merging LIDAR tiles for " + proj_area_name)
    # Merge all tiles that intersect with the project area and clip to project area
    canopy_merged = rxrm.merge_arrays(tile_agg).rio.clip(sel_proj_area_gdf.geometry)
    canopy_dict[proj_area_name] = canopy_merged



# Export Zumwinkel canopy tif to repo
test = canopy_dict['Zumwinkel']

zumwinkel_path = "../notebooks/zumwinkel_canopy.tif"

# Check if the output path exists before exporting
if not os.path.exists(zumwinkel_path):
    test.rio.to_raster(zumwinkel_path, overwrite=True) 

# Plot Zumwinkel canopy tif
zumwinkel_path = "../notebooks/zumwinkel_canopy.tif"

# Load and reproject the Zumwinkel canopy raster
zumwinkel_lidar = rxr.open_rasterio(zumwinkel_path).rio.reproject("EPSG:4326")
zumwinkel_lidar.hvplot(
    height=600,
    width=600,
    geo=True,
    aspect='equal',
    kind='image',
    tiles='EsriImagery',
    alpha=0.5,
    title="LIDAR Canopy Example",
    clabel='Height in feet',
    crs=canopy_dem.rio.crs  # CRS from canopy_dem
)

# Clean up "noise" in raster

# Function to apply morphological operations on a rioxarray DataArray
def clean_raster_rioxarray(raster_xarray, operation='opening', structure_size=3):
    # Extract the numpy array from the xarray DataArray
    raster_data = raster_xarray.values

    # Ensure the raster_data is 2D (in case it's a single-band raster with an extra dimension)
    if raster_data.ndim == 3 and raster_data.shape[0] == 1:
        raster_data = raster_data[0, :, :]
    
    # Convert to binary (tree canopy is represented by 1, no canopy by 0)
    binary_raster = raster_data == 1

    # Define the structure for the morphological operation
    structure = np.ones((structure_size, structure_size), dtype=int)

    # Apply the chosen morphological operation
    if operation == 'opening':
        cleaned_raster = binary_opening(binary_raster, structure=structure)
    elif operation == 'closing':
        cleaned_raster = binary_closing(binary_raster, structure=structure)
    else:
        raise ValueError("Operation must be 'opening' or 'closing'")

    # Convert back to the original values (1 for canopy, 0 for no canopy)
    raster_data_cleaned = np.where(cleaned_raster, 1, 0)

    # Add back the extra dimension if the original data had it
    if raster_xarray.values.ndim == 3:
        raster_data_cleaned = np.expand_dims(raster_data_cleaned, axis=0)

    # Create a new xarray DataArray with the cleaned data, copying metadata from the original
    cleaned_raster_xarray = raster_xarray.copy(data=raster_data_cleaned)

    return cleaned_raster_xarray

# Choose morphological operation and structure size
operation = 'opening'
structure_size = 3

# Apply the cleaning function to the raster
zumwinkel_lidar_cleaned = clean_raster_rioxarray(zumwinkel_lidar, operation, structure_size)

# Plot the cleaned raster
lidar_plot = zumwinkel_lidar_cleaned.hvplot(
    height=600,
    width=600,
    geo=True,
    aspect='equal',
    kind='image',
    tiles='EsriImagery',
    alpha=0.5,
    title="LIDAR Canopy Example",
    clabel='Height in feet',
    crs=canopy_dem.rio.crs
)

# Create a vector binary mask for canopy

# Export cleaned Zumwinkel canopy tif to repo
zumwinkel_path = "../notebooks/zumwinkel_clean_canopy.tif"
zumwinkel_lidar_cleaned = zumwinkel_lidar_cleaned.where(zumwinkel_lidar_cleaned != 1.7976931348623157e+308, np.nan)
zumwinkel_lidar_cleaned.rio.to_raster(zumwinkel_path, overwrite=True)

# Load the TIF file using rioxarray
binary_mask = zumwinkel_lidar_cleaned.squeeze()  # Assuming the data is in the first band

# Create a mask where cell values are 1
mask = binary_mask == 1

# Get the affine transform from the raster data
transform = binary_mask.rio.transform()

# Extract shapes (polygons) from the binary mask
shapes = rasterio.features.shapes(mask.astype(np.int16).values, transform=transform)
polygons = [shape(geom) for geom, value in shapes if value == 1]

# Create a GeoDataFrame from the polygons
canopy_gdf = gpd.GeoDataFrame({'geometry': polygons})

# Prep data for vector processing. Make sure CRS is set to coordinate system with correct units.

# Get CRS from raster
crs = zumwinkel_lidar_cleaned.rio.crs

# Set CRS for GeoDataFrame if not already set
if canopy_gdf.crs is None:
    canopy_gdf = canopy_gdf.set_crs(zumwinkel_lidar_cleaned.rio.crs)

# Define EPSG:6430 CRS
epsg_6430 = '6430'

# Reproject to EPSG:6430
canopy_gdf = canopy_gdf.to_crs(epsg=epsg_6430)

# Filter project area GeoDataFrame for Zumwinkel and reproject
zumwinkel_boundary = proj_area_gdf[proj_area_gdf['Proj_ID'] == "Zumwinkel"]
zumwinkel_boundary = zumwinkel_boundary.to_crs("EPSG:6430")

# Method to process canopy gaps

def process_canopy_areas(canopy_gdf, study_area, buffer_distance=5):
    """
    Processes canopy areas by buffering, dissolving, clipping, and exploding the geometries.
    Adds acreage and size category columns.

    Parameters
    ----------
    canopy_gdf : gpd.GeoDataFrame
        GeoDataFrame representing canopy areas.
    study_area : gpd.GeoDataFrame
        GeoDataFrame representing the boundary within which to clip the canopy areas.
    buffer_distance : float, optional
        The distance to buffer the canopy geometries. Default is 5 units.

    Returns
    -------
    exploded_gap_gdf : gpd.GeoDataFrame
        GeoDataFrame with exploded geometries representing non-tree canopy areas, including acreage and size category.
    """
    
    # Ensure input GeoDataFrames have CRS
    if canopy_gdf.crs is None or study_area.crs is None:
        raise ValueError("Input GeoDataFrames must have a CRS defined.")

    # Buffer the canopy geometries
    buffered_canopy = canopy_gdf.geometry.buffer(buffer_distance)

    # Create a new GeoDataFrame with the buffered geometries
    buffer_gdf = gpd.GeoDataFrame(geometry=buffered_canopy, crs=canopy_gdf.crs)

    # Dissolve the buffered geometries into a single MultiPolygon
    dissolved_canopy = unary_union(buffer_gdf.geometry)

    # Convert the dissolved canopy back to a GeoDataFrame
    dissolved_canopy_gdf = gpd.GeoDataFrame(geometry=[dissolved_canopy], crs=canopy_gdf.crs)

    # Clip the dissolved canopy with the study area
    clipped_buffer = gpd.overlay(dissolved_canopy_gdf, study_area, how='intersection')

    # Calculate the difference between the study area and the clipped buffer
    non_tree_canopy_gdf = gpd.overlay(study_area, clipped_buffer, how='difference')

    # Explode multipart polygon to prepare for area calculations
    exploded_gap_gdf = non_tree_canopy_gdf.explode(index_parts=True)

    # Reset the index to have a clean DataFrame
    exploded_gap_gdf.reset_index(drop=True, inplace=True)

    # Calculate the area in acres (1 acre = 43,560 square feet)
    exploded_gap_gdf['Acreage'] = exploded_gap_gdf.geometry.area / 43560

    # Define a function to categorize the gap size
    def categorize_gap_size(acres):
        if acres < 1/8:
            return '< 1/8 acre'
        elif 1/8 <= acres < 1/4:
            return '1/8 - 1/4 acre'
        elif 1/4 <= acres < 1/2:
            return '1/4 - 1/2 acre'
        elif 1/2 <= acres < 1:
            return '1/2 - 1 acre'
        else:
            return '> 1 acre'

    # Apply the categorization function to the Acreage column
    exploded_gap_gdf['Gap_Size_Category'] = exploded_gap_gdf['Acreage'].apply(categorize_gap_size)

    return exploded_gap_gdf

# Process the canopy gaps for the Zumwinkel area
canopy_gaps_calced = process_canopy_areas(canopy_gdf, zumwinkel_boundary, buffer_distance=5)

# Save the processed canopy gaps to a shapefile
canopy_gaps_calced.to_file('canopy_gaps_calced.shp')

# Reproject the canopy gaps back to EPSG:4326 for plotting
canopy_gaps_calced = canopy_gaps_calced.to_crs("EPSG:4326")
canopy_gaps_calced.hvplot(
    x='x',
    y='y',
    aspect='equal',
    geo=True,
    line_color='blue',
    line_width=2,
    width=600,
    height=600,
    tiles='EsriImagery',
    title="Processed Canopy Gaps from LIDAR"
)
