[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/UM-RMRS/raster_tools/blob/main/notebooks/EstimatingBaa.ipynb)

# Estimating BAA 
## This notebook demonstrates how to build a machine learning model using Sklearn and applying that machine learning model to predictor surfaces to make estimates of basal area per acre (BAA). 
### Author: John Hogland 3/21/2023

![image.png](attachment:b2a27742-ad20-4a2b-a855-2076c697238c.png)

## Install packages

In [None]:
!pip install --upgrade gdown
!pip install --upgrade numba
!pip install --upgrade geopandas
!pip install mapclassify
!pip install --upgrade datascience
!pip install --upgrade gym
!pip install --upgrade folium
!pip install plotly
!pip install laspy[lazrs,laszip]
!pip install raster_tools
!pip install stackstac
!pip install planetary_computer
!pip install pystac_client
!pip install leafmap xarray_leaflet
!pip install localtileserver

## Get supporting python file

In [None]:
import gdown, zipfile

url = "https://drive.google.com/file/d/1dy7bnPKc4BPvHlH-PkrObXwb-SW9nv7n/view?usp=sharing"
outfl = r"./rs_las.py"
gdown.download(url=url, output=outfl, quiet=False, fuzzy=True)

## The Process
### In this notebook we will be using derivatives from Lidar data acquired from Microsoft's Planetary Computer and the dataframe developed in the [SampleDesign](https://github.com/UM-RMRS/raster_tools/blob/main/notebooks/SampleDesign.ipynb) notebook to create a basal area per acre (BAA) random forest model and predict BAA mean estimates for each predictor raster cell. This notebooks builds upon the [Sample Design](https://github.com/UM-RMRS/raster_tools/blob/main/notebooks/SampleDesign.ipynb) and [Processing](https://github.com/UM-RMRS/raster_tools/blob/main/notebooks/LidarProcessing.ipynb) notebooks at [Spatial Modeling Tutorials](https://github.com/UM-RMRS/raster_tools/tree/main/notebooks).  


### Steps
1. Import python libraries
2. Open train.shp file and visualize 
3. Build a random forest model to predict BAA from Lidar predictor values
4. Apply BAA model to raster surfaces
5. Visualize BAA surface

## Step 1: Import libraries and get data
### Import libary

In [None]:
from raster_tools import Raster, open_vectors, Vector, zonal

import os
import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
import pandas as pd

from dask.diagnostics import ProgressBar
from shapely.geometry import shape
from shapely.ops import transform

import leafmap.leafmap as leafmap
from sklearn.ensemble import RandomForestRegressor

## Step 2: Open training shape file & visualize
### if you don't have the training shape file, work through the [Sample Design](https://github.com/UM-RMRS/raster_tools/blob/main/notebooks/SampleDesign.ipynb) notebook

In [None]:
b_sample = open_vectors("./train.shp")
b_sample.data.compute().explore(color="orange")
b_sample.bounds

## Step 3: Build Random Forest Model

In [None]:
import pickle
import matplotlib.pyplot as plt
import pandas

gdf = b_sample.data.compute()  # process data
X = gdf[
    ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
]  # get just predictor dataframe
y = gdf[["BAA"]]  # get response dataframe

# build reandom forest model (use 66% of the data, select 4 features for each
# tree, 50 trees, store oob RMSE)
regr = RandomForestRegressor(
    max_features=4,
    n_estimators=50,
    max_samples=0.66,
    random_state=0,
    oob_score=True,
)
# fit the data and view oob
mdl = regr.fit(X, y)
print("Out of bag R squared = " + str(mdl.oob_score_))

# store the model for later use open(path, 'wb')
pickle.dump(mdl, open("baa_rf_mdl.mdl", "wb"))

# visualize importance graphs
imp = mdl.feature_importances_
v_imp = pandas.Series(imp, index=X.columns)
fig, ax = plt.subplots(figsize=(14, 8))
std = np.std([tree.feature_importances_ for tree in mdl.estimators_], axis=0)
v_imp.plot.bar(yerr=std, ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in MSE")
fig.tight_layout()

### Exercise 1: Build Random Forest Model with 3 features, 100 estimators, and 75% of the data

## Step 4: Apply model to predictor surfaces and save the BAA raster

In [None]:
pred_rs = Raster("Lidar_30_metrics.tif")
est = pred_rs.predict_model(mdl)
est.save("BAA.tif")
est.xdata

### Exercise 2:
- what is the difference between random forest model 1 and model 2 (Exercise 1) oob score?
- which one is better?

## Step 5: Visualize BAA surface

In [None]:
from localtileserver import TileClient, get_leaflet_tile_layer
from ipyleaflet import Map, LayersControl, basemaps, basemap_to_tiles

# Specify the name of the predictor raster stack
outname = "BAA.tif"

# Create a TileClient from a raster file
client = TileClient(outname)

# Create ipyleaflet TileLayer from that server
t = get_leaflet_tile_layer(client, band=[1], name="BAA (ft squared/acre)")

# Create ipyleaflet map, add tile layer, and display
m = Map(center=client.center(), zoom=client.default_zoom)
wi = basemap_to_tiles(basemaps.Esri.WorldImagery)
wi.name = "ESRI World Imagery"
m.add(wi)
m.add(t)

# add the layer control
control = LayersControl(position="topright")
m.add_control(control)
m

# This ends the Estimate BAA notebook
## Check out the other notebooks:
- https://github.com/UM-RMRS/raster_tools/tree/main/notebooks
## References
- Raster-Tools GitHub: https://github.com/UM-RMRS/raster_tools
- Hogland's Spatial Solutions: https://sites.google.com/view/hoglandsspatialsolutions/home
- Dask: https://dask.org/
- Geopandas:https://geopandas.org/en/stable/
- Xarray: https://docs.xarray.dev/en/stable/
- Jupyter: https://jupyter.org/
- Anaconda:https://www.anaconda.com/
- VS Code: https://code.visualstudio.com/
- ipywidgets: https://ipywidgets.readthedocs.io/en/latest/
- numpy:https://numpy.org/
- matplotlib:https://matplotlib.org/
- folium: https://python-visualization.github.io/folium/
- pandas: https://pandas.pydata.org/
- sklearn: https://scikit-learn.org/stable/index.html