# Wildfire Severity Prediction using Raster Tools and SciKit-Learn

# --- TODO ---
- ndvi download and processign scripts
- step through assembling the data
- step through the model

#### REQUIRED SOFTWARE
- Python
    - raster-tools
    - scikit-learn
    - tqdm
    - numpy
    - dask
    - geopandas
    - dask_geopandas

- Command Line
    - gdal
    - cdo

## 1. Data Collection

#### It is recommended to have about 200GB of free space on your drive to complete the Wildfire Severity Prediction process.

#### Data to run the script is obtained from multiple sources.  The table below shows the data sources and links to where the data can be found.

> <table><th>Source</th><th>Link</th><th>Description</th></tr>
<tr><td>MTBS Fire Data</td><td>https://www.mtbs.gov/direct-download</td><td>Fire Bundles -> Burned Areas Boundaries & Burn Severity Mosaics -> 1986-2020 of desired state</td></tr>
<tr><td>DEM Data</td><td>https://earthexplorer.usgs.gov/</td><td>Data sets -> Digital elevation -> CONUS aspect, flow_acc, orig_dem, slope</td></tr>
<tr><td>gridMET Climate Data</td><td>https://www.climatologylab.org/gridmet.html</td><td>use the provided scripts</td></tr>
<tr><td>AdaptWest Climate Data</td><td>https://adaptwest.databasin.org/pages/adaptwest-climatena/</td><td>Climate Normals -> 1991-2020 period -> 33 Bioclimatic variables zip</td></tr>
<tr><td>DayMet Climate Data</td><td>https://daac.ornl.gov/cgi-bin/dataset_lister.pl?p=32</td><td>use the provided scripts</td></tr>
<tr><td>Landfire Data</td><td>https://www.landfire.gov/version_download.php</td><td>LF 2016 Remap -> Fuel Veg Type 2020 & 40 Scott/Burgan Fuel Models 2020</td></tr>
<tr><td>Biomass Data</td><td>https://rangelands.app/products/</td><td>use the provided scripts</td></tr>
<tr><td>NDVI Data</td><td>https://www.ncei.noaa.gov/data/land-normalized-difference-vegetation-index/access/</td><td>need 1986-2020, either manually or using the provided scripts</td></tr>
<tr><td>State Borders</td><td>https://www2.census.gov/geo/tiger/GENZ2018/shp/</td><td>need cb_2018_us_state_5m.zip</td></tr>

##### There are several scripted methods to obtain the data.  For Biomass, the shell script below is used to download the data and convert it into netCDF format. Copy the script into a file called biomass_dl.sh and execute it on the command line.

In [2]:
%%shell
# Description: Download biomass data from the Rangeland Assessment Program (RAP) website and convert to netCDF
# Change the values below to match the desired area of interest
STATE=Oregon
LONG_MIN=-124.85
LONG_MAX=-116.33
LAT_MIN=41.86
LAT_MAX=46.23

# loop through years 2020 to 1986 to download biomass data

for year in 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986; do
  gdal_translate -co compress=lzw -co tiled=yes -co bigtiff=yes /vsicurl/http://rangeland.ntsg.umt.edu/data/rap/rap-vegetation-biomass/v3/vegetation-biomass-v3-${year}.tif -projwin LONG_MIN LAT_MAX LONG_MAX LAT_MIN out${year}_${STATE}.tif
  gdal_translate -of netCDF -co "FORMAT=NC4" out${year}_${STATE}.tif ${year}_biomass_${STATE}.nc
  rm *.tif
done

UsageError: Cell magic `%%shell` not found.


After the biomass netCDF files are ready, the dates need to be fixed. Save and run the following script to fix the dates.

In [None]:
%%shell
# Description: This script fixes the dates and bands of the biomass data, and splits the data into individual bands (AFG and PFG)

STATE=OR

for year in 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986; do
    if [ $(($year % 4)) -eq 0 ]; then
        year_days = 366
        cdo settaxis,${year}-01-01,00:00,${year_days}days ${year}_biomass.nc ${year}_biomass_fixed.nc 
        cdo splitvar ${year}_biomass_fixed.nc ${STATE}_${year}_biomass_
        rm ${year}_biomass.nc
        rm ${year}_biomass_fixed.nc
    else
        year_days = 365
        cdo settaxis,${year}-01-01,00:00,${year_days}days ${year}_biomass.nc ${year}_biomass_fixed.nc 
        cdo splitvar ${year}_biomass_fixed.nc ${STATE}_${year}_biomass_
        rm ${year}_biomass.nc
        rm ${year}_biomass_fixed.nc
    fi
done

After the previous script is run, the files must be combined.  Use the commands below when in the biomass directory to combine the files.

In [None]:
%%shell
mkdir b1
mkdir b2
mv *Band1.nc b1/
mv *Band2.nc b2/
cdo -f nc4 -z zip cat *.nc 1986_2020_biomass_pfg_{STATE}.nc #(note:(b1 = afg, b2 = pfg))

#### gridMET download script:

In [None]:
%%shell
for year in 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986;
do
    wget -nc -c -nd http://www.northwestknowledge.net/metdata/data/vpd_${year}.nc
    wget -nc -c -nd http://www.northwestknowledge.net/metdata/data/srad_${year}.nc
    wget -nc -c -nd http://www.northwestknowledge.net/metdata/data/pdsi_${year}.nc
done

#### dayMET download script:

In [None]:
%%shell
for year in 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986;
do
    wget -nc -c -nd https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/2131/daymet_v4_tmax_monavg_na_${year}.nc
    wget -nc -c -nd https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/2131/daymet_v4_tmin_monavg_na_${year}.nc
done

Move the files into respective directories. They now must be combined into a weekly average.  Use the commands below when in each directory to combine the files.

In [None]:
#gridmet
cdo -f nc4 -z zip cat *.nc {vpd/srad/pdsi}_1986-2020.nc
cdo -f nc4 -z zip -timselmean,7 {vpd/srad/pdsi}_1986-2020.nc {vpd/srad/pdsi}_1986-2020_weekly.nc
#dayment
cdo -f nc4 -z zip cat *.nc {tmax/tmin}_1986-2020.nc

## 2. Building the Dataset

#### The script used to build the dataset is located in the following path: Fire_Prediction/build_mtbs_dataframe.py

##### Once the data has been collected, the dataset can be built.  The files must be arranged in specific folders for the script to work.  The structure is as follows:

> <table><th>Feature</th><th>Folder</th></tr>
<tr><td>MTBS Data</td><td>data/MTBS_Data</td></tr>
<tr><td>DEM Data</td><td>data/terrain</td></tr>
<tr><td>GridMet Climate Data</td><td>data/FeatureData/gridmet</td></tr>
<tr><td>DayMet Climate Data</td><td>data/FeatureData/daymet</td></tr>
<tr><td>AdaptWest Climate Data</td><td>data/FeatureData/adaptwest</td></tr>
<tr><td>Landfire Fuel Data</td><td>data/FeatureData/landfire</td></tr>
<tr><td>Biomass Data</td><td>data/FeatureData/biomass</td></tr>

##### The script will generate a Dask dataframe in Parquet format.  The dataframe will have 21 columns, 20 of which are features and 1 of which is the target variable.  The features are as follows:

> <table><th>Feature</th><th>Column Name</th><th>Temporal Range</th></tr>
<tr><td>MTBS Severity Rating</td><td>mtbs</td><td>Const</td></tr>
<tr><td>MTBS Fire Year</td><td>year</td><td>Const</td></tr>
<tr><td>DEM elevation</td><td>dem</td><td>Const</td></tr>
<tr><td>DEM slope</td><td>dem_slope</td><td>Const</td></tr>
<tr><td>DEM aspect</td><td>dem_aspect</td><td>Const</td></tr>
<tr><td>DEM flow accumulation</td><td>dem_flow_acc</td><td>Const</td></tr>
<tr><td>DEM hillshade</td><td>hillshade</td><td>Const</td></tr>
<tr><td>GridMet Drought Index</td><td>gm_pdsi</td><td>Weekly avg</td></tr>
<tr><td>GridMet Solar Radiation</td><td>gm_srad</td><td>Weekly avg</td></tr>
<tr><td>GridMet Vapor Pressure</td><td>gm_vpd</td><td>Weekly avg</td></tr>
<tr><td>DayMet Temp Max</td><td>dm_tmax</td><td>Monthly max</td></tr>
<tr><td>DayMet Temp Min</td><td>dm_tmin</td><td>Monthly min</td></tr>
<tr><td>AdaptWest Mean Annual Temp</td><td>aw_mat</td><td>Yearly avg</td></tr>
<tr><td>AdaptWest Mean Temp Warmest Month</td><td>aw_mwmt</td><td>Avg of 1 month</td></tr>
<tr><td>AdaptWest Mean Temp Coldest Month</td><td>aw_mcmt</td><td>Avg of 1 month</td></tr>
<tr><td>AdaptWest Temp Difference</td><td>aw_td</td><td>Diff of mwmt and mcmt</td></tr>
<tr><td>Landfire Vegetation Type</td><td>landfire_fvt</td><td>2020 update</td></tr>
<tr><td>Landfire Fuel Model</td><td>landfire_fbfm40</td><td>2020 update</td></tr>
<tr><td>Biomass Annuals</td><td>biomass_afg</td><td>Yearly avg</td></tr>
<tr><td>Biomass Perennials</td><td>biomass_pfg</td><td>Yearly avg</td></tr>
<tr><td>Normalized Difference Vegetation Index</td><td>ndvi</td><td>Weekly avg</td></tr>

## 3. Model Training

## 4. Model Evaluation