# Data and styles

This notebook loads all data required for the exploration, analysis, and modeling, then stores it for use by other notebooks using [storemagic](https://ipython.readthedocs.io/en/stable/config/extensions/storemagic.html). Make sure to run this notebook before using any of the other notebooks in this repository.

To clear all data from the storemagic cache, use `%store -z`. Note that this clears *all* data from the cache, not just the data created by this notebook.

In [None]:
import os

import earthpy as et
import geopandas as gpd
from matplotlib.colors import ListedColormap
import numpy as np
from rioxarray.merge import merge_arrays
import xarray as xr

from ea_drought_burn.config import DATA_DIR
from ea_drought_burn.config import CRS
from ea_drought_burn.utils import (
    concat_arrays,
    open_raster,
    reproject_match
)




# Set working directory to the earthpy data directory
os.chdir(os.path.join(DATA_DIR, "woolsey-fire"))

In [None]:
# Delete ready variable. This will be recreated when the script finishes.
%store -d woolsey_data_ready

## Load data

### Fire perimeters

The Woolsey and Hill Fire perimeters are provided courtesy of the National Interagency Fire Center (2019). There are no known restrictions on the perimeter shapefiles.

In [None]:
# Load the Woolsey Fire perimeter
woolsey_fire = gpd.read_file(os.path.join("inputs",
                                          "shapefiles",
                                          "nifc_woolsey_perimeter",
                                          "2018-CAVNC-091023.shp")).to_crs(CRS)

# Load the Hill Fire perimeter
hill_fire = gpd.read_file(os.path.join("inputs",
                                       "shapefiles",
                                       "nifc_hill_perimeter",
                                       "2018-CAVNC-090993.shp")).to_crs(CRS)

# Combine perimeters into a single dataframe
hill_and_woolsey_fires = woolsey_fire.copy().append(hill_fire)

# Store data for use in other notebooks
%store hill_fire
%store woolsey_fire
%store hill_and_woolsey_fires

### Reprojection raster

All raster data is reprojeted to a common projection, defined by the `reproj_to` varaible in the next cell. Currently, all rasters are reprojected to match the MTBS classified dNBR raster. See **MTBS burn severity data** below for more information about MTBS data.

**Note:** The entirety of the Hill Fire falls within the Woolsey Fire envelope, so this reprojection catches both. 

In [None]:
# Load raster to use for reprojections
path = os.path.join("inputs",
                    "mtbs-burn-severity",
                    "ca3424011870020181108",
                    "ca3424011870020181108_20171215_20181215_dnbr6.tif")
reproj_to = open_raster(path, crs=CRS, crop_bound=woolsey_fire.envelope)
reproj_to = reproj_to.where(reproj_to != -9999, np.nan)
reproj_to = reproj_to.rio.write_nodata(np.nan)

# Reproject to a different resolution
#reproj_to = reproj_to.rio.reproject(reproj_to.rio.crs, 120)

# Set resolution based on reproj_to
res = int(reproj_to.rio.resolution()[0])

# Store data for use in other notebooks
%store reproj_to
%store res

### PRISM grid

The PRISM grid raster is used to aggregate other data to the exact grid used by PRISM. This allows the climate data to aggregated high-resolution data.

In [None]:
# Load the PRISM grid
prism_grid = open_raster(
    os.path.join("inputs", "masks", "prism_grid.tif"),
    crs=CRS,
    crop_bound=woolsey_fire.envelope,
    masked=False
)
prism_grid = reproject_match(prism_grid, reproj_to)

# Store data for use in other notebooks
%store prism_grid

### Sentinel-2 imagery (ESA)

Sentinel-2 (ESA) imagery from before and after the Woolsey Fire was originally collected by the European Space Agency and was provided courtesy of the U.S. Geological Survey. The data below was collected on 31 Oct 2018 (roughly one week before the Woolsey Fire started) and 15 Dec 2018 (three to four weeks after the Woolsey Fire was contained) and has 30 m resolution.

In [None]:
# Load pre-fire data from Sentinel-2 (ESA)
path = os.path.join("inputs",
                    "sentinel-2-imagery",
                    "3_aligned",
                    "L1C_T11SLT_A008633_20181031T184032.tif")
s2_prefire = reproject_match(open_raster(path), reproj_to)

# Load post-fire data from Sentinel-2 (ESA)
path = os.path.join("inputs",
                    "sentinel-2-imagery",
                    "3_aligned",
                    "L1C_T11SLT_A018185_20181215T184316.tif")
s2_postfire = reproject_match(open_raster(path), reproj_to)

# Name each band in the Sentinel-2 imagery
ultrablue = s2_prefire[0]
blue = s2_prefire[1]        # L8 band 1
green = s2_prefire[2]       # L8 band 2
red = s2_prefire[3]         # L8 band 3
nir_705 = s2_prefire[4]     # L8 band 4
nir_740 = s2_prefire[5]
nir_783 = s2_prefire[6]
nir_842 = s2_prefire[7]
nir_865 = s2_prefire[8]     # L8 band 5
swir_940 = s2_prefire[9]
swir_1375 = s2_prefire[10]  # L8 band 9
swir_1610 = s2_prefire[11]  # L8 band 6
swir_2190 = s2_prefire[12]  # L8 band 7

# Calculate pre-fire spectral indices
s2_ndvi = (nir_842 - red) / (nir_842 + red)
s2_ndwi = (nir_865 - swir_1610) / (nir_865 + swir_1610)
s2_ndmi = (nir_842 - swir_1610) / (nir_842 + swir_1610)
s2_savi = 1.5 * (nir_842 - red) / (nir_842 + red + 0.5)

# Calculate dNBR from Sentinel-2 data
nir = s2_prefire[8]
swir2 = s2_prefire[12]
pre_nbr = (nir - swir2) / (nir + swir2)

nir = s2_postfire[8]
swir2 = s2_postfire[12]
post_nbr = (nir - swir2) / (nir + swir2)

s2_dnbr = pre_nbr - post_nbr

# Calculate classified dNBR from Sentinel-2 data
bins = [-np.inf, -.1, .1, .27, .66, np.inf]
s2_cl_dnbr = xr.apply_ufunc(np.digitize, s2_dnbr, bins)

s2_cl_dnbr = s2_cl_dnbr.astype(np.float64)
s2_cl_dnbr = s2_cl_dnbr.rio.write_nodata(np.nan)
s2_cl_dnbr = s2_cl_dnbr.rio.clip(hill_and_woolsey_fires.geometry, drop=False)

# Reorder categories to match MTBS data
s2_cl_dnbr -= 1
s2_cl_dnbr = s2_cl_dnbr.where((s2_cl_dnbr >= 0) | ~np.isfinite(s2_cl_dnbr), 4)

# Store data for use in other notebooks
%store s2_prefire
%store s2_postfire
%store s2_ndvi
%store s2_ndwi
%store s2_ndmi
%store s2_savi
%store s2_dnbr
%store s2_cl_dnbr

### SRTM topographic data

Topographic data is from the NASA's Shuttle Radar Topography Mission (SRTM) and was downloaded from USGS Earth Explorer. Aspect and slope were extracted from the DEM using the [richdem](https://richdem.readthedocs.io/en/latest/python_api.html) library. There are no known restrictions on this data.

In [None]:
# Load USGS elevation data
path = os.path.join("inputs", "srtm-topography", "smm-30m-dem.tif")
elevation = open_raster(path)
elevation = reproject_match(elevation, reproj_to)

# Load USGS slope data
path = os.path.join("inputs", "srtm-topography", "smm-30m-slope.tif")
slope = open_raster(path)
slope = reproject_match(slope, reproj_to)

# Load USGS aspect data
path = os.path.join("inputs", "srtm-topography", "smm-30m-aspect.tif")
aspect = open_raster(path)
aspect = reproject_match(aspect, reproj_to)

# Calculate folded aspect (McCune and Keon, 2002)
folded_aspect = np.absolute(180 - np.absolute(aspect - 225))

# Store data for use in other notebooks
%store elevation
%store slope
%store aspect
%store folded_aspect

### MTBS burn severity data

A set of normalized, field-validated burn severity maps for the Woolsey Fire created by the Monitoring Trends in Burn Severity Program (MTBS), an interagency program managed by the USGS and USDA (Eidenshink et al., 2007). There are no known restrictions on MTBS data. The data below has 30 m resolution and was calculated using Sentinel-2 imagery from 15 Dec 2017 and 15 Dec 2018.

In [None]:
# Find MTBS rasters for all fires (currently just Woolsey and Hill)
dnbr = []
cl_dnbr = []
for root, dirs, files in os.walk(os.path.join("inputs", "mtbs-burn-severity")):
    for fn in files:
        if fn.endswith("_dnbr.tif"):
            dnbr.append(os.path.join(root, fn))
        elif fn.endswith("_dnbr6.tif"):
            cl_dnbr.append(os.path.join(root, fn))
    
# Create combined dNBR array
xdas = []
for path in dnbr:
    xda = open_raster(path, crs=CRS)
    xda = xda.rio.write_nodata(np.nan)
    
    # The rasters have different nodata values! Weird!
    xda = xda.where(xda > -9999, np.nan)
    xda /= 1000  # MTBS pixels are scaled by 1000
    xda = reproject_match(xda, reproj_to)

    xdas.append(xda)

# First in goes on top, so reorder by number of pixels
xdas.sort(key=lambda xda: xda.count())
mtbs_dnbr = reproject_match(merge_arrays(xdas), reproj_to)

# Create combined classified dNBR array
xdas = []
for path in cl_dnbr:
    xda = open_raster(path, crs=CRS)
    xda = xda.rio.write_nodata(np.nan)
    
    # Mask anything that isn't a valid burn severity (1-5)
    xda = xda.where((xda > 0) & (xda < 6), np.nan)
    xda = reproject_match(xda, reproj_to)
    
    xdas.append(xda)

# First in goes on top, so reorder by number of pixels
xdas.sort(key=lambda xda: xda.count())
mtbs_cl_dnbr = reproject_match(merge_arrays(xdas), reproj_to)
mtbs_cl_dnbr = mtbs_cl_dnbr.rio.write_nodata(np.nan)
mtbs_cl_dnbr = mtbs_cl_dnbr.rio.clip(hill_and_woolsey_fires.geometry,
                                     drop=False)

# Create a burned/unburned variant on the classified dNBR plot
mtbs_cl_burned = mtbs_cl_dnbr.copy()
mtbs_cl_burned = mtbs_cl_burned.where(mtbs_cl_burned != 5, 2) \
                               .where(mtbs_cl_burned > 2, 2) \
                               .where(mtbs_cl_burned < 3, 3)
mtbs_cl_burned = mtbs_cl_burned.rio.clip(hill_and_woolsey_fires.geometry,
                                         drop=False)
mtbs_cl_burned = mtbs_cl_burned.where(np.isfinite(mtbs_cl_dnbr))

# Store data for use in other notebooks
%store mtbs_dnbr
%store mtbs_cl_dnbr
%store mtbs_cl_burned

### Santa Monica Mountains vegetation and climate

Vegetation and climate data for the Santa Monica Mountains, California for 2013-2016. This file combines data from the NASA AVIRIS sensor (15.6 m resolution) and the PRISM Gridded Climate dataset (4 km resolution). All PRISM data in the SMM stack was aggregated to the water year (Oct 1-Sep 30), not the calendar year. The compiled data was provided by Natasha Stavros and was originally created for a project about drought in the Santa Monica Mountains (Dagit et al., 2017 and Foster et al., 2017).

In [None]:
# Load SMM stack
smm_stack = open_raster(
    os.path.join("inputs", "aviris-climate-vegetation", "SMMDroughtstack.dat"),
    crs=CRS
)
smm_stack = smm_stack.rio.write_nodata(np.nan)
smm_stack = smm_stack.where((smm_stack >= -1e38) & (smm_stack != -9999))
smm_stack = reproject_match(smm_stack, reproj_to)

# Set pixel=0 to NaN in community plot
smm_stack[0] = smm_stack[0].where(smm_stack[0] > 0, np.nan)

# Store data for use in other notebooks
%store smm_stack

In [None]:
# Make calculations based on FAL
fal = smm_stack[1:5]

# Calculate fraction dead
fdd = 1 - fal

# Calculate year-on-year difference in FAL (dFAL)
dfal = concat_arrays([fal[i+1] - fal[i] for i in range(3)])

# Store data for use in other notebooks
%store fal
%store fdd
%store dfal

In [None]:
# Threshold is the point below which a pixel is considered dead. The field
# constraint on this value is poor and was calculated only for oaks (0.5431).
threshold = 0.5

# Calculate dead pixels
dead = xr.where(fal < threshold, True, False)

# Calculate years dead. This is the point at which a pixel permanently dropped
# below the live/dead threshold.
years_dead = xr.where(fal[3] <= threshold, 2, 0)
for i, xda in enumerate([fal[2], fal[1], fal[0]]):
    mask = (
        xr.where(years_dead > 0, True, False)
        * xr.where(xda <= threshold, True, False)
    )
    years_dead = xr.where(mask.values, i + 3, years_dead)  
years_dead = years_dead.where(np.isfinite(smm_stack[0]))

# Calculates FAL on a 0-1 scale only for pixels that are alive. This treats
# dead pixels are dead but puts live pixels on a gradient of aliveness. I
# kept this metric in but it wasn't much use.
live_fal = xr.where(fal[3] > threshold,
                    (fal[3] - threshold) * (1 / (1 - threshold)),
                    0)
live_fal = live_fal.where(np.isfinite(smm_stack[0]))
live_fal = live_fal.rio.write_crs(CRS)

# Store data for use in other notebooks
%store dead
%store years_dead
%store live_fal

### PRISM data from 2018

Grass and shrub fires are fuel-limited and may be driven by precipitation, so I downloaded PRISM climate data from 2018 to see how precipitation and other climate data worked as inputs for the random-forest model. I didn't recalculate the climate data to the water year, so this array cannot be compared directly to the climate data from the SMM stack.

In [None]:
# Load PRISM climate data from 2018
prism_2018 = open_raster(
    os.path.join("inputs", "prism-climate", "prism_all_stable_4km_m3_2018.tif"),
    crs=CRS
)
prism_2018 = reproject_match(prism_2018, reproj_to)

# Store data for use in other notebooks
%store prism_2018

### Live fuel moisture content

Live fuel moisture content (LFMC) estimates how likely vegetation is to ignite based on how much moiture it contains. The data below was calculated using a neural-network model trained using both optical and microwave radiation (Rao et al., 2020) and was downloaded using the instructions on https://github.com/kkraoj/lfmc_from_sar. The model output is for 1 Nov 2018, a week before the start of the Woolsey Fire, and was calculated to 30 m resolution (although when plotted it looks coarser than that).

In [None]:
# Load LFMC data from 2018-11-01
lfmc = open_raster(
    os.path.join("inputs", "lfmc-fuel-moisture", "lfmc.tif"),
    crs=CRS
)
lfmc = lfmc.where(lfmc > 0, np.nan)
lfmc = reproject_match(lfmc, reproj_to)

# Store data for use in other notebooks
%store lfmc

### Recent fires (1927-2017)

The last-burned map was generated from perimeters for fires from 1927-2017 that intersect the Woolsey Fire envelope (National Interagency Fire Center, 2021). There are no known restrictions on the perimeter shapefiles.

In [None]:
# Load map of recent fires in the study area
path = os.path.join("inputs",
                    "nifc-last-burned",
                    "nifc-last-burned.tif")
last_burned = open_raster(path, crs=CRS)
last_burned = reproject_match(last_burned, reproj_to)

# Store data for use in other notebooks
%store last_burned

### Feature and label data lookup

The lookup dictionary is used to provide a consistent way to access data about the Woolsey Fire across multiple notebooks. Notebooks generally use this dictionary instead of calling the stored variables directly. The lookup is limited data intended to be used in a model (feautres and labels), so some data, like fire perimeters, is excluded.

In [None]:
# Create lookup for all data available for use by the random-forest model
all_data = {
    # Vegetation
    "Community": smm_stack[0],
    "FAL (2013-2016)": fal,
    "FDD (2013-2016)": fdd,
    "dFAL (2013-2016)": dfal,
    "Dead (2013-2016)": dead,
    "Years Dead": years_dead,
    "Live FAL": live_fal,
    "Last Burned": last_burned,
    
    # Pre- and post-fire Sentinel-2 imagery
    "Sentinel-2 Prefire": s2_prefire,
    "Sentinel-2 Postfire": s2_postfire,
    
    # Pre-fire imagery and spectral indicdes
    "LFMC (2018)": lfmc,
    "NDVI (2018)": s2_ndvi,
    "NDMI (2018)": s2_ndmi,
    "NDWI (2018)": s2_ndwi,
    "SAVI (2018)": s2_savi,
    
    # Topography
    "Elevation": elevation,
    "Aspect": aspect,
    "Folded Aspect": folded_aspect,
    "Slope": slope,
    
    # Climate data aggregated to water year (Oct 1-Sep 30) for 2013-2016
    "Days Precipitation (2013-2016)": smm_stack[5:9],
    "Max VPD (2013-2016)": smm_stack[9:13],
    "Minimum Temperature (2013-2016)": smm_stack[13:17],
    "Heat Days Over 95 (2013-2016)": smm_stack[17:21],
    "Cumulative Precipitation (2013-2016)": smm_stack[21:25],
    
    # Climate data aggregated to calendar year for 2018
    "Precipitation (2018)": prism_2018[0],
    "Mean Dew Point Temperature (2018)": prism_2018[1],
    "Maximum Temperature (2018)": prism_2018[2],
    "Mean Temperature (2018)": prism_2018[3],
    "Minimum Temperature (2018)": prism_2018[4],
    "Maximum VPD (2018)": prism_2018[5],
    "Minimum VPD (2018)": prism_2018[6],
    
    # Burn severity
    "MTBS dNBR": mtbs_dnbr,
    "MTBS Classified dNBR": mtbs_cl_dnbr,
    "MTBS Classified Burned/Unburned": mtbs_cl_burned,
    "Sentinel-2 dNBR": s2_dnbr,
    "Sentinel-2 Classified dNBR": s2_cl_dnbr
}


# Verify that all data has the same shape
shapes = {
    "prism_grid": prism_grid.shape[-2:],
}
shapes.update({k: v.shape[-2:] for k, v in all_data.items()})
if len({tuple(s) for s in shapes.values()}) != 1:
    display(shapes)
    raise ValueError("Shapes do not match")

# Verify that all data has the same bounds
bounds = {
    "prism_grid": prism_grid.rio.bounds(),
}
bounds.update({k: v.rio.bounds() for k, v in all_data.items()})
if len(set(bounds.values())) != 1:
    display(bounds)
    raise ValueError("Bounds do not match")
    
# Standardize nodata value to np.nan if dtype is float and nodata is None
for key, xda in all_data.items():
    if xda.rio.nodata is None and xda.dtype in (float, np.float32, np.float64):    
        all_data[key] = xda.rio.write_nodata(np.nan)

# Store data for use in other notebooks
%store all_data

## Set styles

This section defines colors and labels for complex plots that appear in multiple notebooks.

In [None]:
# Define color map and labels for vegetation community data
cmap_vegetation = ListedColormap([
    "tab:green",
    "tab:olive",
    "tab:cyan",
    "tab:red",
    "tab:blue",
    "lightgray"
])

labels_vegetation = [
    "Annual grass",
    "Chaparral",
    "Coastal sage scrub",
    "Oak woodland",
    "Riparian",
    "Substrate"
]

# Store data for use in other notebooks
%store cmap_vegetation
%store labels_vegetation

In [None]:
# Define color map and labels for MTBS dNBR data
cmap_dnbr = ListedColormap([
    "#006400",
    "#7FFFD4",
    "#FFFF00",
    "#FF0000",
    "#7FFF00"
])

labels_dnbr = [
    "Unburned to Low",
    "Low",
    "Moderate",
    "High",
    "Increased Greenness"
]

# Store data for use in other notebooks
%store cmap_dnbr
%store labels_dnbr

## Set ready variable

The ready variable can be used to check if this notebook has been run and all data is available.

In [None]:
woolsey_data_ready = True
%store woolsey_data_ready