<img width="50" src="https://carbonplan-assets.s3.amazonaws.com/monogram/dark-small.png" style="margin-left:0px;margin-top:20px"/>

# MTBS Fire Model

_by Jeremy Freeman (CarbonPlan), September 19, 2020_

This notebook loads downsampled MTBS fire data and TerraClimate climate data and
fits a logistical regression model.


In [None]:
import numpy as np
import pandas as pd
import scipy as sp
from carbonplan_forests import load, setup, plot, fit, utils, prepare, collect
import xarray as xr

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from carbonplan_forests import load, setup, plot, fit, utils, prepare, collect
from showit import image
import scipy as sp
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import regionmask as rm
from scipy.stats import binom
import xesmf as xe
import fsspec
from palettable.colorbrewer.sequential import YlOrRd_9
import warnings
warnings.filterwarnings('ignore')

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
store = "az"

In [None]:
import altair as alt

alt.themes.enable("carbonplan_light")

### wind analysis code


In [None]:
# from carbonplan.data import cat
# ds_dict = {}
# for year in np.arange(1984,2019):
#     print(year)
#     ds_dict[year] = cat.gridmet.raw_gridmet(variable='vs',
#                             year=year).to_dask().wind_speed.groupby('day.month').max().load()

# ds_concat = xr.concat(ds_dict.values(), dim='year')#, coords=ds_dict.keys())

# ds_concat = ds_concat.assign_coords({'year': list(ds_dict.keys())})

# full_ds_wind = ds_concat.stack(time=('year', 'month'))

# full_ds_wind = full_ds_wind.transpose('time', 'lat', 'lon')

# wind_climatology = ds_concat.mean(dim='year')

# monthly_max = ds_concat.max(dim='year')
# wind_climatology.to_netcdf('wind_climatology.nc')
# monthly_max.to_netcdf('period_max.nc')
# target_grid = "gs://carbonplan-data/processed/grids/conus/4000m/domain.zarr"
# mapper = fsspec.get_mapper(target_grid)
# target_ds = xr.open_zarr(
#     mapper, consolidated=True)
# regridder = xe.Regridder(wind_climatology, target_ds, method='bilinear', reuse_weights=True)
# regridded_wind = regridder(wind_climatology)
# regridded_full_wind = regridder(full_ds_wind)
# regridded_full_wind = regridded_full_wind.assign_coords({'x': target_ds.x,
#                              'y': target_ds.y})
# regridded_wind = regridded_wind.assign_coords({'x': target_ds.x,
#                              'y': target_ds.y})

First we set some top-level parameters: the level of spatial coarsening (which
can speed up fitting substantially, especially useful when testing), the time
range (MTBS spans 1984 to 2018), and the climatic variables we'll use in the
model.


In [None]:
coarsen = 16
tlim = (1984, 2018)
variables = ["tmean", "cwd", "pdsi", "ppt"]

First we load data from `nlcd` (National Land Cover Database) which we will use
to mask out non-land areas and data from `nftd` (National Forest Type Database)
to use a regressors for forest type groups.


In [None]:
mask = load.mask(store=store, year=2001)

In [None]:
nlcd = load.nlcd(store=store, year=2001, coarsen=coarsen, mask=mask)
nftd = load.nftd(store=store, area_threshold=1500, coarsen=coarsen, mask=mask)

Now we load the `terraclim` climate data from the same time range.


In [None]:
climate = load.terraclim(
    store=store,
    tlim=tlim,
    coarsen=coarsen,
    variables=variables,
    mask=mask,
    sampling="monthly",
).assign_coords({"x": nftd.x, "y": nftd.y})

And finally load the `mtbs` data (Monitoring Trends in Burn Severity) which will
gives us a record of all the fires we'll want to model.


In [None]:
mtbs = load.mtbs(store=store, coarsen=coarsen, tlim=tlim, mask=mask)
# give it the same x/y coords as nftd (and nlcd and climate)
# this re-assignment will help us with some masking lower down
mtbs = mtbs.assign_coords({"x": nftd.x, "y": nftd.y})  # .where(one_type)

### Inspecting the data


We can plot the `mtbs` data for a specific year to get a sense of the raw data.
Stronger fire activity in the Western US, with a bit in the Southeast as well.


In [None]:
plot.fire.summary(mtbs, clim=(0.00005, 0.005))

Here is the same thing but averaged over all years. Spatial patterns show the
Southeast flaring up in the spring and then the fire season grows in the
intermountain west in June-August. California then lights up in the Fall. This
suggests that if we have biases at a certain time of year we might be able to
tease out a regional bias. <--- Hypothesis!


In [None]:
plot.fire.monthly(mtbs, clim=(0.00005, 0.003))

We can plot spatially averaged data over time to see both seasonable trends, and
the fact that fire frequency and magnitude has increased over time.


In [None]:
x = mtbs["time"]
y = mtbs["monthly"].mean(["x", "y"])

plot.line(
    x=x, y=y, width=900, height=200, color="rgb(175,91,92)", strokeWidth=2
)

We can summarize the data by plotting annual, seasonal, and spatial trends by
averaging across different groupings and dimensions.


In [None]:
plot.fire.summary(mtbs.sel(time="2000"), clim=(0, 0.005))

As a motivation for fitting fires as functions of climatic variables, we can
look at spatially averaged climatic variables alongside fires, e.g. average
tempreature, and see that there is a clear relationship (in this case a positive
correlation).


In [None]:
x = mtbs.groupby("time.year").mean()["year"]
y1 = mtbs["monthly"].groupby("time.year").sum().mean(["x", "y"])
y2 = climate[temp_var].groupby("time.year").max().mean(["x", "y"])

(
    plot.line(x=x, y=sp.stats.zscore(y1), color="rgb(175,91,92)")
    + plot.line(x=x, y=sp.stats.zscore(y2), color="rgb(175,91,92)", opacity=0.5)
)

We could even look at how fires at the individual pixel level are related to the
annual temperature (since we will likely include larger scale climate averages
as variables in the model.) So we'll take the correlation along the time
dimension between the CONUS average temperature (`y2` from above) and the `mtbs`
monthly data.


All this is is a measure of how correlated a given location is to the greater
climate.


If instead we wanted to look at how correlated the fires were with their
specific climate maxes....


In [None]:
# Hacky way to make every month look like the annual average

In [None]:
max_temp = (
    climate[temp_var]
    .groupby("time.year")
    .max()
    .assign_coords({"year": mtbs["monthly"].time[np.arange(0, 420, 12)].values})
    .rename({"year": "time"})
    .resample(time="MS")
    .ffill()
)
max_temp = max_temp.reindex({"time": mtbs["monthly"].time}, method="ffill")
conus_average_max_temp = xr.ones_like(max_temp) * max_temp.mean(dim=["x", "y"])

In [None]:
# then reassign the coords to match the average_climate_ds coords to play nicely
mtbs = mtbs.assign_coords(
    {"x": max_temp.x, "y": max_temp.y, "lat": max_temp.lat}
)

### FINDING #1: changes in fire _are_ more strongly correlated with local changes in climate than CONUS-wide changes.

#### Discussion point: But what is the resulting implication for our model? Do we change something?


First we'll look at the level of correlation to CONUS-wide temperature changes
because that's what is being used in the model currently.


In [None]:
conus_average_temp_corr = xr.corr(
    mtbs["monthly"], conus_average_max_temp, dim="time"
)
local_temp_corr = xr.corr(mtbs["monthly"], max_temp, dim="time")

In [None]:
plot.fire.simple_map(
    conus_average_temp_corr,
    clabel="correlation",
    clim=(-0.2, 0.2),
    cmap="purplegreen",
)

Then we can look at the same analysis except make it local temperature changes.
This will likely have a stronger climate change signal.


In [None]:
plot.fire.simple_map(
    local_temp_corr, clabel="correlation", clim=(-0.2, 0.2), cmap="purplegreen"
)

We can subtract the two to see where we have differences between the different
methods.


So, we deduce that the local changes in fire _are_ more strongly correlated with
the local changes in temperature. But mainly in Texas. We could do a difference
between the two to see the regions that would likely be impacted if we were to
change something in the model.


In [None]:
plot.fire.simple_map(
    local_temp_corr - conus_average_temp_corr,
    clabel="correlation",
    clim=(-0.2, 0.2),
    cmap="purplegreen",
)

Hmmm now I'm a little confused. Let's try it with precip.


Let's try out precipitation now too!


In [None]:
max_var = (
    climate["ppt"]
    .groupby("time.year")
    .max()
    .assign_coords({"year": mtbs["monthly"].time[np.arange(0, 420, 12)].values})
    .rename({"year": "time"})
    .resample(time="MS")
    .ffill()
)
max_var = max_var.reindex({"time": mtbs["monthly"].time}, method="ffill")
conus_average_max_var = xr.ones_like(max_var) * max_var.mean(dim=["x", "y"])

conus_average_var_corr = xr.corr(
    mtbs["monthly"], conus_average_max_var, dim="time"
)
local_var_corr = xr.corr(mtbs["monthly"], max_var, dim="time")

So first we'll look at correlation of local fire w conus average precip change


In [None]:
plot.fire.simple_map(
    conus_average_var_corr,
    clabel="correlation",
    clim=(-0.2, 0.2),
    cmap="purplegreen",
)

Then we'll look at correlation of local fire w local precip change.


In [None]:
plot.fire.simple_map(
    local_var_corr, clabel="correlation", clim=(-0.2, 0.2), cmap="purplegreen"
)

### Let's instead try sum precip (instead of annual max) to try to get total precip trends


In [None]:
sum_var = (
    climate["ppt"]
    .groupby("time.year")
    .sum()
    .assign_coords({"year": mtbs["monthly"].time[np.arange(0, 420, 12)].values})
    .rename({"year": "time"})
    .resample(time="MS")
    .ffill()
)
sum_var = sum_var.reindex({"time": mtbs["monthly"].time}, method="ffill")
conus_average_sum_var = xr.ones_like(sum_var) * sum_var.mean(dim=["x", "y"])

conus_average_var_corr = xr.corr(
    mtbs["monthly"], conus_average_sum_var, dim="time"
)
local_var_corr = xr.corr(mtbs["monthly"], sum_var, dim="time")

In [None]:
plot.fire.simple_map(
    conus_average_var_corr,
    clabel="correlation",
    clim=(-0.2, 0.2),
    cmap="purplegreen",
)

In [None]:
plot.fire.simple_map(
    local_var_corr, clabel="correlation", clim=(-0.2, 0.2), cmap="purplegreen"
)

## Finding #2: annual sum precip better inidcator than annual max precip.


How much of this is just because the trends in max temperature are stronger in
certain places? Let's make a map of the pixel-level trends in fire and temp


In [None]:
max_temp_for_regression = max_temp.assign_coords(
    {"time": np.arange(1984, 2019.0, 1 / 12)}
)
precip_for_regression = sum_var.assign_coords(
    {"time": np.arange(1984, 2019.0, 1 / 12)}
)

In [None]:
plot.fire.simple_map(
    max_temp_for_regression.polyfit(dim="time", deg=1)
    .sel(degree=1)
    .polyfit_coefficients.assign_coords({"lat": max_temp_for_regression.lat}),
    clim=(-0.1, 0.1),
    clabel="trend in temp [deg C/year]",
    cmap="blueorange",
)

In [None]:
plot.fire.simple_map(
    precip_for_regression.polyfit(dim="time", deg=1)
    .sel(degree=1)
    .polyfit_coefficients.assign_coords({"lat": precip_for_regression.lat})
    / sum_var.mean(dim="time"),
    clim=(-0.02, 0.02),
    clabel="trend in precip [%/year]",
    cmap="brownbluegreen",
)

FINDING #3: Looks like spatial trends in precip haven't been observed in
historic record, so we're not giving it any knowledge of changing precip. Does
this mean that the model will be entirely temperature driven?


Now, let's look at four different regions to get a sense of the variance in the
variables compared to the variance in the mtbs fire.


In [None]:
geographic_boxes = {
    "Southeast": {"x": slice(45, 70), "y": slice(30, 50)},
    "California": {"x": slice(0, 10), "y": slice(15, 35)},
    "Intermountain West": {"x": slice(10, 20), "y": slice(15, 35)},
    "Northwest": {"x": slice(0, 20), "y": slice(0, 15)},
}

In [None]:
data = plot.fire.package_for_altair(
    mtbs.isel(geographic_boxes["California"]).sel(time=slice("1985", "2018")),
    climate.isel(geographic_boxes["California"]).sel(
        time=slice("1985", "2018")
    ),
    prediction=prediction.isel(geographic_boxes["California"]).sel(
        time=slice("1985", "2018")
    ),
    label="California",
)
for region in ["Intermountain West", "Northwest", "Southeast"]:
    data[region] = plot.fire.package_for_altair(
        mtbs.isel(geographic_boxes[region]).sel(time=slice("1985", "2018")),
        climate.isel(geographic_boxes[region]).sel(time=slice("1985", "2018")),
        prediction=prediction.isel(geographic_boxes[region]).sel(
            time=slice("1985", "2018")
        ),
        label=region,
    )[region]

In [None]:
plot.fire.multipanel_slider(
    data,
    [1985, 2018],
    ["California", "Intermountain West", "Northwest", "Southeast"],
)

### Let's try out some correlations between wind and fire- how closely related are they actually


In [None]:
wind_input = (
    regridded_full_wind.where(target_ds.mask)
    .coarsen(x=coarsen, y=coarsen, boundary="trim")
    .max()
    .compute()
)


# wind_input = wind_input.assign_coords({'time': mtbs.time,
#                                      'lat': mtbs.lat,
#                                      'lon': mtbs.lon,
#                                      'x': mtbs.x,
#                                      'y': mtbs.y
#                                                 })

In [None]:
wind_input = wind_input.assign_coords(
    {
        "time": mtbs.time,
        "lat": mtbs.lat,
        "lon": mtbs.lon,
        #                                      'x': mtbs.x.round(decimals=3),
        #                                      'y': mtbs.y.round(decimals=3),
    }
)

In [None]:
wind_fire_corr = xr.corr(
    mtbs["monthly"].assign_coords(
        {
            "x": wind_input.x,
            "y": wind_input.y,
        }
    ),
    wind_input,  # .drop(['lat', 'lon']),#.isel(time=slice(0,5)), # (1-climate['rh'])*
    dim="time",
).assign_coords({"lat": wind_input.lat, "lon": wind_input.lon})

In [None]:
wind_precip_corr = xr.corr(
    wind_input.assign_coords(climate.coords), climate["tmean"], dim="time"
)

In [None]:
plot.fire.simple_map(
    wind_precip_corr, clabel="correlation", clim=(-1, 1), cmap="purplegreen"
)

In [None]:
plot.fire.simple_map(
    wind_precip_corr, clabel="correlation", clim=(-1, 1), cmap="purplegreen"
)

In [None]:
# Let's find correlatin

### Fitting the model


In [None]:
wind_input = (
    regridded_wind.where(target_ds.mask)
    .coarsen(x=coarsen, y=coarsen, boundary="trim")
    .mean()
    .compute()
)

In [None]:
wind_input = (
    regridded_full_wind.where(target_ds.mask)
    .coarsen(x=coarsen, y=coarsen, boundary="trim")
    .max()
    .compute()
)

In [None]:
climate["wind"] = xr.DataArray(
    wind_input.values, coords=climate.coords, dims=("time", "y", "x")
)

In [None]:
x, y = prepare.fire(
    climate.drop(["rh"]),
    nftd,
    mtbs,  # , 'wind'
    #                     wind=(1-climate['rh'])*wind_input,
    add_local_climate_trends=True,
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

# let's try masking mtbs and climate with a single forest type and fit the model based upon only that. we'll drop the nftd for that


In [None]:
# conducted using just global climate

In [None]:
plot.fire.monthly(
    prediction.sel(time=slice("1984", "2010")),
    "prediction",
    clim=(0.00005, 0.002),
    cmap="reds",
)

In [None]:
plot.fire.monthly(
    prediction - (mtbs.rename({"monthly": "prediction"})),
    "prediction",
    clim=(-0.003, 0.003),
    cmap="purplegreen",
)

In [None]:
alt.data_transformers.disable_max_rows()

In [None]:
# this was done on forest group type index 16 trained only on that type

In [None]:
# train for each separate forest type and sum the predictions together

In [None]:
# grab a mask of one forest type and cast it all to 1s

In [None]:
nftd.sel(band=nftd.band[-1])

In [None]:
for band in [nftd.band[-1]]:
    x, y = prepare.fire(
        climate.assign_coords(
            {"x": nftd.isel(band=16).x, "y": nftd.isel(band=16).y}
        )
        * nftd.isel(band=16),
        #                     nftd,
        mtbs=mtbs.assign_coords(
            {"x": nftd.isel(band=16).x, "y": nftd.isel(band=16).y}
        )
        * nftd.isel(band=16),
        add_local_climate_trends=True,
    )
    x_z, x_mean, x_std = utils.zscore_2d(x)
    model = fit.hurdle(x_z, y, log=True)
    yhat = model.predict(x_z)
    prediction = collect.fire(
        yhat,
        mtbs.assign_coords(
            {"x": nftd.isel(band=16).x, "y": nftd.isel(band=16).y}
        )
        * nftd.isel(band=16),
    )

In [None]:
stats, charts = plot.fire.full_eval(
    mtbs,  # .assign_coords({'x': nftd.x,
    #                 'y': nftd.y})*nftd.isel(band=16),
    prediction,  # *nftd.isel(band=16),
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)
charts

And then let's compare it to training the whole hting and just evaluating
performance on one forest type


In [None]:
x, y = prepare.fire(climate, nftd, mtbs=mtbs, add_local_climate_trends=True)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
stats, charts = plot.fire.full_eval(
    mtbs.assign_coords({"x": one_type.x, "y": one_type.y})
    * one_type.drop("band"),
    prediction.assign_coords({"x": one_type.x, "y": one_type.y})
    * one_type.drop("band"),
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)
charts

Let's see what our modeled risk looks like at the supersection level to see if
anything jumps out


In [None]:
plot.fire.supersection(mtbs, "monthly")

In [None]:
plot.fire.supersection(prediction, "prediction")

In [None]:
plot.fire.monthly(
    prediction.sel(time="2018"), "prediction", clim=(0.00005, 0.001)
)

In [None]:
plot.fire.evaluation(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(0.003, 0.0006),
    #     clim=(0.0006, 0.003),
    percentage=False,
    cmap="redblue",
)

In [None]:
plot.fire.evaluation(
    mtbs, prediction, "monthly", "prediction", clim=(0.0006, 0.003)
)

FINDING #4: We're way too smeared out in the predictions for an individual year.
Way too smooth. Not episodic. Influence of this in the future is that it'll be
like a trickle everywhere as opposed to a big event.

Discussion point: let's brainstorm reasons why?


In [None]:
year = "1991"
plot.fire.evaluation(
    mtbs.sel(time=year),
    prediction.sel(time=year),
    "monthly",
    "prediction",
    clim=(0.0006, 0.003),
)

In [None]:
plot.fire.evaluation(
    mtbs.sel(time="2002"),
    prediction.sel(time="2002"),
    "monthly",
    "prediction",
    clim=(-0.0008, 0.0008),
)

In [None]:
plot.fire.simple_map(
    mtbs.sel({"x": slice(0, 20), "y": slice(0, 15)}).mean(dim="time").monthly,
    clim=(0.00000001, 0.001),
)

In [None]:
slider = alt.binding_range(min=1984, max=2018, step=1)
select_year = alt.selection_single(
    name="year", fields=["year"], bind=slider, init={"year": 1984}
)
base = (
    alt.Chart(data)
    .mark_line()
    .encode(x="month:O", color="variable:N")
    .add_selection(select_year)
    .properties(width=200, height=100)
    .transform_filter(select_year)
)
chart = base.encode(y="California:Q")
chart22 = base.encode(y="Southeast:Q")
chart | chart22
# base.encode(y='California:Q') | base.encode(y='Southeast:Q') | base.encode(y='Intermountain West:Q') | base.encode(y='Northwest:Q')

Let's repeat zoom into a few episodes to see whether we're capturing actual
fires (let's look specifically at the ones that Barbero et al look at in figure
S2 of the supplemental for the 2012 paper)


For 2007 we look at Zaca fire in California in July and Bugaboo fire in
southeast in April.


In [None]:
plot.fire.monthly(
    prediction.sel(time="2007"), "prediction", clim=(0.00005, 0.001)
)

For 2006 we look at September in California for the Day Fire.


In [None]:
plot.fire.monthly(
    prediction.sel(time="2006"), "prediction", clim=(0.00005, 0.001)
)

For 2002 we look at Rodeo fire in California in June.


In [None]:
plot.fire.monthly(
    prediction.sel(time="2002"), "prediction", clim=(0.00005, 0.001)
)

We can also compare the predicted and real probabilities over time.


In [None]:
x = mtbs["time"]
y1 = mtbs["monthly"].mean(["x", "y"])
y2 = prediction["prediction"].mean(["x", "y"])

(
    plot.line(
        x=x,
        y=y1,
        width=900,
        height=200,
        opacity=0.5,
        color="rgb(175,91,92)",
        strokeWidth=2,
    )
    + plot.line(
        x=x, y=y2, width=900, height=200, color="rgb(175,91,92)", strokeWidth=2
    )
)

Given that we fit a logistic regression, the natural model performance metric is
area under the ROC curve, which we computed on the training data (you can use
the variable `crossval` to repeat the model fit and compute a score on 25% held
out data).


Finally, we can compute correlations between model and prediction for annual,
seasonal, and spatial trends. Note that these were not metrics used to
explicitly fit the model, but in general a better fitting model ought to
reproduce at least some of these trends.


In [None]:
print("correlation of annual trends")
np.corrcoef(
    mtbs["monthly"].groupby("time.year").mean().mean(["x", "y"]),
    prediction["prediction"].groupby("time.year").mean().mean(["x", "y"]),
)[0, 1] ** 2

In [None]:
print("correlation of monthly trends")
np.corrcoef(
    mtbs["monthly"].groupby("time.month").mean().mean(["x", "y"]),
    prediction["prediction"].groupby("time.month").mean().mean(["x", "y"]),
)[0, 1] ** 2

In [None]:
print("correlation of spatial trends")
a = mtbs["monthly"].mean("time").values.flatten()
b = prediction["prediction"].mean("time").values.flatten()
inds = ~np.isnan(a) & ~np.isnan(b)
np.corrcoef(a[inds], b[inds])[0, 1] ** 2

### Try out four different improvements to the model


### Masking out fires in non-forested areas before passing the mtbs data into the model

We've changed this analysis to just only passing in the non-forested areas into
the model


In [None]:
# before doing the modeling (as part of the prepare step)
# take the sum of our forest type masks and call this single forest/no forest mask
# threshold at a low value (0.1) and zero out the mask so that there are zero fires for non-forest
# (aka we will ignore fires in any gridcells less than 10% forest)
# then train model as usual

In [None]:
non_forests = nlcd.sel(band=71) + nlcd.sel(band=52)

can't believe i'm taking it to numpy but c'est la vie... eat your shoe, jhamman!


In [None]:
forest_mask_vals = non_forests.values
# so anything that is more than 90% non-forest (less than 10% forest) will be masked out
forest_mask_vals[forest_mask_vals > 0.9] = np.NaN
forest_mask = forest_mask_vals <= 0.9

# Base set-up


In [None]:
x, y = prepare.fire(
    climate.drop(["rh", "wind", "fire_index"]),
    nftd,
    mtbs * forest_mask,
    add_local_climate_trends=True,
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

# Here we'll do climate including wind and rh


In [None]:
x, y = prepare.fire(
    climate, nftd, mtbs * forest_mask, add_local_climate_trends=True
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

# Then we'll try wind w no rh


In [None]:
x, y = prepare.fire(
    climate.drop("rh"), nftd, mtbs * forest_mask, add_local_climate_trends=True
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

# Then we'll do interaction between wind and 1-rh <---- ask jeremy if this is right in terms of magnitudes


In [None]:
climate["fire_index"] = climate["wind"] * (1 - climate["rh"])

In [None]:
x, y = prepare.fire(
    climate.drop(["rh", "wind"]),
    nftd,
    mtbs * forest_mask,
    add_local_climate_trends=True,
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

# But either which way, let's try this last bit in which we use a wind climatology instead


In [None]:
x, y = prepare.fire(
    climate.drop(["rh", "wind", "fire_index"]),
    nftd,
    mtbs * forest_mask,
    add_local_climate_trends=True,
    wind=regridded_wind.coarsen(x=coarsen, y=coarsen, boundary="trim")
    .max()
    .compute(),
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

In [None]:
plot.fire.supersection(mtbs * forest_mask, "monthly")

In [None]:
plot.fire.supersection(prediction * forest_mask, "prediction")

So, removing non-forested areas helps us a lot. It kind of messes up the
seasonal cycle (strange, and the local climate trends still help us with that)
but our spatial biases drop _a lot_.


In [None]:
needle_leaf_da = xr.DataArray(
    data=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
    dims=["band"],
    coords={"band": nftd.band.values},
)
needle_leaf_fraction = nftd.dot(needle_leaf_da).expand_dims(dim="band")

In [None]:
# Mapping from https://www.fia.fs.fed.us/library/database-documentation/historic/ver3/FIADB_user%20manual_v3-0_p2_7_10_08.pdf
tree_type_mapping = pd.DataFrame(
    [
        [100, "White/red/jack pine group", "Needle"],
        [120, "Spruce/fir group", "Needle"],
        [140, "Longleaf/slash pine group", "Needle"],
        [160, "Lolblolly/shortleaf pine group", "Needle"],
        [180, "Pinyon/juniper group", "Needle"],
        [200, "Douglas-fir group", "Needle"],
        [220, "Ponderosa pine group", "Needle"],
        [260, "Fir/spruce/mountain hemlock group", "Needle"],
        [280, "Lodgepole pine group", "Needle"],
        [370, "California mixed conifer group", "Needle"],
        [400, "Oak/pine group", "Needle"],  # CHECK
        [500, "Oak/hickory group", "Broadleaf"],
        [600, "Oak/gum/cypress group", "Broadleaf"],
        [700, "Elm/ash/cottonwood group", "Broadleaf"],
        [800, "Maple/beech/birch group", "Broadleaf"],
        [900, "Aspen/birch group", "Broadleaf"],
        [920, "Western oak group", "Broadleaf"],
    ],
    columns=["group_ID", "Forest type", "Leaf"],
)
tree_type_mapping["Needleleaf"] = (
    tree_type_mapping["Leaf"] == "Needle"
).astype(int)

### Evaluate performance by forest type


### Create two additional variables (fraction of each gridcell that is either (1) broadleaf or (2) needleleaf) and then feed it to the model as an interaction with the four climate variables (to produce 8 additional variables) to make essentially two different models specific to different kinds of forest responses

Well, actually we'll just do the fraction that is needle leaf and we'll multiply
it by the climate variables. We'll only need four.


First let's just see what happens when we pass in the needle leaf fraction as
its own variable. This will just give the model a little more understanding of
how the forest types relate to eachother.


In [None]:
needle_leaf_da = xr.DataArray(
    data=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
    dims=["band"],
    coords={"band": nftd.band.values},
)
broad_leaf_da = xr.DataArray(
    data=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
    dims=["band"],
    coords={"band": nftd.band.values},
)
broad_leaf_fraction = nftd.dot(broad_leaf_da).expand_dims(dim="band")
needle_leaf_fraction = nftd.dot(needle_leaf_da).expand_dims(dim="band")
nftd = xr.concat(
    [
        nftd,
        needle_leaf_fraction.assign_coords({"band": [0]}),
        broad_leaf_fraction.assign_coords({"band": [1]}),
    ],
    dim="band",
)

Each time we run this model we'll want to look at the performance, so let's pull
all of these analyses together to have a nice evaluation module.


Let's just see if running by adding a needle-leaf/broadleaf category helps at
all (it shouldn't)


In [None]:
climate

In [None]:
x, y = prepare.fire(climate, nftd, mtbs, add_local_climate_trends=True)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

Now let's try adding an interaction! We'll multiply the needle leaf fraction by
the climate variables. We can just add them as new variables to the climate
dataset.


In [None]:
nftd.band.values

In [None]:
climate_interactions = climate.drop(["pdsi"]).copy(deep=True)
for var in climate_interactions.data_vars:
    for band in nftd.band.values:
        climate_interactions["{}_interaction_{}".format(var, band)] = (
            climate_interactions[var] * nftd.sel(band=band).values
        )

In [None]:
x, y = prepare.fire(
    climate_interactions, mtbs=mtbs, nftd=nftd, add_local_climate_trends=True
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True, max_iter=10000000)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
stats, chart = plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)
chart

In [None]:
nftd.band.values

In [None]:
climate_interactions = climate.drop(["pdsi"]).copy(deep=True)
for var in climate_interactions.data_vars:
    for band in nftd.band.values[4:10]:
        climate_interactions["{}_interaction_{}".format(var, band)] = (
            climate_interactions[var] * nftd.sel(band=band).values
        )

In [None]:
x, y = prepare.fire(
    climate_interactions, mtbs=mtbs, nftd=nftd, add_local_climate_trends=True
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True, max_iter=10000000)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
stats, chart = plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)
chart

In [None]:
climate_interactions = climate.drop(["pdsi"]).copy(deep=True)
for var in climate_interactions.data_vars:
    for band in nftd.band.values[7:8]:
        climate_interactions["{}_interaction_{}".format(var, band)] = (
            climate_interactions[var] * nftd.sel(band=band).values
        )

In [None]:
x, y = prepare.fire(
    climate_interactions, mtbs=mtbs, nftd=nftd, add_local_climate_trends=True
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True, max_iter=10000000)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
stats, chart = plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)
chart

Question: why does it lose the seasonal signal when I add the interaction terms?
What's wrong here?


In [None]:
plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

In [None]:
plot.fire.supersection(prediction, "prediction")

Let's try adding masking of mtbs with the interactions. Hmmmm gets weird.


In [None]:
x, y = prepare.fire(
    climate_interactions,
    nftd.drop_sel(band=0).drop_sel(band=1),
    mtbs * forest_mask,
    add_local_climate_trends=True,
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs * forest_mask,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

In [None]:
plot.fire.supersection(prediction, "prediction")

### Let's try back at the beginning just dropping pdsi


In [None]:
x, y = prepare.fire(
    climate.drop(["rh", "pdsi"]),
    nftd.drop_sel(band=0).drop_sel(band=1),
    mtbs,
    add_local_climate_trends=True,
)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [None]:
plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

### Incorporate an element of ignition (e.g. population, lightning) - could help southern california biases


### Consider wind a bit more carefully- are there any potential wind products we could use to capture high end tail of severity


Let's reload climate, this time pulling in wind


In [None]:
coarsen = 4
tlim = (1984, 2018)
variables = [temp_var, "cwd", "pdsi", "ppt", "ws"]
climate = load.terraclim(
    store=store,
    tlim=tlim,
    coarsen=coarsen,
    variables=variables,
    mask=mask,
    sampling="monthly",
)

In [None]:
# here we'll use a climate that includes windspeed
x, y = prepare.fire(climate, nftd, mtbs, add_local_climate_trends=True)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)
plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

# let's try adding in relative humidity


## let's try dropping pdsi


In [None]:
# here we'll use a climate that includes windspeed
x, y = prepare.fire(climate, nftd, mtbs, add_local_climate_trends=True)
x_z, x_mean, x_std = utils.zscore_2d(x)
model = fit.hurdle(x_z, y, log=True)
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)
plot.fire.full_eval(
    mtbs,
    prediction,
    "monthly",
    "prediction",
    clim=(-0.008, 0.008),
    cmap="purplegreen",
    percentage=False,
)

### Prediction on future climate data


In [None]:
cmip_model = "CanESM5"
scenario = "ssp245"
target = 2030

In [None]:
future_climate = load.cmip(
    store="az",
    model=cmip_model,
    coarsen=16,
    scenario=scenario,
    tlim=(2020, 2100),
    variables=variables,
    mask=mask,
    sampling="monthly",
)

In [None]:
x = prepare.fire(
    future_climate, nftd, eval_only=True, add_local_climate_trends=True
)
x_z = utils.zscore_2d(x, mean=x_mean, std=x_std)

In [None]:
yhat = model.predict(x_z)
prediction = collect.fire(yhat, future_climate)

In [None]:
plot.fire.summary(prediction, "prediction", clim=(0.0006, 0.007))