<img width="50" src="https://carbonplan-assets.s3.amazonaws.com/monogram/dark-small.png" style="margin-left:0px;margin-top:20px"/>

# MTBS Fire Model

_by Jeremy Freeman (CarbonPlan), September 19, 2020_

This notebook loads downsampled MTBS fire data and TerraClimate climate data and
fits a logistical regression model.


In [66]:
import numpy as np
import pandas as pd
import scipy as sp
from carbonplan_forests import load, setup, plot, fit, utils, prepare, collect

In [67]:
%matplotlib inline
import matplotlib.pyplot as plt
from showit import image, tile

In [68]:
setup.plotting(remote=False)

In [69]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


First we set some top-level parameters: the level of spatial coarsening (which
can speed up fitting substantially, especially useful when testing), the time
range (MTBS spans 1984 to 2018), and the climatic variables we'll use in the
model.


In [70]:
coarsen = 16
tlim = (1984, 2018)
variables = ["ppt", "tmean", "cwd", "pdsi", "tmax", "tmin"]
store = "local"

First we load data from `nlcd` which we will use to mask out non-land areas and
data from `nftd` to use a regressors for forest type groups.


In [71]:
mask = load.mask(store=store, year=2001)

In [72]:
nlcd = load.nlcd(store=store, year=2001, coarsen=coarsen, mask=mask)
nftd = load.nftd(store=store, area_threshold=1500, coarsen=coarsen, mask=mask)

Now we load the `terraclim` data from the same time range.


In [73]:
climate = load.terraclim(
    store=store,
    tlim=tlim,
    coarsen=coarsen,
    variables=variables,
    mask=mask,
    sampling="monthly",
)

And finally load the `mtbs` data


In [74]:
mtbs = load.mtbs(store=store, coarsen=coarsen, tlim=tlim, mask=mask)

### Inspecting the data


We can plot the `mtbs` data for a specific year to get a sense of the raw data.


In [79]:
plot.fire.monthly(mtbs.sel(time="2018"), clim=(0, 0.2))

Here is the same thing but averaged over all years


In [93]:
plot.fire.monthly(mtbs, clim=(0.00005, 0.0015))

## We can plot spatially averaged data over time to see both seasonable trends, and
the fact that fire frequency and magnitude has increased over time.


In [81]:
x = mtbs["time"]
y = mtbs["monthly"].mean(["x", "y"])

plot.line(
    x=x, y=y, width=900, height=200, color="rgb(175,91,92)", strokeWidth=2
)

We can summarize the data by plotting annual, seasonal, and spatial trends by
averaging across different groupings and dimensions.


In [82]:
plot.fire.summary(mtbs, clim=(0, 0.005))

As a motivation for fitting fires as functions of climatic variables, we can
look at spatially averaged climatic variables alongside fires, e.g. average
tempreature, and see that there is a clear relationship (in this case a positive
correlation).


In [83]:
x = mtbs.groupby("time.year").mean()["year"]
y1 = mtbs["monthly"].groupby("time.year").sum().mean(["x", "y"])
y2 = climate["tmean"].groupby("time.year").max().mean(["x", "y"])

(
    plot.line(x=x, y=sp.stats.zscore(y1), color="rgb(175,91,92)")
    + plot.line(x=x, y=sp.stats.zscore(y2), color="rgb(175,91,92)", opacity=0.5)
)

#np.corrcoef(y1,y2)[0,1]

### Fitting the model


In [84]:
x, y = prepare.fire(climate, nftd, mtbs)
x_z, x_mean, x_std = utils.zscore_2d(x)

In [85]:
model = fit.hurdle(x_z, y, log=True)

In [86]:
model

HurdleModel(link='True', train_r2='0.057', train_roc='0.890')

In [87]:
yhat = model.predict(x_z)
prediction = collect.fire(yhat, mtbs)

In [92]:
plot.fire.monthly(prediction, "prediction", clim=(0.00005, 0.0015))

In [94]:
plot.fire.evaluation(
    mtbs, prediction, "monthly", "prediction", clim=(0, 0.005)
)

We can also compare the predicted and real probabilities over time.


In [22]:
x = mtbs["time"]
y1 = mtbs["monthly"].mean(["x", "y"])
y2 = prediction["prediction"].mean(["x", "y"])

(
    plot.line(
        x=x,
        y=y1,
        width=900,
        height=200,
        opacity=0.5,
        color="rgb(175,91,92)",
        strokeWidth=2,
    )
    + plot.line(
        x=x, y=y2, width=900, height=200, color="rgb(175,91,92)", strokeWidth=2
    )
)

Given that we fit a logistic regression, the natural model performance metric is
area under the ROC curve, which we computed on the training data (you can use
the variable `crossval` to repeat the model fit and compute a score on 25% held
out data).


Finally, we can compute correlations between model and prediction for annual,
seasonal, and spatial trends. Note that these were not metrics used to
explicitly fit the model, but in general a better fitting model ought to
reproduce at least some of these trends.


In [23]:
print("correlation of annual trends")
np.corrcoef(
    mtbs["monthly"].groupby("time.year").mean().mean(["x", "y"]),
    prediction["prediction"].groupby("time.year").mean().mean(["x", "y"]),
)[0, 1]

correlation of annual trends


0.8316705458275839

In [24]:
print("correlation of monthly trends")
np.corrcoef(
    mtbs["monthly"].groupby("time.month").mean().mean(["x", "y"]),
    prediction["prediction"].groupby("time.month").mean().mean(["x", "y"]),
)[0, 1]

correlation of monthly trends


0.9589361751309387

In [25]:
print("correlation of spatial trends")
a = mtbs["monthly"].mean("time").values.flatten()
b = prediction["prediction"].mean("time").values.flatten()
inds = ~np.isnan(a) & ~np.isnan(b)
np.corrcoef(a[inds], b[inds])[0, 1]

correlation of spatial trends


0.4522937952697327

### Prediction on future climate data

In [97]:
cmip_model = 'CanESM5'
scenario = 'ssp245'
target = 2025

In [98]:
future_climate = load.cmip(
    store=store, 
    model=cmip_model, 
    coarsen=16, 
    scenario=scenario, 
    tlim=(2020, 2100), 
    variables=variables, 
    mask=mask, 
    sampling="monthly"
)

In [28]:
x = prepare.fire(future_climate, nftd, eval_only=True)
x_z = utils.zscore_2d(x, mean=x_mean, std=x_std)

In [29]:
yhat = model.predict(x_z)
prediction = collect.fire(yhat, future_climate)

In [96]:
plot.fire.summary(prediction, "prediction", clim=(0.0006, 0.008))

In [31]:
x = prediction["time"]
y2 = prediction["prediction"].mean(["x", "y"])

(
    plot.line(
        x=x, y=y2, width=900, height=200, color="rgb(175,91,92)", strokeWidth=2
    )
)