# Alaska Common Practice

_by Grayson Badgley_

This notebook represents our best effort to recreate common practice value for the three Alaska
assessment areas.


In [None]:
import holoviews as hv
import geoviews as gv
import geoviews.feature as gf
import cartopy
import cartopy.feature as cf
from shapely.geometry import Point

import seaborn as sns

from geoviews import opts, dim
from cartopy import crs as ccrs

gv.extension("matplotlib", "bokeh")

gv.output(dpi=120, fig="svg")

hv.output(backend="bokeh")

In [None]:
import geopandas
import numpy as np
import pandas as pd
from shapely.geometry import Point

from carbonplan_retro.load.fia import load_fia_common_practice
from carbonplan_retro.data import cat
from carbonplan_retro.load import geometry

In [None]:
from collections import defaultdict

## Load the data


In [None]:
df = load_fia_common_practice("ak", min_year=2000, max_year=2020, private_only=False)

supersections = geometry.load_supersections("https://storage.googleapis.com/carbonplan-data")

df = geopandas.sjoin(df, supersections[["SSection", "geometry", "ss_id"]], how="inner", op="within")
# df = df.rename(columns={"index_right": "supersection_id"})

## Target

We're aiming to reproduce the following numbers:


In [None]:
arb_cp = pd.Series(
    {
        "Alaska Range Transition": 37.14,
        "Alexander Archipelago - Kodiak": 120.22,
        "Gulf-NorthCoast-Chugach": 84.9,
    },
    name="ARB_calc",
)
display(arb_cp)

From our CONUS work we've learned that resampling the data can be helpful in narrowing in on how our
CP estimates relate to ARB estimates.


In [None]:
def ci(data, summarize=True, n_samples=500, weights=True):
    if weights:
        resample = [
            data.sample(len(data), replace=True, weights="CONDPROP_UNADJ").slag_co2e_acre.mean()
            for i in range(n_samples)
        ]
    if summarize:
        return pd.Series(resample).quantile(q=[0.025, 0.5, 0.975])
    else:
        return resample

## Conus Method

In CONUS, we use the following criteria:

- min_year of 2002, 2005, or 2008
- max_year of 2012
- owncd == 46 (private land)

If we make no assumptions about FORTYPCDS (e.g., include all CONDS) - we don't get the ARB number
for two of the three assessment areas.


In [None]:
df[(df.MEASYEAR >= 2008) & (df.OWNCD == 46) & (df.MEASYEAR <= 2012)].groupby("SSection").apply(ci)

We're also fairly sure that limiting the number of FORTYPS considered doesn't make a significant
difference. I hand transcribed the "associated species" from the ARB lookup table into the lists in
the following cell -- these represent the FORTYPCDs that map to the listed species in the ARB lookup
table.


In [None]:
# mapping of all FORTYPCDs that appear in TREE table for state of Alaska -- used as reference to populat ss_fortyps
# ak_fortyps = {703: 'cottonwood', 902: 'paper birch', 901: 'aspen', 122: 'white spruce', 125: 'black spruce', 709: 'cottonwood/willow', 270: 'moutain hemlock', 305: 'sitka spruce', 999: 'unstocked', 904: 'balsam poplar', 301: 'western hemlock', 264: 'pacific silver fir', 271: 'ak yellow-cedar', 962: 'other hardwoods', 911: 'red alder', 281: 'lodgepole pine', 304: 'western redcedar' }
ss_fortyps = {
    285: [305, 122, 125, 703, 709, 901, 902, 999],
    286: [271, 703, 709, 281, 270, 902, 911, 305, 301, 304, 999],
    287: [703, 270, 902, 305, 301, 122, 999],
}

In [None]:
res = {}
for ss_id, fortyps in ss_fortyps.items():
    mean_slag = df[
        (df.INVYR >= 2002)
        & (df.INVYR <= 2012)
        & (df.OWNCD == 46)
        & (df.ss_id == ss_id)
        & (df.FORTYPCD.isin(fortyps))
    ].slag_co2e_acre.mean()
    res[ss_id] = mean_slag
display(res)

Changing the years around doesn't help all that much either, though changing to 2008 really changes
287 -- North Coast/Chugach. But things are still way too low.


In [None]:
res = defaultdict(dict)


for min_year in [2002, 2005, 2008]:
    for ss_id, fortyps in ss_fortyps.items():
        mean_slag = df[
            (df.INVYR >= min_year)
            & (df.INVYR <= 2012)
            & (df.OWNCD == 46)
            & (df.ss_id == ss_id)
            & (df.FORTYPCD.isin(fortyps))
        ].slag_co2e_acre.mean()
        res[min_year][ss_id] = mean_slag
pd.DataFrame(res)

If we relax the ownership requirement [commented out `OWNCD` criterion below] -- things look a
little better. Now, 286 is about where ARB calculates, 287 is closer to ARB number (but high), and
285 is significantly below the reported CP of 37.14 tCO2e acre-1.


In [None]:
res = defaultdict(dict)


for min_year in [2002, 2005, 2008]:
    for ss_id, fortyps in ss_fortyps.items():
        mean_slag = df[
            (df.INVYR >= min_year)
            & (df.INVYR <= 2012)
            &
            # (df.OWNCD == 46) &
            (df.ss_id == ss_id)
            & (df.FORTYPCD.isin(fortyps))
        ].slag_co2e_acre.mean()
        res[min_year][ss_id] = mean_slag
pd.DataFrame(res)

Mostly through trial and error, I learned that removing three OWNCDs helps a bunch.

- 21: NPS -- national parks cant be logged (i dont think) so it made sense to exclude them
- 31: state land
- 32: local authorites

When we exclude those, we get about the best quest -- 286/287 are more or less spot on. Sure ARB's
actual calculation will be slightly different, but atleast we can demonstrate there is a reasonable
approach for going from the data we have available to the estimates they provide. For 285, however,
we're still not able to get the right answer.


In [None]:
res = defaultdict(dict)
for min_year in [2002, 2005, 2008]:

    for ss_id, fortyps in ss_fortyps.items():
        mean_slag = df[
            (df.INVYR >= min_year)
            & (df.INVYR <= 2012)
            & (~df.OWNCD.isin([21, 31, 32]))
            & (df.ss_id == ss_id)
            & (df.FORTYPCD.isin(fortyps))
        ].slag_co2e_acre.mean()
        res[min_year][ss_id] = mean_slag
pd.DataFrame(res)

For good measure, let's just relax the forestyp constriant for now and instead do the resampling
esimate of mean SLAG for each assessment area. This gives us another glimpse -- looks like we can
get decently close for Alex/Kodiak and NorthCoast/Chugach -- but Alaska Range Transition just isn't
there


In [None]:
df[(df.MEASYEAR >= 2002) & (df.MEASYEAR <= 2012) & (~df.OWNCD.isin([21, 31, 32]))].groupby(
    "SSection"
).apply(ci)

## Looking for obvious explanations

Our CP estimate for ART is just too low. That means we need some sizable pool of plots that have a
higher SLAG to get the right answer.

To explore this, we aggregate the data across various dimensions to see if there is some dimension
that has much higher SLAG.

### Measurement Year

Looks like there are some ups and downs year to year, but nothing obvious pops out -- so it doesnt
look like just fiddling with temporal filtering is going to solve our problem


In [None]:
res = defaultdict(dict)
df[
    (df.INVYR >= 2000)
    & (df.INVYR <= 2015)
    &
    # (~df.OWNCD.isin([21,31,32])) &
    (df.ss_id == 285)
    & (df.FORTYPCD.isin(ss_fortyps[285]))
].groupby("MEASYEAR").slag_co2e_acre.agg(["mean", "count"])

### OWNCD

Ownership paints a similar picture -- we see that `OWNCD == 31` are definitely lower than other
ownership classes -- which helps explain why their exclusion helps us get closer to the
ARB-estimated CPs. But apart from `31`, there isn't some ownership class with significntly higher CP
that we're somehow understampling (apart from 25, but with just two plots I'm not going to read
anything into that.)


In [None]:
res = defaultdict(dict)
df[
    (df.INVYR >= 2000)
    & (df.INVYR <= 2018)
    &
    # (~df.OWNCD.isin([21,31,32])) &
    (df.ss_id == 285)
    & (df.FORTYPCD.isin(ss_fortyps[285]))
].groupby("OWNCD").slag_co2e_acre.agg(["mean", "count"])

### FORTYPCD

Last but not least, is there anything going on by FORTYPCD and OWNCD? I'm just not seeing it. I
guess if we excluded 122/125 (white/black spruce) we'd probably get much closer -- but we know that
black spruce is going to thrive in this transition zone. These FORTYPCDs are also specifically
mentioned in the ARB lookup table. So yeah there just isn't a clear explanation for why our CP
estimate in ART is so much lower.


In [None]:
res = defaultdict(dict)
df[
    (df.INVYR >= 2002)
    & (df.INVYR <= 2018)
    &
    # (~df.OWNCD.isin([21,31,32])) &
    (df.ss_id == 285)
    # (df.FORTYPCD.isin(ss_fortyps[285]))
].groupby(["FORTYPCD", "OWNCD"]).slag_co2e_acre.agg(["mean", "count"])

## FIA spatial coverage in ART.

The fact we can figure out two of the AK assessment areas, but not the third (ART) might not be
_that_ troubling if it weren't for another fact. There do not seem to be FIA plots in large portions
of the ART assessment area (shown in blue below). The lightly shaded black points in the figure
below are FIA plot locations, taken from the
[AK.accdb posted by ARB](https://ww2.arb.ca.gov/our-work/programs/compliance-offset-program/compliance-offset-protocols/us-forest-projects/2015/common-practice-data).
We are under the assumption that these raw data are the basis for ARB's reported CP estimates. When
we plot those data, however, we see that large sections of the Alaska Range Transition assessment
area have no FIA data. In my conversations with some folks at USFS and reading of online materials,
it doesn't seem that FIA has actually made it to these parts of Alaska yet. So how did ARB include
them? Do these regions not have FIA data? Or is there some other data source, thus explaining our
inability to recreate the CP number from FIA data alone? Without more guidance, I do not believe
that we can definitively calculate CP for the Alaska Range Transition assessment area.


In [None]:
arb_ak_plot = pd.read_csv("/home/jovyan/lost+found/arb_ak_plot.csv")
arb_ak_plot = arb_ak_plot[arb_ak_plot["PLOT_STATUS_CD"] == 1]  # at least one forest COND sampled

geo = [Point(r.LON, r.LAT) for r in arb_ak_plot.itertuples()]
arb_ak_plot = geopandas.GeoDataFrame(data=arb_ak_plot, geometry=geo, crs="epsg:4326")

supersections.geometry = supersections.simplify(0.025)

gv.Polygons(supersections[supersections["ss_id"] > 100], vdims=["SSection"]).opts(
    cmap="tab10", width=600, height=300, alpha=0.25
) * gv.Points(arb_ak_plot).opts(color="k", size=2, alpha=0.10)