This notebook compares the final retro-db, containing all values extracted from
Offset Project Data Reports (OPDRs), against the official ARB issuance table.

This comparison is meant to accomplish two tasks. First, it should identify
instances where OPDR data does not agree with the official ARB issuance outcome.
Such discrepancies likely arise when the offset registries host out-of-date
OPDRs (e.g., ARB issued ARBOCs on the basis of a separate/updated OPDR that we
cannot access). Second, we want to demonstrate the ability to reconstruct ARBOC
calculations from the raw "IFM components" that are reported in each OPDR. IFM
projects are required to report numerous individual "components" when
quantifying net changes in stored carbon. For IFM projects, the primary
components include:

- IFM-1: Standing live tree carbon (above \& below ground)
- IFM-3: Standing dead ("all portions")
- IFM-7: Carbon contained in in-use forest products
- IFM-8: Carbon contained in landfilled forest products

IFM projects are also required to report "secondary effects", which accounts for
several other "secondary components." Each OPDR reports calculated secondary
effects in section TK.

Means we end up with three "allocation" values:

- Issuance: the official, issued allocation of ARBOCs as recorded by ARB.
- OPDR-Reported: the OPO/APD reported ARBC
- OPDR-Calculated: issuance derived from IFM-1, IFM-3, IFM-7, IFM-8, and
  secondary effects (SE).

In an ideal world, we would have agreement between all three values. This
notebook allows us to explore cases where these values diverge.

We've also produced a
[narrative description](https://www.eenews.net/climatewire/2020/12/22/stories/1063721299)
of these discrepancies.


In [None]:
%load_ext nb_black
import pathlib
import sys

import numpy as np
from itertools import permutations
import pandas as pd

sys.path.append("/Users/darryl/proj/carbonplan/retro/")

from retrospective.load.issuance import load_issuance_table
from retrospective.load.project_db import load_project_db
from retrospective.analysis import allocation

# Load retro-db and issuance table


In [None]:
project_db = load_project_db("Forest-Offset-Projects-v0.3", use_cache=False)

project_db = project_db[
    ~project_db["project"]["early_action"].str.startswith("CAR")
]

In [None]:
issuance_table = load_issuance_table(
    "/Users/darryl/forest-retro/documents-of-interest/arb/issuance/arboc_issuance_2020-09-09.xlsx"
)
issuance_table = issuance_table[
    issuance_table["is_ea"] == False
]  # we didnt look at any of the EA proejcts in their EA form; exclude

agg_by_rp = issuance_table.groupby(["opr_id", "arb_rp_id"])[
    ["allocation", "buffer_pool"]
].sum()  # One project has multiple issuance events in its first reporting period, aggregate them
issuance_first_rp = agg_by_rp.xs("A", level=1)

## Run the calculations


In [None]:
opdr_calculated = allocation.calculate_allocation(
    project_db, round_intermediates=False
)
compare_allocations = pd.concat(
    [opdr_calculated, project_db["rp_1"]["allocation"].rename("opdr_reported")],
    axis=1,
)

compare_allocations = compare_allocations.join(
    issuance_first_rp["allocation"].rename("issuance")
)

delta_opdr = (
    compare_allocations["opdr_reported"]
    - compare_allocations["opdr_calculated"]
)
delta_issuance = (
    compare_allocations["opdr_reported"] - compare_allocations["issuance"]
)

# Issuance and Reported do not agree


In [None]:
issuance_reported_differ = delta_issuance[delta_issuance.abs() != 0]

# hand classified
reported_issuance_errors = {
    "unexplained_possible_overcredit": ["CAR1175"],
    "outdated_opdr_likely": [
        "CAR1257",
        "CAR1215",
        "CAR1264",
        "VCSOPR10",
        "CAR1213",
    ],
    "flagged_correctable": ["CAR1103", "CAR1208"],
    "rounding_de_minimis": ["ACR284", "CAR1095"],
}

assert sum([len(v) for v in reported_issuance_errors.values()]) == len(
    issuance_reported_differ
)  # 12 December 2020; if change you better understand why
# as of 22 December 2020, these 9 projects continue to have issues.

The worst possible situation would be where OPDRreported != OPDRcalculated !=
Issuance. Thankfully that doesn't happen.


In [None]:
assert np.all(delta_opdr[delta_issuance.abs() != 0] < 1)

# OPDRreported & OPDRcalculated

Here is a nice summary of where things stand. While some significant
disagreements still exist, we've overall done a fantastic job of recreating the
issuance numbers. I've explored every single discrepancy greater than one and,
at this point, have growing confidence that differences reflected here are
"true" disagreements and not caused by data entry problems on my side.


In [None]:
threshes = [1, 2, 5, 25]

for thresh in threshes:
    display(
        f"{len(compare_allocations[delta_opdr.abs() < thresh])} of the {len(project_db)} projects are within {thresh} ARBOC(s)"
    )

And I've gone through all cases where the difference is > 1 and tried to figure
out what the heck is going on -- those learnings are reproduced below but also
kept here as a "comment" dict so can output those comments on a per project
basis to a csv.

In the end, we identify 11 projects with ARBOC errors >= 2.


In [None]:
full_comparison.delta_reported_less_calculated.abs().sort_values().tail(12)

In [None]:
delta_reported_calculated = {
    "small_rounding_errors": ["CAR1094", "CAR1204", "ACR256", "ACR257"],
    "harvest_error": ["ACR247", "CAR1217", "ACR276"],
    "big_round_errors_uncorrected": ["ACR360", "ACR427"],
    "big_rounding_errors_corrected": ["ACR282"],
    "small_error_not_rounding": ["CAR1032"],
}

In [None]:
full_comparison = compare_allocations.join(
    delta_issuance.rename("delta_reported_less_issuance")
).join(delta_opdr.rename("delta_reported_less_calculated"))

# full_comparison.to_csv("../data/odpr_issuance_math.csv", float_format="%.3f")

In [None]:
full_comparison["tag_reported_not_equal_calculated"] = None
for k, lst in delta_reported_calculated.items():
    full_comparison.loc[lst, "tag_reported_not_equal_calculated"] = k

full_comparison["tag_reported_not_equal_issuance"] = None
for k, lst in reported_issuance_errors.items():
    full_comparison.loc[lst, "tag_reported_not_equal_issuance"] = k


full_comparison.index = full_comparison.index.rename("opr_id")
full_comparison.to_csv("/tmp/opdr_discrepancies.csv", float_format="%.3f")