This notebook compares the final retro-db, containing all values extracted from Offset Project Data Reports (OPDRs), against the official ARB issuance table. 

This comparison is meant to accomplish two tasks. 
First, it should identify instances where OPDR data does not agree with the official ARB issuance outcome. 
Such discrepancies likely arise when the offset registries host out-of-date OPDRs (e.g., ARB issued ARBOCs on the basis of a separate/updated OPDR that we cannot access).
Second, we want to demonstrate the ability to reconstruct ARBOC calculations from the raw "IFM components" that are reported in each OPDR. 
IFM projects are required to report numerous individual "components" when quantifying net changes in stored carbon.
For IFM projects, the primary components include: 

- IFM-1: Standing live tree carbon (above \& below ground)
- IFM-3: Standing dead ("all portions")
- IFM-7: Carbon contained in in-use forest products
- IFM-8: Carbon contained in landfilled forest products

IFM projects are also required to report "secondary effects", which accounts for several other "secondary components."
Each OPDR reports calculated secondary effects in section TK.

Means we end up with three "allocation" values:
- Issuance: the official, issued allocation of ARBOCs as recorded by ARB.
- OPDR-Reported: the OPO/APD reported ARBC
- OPDR-Calculated: issuance derived from IFM-1, IFM-3, IFM-7, IFM-8, and secondary effects (SE). 

In an ideal world, we would have agreement between all three values. 
This notebook allows us to explore cases where these values diverge.

In [2]:
%load_ext nb_black
import pathlib
import sys

import numpy as np
from itertools import permutations
import pandas as pd

sys.path.append("/Users/darryl/proj/carbonplan/retro/")

from retrospective.load.issuance import issuance
from retrospective.load.retro import retro
from retrospective.analysis import allocation

<IPython.core.display.Javascript object>

# Load retro-db and issuance table

loading load Forest-Offset-Projects-v0.3 from /Users/darryl/proj/carbonplan/retro/data


<IPython.core.display.Javascript object>

['CAR1063',
 'CAR1161',
 'CAR1162',
 'CAR1159',
 'CAR1134',
 'CAR1140',
 'CAR1099',
 'CAR1067',
 'CAR1086',
 'CAR1147',
 'CAR1100',
 'CAR1070',
 'CAR1130',
 'CAR1088',
 'CAR1141',
 'CAR1139',
 'CAR1062',
 'CAR1098',
 'CAR1160']

<IPython.core.display.Javascript object>

In [None]:
retro_db = retro("Forest-Offset-Projects-v0.3", use_cache=True)

retro_db = retro_db[~retro_db["project"]["early_action"].str.startswith("CAR")]
retro_db = retro_db[
    retro_db["baseline"]["initial_carbon_stock"]
    > retro_db["baseline"]["common_practice"]
]

In [None]:
issuance_table = issuance(
    "/Users/darryl/forest-retro/documents-of-interest/arb/issuance/arboc_issuance_2020-09-09.xlsx"
)
issuance_table = issuance_table[
    issuance_table["is_ea"] == False
]  # we didnt look at any of the EA proejcts in their EA form; exclude

agg_by_rp = issuance_table.groupby(["opr_id", "arb_rp_id"])[
    ["allocation", "buffer_pool"]
].sum()  # One project has multiple issuance events in its first reporting period, aggregate them
issuance_first_rp = agg_by_rp.xs("A", level=1)

In [None]:
opdr_calculated = allocation.calculate_allocation(retro_db, round_intermediates=False)
compare_allocations = pd.concat(
    [opdr_calculated, retro_db["rp_1"]["allocation"].rename("opdr_reported")], axis=1
)

In [None]:
display(
    f"There are {len(retro_db)} COP IFM projects where ICS > CP [we might relax this later]"
)

In [None]:
compare_allocations = compare_allocations.join(
    issuance_first_rp["allocation"].rename("issuance")
)

In [None]:
delta_opdr = (
    compare_allocations["opdr_reported"] - compare_allocations["opdr_calculated"]
)
delta_issuance = compare_allocations["opdr_reported"] - compare_allocations["issuance"]

# Issuance and Reported do not agree

In [None]:
issuance_reported_differ = delta_issuance[delta_issuance.abs() != 0]

# hand classified
reported_issuance_errors = {
    "unexplained": ["CAR1175", "CAR1257", "CAR1215", "CAR1264", "VCSOPR10", "CAR1213"],
    "flagged_correctable": ["CAR1103", "CAR1208"],
    "de_minimus": ["ACR284", "CAR1095"],
}

assert sum([len(v) for v in reported_issuance_errors.values()]) == len(
    issuance_reported_differ
)  # 12 December 2020; if change you better understand why

When issuance and reported differ, its always the case that OPDR_calc and OPDR_reported are within 1 (rounding issue). 
This gives us confidence that the Issuance != Reported fall into three primary categories:

1. Outdated OPDRs (5) [All at CAR]
2. De minimus rounding considerations (2)
3. Flagged Correctable (2)

There is one possible exception: CAR1175. It seems that the OPDR is up-to-date but 30 too many ARBOCs have been issued.

In [None]:
assert np.all(delta_opdr[delta_issuance.abs() != 0] < 1)

In [None]:
threshes = [1, 2, 10, 100]

for thresh in threshes:
    display(
        f"{len(compare_allocations[delta_opdr.abs() < thresh])} of the {len(retro_db)} projects are within {thresh} ARBOC(s)"
    )

And I've gone through all cases where the difference is > 1 and tried to figure out what the heck is going on -- those learnings are reproduced below but also kept here as a "comment" dict so can output those comments on a per project basis to a csv

In [1]:
comments = {
    # Reported != Issuance
    "ACR248": "OPDRreported includes fractional ARBOC. We never round OPDRreported",
    "CAR1095": "Allocation not reported in OPDR, only buffer pool contribution and our efforts to impute allocation yield a discrepancy",
    "CAR1103": "Correctable Error note issued that matches Issuance",
    "CAR1208": "Correctable Error note issued that matches Issuance",
    "CAR1175": "Unexaplained. BH confirms 30 ARBOC difference",
    "CAR1257": "Unexplained. Out of date OPDR?",
    "CAR1215": "Unexplained. Out of date OPDR?",
    "CAR1264": "Unexplained. OPDR reversal does not match Issuance-derived reversal. Seems likely an out of date OPDR.",
    "VCSOPR10": "Unexplained.",
    "CAR1213": "Unexplained. Initial OPDR has completion date that is more recent than Annual OPDR for RP1 and the two documents have different baselines. Seems Initial OPDR is out of date.",
    # Errors > 100
    # Unexaplained
    "CAR1183": "OPDRreported is 1000 greater than OPDRcalculated. Unexplainable.",
    # CD rounding that is definitely overcreditting
    "ACR282": "OPDR reports CD of 0.3%. However, OPDRreported seems to assume CD == 0%. Results in over creditting.",
    # CD rounding that is perhaps overcreditting
    "ACR427": "OPDR reports CD of 2.445%, but OPDRreported seems to assume CD == 2.4%. Depending on how rounding is treated, could be overcreditting.",
    "ACR360": "OPDR reports CD of 0.67% but OPDRreported seems to assume CD ~= 0.66531%. Likely not overcreditting but need clarification on rounding",
    # Harvest
    "ACR247": "Large harvest component -- still exploring. FC+TFG. BH agrees -- gets off by +12947. Has something to do with how they pro-rated haervest in baseline and potentially how they calculated secondary effects!",
    "CAR1217": "Large harvest component -- still exploring. BH off by +1047 as well. ", 
    "ACR276": "Large harvest component -- still exploring. BH off by +3298. Blue Source + TFG",
    # <= 100 & > 2; All explainable by CDreported != CDused
    "CAR1205": "TK -- recently entered, could have mistake",
    "CAR1032": "Whole value so likely not rounding, BH also off by 2",
    "CAR1094": "Could be caused by unrounded CD",
    "ACR257": "Could be caused by unrounded CD",
    "CAR1204": "Could be caused by unrounded CD",
    "ACR256": "Could be caused by unrounded CD",
    "ACR361": "Could be caused by unrounded CD [TK Double Check]",
    # Errors < 2 -- Explained by Leakage/CD rounding
    "ACR260": "Likely Rounding (CD and/or Leakage)",
    "ACR288": "Likely Rounding (CD and/or Leakage)",
    "CAR1314": "Likely Rounding (CD and/or Leakage)",
    "ACR423": "Likely Rounding (CD and/or Leakage)",
    "ACR182": "Likely Rounding (CD and/or Leakage)",
    "CAR1104": "TK",
    # Errors < 2 -- Have CD == 0 but still could be leakage rounding error? Have confirmed data entered correctly
    "CAR1066": "CD == 0. Intermediate rounding?",
    "ACR393": "CD == 0. Intermediate rounding?",
    
    # old problems but now resolved
    "CAR1197": "could be something funny with harvest, IFM-7/IFM-8 but if just take all values at face, it works",


}

In [None]:
full_comparison = compare_allocations.join(
    delta_issuance.rename("delta_reported_less_issuance")
).join(delta_opdr.rename("delta_reported_less_calculated"))
full_comparison["comment"] = full_comparison.index.map(comments)

full_comparison.to_csv("../data/odpr_issuance_math.csv", float_format="%.3f")

- ACR288: Using Initial OPDR reported value of 12.37% would yield 1.1 ARBOC difference
- CAR1046: see allocation analysis -- think that documentation out of date. 
- ACR425: If use 10.6% 3.2 ARBOC difference
- ACR458: If use 10.6% 78.478 ARBOC difference
- CAR1130: Issuance table yields 19.21 -- feeds into previous discussion of rounding. RP1 OPDR and Initial use 19.2, but section 7.3 of Initial OPDR reports 19.24 percent. Issuance yields 19.21 -- what is going on here? 

## semi-unrelated
- CAR1180: Undocumented change in risk reversal from Listing to RP1 because Initial is absent. What does FOP say about changes between listing and initial?
- CAR1264: similar undocumented change -- looks like we only have an outdated Initial OPDR

Questions of rounding become even more important when we get to confidence deductions.


In [None]:
issuance_reversal = (
    (issuance_first_rp["buffer_pool"] / issuance_first_rp["allocation"])
    .round(4)
    .rename("issuance_reversal")
)

In [None]:
reported_reversal = retro_db['project']['reversal_risk'].rename('reported_reversal')


In [None]:
reversal_risks = (
    pd.concat([reported_reversal.round(4), issuance_reversal], axis=1)
    .dropna()
    .astype(float)
)

In [None]:
reversal_risks.loc["CAR1205"][1]

In [None]:
reversal_risks[
    reversal_risks["reported_reversal"] != reversal_risks["issuance_reversal"]
]

In [None]:
delta_allocation[delta_allocation.abs() >= 100].sort_values()

# Reported != Issuance
## Early Action Errors
- CAR1062 -- First reporting period of this project was a reversal -- allocation listed as n/a in issuance table. 
- CAR1070 -- likely just not most up to date initial OPDR? Verification report has wrong #.
- CAR1161 (no note but partner project has note, so likely just out of date)
- CAR1162 (and has a note!) had a correctable error in 2020 -- which meant that it only got issued 777 credits. 

## De minimus
- ACR284: we never round

## Documentation Error

- CAR1095 -- no allocation listed in document and when impute from rounded values cannot get to issuance table

## Correctable Errors

- CAR1103 (BS) had a correctable error -- meant it was issued 3136 fewer credits. Project note provided, but no other details. 
- CAR1208 - another correctable error example.

## Unexplained
- CAR1175 -- this might be a transpose error? not sure -- why did it have allocation issued over three dates? BH gets same thing -- 30 too many. 
- CAR1257 -- maybe CAR forgot to upload correctable error?
- CAR1215 -- ?? 
- CAR1264 -- has to be out of date -- has incorrect reversal calculation too
- VCSOPR10 -- numbers just disagree -- no documentation explaining. Both Initial and Annual report values as appear in retro_db
- CAR1213 -- this is meaningful example. Initial OPDR is newer than RP1. Baseline in Initial is lower than Baseline in RP1 that seems to be outdated.



# Reported != Calculated
## Bigger Errors

### Just off?
- CAR1183: was originally off by 1000 but only error i could find just made things worse. Transpose? [TK from BH]

### Fairly Sure Rounding Error on Confidence 

#### Over creditting 
- ACR282: The OPDR reported number and the final issuance for first reporting period assumes that confdience deduction is equal to zero. 
OPDR reports a 0.3\% CD. 
Rounding, here, yields an over crediting of 9171 ARBOCs.
See `Extras` -- if set confidence deduction to 0, off by < 1. 
- ACR427: OPDR reports CD of 2.445\%. 
If you round CD to 2.4\%, OPDRcalculated = OPDRreported = Issuance. 
If rounding is not allowed, yields overcrediting of 4096 ARBOCs. 

#### Likely report rounded confidence deduction but use more precise (lower) CD.
- ACR360: Reports a confidence deduction of 6.7\%.
If we change CD to 6.6531\%, less than one ARBOC error. 
Seems likely the correct number of ARBOCs issued if CD can be reported to arbitrary precision.


### Big Harvest/TC (mainly...)
- ACR247 -- turns on how interp secondary fx? If scale IFM7/IFM8 by 1.5, it mostly works... [FC with harvest]
- CAR1217 reports positive Secondary Effects -- this will be cast to zero by calcs -- but more broadly, is this entry correct? [FC, with harvest]. BH gets the same answer. 
- CAR1197 Think they give themselves credit for negative leakage? [FC with harvest! Error == SE!]
- ACR276 similarly has a SE == 0, but to get their # right, need to have IFM7+IFM8actual - IFM7+IFM8baseline > 0 cast to 0. Still wrong after lots of effort to work through. IFM-7 and IFM-8 are probably too low because later RP have same values, but are roughly half as long [first RP == 551 days.]. Need BH back-up on this one. Defiitely could be incorrect [Blue Source w heavy harvest -- IFM7/8 scaling solve anything?]

## Middling Errors

In [None]:
delta_allocation[
    (delta_allocation.abs() < 100) & (delta_allocation.abs() >= 2)
].sort_values()

- CAR1032: Cannot be explained by confdience deduction. BH is off by exactly two as well!



## More Rounding


- CAR1094: Double checked -- seems explainable due to rounding of confidence deduction.
- ACR257: Double checked RP[0] -- seems explainable by small rounding of confidence 

## Over 10 ARBOCs
They're all explainable making reasonable inferences about rounding of confidence deduction
- CAR1204: An error of 0.016035 would get you wi 0.5 of an ARBOC -- [ROUNDED REPORTING >> USED RIGHT VALUE]
- ACR256: Confidence deduction reported at 1.00% -- but yeah likely culprit [double check w BH]
- ACR361: Similar situation. Confidence reported as 3.4 percent [But its BS project -- need to double check]
- CAR1205: TK


# Small Errors

In [None]:
delta_allocation[
    (delta_allocation.abs() >= 1) & (delta_allocation.abs() < 2)
].sort_values()

## Not rounding
do not have confidence deductions. Double checked both but cannot explain difference -- likely due to some sort of rounding of intermediary steps in ARBOC calculation? 


- CAR1066 
- ACR393 

## Likely Rounding
Remaining errors are small. We double checked them all on 8 Dec 2020 -- last time plan to revisit
- [x] ACR260
- [x] ACR288
- [x] CAR1314
- [x] ACR423
- [x] ACR182 [from worksheet can see exactly how rounding of confidence stat effects things]
- [ ] CAR1104 -- if use workbook number and not description in text, error disappears. BH TK.
