In [None]:
import facilities
import pandas as pd
import projects
import users_and_facilities
import users_and_projects

In [None]:
# Define years over which compliance data will be considered and where to find it
# FOR UPDATES: add reporting and mrr data years

reporting_periods = ["2013-2014", "2015-2017", "2018-2020", "2021-2023"]
# reporting_periods = ["2013-2014", "2015-2017", "2018-2020", "2021", "2022"]
mrr_data_years = [
    "2013",
    "2014",
    "2015",
    "2016",
    "2017",
    "2018",
    "2019",
    "2020",
    "2021",
    "2022",
    "2023",
]

# FOR UPDATES: change to latest issuance table file name
issuance_table_path = "../data/nc-arboc_issuance_2024-12-10.xlsx"
compliance_report_path = "../data/compliance-reports/"
mrr_data_path = "../data/mrr-data/"

## Check project_df against issuance data

Check that first and last rows of the new issuance table are included in `project_df`. 

In [None]:
projects.read_project_data(issuance_table_path)

## Check facility_df against mrr data

In [None]:
facility_df = facilities.read_facility_data(mrr_data_path, mrr_data_years)

The new reporting period should only include`facility_id`s that appear in the corresponding MRR data sheets. 

If the new reporting period is an annual update (e.g. '2022'), check that the first and last rows of the newest MRR data correspond with the first and last rows associated with the new reporting period in `facility_df`. 

If the new reporting period is an full reporting period update (e.g. `2021-2023`) check that the head of `facility_df` corresponds to facilities any of the corresponding MRR data files, and the tail of `facility_df` corresponds to the latest MRR data. (We only keep the lates instance of each `facility_id` in the dataframe). 

In [None]:
facility_df[facility_df["reporting_period"] == "2021-2023"]

## Check user_project_df against compliance report

In [None]:
users_and_projects = users_and_projects.read_user_project_data(
    compliance_report_path, reporting_periods
)

For spot checking the `user_project_df` it's helpful to reference both the latest compliance report and the published version of the compliance users tool. First, subset the dataframe to the newly added reporting period and check that the head and tail match the top and bottom of the `Offset Detail` sheet of the newest compliance report.

In [None]:
users_and_projects[users_and_projects["reporting_period"] == "2021-2023"]

Second, do some spot checks using the compliance users tool: https://carbonplan.org/research/compliance-users

For a number of random users, check that the tool plus the latest compliance report matches up with the `users_and_projects` dataframe.

In [None]:
users_and_projects[users_and_projects["user_id"] == "CA1204"]

# Check user_facility against compliance report

In [None]:
user_facility_df = users_and_facilities.read_user_facility_data(
    compliance_report_path, reporting_periods
)

Rows associted with the new reporting period in `user_facility_df` should correspond to unique ARB GHG IDs in the `Compliance Summary` tab of the new compliance report. 

In [None]:
user_facility_df[user_facility_df["reporting_period"] == "2021-2023"]

It's also helpful to spot check a couple of random users (i.e. entities) from the new compliance report against the `user_facility_df` and the existing compliance users tool: https://carbonplan.org/research/compliance-users

For a handful of `user_id`s, check that the tool results plus the new compliance report data match the dataframe. 

In [None]:
user_facility_df[user_facility_df["user_id"] == "CA1170"]

For a handful of `facility_id`s that appear in the new compliance report, check that the tool results plus new compliance report data match the dataframe.

In [None]:
user_facility_df[user_facility_df["facility_id"] == "101162"]

That should do it! 

If all of those checks pass, run `build_users_data.py`. 

## Sandbox

Keep any other testing down here!

In [None]:
name, info = facilities.make_facility_info(facility_df, user_facility_df)