# FERC Electric Quarterly Report (EQR) Access Examples

### Background
The Electric Quarterly Report (EQR) is submitted by sellers participating in bilateral electricity market transactions. The reports summarize the contractual terms and conditions in agreements for all jurisdictional services, including cost-based sales, market-based rate sales, and transmission service, as well as transaction information for short-term and long-term market-based power sales and cost-based power sales.

We've made an initial version of the EQR dataset available in PUDL. We also apply some [basic cleaning](https://catalystcoop-pudl.readthedocs.io/en/latest/data_sources/ferceqr.html#pudl-data-transformations) to the raw data, though we know there is a lot more to be done - please [reach out](https://github.com/catalyst-cooperative/pudl/issues/new?template=data_bug_report.yml) if you discover any irregularities within the data!

EQR data is comprised of the following 4 tables, which cover:

- `core_ferceqr__contracts`: contracts between companies participating in the power market
- `core_ferceqr__quarterly_identity`: information about report filers
- `core_ferceqr__quarterly_index_pub`: information about price indices
- `core_ferceqr__transactions`: individual transactions

For table and column level metadata see our [docs](https://catalystcoop-pudl.readthedocs.io/en/latest/data_sources/ferceqr.html#ferc-form-920-electric-quarterly-report-eqr).

### Working with EQR

The main challenge of working with EQR is the size of the `core_ferceqr__transactions` table, which is >83GB in total. The other tables are all small enough that they can be loaded with a call to `pandas.read_parquet()` - which we will demonstrate.

This notebook will go over how to access EQR data, and how to use `duckdb` and `pandas` together to work with the large transactions table.

## Basic access

We can access the EQR data much like other Parquet files published via PUDL - we just need to take care to:

* use the `ferceqr` directory (instead of `nightly` or `stable`)
* pass in the whole directory (`.../core_ferceqr__contracts`) instead of a single file within it (`.../core_ferceqr__contracts/2024q1.parquet`) - this lets pandas read *all* the data instead of just one part.

In [None]:
import pandas as pd

contracts = pd.read_parquet("s3://pudl.catalyst.coop/ferceqr/core_ferceqr__contracts/")
contracts.head()

The `contracts` table contains information about contracts between companies selling and buying electricity market products. 

For example, we can use this table to examine the distribution of types of products sold in a single Balancing Authority (BA) over a given time period. Let's look at PJM over the last full year of data, 2024:

In [None]:
import datetime

ba_name = "PJM"
date_range_start = datetime.date(2024, 1, 1)
date_range_end = datetime.date(2025, 1, 1)

contracts_filtered = contracts.loc[
    (contracts["point_of_delivery_balancing_authority"] == ba_name)
    & (contracts["contract_execution_date"] >= date_range_start)
    & (contracts["contract_execution_date"] <= date_range_end)
]

contracts_filtered["product_type_name"].value_counts().plot.bar(rot=0)

It's clear that the vast majority of these contracts are classified as `MB`, and we can check the [data dictionary](https://catalystcoop-pudl.readthedocs.io/en/latest/data_dictionaries/pudl_db.html#core-ferceqr-contracts) to see the description of this code (and others):

```
MB: Energy, capacity or ancillary services sold under the sellerâ€™s FERC-approved market-based rate tariff.
```

There's a lot more information in this table. You can look at the rates written into the contracts, finding which companies are the most common sellers, or the most common buyers, etc. We'll leave that in your capable hands, and discuss options for working with the `transactions` table next.

## Accessing the Transactions Table

This table is quite large, and you won't be able to read it all into memory.

We'll go over two options:

* only process a small part of the data at a time, using `pandas`
* use `duckdb` to operate on the full data

Let's talk about `pandas` first. To shrink the amount of data we try to read in, we'll filter the data first. Let's look at just one month of data in PJM:

In [None]:
date_range_start = datetime.date(2024, 1, 1)
date_range_end = datetime.date(2024, 2, 1)

transactions = pd.read_parquet(
    "s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/",
    filters=[
        ("trade_date", ">=", date_range_start),
        ("trade_date", "<=", date_range_end),
        ("point_of_delivery_balancing_authority", "=", ba_name)
    ],
)
transactions.head()

This works, if slowly, and if you only need to work with a small portion of the table at a time. It will quickly run up against memory limitations if you ever need to access a larger chunk of the table. Tools like `duckdb` specialize in managing and optimizing the fiddly "load a small bit of data in, process it, then load a different small bit of data in" work.

First, let's use `duckdb` to recreate the query above:

In [None]:
import duckdb

(
    duckdb.read_parquet("s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/*.parquet")
    .filter(f"trade_date BETWEEN DATE '{date_range_start}' AND DATE '{date_range_end}'")
    .filter(f"point_of_delivery_balancing_authority = '{ba_name}'")
    .limit(5)
    .fetchdf()
)

`duckdb` is also capable of operating on the entire table at once.

As an example, we'll examine the distribution of values in the column `class_name`, which identifies transactions as firm (F), non-firm (NF), or unit power sale (UP).

In [None]:
(
    duckdb.read_parquet("s3://pudl.catalyst.coop/ferceqr/core_ferceqr__transactions/*.parquet")
    .value_counts(column="class_name", groups="class_name")
    .fetchdf()
    .set_index("class_name")
    .plot.bar(rot=0)
)

As with the `contracts` table there is plenty more to explore by yourself - individual transaction prices and product quantities come to mind. As a reminder, this data is only minimally cleaned, so if you run into any issues please [let us know](https://github.com/catalyst-cooperative/pudl/issues/new?template=data_bug_report.yml) and we'll try to fix them!