CONTENTS:
- [Description](#description)
  - [EIA Metadata API Tutorial](#eia-metadata-api-tutorial)
    - [Overview](#overview)
    - [Why Use This Notebook](#why-use-this-notebook)
    - [Requirements](#requirements)
  - [Setup](#setup)
    - [Imports](#imports)
    - [Set Up API Key](#set-up-api-key)
    - [Define Config](#define-config)
  - [Load Metadata](#load-metadata)
  - [Preview Metadata](#preview-metadata)
  - [Construct Full URL from One Value per Facet](#construct-full-url-from-one-value-per-facet)

<a name='description'></a>
# Description

This notebook demonstrates how to use the `EiaMetadataDownloader` to extract and understand
the metadata available via the EIA v2 API. It shows how to instantiate the downloader, run
extraction, and preview the resulting metadata and facet structure.

<a name='requirements'></a>
<a name='why-use-this-notebook'></a>
<a name='overview'></a>
<a name='eia-metadata-api-tutorial'></a>
## EIA Metadata API Tutorial

### Overview

In this notebook, you'll learn how to:

- Connect to the [EIA v2 API](https://www.eia.gov/opendata/) using a Python client.
- Traverse API categories to find available datasets.
- Retrieve and flatten metadata including frequency, available metrics, and facet dimensions.
- Access parameter values for facets such as state, sector, or energy type.

### Why Use This Notebook

- Automate the discovery of available EIA datasets without browsing the web interface.
- Generate all valid combinations of time series from EIA metadata.
- Understand how to construct API requests for specific metrics and filters.

### Requirements

To authenticate and interact with the EIA API, you'll need an API key. Follow these steps:

1. Visit the [EIA registration page](https://www.eia.gov/opendata/register.php).
2. Enter your email address and submit the form.
3. You'll receive a key via email—this key is used as a query parameter in all API requests.
4. Set the key as an environment variable (see [Set Up API Key](#set-up-api-key)).

<a name='setup'></a>
## Setup

<a name='imports'></a>
### Imports

In [1]:
%load_ext autoreload
%autoreload 2
import logging
import os

import helpers.hdbg as hdbg

import causal_automl.TutorTask401_EIA_metadata_downloader_pipeline.eia_utils as catemdpeu

# Enable logging.
hdbg.init_logger(verbosity=logging.INFO)
_LOG = logging.getLogger(__name__)

INFO  > cmd='/venv/lib/python3.12/site-packages/ipykernel_launcher.py -f /home/.local/share/jupyter/runtime/kernel-fa4cffb0-28ce-4810-a969-dfd68c850758.json'


<a name='set-up-api-key'></a>
### Set Up API Key

Store your **EIA API Key** as an environment variable for security. You can do this in your terminal:

```sh
export EIA_API_KEY="your_personal_api_key"
```

Alternatively, you can set it within the notebook:

In [11]:
# Set your EIA api key here.
os.environ["EIA_API_KEY"] = ""

In [3]:
# Ensure the api key is set correctly.
hdbg.dassert_in(
    "EIA_API_KEY", os.environ, msg="Missing environment variable EIA_API_KEY."
)

# Retrieve it when needed.
api_key = os.getenv("EIA_API_KEY")

<a name='define-config'></a>
### Define Config

In this section, we define the configuration used by the downloader:

- `category`: The root category path under the EIA v2 API. Examples include `electricity`, `petroleum`, `natural-gas`, etc.
- `version_num`: A version string to tag outputs. This is used in filenames and S3 paths.

These inputs help parameterize the metadata extraction process and keep output files versioned.

In [4]:
# Define category and output version.
category = "electricity"
version_num = "1.0"

<a name='load-metadata'></a>
## Load Metadata

We instantiate the `EiaMetadataDownloader` with a specified category, API key, and version number.

Then, we extract:
- A metadata table containing dataset routes, metrics, and frequencies
- A list of facet values required to construct valid API queries

In [5]:
# Initialize metadata downloader.
downloader = catemdpeu.EiaMetadataDownloader(
    category=category,
    api_key=api_key,
    version_num=version_num,
)

In [6]:
# Extract metadata.
df_metadata, param_entries = downloader.run_metadata_extraction()

<a name='preview-metadata'></a>
## Preview Metadata

Each dataset defines one or more facets, which are categorical dimensions used to filter time series data. A valid query must specify one value per required facet (e.g., `stateid=CA`, `sectorid=COM`).

In [7]:
# Preview metadata index.
df_metadata.head()

Unnamed: 0,url,id,dataset_id,name,description,frequency_id,frequency_alias,frequency_description,frequency_query,frequency_format,facets,data,data_alias,data_units,start_period,end_period,parameter_values_file
0,https://api.eia.gov/v2/electricity/retail-sale...,retail_sales_monthly_revenue,retail_sales,Electricity Sales to Ultimate Customers,Electricity sales to ultimate customer by stat...,monthly,,One data point for each month.,M,YYYY-MM,"[{'id': 'stateid', 'description': 'State / Cen...",revenue,Revenue from Sales to Ultimate Customers,million dollars,2001-01,2025-02,eia_parameters_v1.0/retail_sales_parameters.csv
1,https://api.eia.gov/v2/electricity/retail-sale...,retail_sales_monthly_sales,retail_sales,Electricity Sales to Ultimate Customers,Electricity sales to ultimate customer by stat...,monthly,,One data point for each month.,M,YYYY-MM,"[{'id': 'stateid', 'description': 'State / Cen...",sales,Megawatt-hours Sold to Ultimate Customers,million kilowatt hours,2001-01,2025-02,eia_parameters_v1.0/retail_sales_parameters.csv
2,https://api.eia.gov/v2/electricity/retail-sale...,retail_sales_monthly_price,retail_sales,Electricity Sales to Ultimate Customers,Electricity sales to ultimate customer by stat...,monthly,,One data point for each month.,M,YYYY-MM,"[{'id': 'stateid', 'description': 'State / Cen...",price,Average Price of Electricity to Ultimate Custo...,cents per kilowatt-hour,2001-01,2025-02,eia_parameters_v1.0/retail_sales_parameters.csv
3,https://api.eia.gov/v2/electricity/retail-sale...,retail_sales_monthly_customers,retail_sales,Electricity Sales to Ultimate Customers,Electricity sales to ultimate customer by stat...,monthly,,One data point for each month.,M,YYYY-MM,"[{'id': 'stateid', 'description': 'State / Cen...",customers,Number of Ultimate Customers,number of customers,2001-01,2025-02,eia_parameters_v1.0/retail_sales_parameters.csv
4,https://api.eia.gov/v2/electricity/retail-sale...,retail_sales_quarterly_revenue,retail_sales,Electricity Sales to Ultimate Customers,Electricity sales to ultimate customer by stat...,quarterly,,One data point every 3 months.,Q,"YYYY-""Q""Q","[{'id': 'stateid', 'description': 'State / Cen...",revenue,Revenue from Sales to Ultimate Customers,million dollars,2001-01,2025-02,eia_parameters_v1.0/retail_sales_parameters.csv


In [8]:
# Preview facet values.
df_facet = param_entries[0][0]
df_facet.head()

Unnamed: 0,dataset_id,facet_id,id,name,alias
0,retail_sales,stateid,IN,Indiana,(IN) Indiana
1,retail_sales,stateid,KS,Kansas,(KS) Kansas
2,retail_sales,stateid,MAT,Middle Atlantic,Region: (MAT) Middle Atlantic
3,retail_sales,stateid,CT,Connecticut,(CT) Connecticut
4,retail_sales,stateid,VA,Virginia,(VA) Virginia


In [9]:
# Show unique facet types and sample values for each.
df_facet.groupby("facet_id").head(1)

Unnamed: 0,dataset_id,facet_id,id,name,alias
0,retail_sales,stateid,IN,Indiana,(IN) Indiana
62,retail_sales,sectorid,OTH,other,(OTH) other


<a name='construct-full-url-from-one-value-per-facet'></a>
## Construct Full URL from One Value per Facet

In [10]:
# Since the URL would expose the actual API key, we overwrite it with a placeholder for safe display.
api_key = "API_KEY"

# Select sample route.
meta = df_metadata.iloc[0]

# Select facet values.
facet_input = {"stateid": "IN", "sectorid": "OTH"}

# Build a query URL to retrieve actual time series values from the EIA API.
full_url = catemdpeu.build_full_url(
    base_url=meta["url"],
    api_key=api_key,
    facet_input=facet_input,
)
print(full_url)

https://api.eia.gov/v2/electricity/retail-sales/data?api_key=API_KEY&frequency=monthly&data[0]=revenue&facets[stateid][]=IN&facets[sectorid][]=OTH
