# Chapter 1 - EIA API - Python Client

In this section, we will see how query the EIA API with Python using. We will use the eia_api.py to send GET requests to the API.

We will continue with the same example we used before - the hourly demand of electricity for balancing authority subregion PGAE. As before, we will use the API dashboard to extract the GET request:

<figure>
<img src="./images/query-detail.png" width="100%" align="center"/></a>
<figcaption> Figure 1 - The GET request details for balancing authority subregion PGAE</figcaption>
</figure>

The `eia_api.py` file provides a set of functions to query data from the EIA API V2. This includes the following functions:

- `eia_get` - to send GET request for data
- `eia_metadata` - to send GET request for metadata
- `eia_backfill` - to send a GET request for large data (more than 5000 observations)

In [7]:
import eia_api as api

In addition, we will import the following libraries:

In [8]:
import os
import datetime
import plotly.express as px

## Pulling Metadata

Setting the api key and the api path to pull data:

In [9]:
api_key = os.getenv('EIA_API_KEY')

api_meta_path = "electricity/rto/region-sub-ba-data/"

In [10]:
api_key

'dxkgn3KIeYc4tlAsjaUtCeQw30lCNxLHJg9t3JGZ'

Sending GET request for route metadata:

In [11]:
meta = api.eia_metadata(
    api_key = api_key,
    api_path = api_meta_path  
)

In [12]:
meta.meta

{'id': 'region-sub-ba-data',
 'name': 'Hourly Demand by Subregion',
 'description': 'Hourly demand by balancing authority subregion.  \n    Source: Form EIA-930\n    Product: Hourly Electric Grid Monitor',
 'frequency': [{'id': 'hourly',
   'alias': 'hourly (UTC)',
   'description': 'One data point for each hour in UTC time.',
   'query': 'H',
   'format': 'YYYY-MM-DD"T"HH24'},
  {'id': 'local-hourly',
   'alias': 'hourly (Local Time Zone)',
   'description': 'One data point for each hour in local time.',
   'query': 'LH',
   'format': 'YYYY-MM-DD"T"HH24TZH'}],
 'facets': [{'id': 'subba', 'description': 'Subregion'},
  {'id': 'parent', 'description': 'Balancing Authority'}],
 'data': {'value': {'aggregation-method': 'SUM',
   'alias': 'Demand',
   'units': 'megawatthours'}},
 'startPeriod': '2018-06-19T05',
 'endPeriod': '2024-08-05T07',
 'defaultDateFormat': 'YYYY-MM-DD"T"HH24',
 'defaultFrequency': 'hourly'}

## Sending A Simple GET Request

Setting a GET Request:

In [13]:
api_key = os.getenv('EIA_API_KEY')

api_path = "electricity/rto/region-sub-ba-data/data/"

frequency = "hourly"

facets = {
    "parent": "CISO",
    "subba": "PGAE"
}

In [14]:
df1 = api.eia_get(
    api_key = api_key,
    api_path = api_path,
    frequency = frequency,
    facets = facets
)

In [15]:
df1.url

'https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/data/?data[]=value&facets[parent][]=CISO&facets[subba][]=PGAE&frequency=hourly&api_key='

In [16]:
df1.parameters

{'api_path': 'electricity/rto/region-sub-ba-data/data/',
 'data': 'value',
 'facets': {'parent': 'CISO', 'subba': 'PGAE'},
 'start': None,
 'end': None,
 'length': None,
 'offset': None,
 'frequency': 'hourly'}

In [17]:
df1.data

Unnamed: 0,period,subba,subba-name,parent,parent-name,value,value-units
4031,2018-07-02 22:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,14310,megawatthours
4032,2018-07-02 23:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,14895,megawatthours
4033,2018-07-03 01:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,15871,megawatthours
4034,2018-07-03 02:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,16086,megawatthours
4035,2018-07-03 04:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,15651,megawatthours
...,...,...,...,...,...,...,...
425,2023-11-01 02:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11411,megawatthours
399,2023-11-01 03:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11802,megawatthours
468,2023-11-01 05:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11305,megawatthours
388,2023-11-01 06:00:00,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11150,megawatthours


In [18]:
df1.data.dtypes

period         datetime64[ns]
subba                  object
subba-name             object
parent                 object
parent-name            object
value                   int64
value-units            object
dtype: object

## API Limitation

Let's plot the series:

In [None]:
px.line(df1.data, x= "period", y= "value")

The `start` and `end` arguments enable us to set a time range to the GET request. For example, let's pull data betweem January 1st, 2024 and February 24th, 2024:

In [None]:
start = datetime.datetime(2024, 1, 1, 1)
end = datetime.datetime(2024, 2, 24, 23)

df2 = api.eia_get(
    api_key = api_key,
    api_path = api_path,
    frequency = frequency,
    facets = facets,
    start = start,
    end = end
)

In [None]:
px.line(df2.data, x="period", y="value")

## Handling A Large Data Request

When we have to pull a series with a number of observations that exceed the API limitation of 5000 observations per call, use the `eia_backfill` function. The function splits the request into multiple small requests, where the `offset` argument defines the size of each request. It is recommended not to use an offset larger than 2500 observations. For example, let's pull data since July 1st, 2018:

In [None]:
start = datetime.datetime(2018, 7, 1, 8)
end = datetime.datetime(2024, 2, 24, 23)
offset = 2250

df3 = api.eia_backfill(
  start = start,
  end = end,
  offset = offset,
  api_path= api_path,
  api_key = api_key,
  facets = facets)

In [None]:
df3.data

In [None]:
p = px.line(df3.data, x="period", y="value")
p.show()