In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import requests
import json


## EIA API Details

The one-stop shop for all our API needs is at the [documentation page](https://www.eia.gov/opendata/documentation.php#Submittingrequesttoour) and the [API explorer](https://www.eia.gov/opendata/browser/electricity/rto/region-sub-ba-data?frequency=hourly&data=value;&facets=parent;subba;&parent=CISO;&subba=PGAE;&start=2018-06-19T00&end=2023-10-05T00&sortColumn=period;&sortDirection=asc;).

### Rate limits and important usage details:
Failure to adhere to these limits will cause a timeout in our usage!
- Limits:
    - Data length maximum: 5000
    - Maximum sustained requests: 9000/hour
    - Maximum burst requests: 5/second
- Security:
    - To generate data, we need to pass an API key along with our call
    - You can request one through the EIA website


**EIA Developer Key**: JUMaqGFwj6nligR9EDS8O9mYX0qFSD38L7jkES7K

In [11]:
# Example API call
response_api = requests.get("https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/data/?api_key=JUMaqGFwj6nligR9EDS8O9mYX0qFSD38L7jkES7K&frequency=hourly&data[0]=value&facets[parent][]=CISO&facets[subba][]=PGAE&start=2018-06-19T00&end=2023-10-05T00&sort[0][column]=period&sort[0][direction]=asc&offset=0&length=5000")
response_api

<Response [200]>

In [26]:
trial_df = pd.DataFrame(json.loads(response_api.text)['response']['data'])
trial_df

Unnamed: 0,period,subba,subba-name,parent,parent-name,value,value-units
0,2018-07-01T08,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,12522,megawatthours
1,2018-07-01T09,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11745,megawatthours
2,2018-07-01T10,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,11200,megawatthours
3,2018-07-01T11,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10822,megawatthours
4,2018-07-01T12,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10644,megawatthours
...,...,...,...,...,...,...,...
4995,2019-01-25T15,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10936,megawatthours
4996,2019-01-25T16,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10870,megawatthours
4997,2019-01-25T17,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,10400,megawatthours
4998,2019-01-25T18,PGAE,Pacific Gas and Electric,CISO,California Independent System Operator,9575,megawatthours


Data looks extremely clean, no null values to be found, at least in prelim analysis

In [31]:
trial_df.isna().sum(axis=0)

period         0
subba          0
subba-name     0
parent         0
parent-name    0
value          0
value-units    0
dtype: int64

# I am so... confused

We can get data at the granularity not at the level of balancing authority (CAISO), but at the level of PG&E. This creates several really weird questions:
1. Why are we able to get demand levels from PG&E directly? Aren't they not the grid operators/BA's here?
    - This is actually what ISO means. While PG&E owns several transmission lines (i.e they are major transimission owners (TOs), they cede control to the CAISO for balance purposes).
    - PG&E also owns some generators!
    - CAISO sort of operates a pool of energy. We buy energy from PG&E, because they will generate/buy the amount of energy we consume...? 
2. When we take into account that we buy power from PG&E directly, it begs the question: who is actually responsible for powering any particular person's house? Could I have bought from a different participating transmission owner (TO) instead?
    - Yes! CAISO means that you can get your power delivered by PG&E, but buy power from another company!
3. 