# **Importing the VA Electricity Capability Dataset**
This dataset contains information about **electricity capabilities** in Virginia.
[Dataset can be found by clicking here!](https://www.eia.gov/opendata/browser/electricity/state-electricity-profiles/capability?frequency=annual&data=capability;&facets=stateId;&stateId=VA;&start=1990&end=2024&sortColumn=period;&sortDirection=desc;)

- But what is the electricity capability? It refers to the maximum amount of electric power that can be generated, transmitted, or distributed by a system under specific conditions. So, in other words, for this dataset, it's the maximum amount of electric power a generating unit can reliably produce during the summer. And in this case, it's when all resources are available. It's an important measure to look at because it shows the actual power Virginia can reliably supply during peak stress
- So lets take a deeper look into this dataset. 

 ## **Who produced the data and how?**

The dataset is produced by the U.S. Energy Information Administration (EIA), an independent statistical agency within the U.S. Department of Energy. EIA is responsible for collecting, validating, and publishing national energy data to support policy analysis, market transparency, and public research.

The “capability” metric used in this dataset is derived primarily from **Form EIA-860 (Annual Electric Generator Report)**. This federal survey requires power plant operators to report detailed information on each generating unit—including nameplate capacity, prime mover, fuel type, ownership, and location. Only generators of 1 MW or larger are included.

EIA aggregates these generator-level responses into state-level annual statistics, performs quality checks, and then publishes the results through:

- the *Electric Power Annual*,
- the *State Electricity Profiles*, and
- the EIA Open Data API (used here).

I chose using the API endpoint because it provides a clean, machine-readable version of those state-level capacity totals, ensuring consistency with EIA’s published tables while making it easy to integrate the data into analytical workflows.

## **How I Obtained the Data**

I accessed the dataset through the U.S. Energy Information Administration (EIA) open-data API (v2). Specifically, I queried the *State Electricity Profiles – Capability* endpoint, which provides annual electricity-generating capability (capacity) for each U.S. state. I limited the request to Virginia (`stateId = VA`) and selected the years 1990–2024.

The final API call used in this project is: 

```python
api_Url = "https://api.eia.gov/v2/electricity/state-electricity-profiles/capability/data/?frequency=annual&data[0]=capability&facets[stateId][]=VA&start=1990&end=2024&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000"
```

To download the data, I first generated a free EIA API key and added it to the URL. The API returns a JSON object, which I retrieved in Python using a standard `requests.get()` call and converted into a pandas DataFrame. This workflow is fully reproducible—rerunning the same code will automatically pull updated values as EIA releases new State Electricity Profile data each year.

### **Python Script Used to Obtain the Data**

The following Python snippet demonstrates how I retrieved the Virginia electricity capability data directly from the EIA Open Data API. The script sends a request to the *State Electricity Profiles Capability* endpoint, parses the JSON response, and saves the data as a CSV file for further analysis.

```python
import requests
import pandas as pd

api_key = "your_api_key_here"

url = (
    "https://api.eia.gov/v2/electricity/state-electricity-profiles/capability/data/"
    "?frequency=annual"
    "&data[0]=capability"
    "&facets[stateId][]=VA"
    "&start=1990"
    "&end=2024"
    "&sort[0][column]=period"
    "&sort[0][direction]=desc"
    "&offset=0"
    "&length=5000"
    f"&api_key={api_key}"
)

# Fetch from API
print("Fetching data from API...")
response = requests.get(url)
response.raise_for_status()

# Parse JSON
data = response.json()
print(f"API Response Keys: {data.keys()}")

# Convert to DataFrame
df = pd.json_normalize(data["response"]["data"])
print(f"DataFrame shape: {df.shape}")

# Save to CSV
df.to_csv("va_electricity_capability.csv", index=False)
print("CSV file saved successfully!")

print("\nFirst few rows:")
print(df.head())



## **Now let's read in the Data Sources**

In [6]:
# Let's import the libraries that we will use. 
import pandas as pd 
import numpy as np

In [7]:
data = pd.read_csv("va_electricity_capability.csv")
data.head(10)

Unnamed: 0,period,stateId,stateDescription,producertypeid,producerTypeDescription,energysourceid,energySourceDescription,capability,capability-units
0,2024,VA,Virginia,EU,Electric Utilities,ALL,All,21256.2,megawatts
1,2024,VA,Virginia,EU,Electric Utilities,BAT,Battery,60.0,megawatts
2,2024,VA,Virginia,EU,Electric Utilities,COL,Coal,1487.1,megawatts
3,2024,VA,Virginia,EU,Electric Utilities,HPS,Pumped Storage,3253.1,megawatts
4,2024,VA,Virginia,EU,Electric Utilities,HYC,Hydroelectric,821.6,megawatts
5,2024,VA,Virginia,EU,Electric Utilities,NG,Natural Gas,9963.4,megawatts
6,2024,VA,Virginia,EU,Electric Utilities,NGCC,Natural Gas - CC,6132.4,megawatts
7,2024,VA,Virginia,EU,Electric Utilities,NGGT,Natural Gas - GT,3370.0,megawatts
8,2024,VA,Virginia,EU,Electric Utilities,NGIC,Natural Gas - IC,1.0,megawatts
9,2024,VA,Virginia,EU,Electric Utilities,NGST,Natural Gas - ST,460.0,megawatts


In [8]:
#standardize the column names
new_columns = [col_name.lower().replace('-','') for col_name in data.columns.to_list()]
data.columns = new_columns 
data.head(10)

Unnamed: 0,period,stateid,statedescription,producertypeid,producertypedescription,energysourceid,energysourcedescription,capability,capabilityunits
0,2024,VA,Virginia,EU,Electric Utilities,ALL,All,21256.2,megawatts
1,2024,VA,Virginia,EU,Electric Utilities,BAT,Battery,60.0,megawatts
2,2024,VA,Virginia,EU,Electric Utilities,COL,Coal,1487.1,megawatts
3,2024,VA,Virginia,EU,Electric Utilities,HPS,Pumped Storage,3253.1,megawatts
4,2024,VA,Virginia,EU,Electric Utilities,HYC,Hydroelectric,821.6,megawatts
5,2024,VA,Virginia,EU,Electric Utilities,NG,Natural Gas,9963.4,megawatts
6,2024,VA,Virginia,EU,Electric Utilities,NGCC,Natural Gas - CC,6132.4,megawatts
7,2024,VA,Virginia,EU,Electric Utilities,NGGT,Natural Gas - GT,3370.0,megawatts
8,2024,VA,Virginia,EU,Electric Utilities,NGIC,Natural Gas - IC,1.0,megawatts
9,2024,VA,Virginia,EU,Electric Utilities,NGST,Natural Gas - ST,460.0,megawatts


In [9]:
COLS = pd.DataFrame(index = data.columns)
COLS["data_type"] = data.dtypes
COLS

Unnamed: 0,data_type
period,int64
stateid,object
statedescription,object
producertypeid,object
producertypedescription,object
energysourceid,object
energysourcedescription,object
capability,float64
capabilityunits,object


This shows us that the **period** feature is of type **integer** and the **capability** feature is of type **float**. The rest are classified as objects or, in other words, categories. 

 - I would go a step further and classify them as follows:
   - period = *interval*
   - stateid = *nominal*
   - statedescription = *nominal*
   - producertypeid = *nominal*
   - producertyepdescription = *nominal*
   - energysourceid = *nominal*
   - energysourcedescription = *nominal*
   - capability = *ratio*
   - capabilityunits = *nominal*

Let's take it a step forward and get a **list of the unique energy source IDs** and the **description of each** so that we know what each one means. 

In [12]:
data_by_producer_id = data[(data['producertypeid']== 'TOT')]
data_by_producer_id[["energysourceid", "energysourcedescription"]].drop_duplicates()

Unnamed: 0,energysourceid,energysourcedescription
34,ALL,All
35,BAT,Battery
36,COL,Coal
37,HPS,Pumped Storage
38,HYC,Hydroelectric
39,NG,Natural Gas
40,NGCC,Natural Gas - CC
41,NGGT,Natural Gas - GT
42,NGIC,Natural Gas - IC
43,NGST,Natural Gas - ST


Above, we can see all the energy source IDs and their descriptions. 

Some key classifications: 

 - You may be asking yourself why there are multiple natural gases and petroleum IDs. Well, the tables below help answer this question. 

| Code | Meaning                              | Description |
|------|---------------------------------------|-------------|
| CC   | Combined Cycle                        | Gas + steam turbine system; most efficient NG generation |
| NGGT | Natural Gas – Gas Turbine             | Simple-cycle gas turbine; fast-ramping peaker units |
| NGIC | Natural Gas – Internal Combustion     | Large reciprocating engines; used in small/backup power plants |
| NGST | Natural Gas – Steam Turbine           | Traditional steam boiler + turbine using natural gas |


| Code   | Meaning                     | Description |
|--------|-----------------------------|-------------|
| PET    | Petroleum                   | General category for petroleum-based generation |
| PETGT  | Petroleum – Gas Turbine     | Simple-cycle gas turbine that burns petroleum liquids |
| PETIC  | Petroleum – Internal Comb.  | Reciprocating internal-combustion engine using petroleum fuels |


Let's also take a look at the producer type IDs. **Producer Type IDs** are important because they tell you who owns or operates the generating units. This matters when you’re analyzing capability/production because different producer types have different endpoints for consumption (Electric utilities vs. independent power producers).

| ProducerTypeID | ID meaning            | Description |
|----------------|-------------------------------|-------------------------------|
| EU             | Electric Utility              | Regulated utilities that serve Virginia retail customers. This includes Dominion Energy Virginia, Appalachian Power (AEP), Old Dominion Power (Kentucky Utilities), and Virginia’s electric cooperatives. These entities own generation that supports their retail load and operate under the oversight of the Virginia State Corporation Commission (SCC). |
| IPP             | Independent Power Producer    | Merchant power companies operating in Virginia that sell electricity directly into the PJM wholesale market rather than to retail customers. Many of Virginia’s large natural gas combined-cycle plants and utility-scale solar farms are owned by IPPs, not Dominion or APCo. These facilities are driven by market prices and data-center-driven demand growth in Northern Virginia. |

## Now that you have an excellent understanding of the Virginia energy capabilities dataset, let's explore the data. ---> Move onto the EDA notebook found in EDA folder!
