# How Green is My Grid?

## The Problem of Dirty Electricity

Climate scientists predict that we need to have net zero carbon emissions by [YEAR] in order to avoid the global warming "tipping point" of [BAD THING]. The shift away from using fossil fuels to power cars, stoves, and so on is a critical and readily visible part of this effort. However, moving from fossil fuels to electric power can't help us achieve net zero emissions unless electricity production itself has net zero emissions. Many of our power plants use CO2-emitting sources such as coal and natural gas to produce electricity. 

The problem of CO2-heavy electricity production will only grow. The US is estimated to [increase its electricity consumption](https://www.nationalgrid.com/stories/energy-explained/how-will-our-electricity-supply-change-future) by 50% by 2036 and will double by 2050. If we continue to use coal and fossil fuels to produce electricity, our CO2 emissions will be *increasing* at a time when the planet's future depends on our ability to *reduce* emissions.

[[why focus on co2]]

### Questions:

- How much electricity did these plants produce in 2021? How much power would we expect to need from them by 2036 and 2050?
- Which power plants are working at lowest capacity? I.e., which power plants would be able to produce more power as our demand for energy increases? 
- What percent of these are using "clean" (or almost clean) power? Which ones are using power sources that contribute to global warming?
- What will the overall impact of increased demand for electricity be in terms of pollution?
- Are there particular power companies/states producing power with a lower emissions rate that could serve as models for cleaner power production?

## Data Source

The US Environmental Protection Agency (EPA) releases the [eGrid report](https://www.epa.gov/egrid) each year. This report contains data on each of the 11K power plants in the US and Puerto Rico, including power sources, pollution, and efficiency. It also contains a summary of demographic information for the area surrounding each power plant. The most recent data is from 2021.

A full description of all terms and data in the dataset can be found in [this guide](https://www.epa.gov/system/files/documents/2023-01/eGRID2021_technical_guide.pdf).

Federal regulations require power plants to report their emissions and energy use. This is the data that is presented in the eGrid report. The EPA describes the dataset as containing information on "almost all electric power generated in the United States". It's not clear which power plants would be exempt from this reporting rule and what impact that missing data might have on analysis of the dataset. However, eGrid is used throughout the US government and in industry to calculate the environmental impact of power production, so we will follow the consensus that the dataset is representative of the entire country's power production.

## Clean Data

I cleaned the EPA's eGrid data by:
- extracting and renaming the relevant columns
- creating a schema of expected data types to catch irregularities, which led me to:
- casting columns to the correct data types
- removing rows with 'NaN' in critical columns
- normalizing plant owner and utility company names
    
You can find a notebook documenting the full data cleaning process [here](data/clean_egrid_data.ipynb).

## Load Libraries

In [1]:
import pandas as pd
import numpy as np
import bokeh

## Load Data

In [2]:
egrid_df = pd.read_csv('cleaned_egrid_data.csv')

In [3]:
egrid_df.head()

Unnamed: 0.1,Unnamed: 0,plant_sequence_num,state,plant_owner,utility_name,balancing_auth_code,nerc_region,egrid_subregion,county,latitude,...,oil_generation_percent,gas_generation_percent,nuclear_generation_percent,hydro_generation_percent,biomass_generation_percent,wind_generation_percent,solar_generation_percent,geothermal_generation_percent,other_fossil_fuel_generation_percent,other_purchased_generation_percent
0,4,4,AK,"Alaska Village Elec Coop, Inc","Alaska Village Elec Coop, Inc",UNKNOWN,AK,AKMS,Northwest Arctic,67.08798,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,5,5,AK,"Inside Passage Elec Coop, Inc","Inside Passage Elec Coop, Inc",UNKNOWN,AK,AKMS,Hoonah-Angoon,57.499166,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,6,6,AK,Aniak Light & Power Co Inc,Aniak Light & Power Co Inc,UNKNOWN,AK,AKMS,Bethel,61.580678,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,9,9,AK,Golden Valley Elec Assn Inc,Aurora Energy LLC,UNKNOWN,AK,AKGD,Fairbanks North Star,64.847743,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,10,10,AK,"Barrow Utils & Elec Coop, Inc","Barrow Utils & Elec Coop, Inc",UNKNOWN,AK,AKMS,North Slope,71.292,...,2.4e-05,0.999976,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
egrid_df.columns

## Current Electricity Production and Future Demand Estimate

We will use the total current electricity production from the dataset's power plants as an estimate of the current electricity produced in the US and Puerto Rico.

First, add a column to represent each plant's total annual MWh production. A megawatt hour is 1,000 kilowatts of electricity generated per hour. For reference, a megawatt hour of power can keep [two refrigerators running for a year](https://www.freeingenergy.com/what-is-a-megawatt-hour-of-electricity-and-what-can-you-do-with-it/).

In [None]:
egrid_df['annual_power_production_mwh']

In [None]:
annual_power_production = egrid_df['annual_power_production_mwh'].sum()

print('The total annual production is {total_energy} MWh.'.format(total_energy=str(annual_power_production)))

At a minimum, we can expect electricity consumption to rise by 50% by 2036 and to double by 2050. Based on current production, we can estimate that the US will need to produce:

In [None]:
est_2036_need = annual_power_production * 1.5
est_2050_need = annual_power_production * 2

'{est_2036} MWh by 2036 and {est_2050} MWh by 2050.'.format(est_2036=est_2036_need, est_2050=est_2050_need)