Fossil Fuel Power Plants vs Renewable Power Plants

Comparing environmental(AQI), health (Chronic Resp) and economic data (CAINC1) by county.



- Compare Towns with Nuclear vs Towns without Nuclear
- Compare Towns with renewables within themselves
- Compare Towns with a power plant vs the rest of the state

Compare years 2000 and 2010


## Documentation

Power Plant Data: https://www.kaggle.com/datasets/behroozsohrabi/us-electric-power-plants/data

Resp Data: https://ghdx.healthdata.org/sites/default/files/record-attached-files/IHME_USA_COUNTY_RESP_DISEASE_MORTALITY_1980_2014_NATIONAL_XLSX.zip

AQI Data: https://aqs.epa.gov/aqsweb/airdata/annual_aqi_by_county_2010.zip, https://aqs.epa.gov/aqsweb/airdata/annual_aqi_by_county_2000.zip

Cainc1 Data: https://apps.bea.gov/regional/zip/CAINC1.zip

## Plan of Attack:
- Bring in all relevant data
- Filter for study population
    - Counties with Power Plants
- Join/Merge the different datasets into one master dataset for analysis
- Compare towns with power plants to towns without power plants
- Compare towns with different types of power plants
- Compare different types of renewables
- Discuss any confounding variables
- 

In [1]:

## Imports and constants
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

us_state_to_abbrev = {
    "ALABAMA": "AL",
    "ALASKA": "AK",
    "ARIZONA": "AZ",
    "ARKANSAS": "AR",
    "CALIFORNIA": "CA",
    "COLORADO": "CO",
    "CONNECTICUT": "CT",
    "DELAWARE": "DE",
    "FLORIDA": "FL",
    "GEORGIA": "GA",
    "HAWAII": "HI",
    "IDAHO": "ID",
    "ILLINOIS": "IL",
    "INDIANA": "IN",
    "IOWA": "IA",
    "KANSAS": "KS",
    "KENTUCKY": "KY",
    "LOUISANA": "LA",
    "MAINE": "ME",
    "MARYLAND": "MD",
    "MASSACHUSETTS": "MA",
    "MICHIGAN": "MI",
    "MINNESOTA": "MN",
    "MISSISSIPPI": "MS",
    "MISSOURI": "MO",
    "MONTANA": "MT",
    "NEBRASKA": "NE",
    "NEVADA": "NV",
    "NEW HAMPSHIRE": "NH",
    "NEW JERSEY": "NJ",
    "NEW MEXICO": "NM",
    "NEW YORK": "NY",
    "NORTH CAROLINA": "NC",
    "NORTH DAKOTA": "ND",
    "OHIO": "OH",
    "OKLAHOMA": "OK",
    "OREGON": "OR",
    "PENNSYLVANIA": "PA",
    "RHODE ISLAND": "RI",
    "SOUTH CAROLINA": "SC",
    "SOUTH DAKOTA": "SD",
    "TENNESSEE": "TN",
    "TEXAS": "TX",
    "UTAH": "UT",
    "VERMONT": "VT",
    "VIRGINIA": "VA",
    "WASHINGTON": "WA",
    "WEST VIRGINIA": "WV",
    "WISCONSIN": "WI",
    "WYOMING": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}

In [2]:
## Init Power plant df (power_df)
power_df = pd.read_csv("data/Power_Plants.csv")
power_df = power_df[['CITY', 'STATE', 'COUNTY', 'NAICS_DESC', 'OPER_CAP', 'SUMMER_CAP', 'WINTER_CAP', 'PLAN_CAP',
       'RETIRE_CAP', 'GEN_UNITS', 'PLAN_UNIT', 'RETIR_UNIT', 'PRIM_FUEL',
       'SEC_FUEL', 'COAL_USED', 'NGAS_USED', 'OIL_USED', 'NET_GEN']] 
power_df.head()

Unnamed: 0,CITY,STATE,COUNTY,NAICS_DESC,OPER_CAP,SUMMER_CAP,WINTER_CAP,PLAN_CAP,RETIRE_CAP,GEN_UNITS,PLAN_UNIT,RETIR_UNIT,PRIM_FUEL,SEC_FUEL,COAL_USED,NGAS_USED,OIL_USED,NET_GEN
0,SAND POINT,AK,ALEUTIANS EAST,FOSSIL FUEL ELECTRIC POWER GENERATION,4.0,1.8,1.7,0.0,0.0,6,0,0,DFO,WND,0,0,0,347.0
1,NORTHPORT,AL,TUSCALOOSA,HYDROELECTRIC POWER GENERATION,53.9,53.0,53.0,0.0,0.0,1,0,0,WAT,NOT AVAILABLE,0,0,0,139170.0
2,BUCKS,AL,MOBILE,FOSSIL FUEL ELECTRIC POWER GENERATION,2569.5,2399.7,2430.7,774.0,272.0,10,2,1,NG,BIT,1292499,54793583,0,10499145.97
3,WETUMPKA,AL,ELMORE,HYDROELECTRIC POWER GENERATION,225.0,224.1,224.1,0.0,0.0,3,0,0,WAT,NOT AVAILABLE,0,0,0,554613.0
4,GADSDEN,AL,ETOWAH,FOSSIL FUEL ELECTRIC POWER GENERATION,138.0,130.0,130.0,0.0,0.0,2,0,0,NG,NOT AVAILABLE,0,697629,0,50435.0


In [5]:
# Init Respiratory Mortality Rate df (resp_df)
resp_df = pd.read_excel("data/RESP_DISEASE_MORTALITY_1980_2014.XLSX",header=1)
display(resp_df.shape)
resp_df.dropna(inplace=True)
resp_df['Location'] = resp_df['Location'].str.upper()
resp_df = resp_df[['Location','Mortality Rate, 2000*','Mortality Rate, 2010*']]
resp_df.columns = ['Location','Deaths per 100,000 in 2000','Deaths per 100,000 in 2010']
resp_df["County"] = resp_df["Location"].str.split(", ",expand=True)[0]
resp_df["State"]= resp_df["Location"].str.split(", ",expand=True)[1]
resp_df = resp_df.drop('Location',axis = 1)
resp_df = resp_df[['County','State','Deaths per 100,000 in 2000','Deaths per 100,000 in 2010']].dropna().reset_index(drop=True)
resp_df['State'] = resp_df['State'].map(us_state_to_abbrev)
loc_list = resp_df.head(0).copy()
for idx,row in power_df.iterrows():
    loc_list = pd.concat([loc_list,resp_df.loc[(resp_df['County'].str.contains(row['COUNTY'],case=False,regex=True)) & (resp_df['State'].str.contains(row['STATE'],case=False,regex=True))].reset_index(drop=True)])
resp_df = loc_list.copy()
display(resp_df.shape)
resp_df.head()

(3196, 11)

(13072, 4)

Unnamed: 0,County,State,"Deaths per 100,000 in 2000","Deaths per 100,000 in 2010"
0,ALEUTIANS EAST BOROUGH,AK,"48.21 (41.35, 56.36)","35.45 (29.74, 42.07)"
0,TUSCALOOSA COUNTY,AL,"64.67 (62.02, 67.51)","67.32 (64.66, 70.10)"
0,MOBILE COUNTY,AL,"61.71 (59.69, 63.82)","63.00 (60.76, 65.35)"
0,ELMORE COUNTY,AL,"73.04 (69.48, 76.58)","84.14 (80.39, 88.57)"
0,ETOWAH COUNTY,AL,"77.87 (74.77, 81.02)","90.12 (86.34, 93.99)"


In [118]:
loc_list = resp_df.head(0).copy()
for idx,row in power_df.iterrows():
    loc_list = pd.concat([loc_list,resp_df.loc[(resp_df['County'].str.contains(row['COUNTY'],case=False,regex=True)) & (resp_df['State'].str.contains(row['STATE'],case=False,regex=True))].reset_index(drop=True)])
resp_df = loc_list.copy()
