# Identifying 100 cleanest and dirtiest counties

Notes:  
- All of the iso projects are listed as `active` in `queue_status`
- In the active ISO proejects data the `resource_type_lbnl` field has a lot more detail about combinations of resources (e.g. 'Solar+Battery'), by filtering only for Wind, Solar, and Off Shore Wind in `resource` I might be excluding some of these combination entries
- Included Off Shore Wind when looking at the clean counties

In [112]:
import pandas as pd
import pudl
import sqlalchemy as sa

In [143]:
iso_df = pd.read_csv('active_iso_projects.csv', dtype={'state_id_fips': str, 'county_id_fips': str})
eip_df = pd.read_csv('emissions_increase.csv')

In [18]:
iso_df['resource'].unique()

array(['Battery', 'Wind', 'Solar', 'Gas', 'Other', 'Hydro', 'Geothermal',
       'Offshore Wind', 'Nuclear', 'Coal', 'Other Storage',
       'Pumped Storage', 'Batteries', 'Storage', 'Waste Heat',
       'Pump Storage', 'Natural Gas'], dtype=object)

In [19]:
iso_df['resource_type_lbnl'].unique()

array(['Battery', 'Wind', 'Solar', 'Solar+Battery', 'Gas', 'Other',
       'Hydro', 'Geothermal', 'Solar+Wind', 'Offshore Wind', 'Nuclear',
       'Gas+Solar', 'Wind+Battery', 'Gas+Battery', 'Coal',
       'Other Storage', 'Battery+Other', 'Gas+Solar+Battery',
       'Pumped Storage+Wind+Solar', 'Solar+Wind+Battery', 'Solar+Hydro',
       'Storage+Other'], dtype=object)

In [35]:
def iso_county_hotspots(resource_col, resource_vals, n=100):
    df = iso_df[iso_df[resource_col].isin(resource_vals)]
    return df.groupby(['county_id_fips', 'county', 'state'])['capacity_mw'].sum().sort_values(ascending=False).head(n)

### Identify 100 cleanest counties  
- top 100 counties in terms of clean energy potential (megawatts of wind and/or solar listed as active in the ISO queue)

In [38]:
resource_col = 'resource'
resource_vals = ['Wind', 'Solar', 'Offshore Wind']
clean_df = iso_county_hotspots(resource_col, resource_vals)
clean_df.to_csv('top_100_clean_counties.csv')

### Identify 100 dirtiest counties
- top 50 from EPI fossil infrastructure in terms of co2 and top 50 from EIA or ISO in terms of fossil megawatts

In [41]:
# Get top 50 from ISO active projects
resource_col = 'resource'
resource_vals = ['Gas', 'Natural Gas', 'Coal']
dirty_iso_df = iso_county_hotspots(resource_col, resource_vals, n=50)

In [140]:
dirty_iso_df.to_csv('top_50_dirty_counties_iso.csv')

### Now EIP

In [89]:
# get median for the co2_tpy value when operating_status is Pre-construction
pre_con = eip_df[eip_df['operating_status'] == 'Pre-construction']
pre_con_clean = pre_con[pre_con['co2_tpy'] != 'TBD'].astype({"co2_tpy": float})
med = pre_con_clean['co2_tpy'].dropna().median()

In [142]:
op_stats = ['Pre-construction', 'Under construction']

In [144]:
df = eip_df[eip_df['operating_status'].isin(op_stats)]
df['co2_tpy'] = df['co2_tpy'].where(df['co2_tpy'] != 'TBD', med)
df = df.astype({"co2_tpy": float})
df = df.groupby(['county', 'state'])['co2_tpy'].sum().sort_values(ascending=False).head(50)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['co2_tpy'] = df['co2_tpy'].where(df['co2_tpy'] != 'TBD', med)


In [113]:
# add in FIPS code
PUDL_DB_PATH = "/Users/katielamb/Documents/Catalyst_Coop/workspace/sqlite/pudl.sqlite"
pudl_engine = sa.create_engine(f"sqlite:///{PUDL_DB_PATH}")
serv_terr = pudl.output.pudltabl.PudlTabl(pudl_engine).service_territory_eia861()

The data has not yet been validated, and the structure may change.


The following reported NERC regions are not currently recognized and become         UNK values: []
The following reported NERC regions are not currently recognized and become         UNK values: ['MISE', 'CLECO', 'MPS']
The following reported NERC regions are not currently recognized and become         UNK values: ['MISE', 'SASKATCHWA', 'MPS']


In [121]:
serv_terr_fips = serv_terr[['county', 'state', 'county_id_fips']].set_index(['county', 'state']).drop_duplicates()

In [146]:
# do some manual changes - I think there are typos in county names
df = df.reset_index()
df.at[45, 'county'] = "Hutchinson"
df.at[11, 'county'] = "St. John the Baptist"

In [147]:
df = df.join(serv_terr_fips, on=['county', 'state'])
fips = df.pop('county_id_fips')
df.insert(0, 'county_id_fips', fips)
df

Unnamed: 0,county_id_fips,county,state,co2_tpy
0,22019.0,Calcasieu,LA,24919480.0
1,48245.0,Jefferson,TX,24557500.0
2,22075.0,Plaquemines,LA,20057070.0
3,22093.0,St. James,LA,17272940.0
4,2122.0,Kenai Peninsula,AK,10877930.0
5,48061.0,Cameron,TX,10038680.0
6,5069.0,Jefferson,AR,8378365.0
7,22023.0,Cameron,LA,7801591.0
8,2185.0,North Slope,AK,7278238.0
9,48135.0,Ector,TX,5800180.0


In [141]:
df.to_csv('top_50_dirty_counties_eip.csv', index=False)