# Calculate how many federally-identified DACs see both new renewable investment and retirement of fossil fuel resources

### Overview

This notebook determines how many disadvantaged communities (DACs) in the western US see both fossil fuel generation retirement by 2050 and new renewable infrastruture based on projected power plant siting and retirement results.

### Data Requirements

This notebook relies on the data outputs from the `prepare_cerf_siting_output.ipynb` notebooks. That notebook has been pre-run for convenience and the data files referenced in this notebook are found in the `data/input_data` folder.

The US Center for Environmental Quality (CEQ) Disadvantaged Community shapefile must also be downloaded prior to running this notebook as we do not provide the source data directly. The geospatial shapefile of DAC areas can be downloaded here: https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/downloadable/1.0-shapefile-codebook.zip. Please extract the downloaded data inside the `data/input_data` directory of this repository as the paths in this notebook are set to that expectation.

* **Source Data Title:** Climate and Economic Justice Screening Tool (CEJST)
* **Description from Source:** The tool highlights disadvantaged census tracts across all 50 states, the District of Columbia, and the U.S. territories. Communities are considered disadvantaged: If they are in census tracts that meet the thresholds for at least one of the tool’s categories of burden, or If they are on land within the boundaries of Federally Recognized Tribes
* **Source URL:** https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/downloadable/1.0-shapefile-codebook.zip
* **Date Accessed:** 07/25/24
* **Citation:** White House Council on Environmental Quality, 2022. Climate and Economic Justice Screening Tool (CEJST). https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/downloadable/1.0-shapefile-codebook.zip.

## Imports

In [11]:
import pandas as pd
import geopandas as gpd
import numpy as np
import shapely
from shapely import Point

import matplotlib.pyplot as plt

import os
from pathlib import Path

### Collect Data Paths

In [12]:
# data dir
data_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input_data')

# output data dir
output_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'output_data')

# infrastructure siting output csv
infrastucture_path = os.path.join(data_dir, 'infrastructure_data_csv', f'infrastructure_data.csv')

# US CEQ DAC shapefile
dac_shp_path = os.path.join(data_dir, "1.0-shapefile-codebook", 'usa', 'usa.shp')

# dac analysis file output path
output_file_path = os.path.join(output_dir,  f'dac_fossil_retire_analysis_2050.csv')

### Functions

In [13]:
def results_to_geodataframe(df, crs = "ESRI:102003"):
    """ 
    Takes a pandas DataFrame with x and y coordinates as input and 
    converts to a GeoPandas GeodataFrame. Coordinates in DataFrame are expected to
    follow the ESRI:102003 albers equal area conic coordinate referece system.
    x-coordinate column should be called 'xcoord', y-coordinate column should be
    called 'ycoord'

    :param df:        input Pandas DataFrame with x/y coordinates
    :type df:         Pandas DataFrame
    
    :param crs:       Coordinate reference system to use for GeoDataFrame
    :type crs:        str
    
    """
    
    # create geometry column from coordinate fields
    geometry = [Point(xy) for xy in zip(df['xcoord'], df['ycoord'])]
    
    gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)
    
    return gdf

def calculate_fossil_replacements(dac_shp, renewable_shp, fossil_retire_shp):
    """
    Calculates which DACs see both new renewable investment and fossil fuel retirements. Returns
    a geopandas geodataframe of DAC communities that meet criteria and the fraction of DACs that
    saw renewable investment that also saw fossil retirements. Requires all geodataframes to be 
    under the same projections.
    
    :param dac_shp:        input GeoPandas GeoDataFrame of DAC community polygons
    :type dac_shp:         Geopandas GeoDataFrame
    
    :param renewable_shp:        input GeoPandas GeoDataFrame of point locations of new renewable power plants
    :type renewable_shp:         Geopandas GeoDataFrame

    :param fossil_retire_shp:        input GeoPandas GeoDataFrame of point locations of retired fossil power plants
    :type fossil_retire_shp:         Geopandas GeoDataFrame
    """
    
    
    dac_fossil_retire = gpd.sjoin(left_df=dac_shp , right_df=fossil_retire_shp, how="left", predicate="intersects") 

    # drop rows that don't intersect
    dac_fossil_retire = dac_fossil_retire[~dac_fossil_retire.xcoord.isna()].copy()

    # columns to keep
    column_keep = ['GEOID10','SF', 'CF', 'geometry']

    # remove unnecessary columns
    dac_fossil_retire = dac_fossil_retire[column_keep]

    # remove duplicate rows when more than one plant intersected with a DAC
    dac_fossil_retire = dac_fossil_retire.drop_duplicates()

    # join renewable sitings with dac fossil retire polygons
    dac_renewables = gpd.sjoin(left_df=dac_fossil_retire , right_df=renewable_shp, how="left", predicate="intersects") 

    # drop rows that don't intersect
    dac_renewables = dac_renewables[~dac_renewables.xcoord.isna()].copy()
    
    # remove unnecessary columns
    dac_renewables = dac_renewables[column_keep]
    
    # remove duplicate rows when more than one plant intersected with a DAC
    dac_renewables = dac_renewables.drop_duplicates()

    # calculate percentage of DACs that see renewable sitings that also see retirements
    fraction = len(dac_renewables) / len(dac_fossil_retire)

    return dac_renewables, fraction

# Analysis

### Collect Infrastructure Data

In [14]:
# collect prepared CERF infrastructure siting data
df = pd.read_csv(infrastucture_path)

#### Prepare Power Plant Sitings & Retirements Under Each Scenario

In [6]:
# year of analysis
year = 2050

# fossil generation resources
fossil_plants = ['Coal', 'Natural Gas', 'Oil']

# renewable_plants
renewable_plants = ['Solar CSP', 'Solar PV', 'Wind']

# reduce to only include sitings up to year of interest
df = df[df.sited_year <= year]

# create a dataframe of retired fossil plants for the net zero scenario
df_nz_retire = df[(df.scenario == 'net_zero_ira_ccs_climate') & (df.cerf_sited == 0) & (df.retirement_year <= year) & (df.technology_simple.isin(fossil_plants))].copy()

# create a dataframe of renewable cerf power plant sitings for the net zero scenario
df_nz_new = df[(df.scenario == 'net_zero_ira_ccs_climate') & (df.cerf_sited == 1) & (df.technology_simple.isin(renewable_plants))].copy()

# create a dataframe of retired fossil plants for the business-as-usual scenario
df_bau_retire = df[(df.scenario == 'business_as_usual_ira_ccs_climate') & (df.cerf_sited == 0) & (df.retirement_year <= year) & (df.technology_simple.isin(fossil_plants))].copy()

# create a dataframe of renewable cerf power plant sitings for the business-as-usual scenario
df_bau_new = df[(df.scenario == 'business_as_usual_ira_ccs_climate') & (df.cerf_sited == 1) & (df.technology_simple.isin(renewable_plants))].copy()

#### Convert to geodataframe

In [7]:
# net zero
gdf_nz_retire = results_to_geodataframe(df_nz_retire, crs = "ESRI:102003")
gdf_nz_renewable = results_to_geodataframe(df_nz_new, crs = "ESRI:102003")

# business-as-usual
gdf_bau_retire = results_to_geodataframe(df_bau_retire, crs = "ESRI:102003")
gdf_bau_renewable = results_to_geodataframe(df_bau_new, crs = "ESRI:102003")

### Prepare DAC Geodata

In [8]:
# read in DAC shapefile
dac_shp = gpd.read_file(dac_shp_path)

# select communities that are identified as disadvantaged
dac_shp = dac_shp[dac_shp['SN_C'] == 1]

# select Western US states only
dac_shp = dac_shp[dac_shp.SF.isin(['California', 'Oregon', 'Washington', 'Idaho', 'Montana', 
                                   'Nevada', 'Utah', 'Arizona', 'New Mexico', 'Colorado', 'Wyoming'])].copy()

# convert to the appropriate crs
dac_shp.to_crs("ESRI:102003", inplace=True)

### Identify DACs that meet criteria

In [9]:
# net zero scenario
nz_output, nz_fraction = calculate_fossil_replacements(dac_shp, gdf_nz_renewable, gdf_nz_retire)
print(f'In the net zero scenario there are {len(nz_output)} DACs that meet the criteria. {round(nz_fraction, 2)*100}% of DACs that see fossil retirements also see renewables')

# business-as-usual scenario
bau_output, bau_fraction = calculate_fossil_replacements(dac_shp, gdf_bau_renewable, gdf_bau_retire)
print(f'In the bau scenario there are {len(bau_output)} DACs that meet the criteria. {round(bau_fraction, 2)*100}% of DACs that see fossil retirements also see renewables')

In the net zero scenario there are 49 DACs that meet the criteria. 45.0% of DACs that see fossil retirements also see renewables
In the bau scenario there are 42 DACs that meet the criteria. 39.0% of DACs that see fossil retirements also see renewables


### Combine DataFrames and Save Output

In [10]:
column_rename = {'GEOID10': 'census_tract_id', 'SF':'state_name', 'CF':'county_name'}

nz_output['scenario'] = 'net_zero_ira_ccs_climate'
bau_output['scenario'] = 'business_as_usual_ira_ccs_climate'

output_df = pd.concat([nz_output, bau_output]).rename(columns=column_rename).drop(['geometry'], axis=1)

output_df.to_csv(output_file_path, index=False)