# Prepare GCAM-USA Capacity Expansion Data

This notebook collects and organizes the total generation (MWh) by technology in 2050 under the Net Zero and Business-as-usual Scenarios

## Data Requirements

This notebook relies on GCAM-USA capacity expansion plan modeling results under Net Zero and Business-as-Usual scenarios. Two files (specified below) need to be downloaded. Please extract each of the downloaded data files inside the `data/input_data/gcam_data` directory of this repository as the paths in this notebook are set to that expectation.

**Dataset Title:** GCAM-USA Scenarios for GODEEEP

**Files Required** 
* `bau_ira_ccs_climate.zip` 
* `nz_ira_ccs_climate.zip`

**Description from source:** This dataset contains a set of twelve future (2020-2050) scenarios modeled by GCAM-USA for the GODEEEP project for the purpose of studying the effects of climate, socioeconomic change, technology change, current decarbonization incentives, and longer-term decarbonization policies on the U.S. energy-economy, the electricity grid, human well-being, and the environment.

**Download Link**: https://doi.org/10.5281/zenodo.10642507

**Reference:**
> Ou, Y., Zhang, Y., Waldhoff, S., & Iyer, G. (2024). GCAM-USA Scenarios for GODEEEP (v3.0.2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10642507
______________

Note: Extracting the data from the GCAM output file requires a data query file. This query file has been pre-built and provided in the `input_data/gcam_data/gcam_query_xml` folder of this repository.

### Imports

In [None]:
import gcamreader
import numpy as np
import pandas as pd
import os

### Data Paths

In [None]:
# data dir
data_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input_data')

# gcam data dir
gcam_data_dir = os.path.join(data_dir, 'gcam_data')

# bau gcam database
gcam_db_path = os.path.join(gcam_data_dir, 'GODEEEP_GCAM-USA_Pathways')

# gcam query file path
gcam_query_path = os.path.join(gcam_data_dir, 'gcam_query_xlm', 'subSetQueries.xml')

# output data dir
output_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input_data', 'processed_generation_data')

# create output directory
os.makedirs(output_dir, exist_ok=True)

# output file path
output_path = os.path.join(output_dir, f'gcam_generation_state_tech_2050.csv')

### Settings

In [None]:
# bau file
bau_gcam_db_file = 'bau_ira_ccs_climate'

# net zero file
nz_gcam_db_file = 'nz_ira_ccs_climate'

# query name for generation data
generation_query_name = "elec gen by gen tech USA"

# western interconnection states
region_list = ['WA', 'OR', 'CA', 'ID', 'MT', 'WY', 'NV', 'UT', 'AZ', 'NM', 'CO']

# dictionary of abbreviations to state names
WECC = {'AZ': 'arizona', 'CA': 'california', 'CO': 'colorado', 'ID': 'idaho',  'MT': 'montana', 
        'NM': 'new_mexico', 'NV': 'nevada', 'UT':'utah', 'OR': 'oregon', 'WA': 'washington','WY': 'wyoming'}

# dictionary of technology types to collect from database
renewable_dict = {'wind': ['wind_base', 'wind_base_storage','wind_int', 'wind_subpeak'],
                  'solar': ['PV_base_storage', 'PV_int','PV_peak','PV_subpeak']}

### Functions

In [None]:
def convert_ej_to_mwh(ej):
    """Convert native gcam Exajoule output to mw-hours"""
    return ej*277777777.7777778


def get_query_by_name(queries, name):
    """Return query for given name"""
    return next((x for x in queries if x.title == name), None)

### Connect and Process GCAM Data

In [None]:
# net zero database connection
nz_conn = gcamreader.LocalDBConn(gcam_data_dir, nz_gcam_db_file)

# business-as-usual database connection
bau_conn = gcamreader.LocalDBConn(gcam_data_dir, bau_gcam_db_file)

# list of queries
queries = gcamreader.parse_batch_query(gcam_query_path)

##### Net Zero

In [None]:
# net zero generation data
nz_generation = nz_conn.runQuery(get_query_by_name(queries, generation_query_name), regions=region_list)

# reduce to western interconnection states
nz_generation = nz_generation[nz_generation.region.isin(WECC.keys())]

# collect technologies of interest
nz_generation = nz_generation[nz_generation.subsector.isin(['solar', 'wind'])]
nz_generation = nz_generation[(nz_generation.technology.isin(renewable_dict['wind'])) | (nz_generation.technology.isin(renewable_dict['solar']))]

# simplify naming and columns to include
nz_generation['tech_type'] = np.where((nz_generation.technology.isin(renewable_dict['wind'])), 'Wind', 'Solar PV') 
nz_generation = nz_generation[['region', 'Year', 'tech_type', 'value']]

# select year of interest
nz_generation = nz_generation[nz_generation.Year == 2050]

# group data by technology type
nz_generation = nz_generation.groupby(['region','tech_type'], as_index=False).sum()

# convert generation to TWh
nz_generation['value'] = round(convert_ej_to_mwh(nz_generation['value']), 2)

# set scenario name
nz_generation['scenario'] = 'net_zero_ira_ccs_climate'

# assign units
nz_generation['units'] = 'gen_mwh'

# assign year
nz_generation['Year'] = 2050

##### Business-as-usual

In [None]:
# bau generation data
bau_generation = bau_conn.runQuery(get_query_by_name(queries, generation_query_name), regions=region_list)

# reduce to western interconnection states
bau_generation = bau_generation[bau_generation.region.isin(WECC.keys())]

# collect technologies of interest
bau_generation = bau_generation[bau_generation.subsector.isin(['solar', 'wind'])]
bau_generation = bau_generation[(bau_generation.technology.isin(renewable_dict['wind'])) | (bau_generation.technology.isin(renewable_dict['solar']))]

# simplify naming and columns to include
bau_generation['tech_type'] = np.where((bau_generation.technology.isin(renewable_dict['wind'])), 'Wind', 'Solar PV') 
bau_generation = bau_generation[['region', 'Year', 'tech_type', 'value']]

# select year of interest
bau_generation = bau_generation[bau_generation.Year == 2050]

# group data by technology type
bau_generation = bau_generation.groupby(['region','tech_type'], as_index=False).sum()

# convert generation to TWh
bau_generation['value'] = round(convert_ej_to_mwh(bau_generation['value']), 2)

# set scenario name
bau_generation['scenario'] = 'business_as_usual_ira_ccs_climate'

# assign units
bau_generation['units'] = 'gen_mwh'

#assign year
bau_generation['Year'] = 2050

### Combine Scenarios

In [None]:
# combine bau and net zero files
gcam_data = pd.concat([bau_generation, nz_generation])

# collect full state names
gcam_data['region_name'] = gcam_data['region'].map(WECC)

#### Save to file

In [None]:
gcam_data.to_csv(output_path, index=False)