# Prepare GCAM Data

This notebook collects and organizes the total generation (MWh) by technology in 2050 under the Clean Grid and Business-as-usual Scenarios

### Download Required Dataset

#### GCAM-USA Capacity Expansion Plan under Net Zero and Business-as-Usual Scenarios

**Dataset Title:** GCAM-USA Scenarios for GODEEEP

**Description from source:** This dataset contains a set of twelve future (2020-2050) scenarios modeled by GCAM-USA for the GODEEEP project for the purpose of studying the effects of climate, socioeconomic change, technology change, current decarbonization incentives, and longer-term decarbonization policies on the U.S. energy-economy, the electricity grid, human well-being, and the environment.

Download the GCAM-USA dataset from here: https://doi.org/10.5281/zenodo.10642507

**Reference:**
> Ou, Y., Zhang, Y., Waldhoff, S., & Iyer, G. (2024). GCAM-USA Scenarios for GODEEEP (v3.0.2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10642507

______________

## Steps:

1. Download and extract the GCAM-USA datasets inside the `/data/input_data/gcam_data` directory of this repository as the paths in this notebook are set to that expectation.
2. Run the scripts below
3. The processed output file will be saved to `/data/output_data/generation_data`

### Imports

In [1]:
import gcamreader
import numpy as np
import pandas as pd
import os

### Data Paths

In [2]:
# set year of analysis
year = 2050

# data dir
data_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input_data')

# gcam data dir
gcam_data_dir = os.path.join(data_dir, 'gcam_data')

# bau gcam database
gcam_db_path = os.path.join(gcam_data_dir, 'GODEEEP_GCAM-USA_Pathways')

# bau file
bau_gcam_db_file = 'bau_ira_ccs_climate'

# net zero file
nz_gcam_db_file = 'nz_ira_ccs_climate'

# gcam query path
gcam_query_path = os.path.join(data_dir, 'gcam_query_xlm', 'subSetQueries.xml')

# output data dir
output_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input_data', 'processed_generation_data')

# output file path
output_path = os.path.join(output_dir, f'gcam_generation_state_tech_{year}.csv')

# query name for the generation data
generation_query_name = "elec gen by gen tech cogen USA"

### Settings

In [3]:
# western interconnection states
region_list = ['WA', 'OR', 'CA', 'ID', 'MT', 'WY', 'NV', 'UT', 'AZ', 'NM', 'CO']

WECC = {'AZ': 'arizona', 'CA': 'california', 'CO': 'colorado', 'ID': 'idaho',  'MT': 'montana', 
        'NM': 'new_mexico', 'NV': 'nevada', 'UT':'utah', 'OR': 'oregon', 'WA': 'washington','WY': 'wyoming'}

# dictionary of technology types to collect
renewable_dict = {'wind': ['wind_base', 'wind_base_storage','wind_int', 'wind_subpeak'],
                  'solar': ['PV_base_storage', 'PV_int','PV_peak','PV_subpeak']}

In [4]:
# Convert native gcam Exajoule output to mw-hours
EXAJOULES_TO_MWH = 277777777.7777778

### Functions

In [5]:
def get_query_by_name(queries, name):
    """Return query for given name"""
    return next((x for x in queries if x.title == name), None)

### Connect and Process GCAM Data

##### Step 1. Connect to the GCAM database

In [6]:
# net zero
nz_conn = gcamreader.LocalDBConn(gcam_data_dir, nz_gcam_db_file)

# business-as-usual
bau_conn = gcamreader.LocalDBConn(gcam_data_dir, bau_gcam_db_file)

Database scenarios: R_02b_NZ_climate
Database scenarios: R_01b_BAU_climate


##### Step 2. Create a list of queries

In [7]:
# list of queries
queries = gcamreader.parse_batch_query(gcam_query_path)

##### Step 3. Collect the generation data

In [8]:
generation_query_name = "elec gen by gen tech USA"

##### Net Zero

In [9]:
# net zero generation data
nz_generation = nz_conn.runQuery(get_query_by_name(queries, generation_query_name), regions=region_list)

# reduce to western interconnection states
nz_generation = nz_generation[nz_generation.region.isin(WECC.keys())]

# collect technologies of interest
nz_generation = nz_generation[nz_generation.subsector.isin(['solar', 'wind'])]
nz_generation = nz_generation[(nz_generation.technology.isin(renewable_dict['wind'])) | (nz_generation.technology.isin(renewable_dict['solar']))]

# simplify naming and columns to include
nz_generation['tech_type'] = np.where((nz_generation.technology.isin(renewable_dict['wind'])), 'Wind', 'Solar PV') 
nz_generation = nz_generation[['region', 'Year', 'tech_type', 'value']]

# select year of interest
nz_generation = nz_generation[nz_generation.Year == 2050]

# group data by technology type
nz_generation = nz_generation.groupby(['region','tech_type'], as_index=False).sum()

# convert generation to TWh
nz_generation['value'] = round(nz_generation['value'] * EXAJOULES_TO_MWH, 2)

# set scenario name
nz_generation['scenario'] = 'net_zero_ira_ccs_climate'

# assign units
nz_generation['units'] = 'gen_mwh'

# assign year
nz_generation['Year'] = 2050

nz_generation[nz_generation.tech_type == 'Wind']['value'].sum()/ 1000000

607.9907973400001

##### Business-as-usual

In [10]:
# bau generation data
bau_generation = bau_conn.runQuery(get_query_by_name(queries, generation_query_name), regions=region_list)

# reduce to western interconnection states
bau_generation = bau_generation[bau_generation.region.isin(WECC.keys())]

# collect technologies of interest
bau_generation = bau_generation[bau_generation.subsector.isin(['solar', 'wind'])]
bau_generation = bau_generation[(bau_generation.technology.isin(renewable_dict['wind'])) | (bau_generation.technology.isin(renewable_dict['solar']))]

# simplify naming and columns to include
bau_generation['tech_type'] = np.where((bau_generation.technology.isin(renewable_dict['wind'])), 'Wind', 'Solar PV') 
bau_generation = bau_generation[['region', 'Year', 'tech_type', 'value']]

# select year of interest
bau_generation = bau_generation[bau_generation.Year == 2050]

# group data by technology type
bau_generation = bau_generation.groupby(['region','tech_type'], as_index=False).sum()

# convert generation to TWh
bau_generation['value'] = round(bau_generation['value'] * EXAJOULES_TO_MWH, 2)

# set scenario name
bau_generation['scenario'] = 'business_as_usual_ira_ccs_climate'

# assign units
bau_generation['units'] = 'gen_mwh'

#assign year
bau_generation['Year'] = 2050

### Combine Scenarios

In [11]:
# combine bau and net zero files
gcam_data = pd.concat([bau_generation, nz_generation])

# collect full state names
gcam_data['region_name'] = gcam_data['region'].map(WECC)

gcam_data

Unnamed: 0,region,tech_type,Year,value,scenario,units,region_name
0,AZ,Solar PV,2050,38726220.0,business_as_usual_ira_ccs_climate,gen_mwh,arizona
1,AZ,Wind,2050,23656360.0,business_as_usual_ira_ccs_climate,gen_mwh,arizona
2,CA,Solar PV,2050,71452410.0,business_as_usual_ira_ccs_climate,gen_mwh,california
3,CA,Wind,2050,76810420.0,business_as_usual_ira_ccs_climate,gen_mwh,california
4,CO,Solar PV,2050,35213750.0,business_as_usual_ira_ccs_climate,gen_mwh,colorado
5,CO,Wind,2050,46506100.0,business_as_usual_ira_ccs_climate,gen_mwh,colorado
6,ID,Solar PV,2050,1450510.0,business_as_usual_ira_ccs_climate,gen_mwh,idaho
7,ID,Wind,2050,11330300.0,business_as_usual_ira_ccs_climate,gen_mwh,idaho
8,MT,Solar PV,2050,9565910.0,business_as_usual_ira_ccs_climate,gen_mwh,montana
9,MT,Wind,2050,90729610.0,business_as_usual_ira_ccs_climate,gen_mwh,montana


#### Save to file

In [12]:
gcam_data.to_csv(output_path, index=False)