# Gridded EPA Methane Inventory
## Category: 1A Stationary Combustion of Fuels

***
#### Authors: 
Joannes D. Maasakkers, Candice F. Z. Chen, Erin E. McDuffie
#### Date Last Updated: 
see Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual and monthly gridded (0.1°x0.1°) methane emission fluxes (molec./cm2/s) from stationary combustion sources in the CONUS region between 2012-2018.   
#### Summary & Notes:
EPA GHGI stationary combustion emissions from Electric Energy Generation, Commercial, Residential, and Industrial processes within the Energy Combustion sector are read in at the national level. Emissions for residential, commercial, and industrial sectors are first allocated to the state level as a function of proxy group. Residential emissions are also allocated to the county level using NEI wood combustion as a proxy. The activity/proxy data used to allocate emissions from each group include EIA State Energy Data System data (for energy sector) and facility level methane fluxes and emission from EPA’s Acid Rain Program and GHGRP (Subpart’s C and D), as a function of sector and fuel type (for commercial, residential, industrial sectors). State-level emissions are spatially distributed onto a 0.1°x0.1° grid based on population density (residential/commercial) and GHGRP facility locations/emissions (industrial). Emissions are converted to emission flux. Annual emission and monthly emission fluxes (molec./cm2/s) are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder.
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
-------

In [None]:
#Confirm working directory & print last update time
import os
import time
modtime = os.path.getmtime('./1A_Combustion_Stationary.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import datetime
from copy import copy

# Import additional modules
# Load plotting package Basemap 
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Load Tabula (for reading tables from PDFs)
import tabula as tb 
import PyPDF2 as pypdf
    
# Load user-defined global functions (modules)
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
County_ANSI_inputfile = global_filenames[1]
pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]

# Specify names of inputs files used in this notebook
EPA_stat_inputfile = globalinputlocation+ 'GHGI/Ch3_Energy/EPA_Table_3-10_kt.csv'

StatComb_Mapping_inputfile = './InputData/StationaryCombustion_ProxyMapping.xlsx'

#Activity Data
EPA_ARP_inputfile = './InputData/ARP_Data/EPA_ARP_FacilityEmissions.csv'
EIA_SEDS_commconsump_inputfile = "./InputData/EIA_SEDS/Commercial/sum_btu_com_"
EIA_SEDS_resconsump_inputfile = "./InputData/EIA_SEDS/Residential/sum_btu_res_"
EIA_SEDS_indconsump_inputfile = "./InputData/EIA_SEDS/Industrial/sum_btu_ind_"
NEI_resi_wood_inputfile = "./InputData/NEI 2020 RWC Throughputs.xlsx"

#GHGRP Data (reporting format changed in 2015)
GHGRP_subCfacility_inputfile = "./InputData/GHGRP/GHGRP_SubpartCEmissions.csv" #subpart C facility IDs and emissions (locations not available)
GHGRP_subDfacility_inputfile = "./InputData/GHGRP/GHGRP_SubpartDEmissions.csv" #subpart D facility IDs and emissions 
GHGRP_subDfacility_loc_inputfile = "./InputData/GHGRP/GHGRP_FacilityInfo.csv" #subpart D facility info (for all years, with ID & lat and lons)

#OUTPUT FILES
gridded_int_outputfile = 'EPA_v2_1A_Combustion_Stationary_int.nc'

gridded_outputfile = '../Final_Gridded_Data/EPA_v2_1A_Combustion_Stationary.nc'
gridded_month_outputfile = '../Final_Gridded_Data/EPA_v2_1A_Combustion_Stationary_Monthly.nc'
netCDF_description = 'Gridded EPA Inventory - Stationary Combustion Emissions - IPCC Source Category 1A'
netCDF_description_m = 'Gridded EPA Inventory - Monthly Stationary Combustion Emissions - IPCC Source Category 1A'
title_str = "EPA methane emissions from stationary combustion"
title_diff_str = "Emissions from stationary combustion difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Comb_Stationary_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01                   # degrees
tg_scale   = 0.001                  #Tg scale number [New file allows for the exclusion of the territories] 

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data and Area
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:2]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

#County ANSI Data
#Includes State ANSI number, county ANSI number, county name, and country area (square miles)
County_ANSI = pd.read_csv(County_ANSI_inputfile,encoding='latin-1')

#QA: number of counties
print ('Read input file: ' + f"{County_ANSI_inputfile}")
print('Total "Counties" found (include PR): ' + '%.0f' % len(County_ANSI))
print(' ')

#Create a placeholder array for county data
county_array = np.zeros([len(County_ANSI),3])

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
state_ANSI_map = state_ANSI_map.astype('int32')
county_ANSI_map = data_load_fn.load_county_ansi_map(Grid_county001_ansi_inputfile)
county_ANSI_map = county_ANSI_map.astype('int32')
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
area_map01, Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[0:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2

state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)


# Print time
ct = datetime.datetime.now() 
print("current time:", ct) 

-------------
## Step 2: Read-in and Format Proxy Data
-------------

#### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(StatComb_Mapping_inputfile, sheet_name = "GHGI Map - Stat", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_stat_map = pd.read_excel(StatComb_Mapping_inputfile, sheet_name = "GHGI Map - Stat", usecols = "A:B", skiprows = 1, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_stat_map = ghgi_stat_map[ghgi_stat_map['GHGI_Emi_Group'] != 'na']
ghgi_stat_map = ghgi_stat_map[ghgi_stat_map['GHGI_Emi_Group'].notna()]
ghgi_stat_map['GHGI_Source']= ghgi_stat_map['GHGI_Source'].str.replace(r"\(","")
ghgi_stat_map['GHGI_Source']= ghgi_stat_map['GHGI_Source'].str.replace(r"\)","")
ghgi_stat_map.reset_index(inplace=True, drop=True)
display(ghgi_stat_map)

#load emission group - proxy map
names = pd.read_excel(StatComb_Mapping_inputfile, sheet_name = "Proxy Map - Stat", usecols = "A:G",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_stat_map = pd.read_excel(StatComb_Mapping_inputfile, sheet_name = "Proxy Map - Stat", usecols = "A:G", skiprows = 1, names = colnames)
display((proxy_stat_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_stat_map)):
    if proxy_stat_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
    
    if proxy_stat_map.loc[igroup,'State_Proxy_Group'] != '-':
        if proxy_stat_map.loc[igroup,'State_Month_Flag'] == 0:
            vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years])
        else:
            vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years,num_months])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file
    
    if proxy_stat_map.loc[igroup,'County_Proxy_Group'] != '-':
        if proxy_stat_map.loc[igroup,'County_Month_Flag'] == 0:
            vars()[proxy_stat_map.loc[igroup,'County_Proxy_Group']] = np.zeros([len(State_ANSI),len(County_ANSI),num_years])
        else:
            vars()[proxy_stat_map.loc[igroup,'County_Proxy_Group']] = np.zeros([len(State_ANSI),len(County_ANSI),num_years,num_months])
    else:
        continue # do not make county proxy variable if no variable assigned in mapping file

emi_group_names = np.unique(proxy_stat_map['GHGI_Emi_Group'])

print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_stat_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')

#### 2.2 Read In and Format EPA (Acid Rain Program) Electric Power Emissions (Electric Energy Proxy)

##### 2.2.1 Read in EPA power plant facility information and calculate facility-level emissions

In [None]:
# Read EPA ARP data for individual power plants. 
# Calculate emissions from the unit type and fuel type to calculate CH4 emission factor to apply to Heat Input.
#https://ampd.epa.gov/ampd/

fields = ['State', ' Year',' Month', ' Facility Name',' Facility Latitude',' Facility Longitude',' Unit Type', \
          ' Fuel Type (Primary)', ' Heat Input (MMBtu)']
ARP_Raw = pd.read_csv(EPA_ARP_inputfile, usecols = fields, index_col=False, na_filter = False)

# make a multidimensional dictionary that contains the data for each year
# for calculations later, replace empty heat input values with NaNs and convert
# to numeric (otherwise data in scientific notation are read in as strings)
ARP_facilities = dict()
for iyear in np.arange(num_years):
    ARP_facilities[iyear] = ARP_Raw[ARP_Raw[' Year'] == year_range[iyear]]
    ARP_facilities[iyear].fillna(0)#,inplace = True)
    ARP_facilities[iyear].reset_index(inplace=True)
    temp = pd.to_numeric(ARP_facilities[iyear].loc[:,' Heat Input (MMBtu)'], errors='coerce')
    temp.fillna(0,inplace=True)
    ARP_facilities[iyear].loc[:,' Heat Input (MMBtu)'] = temp


# Clean up and standardize the Unit Type labels
# Assign values to a temporary dataframe to avoid settingwithcopy warning

# For each year, check the unit types and report a clean string version in a new column
for iyear in np.arange(num_years):
    temp = pd.DataFrame.from_dict(ARP_facilities[iyear])

    for ifacility in np.arange(len(ARP_facilities[iyear])):
        if re.search('combustion turbine',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'combustion turbine'
        elif re.search('combined cycle',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'combined cycle'
        elif re.search('wet bottom',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'wet bottom'
        elif re.search('dry bottom',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'dry bottom'
        elif re.search('bubbling',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'bubbling'
        else:
            temp.loc[ifacility,'Unit_clean'] = ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()  

        ARP_facilities[iyear] = temp.copy()

#Clean up and standardize the fuel type labels
# Assign values to a temporary dataframe to avoid settingwithcopy warning

# For each year, check the primary fuel types and consolidate into Gas, Coal, Oil, and Wood fuel categories
for iyear in np.arange(num_years):
    
    temp = pd.DataFrame.from_dict(ARP_facilities[iyear])
    for ifacility in np.arange(len(ARP_facilities[iyear])):
        if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Gas, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Natural Gas, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Process Gas':
            temp.loc[ifacility,'Fuel_clean'] = 'Gas'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Petroleum Coke' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Wood' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Coal Refuse' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal Refuse' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Solid Fuel':
            temp.loc[ifacility,'Fuel_clean'] = 'Coal'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil, Residual Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil, Pipeline Natural Gas':
            temp.loc[ifacility,'Fuel_clean'] = 'Oil'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Wood' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Solid Fuel, Wood':
            temp.loc[ifacility,'Fuel_clean'] = 'Wood'
        else:
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] != '':
                print(ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'])  
                
    ARP_facilities[iyear] = temp.copy()  

del ARP_Raw

##### 2.2.2. Add the CH4 factor (kg gas/ TJ energy input) to the data dictionary, then calculate methane flux

In [None]:
# CH4 factor is based on the unit type and fuel used
# From 'Acid Rain Prog - Unit-level Fuel+Technology' file in InputData Folder

for iyear in np.arange(num_years):
    ARP_facilities[iyear].loc[:,'CH4_f'] = 0.0

    for ifacility in np.arange(len(ARP_facilities[iyear])):
        # Gas: combined cycle or turbine= 3.7, 
        # Gas: others (Assume stoker, tangentially-fired, dry bottom, & wet bottom are boilers) = 1
        if ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Gas':
            if ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'combined cycle' \
             or re.search('turbine',ARP_facilities[iyear].loc[ifacility,'Unit_clean'].lower()) != None:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 3.7     
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1
        # Coal: tangentially-fired, dry bottom = 0.7
        # Coal: wet bottom = 0.9
        # Coal: Cyclone boiler = 0.2
        # Coal: others (boilers, combined cycle) = 1
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Coal':
            if ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'tangentially-fired' \
             or ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'dry bottom':
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.7
            elif ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'wet bottom':   
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.9
            elif ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'cyclone boiler':   
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.2
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1   
        # Wood: assume all are recover boilders = 1
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Wood':
            ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1   
        # Oil: Reisdual oil, pipeline natural gas = 0.8
        # Oil: Others = 0.9
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Oil':
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil' \
             or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil, Pipeline Natural Gas':
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.8
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.9
        else:
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] != '':
                print(ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'])
    

#Calculate the methane flux at each facility and the flux by fuel at each facility relative to the national total. 
for iyear in np.arange(num_years):
    # Calculate fluxes
    ARP_facilities[iyear]['CH4_flux'] = 0.0
    ARP_facilities[iyear]['CH4_flux'] = ARP_facilities[iyear]['CH4_f'] * ARP_facilities[iyear][' Heat Input (MMBtu)']

##### Step 2.2.3 Allocate flux data (as a function of fuel type) to grid arrays and to the state level

In [None]:
arp_wood_array = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
arp_wood_array_nongrid = np.zeros([num_years,num_months])
arp_coal_array = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
arp_coal_array_nongrid = np.zeros([num_years,num_months])
arp_oil_array = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
arp_oil_array_nongrid = np.zeros([num_years,num_months])
arp_gas_array = np.zeros([len(Lat_01),len(Lon_01),num_years, num_months])
arp_gas_array_nongrid = np.zeros([num_years,num_months])


for iyear in np.arange(num_years):
    #count=0
    var = 'CH4_flux'
    for ifacility in np.arange(len(ARP_facilities[iyear])):
        imon = int(ARP_facilities[iyear].loc[ifacility,' Month'] - 1)
        istate = np.where(ARP_facilities[iyear].loc[ifacility,'State'] == State_ANSI['abbr'])[0][0]
        #Filter inside Continental US domain
        if ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] > Lon_left \
         and ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] < Lon_right \
         and ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] > Lat_low \
         and ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] < Lat_up:
            #Find the index values of each facility lat and lon within the Continental US grid 
            ilat = int((ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] - Lat_low)/Res01)
            ilon = int((ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] - Lon_left)/Res01)
            if ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Gas':
                arp_gas_array[ilat,ilon,iyear,imon] += ARP_facilities[iyear].loc[ifacility,var]
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Coal':
                arp_coal_array[ilat,ilon,iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Oil':
                arp_oil_array[ilat,ilon,iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Wood':
                arp_wood_array[ilat,ilon,iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
        else:    
            if ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Gas':
                arp_gas_array_nongrid[iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Coal':
                arp_coal_array_nongrid[iyear,imon] += ARP_facilities[iyear].loc[ifacility,var]
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Oil':
                arp_oil_array_nongrid[iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
            elif ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Wood':
                arp_wood_array_nongrid[iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
            

#### Step 2.3 Read In and Format EIA SEDS (State Engery Data System) Energy Consumption Data (Commercial, Residential, Industrial Proxies)

##### Step 2.3.1 Read In EIA SEDS Data

In [None]:
# 1) Read state-level energy consumption data (Commercial)
#All SEDS data: https://www.eia.gov/state/seds/archive/

SEDS_com = dict()
for iyear in np.arange(0,num_years):
    #tb.convert_into("./InputData_Stationary/Data_stat/EIA_SEDS_Data/seds2012.pdf",'./test.csv', output_format = 'csv', stream=True, guess = False, pages = 11)
    SEDS_com[iyear] = pd.read_csv(EIA_SEDS_commconsump_inputfile+year_range_str[iyear]+'.csv',nrows=51)
    SEDS_com[iyear]['ASCI'] = 0
    for istate in np.arange(len(SEDS_com[iyear])):
        SEDS_com[iyear].loc[istate,'ASCI'] = name_dict[SEDS_com[iyear].loc[istate,'State'].strip()]
#SEDS_com[1].head(1)

# 2) Read state-level energy consumption data (Residential)
SEDS_res = dict()
for iyear in np.arange(num_years):
    SEDS_res[iyear] = pd.read_csv(EIA_SEDS_resconsump_inputfile+year_range_str[iyear]+'.csv',nrows=51)
    #SEDS_res[i] = pd.read_csv(f"./Data_stat/Residential/sum_btu_res_201{i+2}.csv",nrows=51)
    SEDS_res[iyear]['ASCI'] = 0
    for istate in np.arange(len(SEDS_res[iyear])):
        SEDS_res[iyear].loc[istate,'ASCI'] = name_dict[SEDS_res[iyear].loc[istate,'State'].strip()]
#SEDS_res[0].head(1)

# 3) Read state-level energy consumption data (Industrial)
SEDS_ind = dict()
for iyear in np.arange(0, num_years):
    SEDS_ind[iyear] = pd.read_csv(EIA_SEDS_indconsump_inputfile+year_range_str[iyear]+'.csv',nrows=51)
    SEDS_ind[iyear]['ASCI'] = 0
    for istate in np.arange(len(SEDS_ind[iyear])):
        SEDS_ind[iyear].loc[istate,'ASCI'] = name_dict[SEDS_ind[iyear].loc[istate,'State'].strip()]
        #SEDS_ind[iyear].loc[istate,'ASCI'] = int(SEDS_ind[iyear].loc[istate,'ASCI'])
#SEDS_ind[0].head(5)

##### Step 2.3.2 Allocate BTUs to the state level (commercial)

In [None]:
#Calcualte state level BTU levels for commercial SEDS data, by state and fuel type

sedscom_wood_state = np.zeros([len(State_ANSI), num_years])
sedscom_coal_state = np.zeros([len(State_ANSI), num_years])
sedscom_oil_state = np.zeros([len(State_ANSI), num_years])
sedscom_gas_state = np.zeros([len(State_ANSI), num_years])

for iyear in np.arange(num_years):
    #Calculate emissions
    
    for istate in np.arange(len(SEDS_com[iyear])):
        state_str = SEDS_com[iyear].loc[istate,'State']
        state_str = state_str.strip()
        matchstate = np.where(state_str == State_ANSI['name'])[0][0]
        #if SEDS_com[iyear].loc[istate,'State'] not in {'Alaska', 'Hawaii'}:
            #SEDS_com[iyear].loc[istate,'ASCI'] = name_dict[SEDS_com[iyear].loc[istate,'State'].strip()]
        sedscom_coal_state[matchstate,iyear] += SEDS_com[iyear].loc[istate,'Coal'] #/Coal_sum[iyear] 
        sedscom_oil_state[matchstate,iyear] += SEDS_com[iyear].loc[istate,'Total Petroleum'] #/Fuel_sum[iyear]
        sedscom_gas_state[matchstate,iyear] += SEDS_com[iyear].loc[istate,'Natural Gas'] #/NGas_sum[iyear]
        sedscom_wood_state[matchstate,iyear] += SEDS_com[iyear].loc[istate,'Wood and Waste'] #/Wood_sum[iyear]

        

##### Step 2.3.3 Allocate BTUs to the state level (residential)

In [None]:
#Calcualte state level BTU levels for residential SEDS data, by state and fuel type

sedsres_wood_state = np.zeros([len(State_ANSI), num_years])
sedsres_coal_state = np.zeros([len(State_ANSI), num_years])
sedsres_oil_state = np.zeros([len(State_ANSI), num_years])
sedsres_gas_state = np.zeros([len(State_ANSI), num_years])

for iyear in np.arange(num_years):
    #Calculate emissions
    
    for istate in np.arange(len(SEDS_res[iyear])):
        state_str = SEDS_res[iyear].loc[istate,'State']
        state_str = state_str.strip()
        matchstate = np.where(state_str == State_ANSI['name'])[0][0]
        #if SEDS_com[iyear].loc[istate,'State'] not in {'Alaska', 'Hawaii'}:
            #SEDS_com[iyear].loc[istate,'ASCI'] = name_dict[SEDS_com[iyear].loc[istate,'State'].strip()]
        sedsres_coal_state[matchstate,iyear] += SEDS_res[iyear].loc[istate,'Coal']#,Coal_sum[iyear]) 
        sedsres_oil_state[matchstate,iyear] += SEDS_res[iyear].loc[istate,'Total Petroleum']# /Fuel_sum[iyear]
        sedsres_gas_state[matchstate,iyear] += SEDS_res[iyear].loc[istate,'Natural Gas']# /NGas_sum[iyear]
        sedsres_wood_state[matchstate,iyear] += SEDS_res[iyear].loc[istate,'Wood']# /Wood_sum[iyear]

        

##### Step 2.3.4 Allocate BTUs to the state level (industrial)

In [None]:
#Calcualte relative state level BTU levels for industrial SEDS data, by state and fuel type

sedsind_wood_state = np.zeros([len(State_ANSI), num_years])
sedsind_coal_state = np.zeros([len(State_ANSI), num_years])
sedsind_oil_state = np.zeros([len(State_ANSI), num_years])
sedsind_gas_state = np.zeros([len(State_ANSI), num_years])


#print('**QA/QC: Check allocated state-level commercial emissions against GHGI')
#print('')
for iyear in np.arange(num_years):
    #Calculate emissions
    
    for istate in np.arange(len(SEDS_ind[iyear])):
        state_str = SEDS_ind[iyear].loc[istate,'State']
        state_str = state_str.strip()
        matchstate = np.where(state_str == State_ANSI['name'])[0][0]
        #if SEDS_com[iyear].loc[istate,'State'] not in {'Alaska', 'Hawaii'}:
            #SEDS_com[iyear].loc[istate,'ASCI'] = name_dict[SEDS_com[iyear].loc[istate,'State'].strip()]
        sedsind_coal_state[matchstate,iyear] += SEDS_ind[iyear].loc[istate,'Coal'] #/Coal_sum[iyear] 
        sedsind_oil_state[matchstate,iyear] += SEDS_ind[iyear].loc[istate,'Total Petroleum'] #/Fuel_sum[iyear]
        sedsind_gas_state[matchstate,iyear] += SEDS_ind[iyear].loc[istate,'Natural Gas'] #/NGas_sum[iyear]
        sedsind_wood_state[matchstate,iyear] += SEDS_ind[iyear].loc[istate,'Wood and Waste'] #/Wood_sum[iyear]


#### Step 2.4. Read In GHGRP Subpart C and D Data (Industrial Proxy)

##### Step 2.4.1 Read in Subpart C and D data

In [None]:
# Read in Subpart D and C facility lists, find the subpart C facilities that were not in subpart D list
# and then merge with subpart D list to create a complete facility information array

#facility level data for Subpart C
GHGRP_all = pd.read_csv(GHGRP_subCfacility_inputfile) 
#filter for methane emissions only
GHGRP_all = GHGRP_all[GHGRP_all['C_SUBPART_LEVEL_INFORMATION.GHG_GAS_NAME'] == 'Methane']
GHGRP_all = GHGRP_all.drop(columns = ['C_SUBPART_LEVEL_INFORMATION.FACILITY_NAME'])
GHGRP_all.reset_index(inplace=True, drop=True)

#facility level data for subpart D
GHGRP_elec = pd.read_csv(GHGRP_subDfacility_inputfile) 
#filter for methane emissions only
GHGRP_elec = GHGRP_elec[GHGRP_elec['D_SUBPART_LEVEL_INFORMATION.GHG_NAME'] == 'Methane']
GHGRP_elec = GHGRP_elec.drop(columns = ['D_SUBPART_LEVEL_INFORMATION.FACILITY_NAME'])
GHGRP_elec.reset_index(inplace=True, drop=True)
GHGRP_elec_fac = np.array(GHGRP_elec['D_SUBPART_LEVEL_INFORMATION.FACILITY_ID'])
GHGRP_elec_fac = np.unique(GHGRP_elec_fac)

GHGRP_comb_noloc = dict()

#make a list of facilities that report to subpart C that are not in the subpart D facility list
for iyear in np.arange(0,num_years):
    temp = list()
    for ifacility in np.arange(len(GHGRP_all)):       
        if GHGRP_all.loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR'] == year_range[iyear]:
            if GHGRP_all.loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.FACILITY_ID'] not in GHGRP_elec_fac:
                temp.append(GHGRP_all.loc[ifacility])
    GHGRP_comb_noloc[iyear] = pd.DataFrame(temp)

GHGRP_comb_noloc[0].head(1)

# Read Facility Info file that contains lat and lon for GHGRP facilities
#extract the reporting facilities for the most recent year
GHGRP_facloc = pd.read_csv(GHGRP_subDfacility_loc_inputfile)
sort = GHGRP_facloc.sort_values(by=['V_GHG_EMITTER_FACILITIES.YEAR'])
filter1 = sort.drop_duplicates(subset = 'V_GHG_EMITTER_FACILITIES.FACILITY_ID' , keep = 'last')
filter2 = filter1.drop(columns=['V_GHG_EMITTER_FACILITIES.YEAR','V_GHG_EMITTER_FACILITIES.STATE'])
Fac_rename = filter2.rename(columns={'V_GHG_EMITTER_FACILITIES.FACILITY_ID': 'FACILITY_ID'})

#merge the missing subpart C facility list with the Subpart D facility list to get lat and lon values for all facilities
GHGRP_combfac = dict()
for iyear in np.arange(num_years):
    Comb_rename = GHGRP_comb_noloc[iyear].rename(columns={'C_SUBPART_LEVEL_INFORMATION.FACILITY_ID': 'FACILITY_ID'})
    temp = Comb_rename.merge(Fac_rename, on = 'FACILITY_ID')
    GHGRP_combfac[iyear] = temp

#Check that no repeats were added
for iyear in np.arange(num_years):
    if  GHGRP_combfac[iyear].shape[0] != GHGRP_comb_noloc[iyear].shape[0]:
        print('Dataframe size discrepancy')

display(GHGRP_combfac[0])#.head(1)
del Fac_rename, GHGRP_all, GHGRP_elec, GHGRP_elec_fac,GHGRP_facloc,filter1,filter2

##### Step 2.4.2 Allocate facility level methane emissions to the CONUS grid (0.01x0.01)

In [None]:
# Make a 0.1x0.1 gridded map of GHGRP facility-level emissions
# also record the emissions that are not within the CONUS grid 

ghgrp_emi_array = np.zeros([area_map.shape[0],area_map.shape[1], num_years])
ghgrp_emi_array_nongrid = np.zeros([num_years])

for iyear in np.arange(num_years):
    n_plants = 0
    for ifacility in np.arange(len(GHGRP_combfac[iyear])):
        #Filter inside domain
        if GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] > Lon_left \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] < Lon_right \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] > Lat_low \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] < Lat_up:
            #find the corresponding plant ilon and ilat, record the emissions at that location
            ilat = int((GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] - Lat_low)/Res_01)
            ilon = int((GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] - Lon_left)/Res_01)
            ghgrp_emi_array[ilat,ilon,iyear] += GHGRP_combfac[iyear].loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']
            n_plants += 1
        else:
            ghgrp_emi_array_nongrid[iyear] += GHGRP_combfac[iyear].loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']
    print (year_range_str[iyear]+' Facilities: ', n_plants)
            

#### Step 2.5 Read in county-level residential wood thruput (from NEI)

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(NEI_resi_wood_inputfile, usecols = "A,D,F:G", header = 0)
colnames = names.columns.values
nei_rwc_thruput = pd.read_excel(NEI_resi_wood_inputfile, usecols = "A,D,F:G", \
                                converters = {"StateAndCountyFIPSCode": str},names = colnames)
nei_rwc_thruput.rename(columns={'StateAndCountyFIPSCode':'FIPS','SourceClassificationCode':'SCC'},inplace=True)
#remove wax logs - per recommendation from Rich Mason
#also select all data in units of TON
nei_rwc_thruput = nei_rwc_thruput.loc[nei_rwc_thruput['SCC'] != 2104009000]
nei_rwc_thruput = nei_rwc_thruput.loc[nei_rwc_thruput['ThroughputUnit'] == 'TON']
nei_rwc_thruput.reset_index(inplace=True, drop=True)
display(nei_rwc_thruput)

In [None]:
county_rwc_thruput = np.zeros([len(State_ANSI),len(County_ANSI),num_years])

for irow in np.arange(0, len(nei_rwc_thruput)):
    county_str = nei_rwc_thruput['FIPS'][irow][2:5].lstrip("0")
    state_str = int(nei_rwc_thruput['FIPS'][irow][0:2].lstrip("0"))
    #print(state_str)
    #print(county_str)
    matchstate = np.where(state_str == State_ANSI['ansi'])[0]
    #print(matchstate)
    matchcounty = np.where((County_ANSI['State']==int(state_str)) &\
                           (County_ANSI['County'] ==int(county_str)))[0]
    #print(matchcounty)
    if len(matchcounty) != 1:
        #print(county_str, matchcounty, state_str)
        if state_str ==2 and county_str == '63': #if AK, just assign to nearest approx. region
            county_str = '20'
            #print(county_str)
        elif state_str ==2 and county_str == '66': #if AK, just assign to nearest approx. region
            county_str = '261'
            #print(county_str)
        elif state_str ==2 and county_str == '158': #if AK, just assign to nearest approx. region
            county_str = '60'
            #print(county_str)
        elif state_str ==46 and county_str == '102': #Shannon county renamed in 2015 and has new FIPS code (new:102)
            county_str = '113'
            #print(county_str)
    else:
        county_rwc_thruput[matchstate[0],matchcounty[0],:] += nei_rwc_thruput['Throughput'][irow]

#### Step 2.6 Read in gridded population density data and calculate state-level populations, regrid to 0.1x0.1 degrees

In [None]:
#Read population density map
pop_den_map = data_load_fn.load_pop_den_map(pop_map_inputfile)

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

#### Step 3.1. Read in the GHGI data (in kt)

In [None]:
# Read stationary combustion emissions (units = kt)
# For electricity, industrial, commercial, residential, and total sources.
# Total emissions include U.S. territories, while the sum of combustion sources do not. 

names = pd.read_csv(EPA_stat_inputfile,  skiprows = 2, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_statcomb_CH4 = pd.read_csv(EPA_stat_inputfile, skiprows = 3, names = colnames, nrows = 26)
EPA_emi_statcomb_CH4 = EPA_emi_statcomb_CH4.fillna('')
EPA_emi_statcomb_CH4 = EPA_emi_statcomb_CH4.drop(columns = [str(n) for n in range(1990, start_year,1)])
EPA_emi_statcomb_CH4.reset_index(inplace=True, drop=True)
EPA_statcom_total = EPA_emi_statcomb_CH4[EPA_emi_statcomb_CH4['Sector/Fuel Type'] == 'Total']

##DEBUG## print('EPA GHGI Emissions (kt)')
##DEBUG## display(EPA_emi_statcomb_CH4)
##DEBUG## display(EPA_statcom_total)

display(EPA_emi_statcomb_CH4)

#### 3.2. Split Emissions into Gridding Groups (each Group will have the same proxy applied during the state allocation/gridding)

In [None]:
start_year_idx = EPA_emi_statcomb_CH4.columns.get_loc(str(start_year))
end_year_idx = EPA_emi_statcomb_CH4.columns.get_loc(str(end_year))+1
ghgi_stat_groups = ghgi_stat_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])

DEBUG =1

for igroup in np.arange(0,len(ghgi_stat_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
    ##DEBUG## print(ghgi_stat_groups[igroup])
    vars()[ghgi_stat_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_stat_map.loc[ghgi_stat_map['GHGI_Emi_Group'] == ghgi_stat_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    ##DEBUG## display(pattern_temp)
    if 'elec' in ghgi_stat_groups[igroup]:
        isector = EPA_emi_statcomb_CH4.index[EPA_emi_statcomb_CH4['Sector/Fuel Type'].str.contains('Electric Power')][0]            
    elif 'ind' in ghgi_stat_groups[igroup]: 
        isector = EPA_emi_statcomb_CH4.index[EPA_emi_statcomb_CH4['Sector/Fuel Type'].str.contains('Industrial')][0]
    elif 'com' in ghgi_stat_groups[igroup]:    
        isector = EPA_emi_statcomb_CH4.index[EPA_emi_statcomb_CH4['Sector/Fuel Type'].str.contains('Commercial/Institutional')][0]
    elif 'res' in ghgi_stat_groups[igroup]:    
        isector = EPA_emi_statcomb_CH4.index[EPA_emi_statcomb_CH4['Sector/Fuel Type'].str.contains('Residential')][0]  
    elif 'not' in ghgi_stat_groups[igroup]:
        isector = EPA_emi_statcomb_CH4.index[EPA_emi_statcomb_CH4['Sector/Fuel Type'].str.contains('U.S. Territories')][0] 
    EPA_emi_stat_temp = EPA_emi_statcomb_CH4.loc[isector+1:isector+5,] 
    emi_temp = EPA_emi_stat_temp[EPA_emi_stat_temp['Sector/Fuel Type'].str.contains(pattern_temp)]
    ##DEBUG## display(emi_temp)
    vars()[ghgi_stat_groups[igroup]][:] = emi_temp.iloc[:,start_year_idx:].sum()
    ##DEBUG## display(vars()[ghgi_stat_groups[igroup]][:])
        
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_stat_groups)):
        sum_emi[iyear] += vars()[ghgi_stat_groups[igroup]][iyear]
        
    summary_emi = EPA_statcom_total.iloc[0,iyear+1]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG ==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

--------------
## Step 4. Grid Data
-------------

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'Stationary_ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#national --> state proxies (state x year X month)
state_indu_coal = sedsind_coal_state
state_indu_wood = sedsind_wood_state
state_indu_oil= sedsind_oil_state
state_indu_gas= sedsind_gas_state
state_resi_coal = sedsres_coal_state
state_resi_wood = sedsres_wood_state
state_resi_oil = sedsres_oil_state
state_resi_gas = sedsres_gas_state
state_comm_coal = sedscom_coal_state
state_comm_wood= sedscom_wood_state
state_comm_oil= sedscom_oil_state
state_comm_gas= sedscom_gas_state

#state --> county proxies (stat X county X year (X month))
county_resi_wood = county_rwc_thruput

#national --> grid proxies (0.1x0.1)
Map_elec_wood = arp_wood_array 
Map_elec_wood_nongrid = arp_wood_array_nongrid
Map_elec_coal = arp_coal_array
Map_elec_coal_nongrid = arp_coal_array_nongrid 
Map_elec_oil = arp_oil_array
Map_elec_oil_nongrid = arp_oil_array_nongrid
Map_elec_gas = arp_gas_array
Map_elec_gas_nongrid = arp_gas_array_nongrid

#state --> grid proxies (0.01x0.01)
Map_indu = ghgrp_emi_array
Map_indu_nongrid = ghgrp_emi_array_nongrid
Map_population = np.zeros([area_map.shape[0], area_map.shape[1], num_years])

for iyear in np.arange(0,num_years):
    Map_population[:,:,iyear] = pop_den_map*area_map
    
    

# remove variables to clear space for larger arrays 
#del sedsind_coal_state,sedsind_wood_state,sedsind_oil_state,sedsind_gas_state,sedsres_coal_state,sedsres_wood_state
#del sedsres_oil_state,sedsres_gas_state,sedscom_coal_state,sedscom_wood_state,sedscom_oil_state,sedscom_gas_state
#del arp_wood_array,arp_wood_array_nongrid,arp_coal_array,arp_coal_array_nongrid,arp_oil_array,arp_oil_array_nongrid
#del arp_gas_array,arp_gas_array_nongrid,ghgrp_emi_array,ghgrp_emi_array_nongrid,pop_den_map
#del county_rwc_thruput

##### Step 4.1.2 Allocate National EPA Emissions to the State-Level

In [None]:
# Calculate state-level emissions for commencial, residential, and industrial sectors
# Emissions in kt
# State data = national GHGI emissions * state proxy/national total


# Note that national emissions are retained for groups that do not have state proxies (identified in the mapping file)
# and are gridded in the next step
DEBUG =1

# Make placeholder emission arrays for each group
for igroup in np.arange(0,len(proxy_stat_map)):
    vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years,num_months])
    vars()['NonState_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(num_years):
    #Loop over states
    for istate in np.arange(len(State_ANSI)):
        for igroup in np.arange(0,len(proxy_stat_map)):    
            if proxy_stat_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_stat_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                if proxy_stat_map.loc[igroup,'State_Month_Flag'] ==1:
                    for imonth in np.arange(0,num_months):
                        vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,imonth] = vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]* \
                            data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']][istate,iyear,imonth], np.sum(vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']][:,iyear,:]))   
                else:
                    for imonth in np.arange(0,num_months):
                        vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,imonth] = (1/12) * vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                            data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']][istate,iyear], np.sum(vars()[proxy_stat_map.loc[igroup,'State_Proxy_Group']][:,iyear]))
            else:
                vars()['NonState_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear] = vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]
                
# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_statcom_total.iloc[0,iyear+1] 
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_stat_map)):
        calc_emi +=  np.sum(vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,iyear,:])+\
            vars()['NonState_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear] #np.sum(Emissions[:,iyear]) + Emissions_nongrid[iyear] + Emissions_nonstate[iyear]
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0002:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

In [None]:
del state_indu_coal,state_indu_wood ,state_indu_oil,state_indu_gas,state_resi_coal,state_resi_wood,state_resi_oil
del state_resi_gas,state_comm_coal,state_comm_wood,state_comm_oil,state_comm_gas, pop_den_map

##### 4.1.3 4.1.3 Allocate emissions to the county level

In [None]:
# Calculate county-level emissions (kt)
# Emissions in kt
# County data = state emissions * county proxy /state total

# If there are emissions in a state but no proxy data (e.g., wood thruput) available in the entire state, 
# emissions are allocated within that state by relative county areas 

DEBUG = 1

# Note that national emissions are retained for groups that do not have state or county proxies (identified in the mapping file)
# and are gridded in the next step

# Make placeholder emission arrays for each group
for igroup in np.arange(0,len(proxy_stat_map)):
    #if proxy_rice_map.loc[igroup,'State_Month_Flag'] ==1:
    vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = \
            np.zeros([len(State_ANSI),len(County_ANSI),num_years,num_months])
    #else:
    #    vars()['State_'+proxy_rice_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
    vars()['NonCounty_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(0,num_years):
    running_sum = np.zeros([num_years])
    
    for igroup in np.arange(0,len(proxy_stat_map)): 
        #print(igroup)
        if proxy_stat_map.loc[igroup,'County_Proxy_Group'] != '-' and proxy_stat_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
            #print(proxy_stat_map.loc[igroup,'County_Proxy_Group'])
            
            for icounty in np.arange(0,len(County_ANSI)):
                istate = np.where(State_ANSI['ansi']==County_ANSI['State'][icounty])[0][0]
                state_ansi = State_ANSI['ansi'][istate]
                #print(icounty, istate)            
                emi_temp = vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,:]
                frac_temp = data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'County_Proxy_Group']][istate,icounty,iyear], \
                            np.sum(vars()[proxy_stat_map.loc[igroup,'County_Proxy_Group']][istate,:,iyear]))
                #print(np.sum(emi_temp))
                for imonth in np.arange(0,num_months):
                    if np.sum(emi_temp) > 0 and frac_temp > 0:
                        vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,icounty,iyear,imonth] = emi_temp[imonth] * frac_temp
                    elif np.sum(emi_temp) > 0 and np.sum(vars()[proxy_stat_map.loc[igroup,'County_Proxy_Group']][istate,:,iyear]) == 0:
                        frac_temp = data_fn.safe_div(County_ANSI.loc[icounty,'Area'],np.sum(County_ANSI['Area'][County_ANSI['State'] == state_ansi]))
                        vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,icounty,iyear,imonth] = emi_temp[imonth] * frac_temp  
        
        else: #add data not allocated to county
            if proxy_stat_map.loc[igroup,'State_Proxy_Group'] != '-'and proxy_stat_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                running_sum[iyear] += (np.sum(vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,iyear,:]))
            else:
                running_sum[iyear] += vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]

                
    vars()['NonCounty_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear] = running_sum[iyear]

# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_statcom_total.iloc[0,iyear+1] 
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_stat_map)):
        calc_emi +=  np.sum(vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])#+\    
    calc_emi += vars()['NonCounty_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]
    
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
        #print(running_sum)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

##### 4.1.4 Allocate emissions to the CONUS region (0.1x0.1)

In [None]:
# To speed up the code, this notebook does not loop through each county, but instead loops through
# each lat/lon value in the CONUS region. Emissions are allocated based on the fraction of 
# the proxy that is in each grid cell relative to the total in that county. 
# Since the code is not using county masks, the sum of each proxy for each county/state pair
# must first be calcualted. 
# This chunk calculates the county totals for county-level proxy group


#for each group that was allocated to county level,...
#For each grid box that falls within the continental US geographic bounds, keep a running sum of grid proxy to calculate 
# the total proxy value within each state and county. 
for igroup in np.arange(0,len(proxy_stat_map)):
    if proxy_stat_map.loc[igroup,'County_Proxy_Group'] != '-' and \
            proxy_stat_map.loc[igroup,'County_Proxy_Group'] != 'county_not_mapped':
        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_countysum'] = np.zeros([len(State_ANSI),len(County_ANSI),num_years])
        print(proxy_stat_map.loc[igroup,'Proxy_Group'])
        for ilat in np.arange(0, len(lat001)):
            print(ilat, 'of',len(lat001))
            for ilon in np.arange(0, len(lon001)):
                if state_ANSI_map[ilat,ilon] > 0: #only includes CONUS region
                    istate = np.where(State_ANSI['ansi']==state_ANSI_map[ilat,ilon])[0][0]
                    icounty = np.where((County_ANSI['State']==state_ANSI_map[ilat,ilon]) & \
                                    (County_ANSI['County']==county_ANSI_map[ilat,ilon]))[0][0]
                    #Area_sum[istate,icounty] += area_map[ilat,ilon]
                    #for iyear in np.arange(0, num_years):
                    vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_countysum'][istate,icounty,:] += \
                        vars()[proxy_stat_map.loc[igroup,'Proxy_Group']][ilat,ilon,:]
            print(np.sum(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_countysum'][:,:,0]))
                

In [None]:
# Allocate State-Level emissions (kt) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

#Define emission arrays
#Emissions_array = np.zeros([area_map.shape[0],area_map.shape[1],num_years,num_months])
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years, num_months])
Emissions_nongrid = np.zeros([num_years])
month_days = np.zeros([num_years, num_months])

DEBUG=1 
# For each year, (2a) distribute county-level emissions onto a grid using proxies defined above ....
# To speed up the code, masks are used rather than looping individually through each lat/lon (for state gridding only). 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given state
# The masked values are set to zero, remaining values = 1. 
# For each year, (2b), if emission groups have been previously allocated to the state-level, then allocate to grid
# AK and HI and territories are removed from the analysis at this stage. 
# The final emissions allocated to the grid are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. 
######Emission arrays are re-gridded to 0.1x0.1 degrees as looping through monthly high-resolution
# grids was prohibitively slow
# (2c) For emission groups that were not first allocated to states, national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2d ) - record 'not mapped' emission groups in the 'non-grid' array


print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
running_sum = np.zeros([len(proxy_stat_map),num_years])
#running_sum2 = np.zeros([len(proxy_stat_map),num_years])
for igroup in np.arange(len(proxy_stat_map)):
    vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])


#first calculate the number of days in each month and year (to weight proxy)
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        #year_days = np.sum(month_day_leap)
        month_days[iyear,:] = month_day_leap
    else:
        #year_days = np.sum(month_day_nonleap)
        month_days[iyear,:] = month_day_nonleap 
    #running_count = 0
    
#1. Step through each gridding group
for igroup in np.arange(0,len(proxy_stat_map)):
    print(igroup, 'of',len(proxy_stat_map))
    # 1. weight proxy by the number of days in each month (depending on whether proxy has month res or not)
    proxy_temp = vars()[proxy_stat_map.loc[igroup,'Proxy_Group']].copy()
    proxy_temp_nongrid = vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'].copy()
    if proxy_stat_map.loc[igroup,'Grid_Month_Flag'] ==1:
        for iyear in np.arange(0,num_years):
            for imonth in np.arange(0, num_months):
                proxy_temp[:,:,iyear,imonth] *= month_days[iyear,imonth]
                proxy_temp_nongrid[iyear,imonth] *= month_days[iyear,imonth]
    else:
        for iyear in np.arange(0,num_years):
            proxy_temp[:,:,iyear] *= np.sum(month_days[iyear,:])
            proxy_temp_nongrid[iyear] *= np.sum(month_days[iyear,:])
        ##DEBUG## print("group " + str(igroup) +' of '+ str(len(proxy_stat_map)))
        
        #2a. first check if allocated to county level, step through each county...
    if proxy_stat_map.loc[igroup,'County_Proxy_Group'] != '-' and proxy_stat_map.loc[igroup,'County_Proxy_Group'] != 'county_not_mapped':
        
        ##****
        #proxy_temp = Map_animal_area_rank
        #proxy_temp_nongrid = Map_animal_area_rank_nongrid
        #calculated in script above
        proxy_temp = vars()[proxy_stat_map.loc[igroup,'Proxy_Group']].copy()
        proxy_temp_sum = vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_countysum'].copy()
        emi_temp = np.zeros([len(lat001),len(lon001),num_years])
        #area_map_sum = Area_sum
        
        for ilat in np.arange(0,len(lat001)):
            print(ilat, 'of',len(lat001))
            for ilon in np.arange(0,len(lon001)):
                if state_ANSI_map[ilat,ilon] > 0:
                    istate = np.where(State_ANSI['ansi']==state_ANSI_map[ilat,ilon])[0][0]
                    icounty = np.where((County_ANSI['State']==state_ANSI_map[ilat,ilon]) & \
                                    (County_ANSI['County']==county_ANSI_map[ilat,ilon]))[0][0]
                    county_temp = vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,icounty,:,:]
                    if np.sum(county_temp) > 0:
                        for iyear in np.arange(0,num_years):
                            if np.sum(proxy_temp_sum[istate,icounty,iyear]) >0: # if there is proxy data in the county, allocate by that proxy in each grid cell relative to county sum
                                weighted_array = data_fn.safe_div(proxy_temp[ilat,ilon,iyear],\
                                                          proxy_temp_sum[istate,icounty,iyear]) #counts at grid cell/counts in county
                                emi_temp[ilat,ilon,iyear] += np.sum(county_temp[iyear,:])*weighted_array
                                running_sum[igroup,iyear] += np.sum(weighted_array*county_temp[iyear,:])
                                
                            elif proxy_temp_sum[istate,icounty,iyear] == 0: # if no proxy data in county, #FLAG## use relative area as proxy
                                print('check',istate,icounty)

            print(running_sum[igroup,0])
        for iyear in np.arange(0,num_years):
            emi_01 = data_fn.regrid001_to_01(emi_temp[:,:,iyear],Lat_01, Lon_01)
            for imonth in np.arange(0,num_months):
                Emissions_array_01[:,:,iyear,imonth]+= (1/12)*emi_01
            vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_01
            Emissions_nongrid[iyear] += np.sum(vars()['County_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])- np.sum(emi_01) 
            
            print(Emissions_nongrid[iyear])
            print(igroup, np.sum(Emissions_array_01[:,:,iyear,:]))
                
        
    #2b.if instead allocated to state-level, Step through each state (if group was previously allocated to state level)
    elif proxy_stat_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_stat_map.loc[igroup,'State_Proxy_Group'] != 'state_not_mapped':
        for istate in np.arange(0,len(State_ANSI)):
            mask_state = np.ma.ones(np.shape(state_ANSI_map))
            mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
            mask_state = np.ma.filled(mask_state,0)   
            ##DEBUG## print("state " + str(istate) +' of '+ str(len(State_ANSI)))
            state_temp = vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,:,:]
            for iyear in np.arange(0,num_years):
                if State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51 : 
                    if proxy_stat_map.loc[igroup, 'Grid_Month_Flag'] == 1:
                        for imonth in np.arange(0,num_months):
                            if np.sum(mask_state*proxy_temp[:,:,iyear,imonth]) > 0:
                                # if state is on grid and proxy for that state is non-zero
                                weighted_array = data_fn.safe_div(mask_state*proxy_temp[:,:,iyear,imonth], np.sum(mask_state*proxy_temp[:,:,iyear,imonth]))
                                weighted_array_01 = data_fn.regrid001_to_01(weighted_array, Lat_01, Lon_01)
                                emi_temp = state_temp[iyear,imonth]*weighted_array_01
                                Emissions_array_01[:,:,iyear,imonth] += emi_temp
                                vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_01
                            else:
                                #for imonth in np.arange(0,num_months):
                                Emissions_nongrid[iyear] += state_temp[iyear,imonth]
                    
                    else:
                        if np.sum(mask_state*proxy_temp[:,:,iyear]) > 0 and State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51: 
                            weighted_array = data_fn.safe_div(mask_state*proxy_temp[:,:,iyear], np.sum(mask_state*proxy_temp[:,:,iyear]))
                            weighted_array_01 = data_fn.regrid001_to_01(weighted_array, Lat_01, Lon_01)
                            for imonth in np.arange(0,num_months):
                                emi_temp = state_temp[iyear,imonth]*weighted_array_01
                                Emissions_array_01[:,:,iyear,imonth] += emi_temp
                                vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_temp
                        else: 
                            #for imonth in np.arange(0,num_months):
                            Emissions_nongrid[iyear] += np.sum(state_temp[iyear,:])
                else: 
                        #for imonth in np.arange(0,num_months):
                        Emissions_nongrid[iyear] += np.sum(state_temp[iyear,:])
                ##DEBUG## running_count += np.sum(vars()['State_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,:])
                
                ##DEBUG## print(running_count)
                ##DEBUG## print(np.sum(Emissions_array_01[:,:,iyear,:]) +np.sum(Emissions_nongrid[iyear,:]))
        print(igroup, np.sum(Emissions_array_01[:,:,iyear,:]))
        print(Emissions_nongrid[iyear])
                
    #2c. if instead emissions are not allocated to state or county, allocate national total to grid here
    elif proxy_stat_map.loc[igroup,'State_Proxy_Group'] == '-':
        nat_temp = vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']]
        for iyear in np.arange(0,num_years):
            if proxy_stat_map.loc[igroup, 'Grid_Month_Flag'] == 1: 
                temp_sum = np.sum(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']][:,:,iyear,:])+np.sum(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,:])
                for imonth in np.arange(0, num_months):
                    emi_temp = nat_temp[iyear] * data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']][:,:,iyear,imonth], temp_sum)
                    Emissions_array_01[:,:,iyear,imonth] += emi_temp
                    vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_temp
                    Emissions_nongrid[iyear] += nat_temp[iyear] * \
                        data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,imonth], temp_sum)
            else:
                temp_sum = np.sum(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear])
                for imonth in np.arange(0,num_months):
                    emi_temp = (1/12) * nat_temp[iyear] * data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
                    Emissions_array_01[:,:,iyear,imonth] += emi_temp
                    vars()['Ext_'+proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_temp
                    Emissions_nongrid[iyear] += (1/12) * nat_temp[iyear] *\
                        data_fn.safe_div(vars()[proxy_stat_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            ##DEBUG## running_count += vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]
        print(igroup, np.sum(Emissions_array_01[:,:,iyear,:]))
        print(Emissions_nongrid[iyear])
                
    #2d. this is the case that GHGI emissions are not mapped (e.g., specified outside of CONUS in the GHGI)
    elif proxy_stat_map.loc[igroup,'Proxy_Group'] == 'Map_not_mapped':    
        for iyear in np.arange(0,num_years):
            #for imonth in np.arange(0,num_months):
            Emissions_nongrid[iyear] += vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            ##DEBUG## running_count += vars()[proxy_stat_map.loc[igroup,'GHGI_Emi_Group']][iyear]
        ##DEBUG## print(running_count)
        ##DEBUG## print(np.sum(Emissions_array_01[:,:,iyear,:]) +np.sum(Emissions_nongrid[iyear,:]))
        print(igroup, np.sum(Emissions_array_01[:,:,iyear,:]))
        print(Emissions_nongrid[iyear])
            

for iyear in np.arange(0,num_years):    
    calc_emi = np.sum(Emissions_array_01[:,:,iyear,:]) + np.sum(Emissions_nongrid[iyear]) 
    summary_emi = EPA_statcom_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG ==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_stat_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')
#nc_out.createDimension('state', len(State_ANSI))

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# conversion: kt emissions to molec/cm2/s flux

DEBUG=1

Flux_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    calc_emi = 0
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap
        
    for imonth in np.arange(0,num_months):
        conversion_factor_01 = 10**9 * Avogadro / float(Molarch4 *month_days[imonth] * 24 * 60 *60) / area_matrix_01
        conv_factor2 = month_days[imonth]/year_days
        Flux_array_01[:,:,iyear,imonth] = Emissions_array_01[:,:,iyear,imonth]*conversion_factor_01
        Flux_array_01_annual[:,:,iyear] += Flux_array_01[:,:,iyear,imonth]*conv_factor2
        calc_emi += np.sum(Flux_array_01[:,:,iyear,imonth]/conversion_factor_01)
    #convert back to mass to check
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi += np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_statcom_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual

-------------
## Step 5. Write netCDF
------------

In [None]:
# monthly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_month_outputfile, netCDF_description_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_month_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_array_01
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_month_outputfile)

# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)

----------
## Step 6. Plot Gridded Data
---------

#### Step 6.1. Plot Annual Emission Fluxes

In [None]:
#Plot Annual Data
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str, scale_max,save_flag,save_outfile)

#### Step 6.2 Plot Difference between first and last inventory year

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_outfile)

In [None]:
ct = datetime.datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_1A_Stationary_Combustion: COMPLETE **')