# Gridded EPA Methane Inventory
## Category: Post Meter Emissions

***
#### Authors: 
Erin E. McDuffie
#### Date Last Updated: 
see Step 0
#### Notebook Purpose
This notebook calculates gridded (0.1⁰x0.1⁰) annual emission fluxes of methane (molecules CH4/cm2/s) from post meter activities in the CONUS region for the years 2012 - 2018. Emission fluxes are reported at an annual time resolution. 
Emissions are calculated from the 2022 version of the GHGI (which extends to 2020)
#### Summary & Notes 
The national EPA GHGI emissions data are read in from the GHGI Post Meter workbook. Residential and commercial emissions use the same proxy datasets as other GEPA distribution segement residential and commercial customer meter emissions. Data are first allocated to each state using EIA customer counts and then to the grid using gridded population. For Industrial and EGU emissions, data are allocated using the same proxies as used for industrial and EGU stationary combustion sources. For industrial, emissions are allocated to the state using EIA SEDS data and then to the grid using GHGRP facility level information. For EGUs, emissions are allocated directly to the grid using EPA facility level Acid Rain Program data. For CNG vehicles, emissions are allocated to each state using CNG vehcile counts from MOVES and then to the grid using population. Total emissions are converted to annual emision fluxes (molec./cm2/s) and are written to final netCDFs in the '/code/Final_Gridded_Data/Supplement' folder. 
***

--------------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory
import os
import time
modtime = os.path.getmtime('./Ext_PostMeter.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
# Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import datetime
from copy import copy

# Import additional modules
# Load plotting package Basemap 
# Must also specify project library path [unique to each user])
#os.environ["PROJ_LIB"] = "C:/Users//Anaconda3/Library/share/proj"
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../../Global_Functions/'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Load user-defined global functions (modules)
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]

# Specify names of inputs files used in this notebook
EPA_post_meter_inputfile = "../../Global_InputData/GHGI/Ch3_Energy/Copy of NG_Post-Meter_1990-2020.xlsx" #EPA_Enteric_Cattle.csv"

#Activity Data
EPA_ARP_inputfile = '../../GEPA_Combustion_Stationary/InputData/ARP_Data/EPA_ARP_FacilityEmissions.csv'
EIA_SEDS_indconsump_inputfile = "../../GEPA_Combustion_Stationary/InputData/EIA_SEDS/Industrial/sum_btu_ind_"

#GHGRP Data (reporting format changed in 2015)
GHGRP_subCfacility_inputfile = "../../GEPA_Combustion_Stationary/InputData/GHGRP/GHGRP_SubpartCEmissions.csv" #subpart C facility IDs and emissions (locations not available)
GHGRP_subDfacility_inputfile = "../../GEPA_Combustion_Stationary/InputData/GHGRP/GHGRP_SubpartDEmissions.csv" #subpart D facility IDs and emissions 
GHGRP_subDfacility_loc_inputfile = "../../GEPA_Combustion_Stationary/InputData/GHGRP/GHGRP_FacilityInfo.csv" #subpart D facility info (for all years, with ID & lat and lons)

#EIA Data
EIA_Residential_CC_inputfile = '../../GEPA_Gas_Distribution/InputData/EIA_CustomerCounts/NG_CONS_NUM_A_EPG0_VN3_COUNT_A.xls'
EIA_Commercial_CC_inputfile = '../../GEPA_Gas_Distribution/InputData/EIA_CustomerCounts/NG_CONS_NUM_A_EPG0_VN5_COUNT_A.xls'

#Moves Data
moves_inputfile = './InputData/CNG_Vehicle_By_State.csv'

#Proxy Data file
PM_Mapping_inputfile = "./InputData/PostMeter_ProxyMapping.xlsx"

#Specify names of gridded output files
gridded_outputfile = '../../Final_Gridded_Data/Supplement/EPA_v2_Supp_PostMeter.nc'
netCDF_description = 'Supplement to the Gridded EPA Inventory - Post Meter Emissions - IPCC Source Category 1B2b'
title_str = "EPA methane emissions from post meter"
title_diff_str = "Emissions from post meter difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../../Final_Gridded_Data/Extension/v2_input_data/PostMeter_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01                   # degrees
tg_scale   = 0.001                  #Tg scale number [New file allows for the exclusion of the territories] 


# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg

loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]
ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;

In [None]:
# Track run time
ct = datetime.datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State and County ANSI data and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict, abbr_dict = data_load_fn.load_state_ansi('../'+State_ANSI_inputfile)[0:3]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map('../'+Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001('../'+Grid_area001_inputfile)


#County ANSI Data
#Includes State ANSI number, county ANSI number, county name, and country area (square miles)
#County_ANSI = pd.read_csv(County_ANSI_inputfile,encoding='latin-1')

#QA: number of counties
#print ('Read input file: ' + f"{'../'+County_ANSI_inputfile}")
#print('Total "Counties" found (include PR): ' + '%.0f' % len(County_ANSI))
#print(' ')

#Create a placeholder array for county data
#county_array = np.zeros([len(County_ANSI),3])

#Populate array with State ANSI number (0), county ANSI number (1), and county area (2)
#for icounty in np.arange(0,len(County_ANSI)):
#    county_array[icounty,0] = int(County_ANSI.values[icounty,0])
#    county_array[icounty,1] = int(County_ANSI.values[icounty,1])
#    county_array[icounty,2] = County_ANSI.values[icounty,3]

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map('../'+Grid_state001_ansi_inputfile)
state_ANSI_map = state_ANSI_map.astype('int32')
#county_ANSI_map = data_load_fn.load_county_ansi_map(Grid_county001_ansi_inputfile)
#county_ANSI_map = county_ANSI_map.astype('int32')
area_map, lat001, lon001 = data_load_fn.load_area_map_001('../'+Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state and county ANSI maps
area_map01, Lat01, Lon01 = data_load_fn.load_area_map_01('../'+Grid_area01_inputfile)[0:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2

state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)

# Print time
ct = datetime.datetime.now() 
print("current time:", ct) 

---------------------------------------------
## Step 2. Read in and Format Proxy Data
--------------------------------

#### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(PM_Mapping_inputfile, sheet_name = "GHGI Map - PM", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_pm_map = pd.read_excel(PM_Mapping_inputfile, sheet_name = "GHGI Map - PM", usecols = "A:B", skiprows = 1, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_pm_map = ghgi_pm_map[ghgi_pm_map['GHGI_Emi_Group'] != 'na']
ghgi_pm_map = ghgi_pm_map[ghgi_pm_map['GHGI_Emi_Group'].notna()]
ghgi_pm_map['GHGI_Source']= ghgi_pm_map['GHGI_Source'].str.replace(r"\(","")
ghgi_pm_map['GHGI_Source']= ghgi_pm_map['GHGI_Source'].str.replace(r"\)","")
ghgi_pm_map.reset_index(inplace=True, drop=True)
display(ghgi_pm_map)

#load emission group - proxy map
names = pd.read_excel(PM_Mapping_inputfile, sheet_name = "Proxy Map - PM", usecols = "A:G",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_pm_map = pd.read_excel(PM_Mapping_inputfile, sheet_name = "Proxy Map - PM", usecols = "A:G", skiprows = 1, names = colnames)
display((proxy_pm_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_pm_map)):
    if proxy_pm_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        vars()[proxy_pm_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_pm_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_pm_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_pm_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
    
    if proxy_pm_map.loc[igroup,'State_Proxy_Group'] != '-':
        if proxy_pm_map.loc[igroup,'State_Month_Flag'] == 0:
            vars()[proxy_pm_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years])
        else:
            vars()[proxy_pm_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years,num_months])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file
        
    if proxy_pm_map.loc[igroup,'County_Proxy_Group'] != '-':
        if proxy_pm_map.loc[igroup,'County_Month_Flag'] == 0:
            vars()[proxy_pm_map.loc[igroup,'County_Proxy_Group']] = np.zeros([len(State_ANSI),len(County_ANSI),num_years])
        else:
            vars()[proxy_pm_map.loc[igroup,'County_Proxy_Group']] = np.zeros([len(State_ANSI),len(County_ANSI),num_years,num_months])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file

        
emi_group_names = np.unique(ghgi_pm_map['GHGI_Emi_Group'])

print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_pm_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')

### 2.2 Read In and Format EPA (Acid Rain Program) Electric Power Emissions (Electric Energy Proxy)

##### 2.2.1 Read in EPA power plant facility information and calculate facility-level emissions

In [None]:
# Read EPA ARP data for individual power plants. 
# Calculate emissions from the unit type and fuel type to calculate CH4 emission factor to apply to Heat Input.
#https://ampd.epa.gov/ampd/

fields = ['State', ' Year',' Month', ' Facility Name',' Facility Latitude',' Facility Longitude',' Unit Type', \
          ' Fuel Type (Primary)', ' Heat Input (MMBtu)']
ARP_Raw = pd.read_csv(EPA_ARP_inputfile, usecols = fields, index_col=False, na_filter = False)

# make a multidimensional dictionary that contains the data for each year
# for calculations later, replace empty heat input values with NaNs and convert
# to numeric (otherwise data in scientific notation are read in as strings)
ARP_facilities = dict()
for iyear in np.arange(num_years):
    ARP_facilities[iyear] = ARP_Raw[ARP_Raw[' Year'] == year_range[iyear]]
    ARP_facilities[iyear].fillna(0)#,inplace = True)
    ARP_facilities[iyear].reset_index(inplace=True)
    temp = pd.to_numeric(ARP_facilities[iyear].loc[:,' Heat Input (MMBtu)'], errors='coerce')
    temp.fillna(0,inplace=True)
    ARP_facilities[iyear].loc[:,' Heat Input (MMBtu)'] = temp


# Clean up and standardize the Unit Type labels
# Assign values to a temporary dataframe to avoid settingwithcopy warning

# For each year, check the unit types and report a clean string version in a new column
for iyear in np.arange(num_years):
    temp = pd.DataFrame.from_dict(ARP_facilities[iyear])

    for ifacility in np.arange(len(ARP_facilities[iyear])):
        if re.search('combustion turbine',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'combustion turbine'
        elif re.search('combined cycle',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'combined cycle'
        elif re.search('wet bottom',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'wet bottom'
        elif re.search('dry bottom',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'dry bottom'
        elif re.search('bubbling',ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()) != None:
            temp.loc[ifacility,'Unit_clean'] = 'bubbling'
        else:
            temp.loc[ifacility,'Unit_clean'] = ARP_facilities[iyear].loc[ifacility,' Unit Type'].lower()  

        ARP_facilities[iyear] = temp.copy()

#Clean up and standardize the fuel type labels
# Assign values to a temporary dataframe to avoid settingwithcopy warning

# For each year, check the primary fuel types and consolidate into Gas, Coal, Oil, and Wood fuel categories
for iyear in np.arange(num_years):
    
    temp = pd.DataFrame.from_dict(ARP_facilities[iyear])
    for ifacility in np.arange(len(ARP_facilities[iyear])):
        if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Gas, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Natural Gas, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Process Gas':
            temp.loc[ifacility,'Fuel_clean'] = 'Gas'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Petroleum Coke' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Wood' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal, Coal Refuse' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Coal Refuse' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Solid Fuel':
            temp.loc[ifacility,'Fuel_clean'] = 'Coal'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil, Residual Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Diesel Oil, Pipeline Natural Gas' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil, Pipeline Natural Gas':
            temp.loc[ifacility,'Fuel_clean'] = 'Oil'
        elif ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Wood' \
         or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Other Solid Fuel, Wood':
            temp.loc[ifacility,'Fuel_clean'] = 'Wood'
        else:
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] != '':
                print(ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'])  
                
    ARP_facilities[iyear] = temp.copy()  

del ARP_Raw

##### 2.2.2. Add the CH4 factor (kg gas/ TJ energy input) to the data dictionary, then calculate methane flux

In [None]:
# CH4 factor is based on the unit type and fuel used
# From 'Acid Rain Prog - Unit-level Fuel+Technology' file in InputData Folder

for iyear in np.arange(num_years):
    ARP_facilities[iyear].loc[:,'CH4_f'] = 0.0

    for ifacility in np.arange(len(ARP_facilities[iyear])):
        # Gas: combined cycle or turbine= 3.7, 
        # Gas: others (Assume stoker, tangentially-fired, dry bottom, & wet bottom are boilers) = 1
        if ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Gas':
            if ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'combined cycle' \
             or re.search('turbine',ARP_facilities[iyear].loc[ifacility,'Unit_clean'].lower()) != None:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 3.7     
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1
        # Coal: tangentially-fired, dry bottom = 0.7
        # Coal: wet bottom = 0.9
        # Coal: Cyclone boiler = 0.2
        # Coal: others (boilers, combined cycle) = 1
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Coal':
            if ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'tangentially-fired' \
             or ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'dry bottom':
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.7
            elif ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'wet bottom':   
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.9
            elif ARP_facilities[iyear].loc[ifacility,'Unit_clean'] == 'cyclone boiler':   
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.2
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1   
        # Wood: assume all are recover boilders = 1
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Wood':
            ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 1   
        # Oil: Reisdual oil, pipeline natural gas = 0.8
        # Oil: Others = 0.9
        elif ARP_facilities[iyear].loc[ifacility,'Fuel_clean'] == 'Oil':
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil' \
             or ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] == 'Residual Oil, Pipeline Natural Gas':
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.8
            else:
                ARP_facilities[iyear].loc[ifacility,'CH4_f'] = 0.9
        else:
            if ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'] != '':
                print(ARP_facilities[iyear].loc[ifacility,' Fuel Type (Primary)'])
    

#Calculate the methane flux at each facility and the flux by fuel at each facility relative to the national total. 
for iyear in np.arange(num_years):
    # Calculate fluxes
    ARP_facilities[iyear]['CH4_flux'] = 0.0
    ARP_facilities[iyear]['CH4_flux'] = ARP_facilities[iyear]['CH4_f'] * ARP_facilities[iyear][' Heat Input (MMBtu)']

##### Step 2.2.3 Allocate flux data (as a function gas fuel type only) to grid arrays and to the state level

In [None]:
arp_gas_array_mon = np.zeros([len(Lat_01),len(Lon_01),num_years, num_months])
arp_gas_array_mon_nongrid = np.zeros([num_years,num_months])
arp_gas_array = np.zeros([len(Lat_01),len(Lon_01),num_years,])
arp_gas_array_nongrid = np.zeros([num_years])


for iyear in np.arange(num_years):
    #count=0
    var = 'CH4_flux'
    for ifacility in np.arange(len(ARP_facilities[iyear])):
        imon = int(ARP_facilities[iyear].loc[ifacility,' Month'] - 1)
        istate = np.where(ARP_facilities[iyear].loc[ifacility,'State'] == State_ANSI['abbr'])[0][0]
        #Filter inside Continental US domain
        if ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] > Lon_left \
         and ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] < Lon_right \
         and ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] > Lat_low \
         and ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] < Lat_up:
            #Find the index values of each facility lat and lon within the Continental US grid 
            ilat = int((ARP_facilities[iyear].loc[ifacility,' Facility Latitude'] - Lat_low)/Res01)
            ilon = int((ARP_facilities[iyear].loc[ifacility,' Facility Longitude'] - Lon_left)/Res01)
            if ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Gas':
                arp_gas_array_mon[ilat,ilon,iyear,imon] += ARP_facilities[iyear].loc[ifacility,var]
        else:    
            if ARP_facilities[iyear].loc[ifacility, 'Fuel_clean'] == 'Gas':
                arp_gas_array_mon_nongrid[iyear,imon] += ARP_facilities[iyear].loc[ifacility,var] 
                
for iyear in np.arange(num_years):
    arp_gas_array[:,:,iyear] = np.sum(arp_gas_array_mon[:,:,iyear,:],axis=2)
    arp_gas_array_nongrid[iyear] = np.sum(arp_gas_array_mon_nongrid[iyear,:])
                       

#### Step 2.3 Read In and Format EIA SEDS (State Engery Data System) Energy Consumption Data (Commercial, Residential, Industrial Proxies)

##### Step 2.3.1 Read In EIA SEDS Data

In [None]:

# 3) Read state-level energy consumption data (Industrial)
SEDS_ind = dict()
for iyear in np.arange(0, num_years):
    SEDS_ind[iyear] = pd.read_csv(EIA_SEDS_indconsump_inputfile+year_range_str[iyear]+'.csv',nrows=51)
    SEDS_ind[iyear]['ASCI'] = 0
    for istate in np.arange(len(SEDS_ind[iyear])):
        SEDS_ind[iyear].loc[istate,'ASCI'] = name_dict[SEDS_ind[iyear].loc[istate,'State'].strip()]


##### Step 2.3.2 Allocate BTUs to the state level (industrial)

In [None]:
#Calcualte relative state level BTU levels for industrial SEDS data, by state and fuel type

sedsind_gas_state = np.zeros([len(State_ANSI), num_years])


#print('**QA/QC: Check allocated state-level commercial emissions against GHGI')
#print('')
for iyear in np.arange(num_years):
    #Calculate emissions
    
    for istate in np.arange(len(SEDS_ind[iyear])):
        state_str = SEDS_ind[iyear].loc[istate,'State']
        state_str = state_str.strip()
        matchstate = np.where(state_str == State_ANSI['name'])[0][0]
        sedsind_gas_state[matchstate,iyear] += SEDS_ind[iyear].loc[istate,'Natural Gas'] #/NGas_sum[iyear]      

#### Step 2.4. Read In GHGRP Subpart C and D Data (Industrial Proxy)

##### Step 2.4.1 Read in Subpart C and D data

In [None]:
# Read in Subpart D and C facility lists, find the subpart C facilities that were not in subpart D list
# and then merge with subpart D list to create a complete facility information array

#facility level data for Subpart C
GHGRP_all = pd.read_csv(GHGRP_subCfacility_inputfile) 
#filter for methane emissions only
GHGRP_all = GHGRP_all[GHGRP_all['C_SUBPART_LEVEL_INFORMATION.GHG_GAS_NAME'] == 'Methane']
GHGRP_all = GHGRP_all.drop(columns = ['C_SUBPART_LEVEL_INFORMATION.FACILITY_NAME'])
GHGRP_all.reset_index(inplace=True, drop=True)

#facility level data for subpart D
GHGRP_elec = pd.read_csv(GHGRP_subDfacility_inputfile) 
#filter for methane emissions only
GHGRP_elec = GHGRP_elec[GHGRP_elec['D_SUBPART_LEVEL_INFORMATION.GHG_NAME'] == 'Methane']
GHGRP_elec = GHGRP_elec.drop(columns = ['D_SUBPART_LEVEL_INFORMATION.FACILITY_NAME'])
GHGRP_elec.reset_index(inplace=True, drop=True)
GHGRP_elec_fac = np.array(GHGRP_elec['D_SUBPART_LEVEL_INFORMATION.FACILITY_ID'])
GHGRP_elec_fac = np.unique(GHGRP_elec_fac)

GHGRP_comb_noloc = dict()

#make a list of facilities that report to subpart C that are not in the subpart D facility list
for iyear in np.arange(0,num_years):
    temp = list()
    for ifacility in np.arange(len(GHGRP_all)):       
        if GHGRP_all.loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR'] == year_range[iyear]:
            if GHGRP_all.loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.FACILITY_ID'] not in GHGRP_elec_fac:
                temp.append(GHGRP_all.loc[ifacility])
    GHGRP_comb_noloc[iyear] = pd.DataFrame(temp)

GHGRP_comb_noloc[0].head(1)

# Read Facility Info file that contains lat and lon for GHGRP facilities
#extract the reporting facilities for the most recent year
GHGRP_facloc = pd.read_csv(GHGRP_subDfacility_loc_inputfile)
sort = GHGRP_facloc.sort_values(by=['V_GHG_EMITTER_FACILITIES.YEAR'])
filter1 = sort.drop_duplicates(subset = 'V_GHG_EMITTER_FACILITIES.FACILITY_ID' , keep = 'last')
filter2 = filter1.drop(columns=['V_GHG_EMITTER_FACILITIES.YEAR','V_GHG_EMITTER_FACILITIES.STATE'])
Fac_rename = filter2.rename(columns={'V_GHG_EMITTER_FACILITIES.FACILITY_ID': 'FACILITY_ID'})

#merge the missing subpart C facility list with the Subpart D facility list to get lat and lon values for all facilities
GHGRP_combfac = dict()
for iyear in np.arange(num_years):
    Comb_rename = GHGRP_comb_noloc[iyear].rename(columns={'C_SUBPART_LEVEL_INFORMATION.FACILITY_ID': 'FACILITY_ID'})
    temp = Comb_rename.merge(Fac_rename, on = 'FACILITY_ID')
    GHGRP_combfac[iyear] = temp

#Check that no repeats were added
for iyear in np.arange(num_years):
    if  GHGRP_combfac[iyear].shape[0] != GHGRP_comb_noloc[iyear].shape[0]:
        print('Dataframe size discrepancy')

display(GHGRP_combfac[0])#.head(1)
del Fac_rename, GHGRP_all, GHGRP_elec, GHGRP_elec_fac,GHGRP_facloc,filter1,filter2

##### Step 2.4.2 Allocate facility level methane emissions to the CONUS grid (0.01x0.01)

In [None]:
# Make a 0.1x0.1 gridded map of GHGRP facility-level emissions
# also record the emissions that are not within the CONUS grid 

#ghgrp_emi_array = np.zeros([num_years,area_map.shape[0],area_map.shape[1]])
ghgrp_emi_array = np.zeros([area_map.shape[0],area_map.shape[1], num_years])
ghgrp_emi_array_nongrid = np.zeros([num_years])

for iyear in np.arange(num_years):
    n_plants = 0
    for ifacility in np.arange(len(GHGRP_combfac[iyear])):
        #Filter inside domain
        if GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] > Lon_left \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] < Lon_right \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] > Lat_low \
         and GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] < Lat_up:
            #find the corresponding plant ilon and ilat, record the emissions at that location
            ilat = int((GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LATITUDE'] - Lat_low)/Res_01)
            ilon = int((GHGRP_combfac[iyear].loc[ifacility,'V_GHG_EMITTER_FACILITIES.LONGITUDE'] - Lon_left)/Res_01)
            ghgrp_emi_array[ilat,ilon,iyear] += GHGRP_combfac[iyear].loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']
            n_plants += 1
        else:
            ghgrp_emi_array_nongrid[iyear] += GHGRP_combfac[iyear].loc[ifacility,'C_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']
    print (year_range_str[iyear]+' Facilities: ', n_plants)
            

##### Step 2.5 State-level EIA Customer Counts

In [None]:
# Read company level EIA data, but only load the totals for all companies across each state
EIA_ResCounts = pd.read_excel(EIA_Residential_CC_inputfile, sheet_name = 'Data 1', skiprows=2)
EIA_ComCounts = pd.read_excel(EIA_Commercial_CC_inputfile, sheet_name = 'Data 1', skiprows=2)
    
    #convert the time stamp to year
EIA_ResCounts['Date'] = EIA_ResCounts['Date'].astype(str)
EIA_ComCounts['Date'] = EIA_ComCounts['Date'].astype(str)
EIA_ResCounts['Date'] = [EIA_ResCounts['Date'][i][0:4] for i in np.arange(len(EIA_ResCounts))]   #extract the year
EIA_ComCounts['Date'] = [EIA_ComCounts['Date'][i][0:4] for i in np.arange(len(EIA_ComCounts))]   #extract the year
#transpose and reset the column and row indexes
EIA_ResCounts = EIA_ResCounts.T
EIA_ComCounts = EIA_ComCounts.T
EIA_ResCounts.columns = EIA_ResCounts.iloc[0]
EIA_ComCounts.columns = EIA_ComCounts.iloc[0]
EIA_ResCounts = EIA_ResCounts.drop(EIA_ResCounts.index[[0,1]])
EIA_ComCounts = EIA_ComCounts.drop(EIA_ComCounts.index[[0,1]])
#extract the state names and format as abbreviations
Names_res = EIA_ResCounts.index.values.tolist()
Names_com = EIA_ComCounts.index.values.tolist()
EIA_ResCounts.reset_index(drop=True,inplace=True)
EIA_ComCounts.reset_index(drop=True,inplace=True)
EIA_ResCounts['State'] = Names_res
EIA_ComCounts['State'] = Names_com
    
for istate in np.arange(0,len(State_ANSI)-6): #### -6
    #print(State_ANSI['name'][istate])
    match_state = np.where(EIA_ResCounts['State'].str.contains(State_ANSI['name'][istate]))[0][0]
    EIA_ResCounts['State'][match_state] = State_ANSI['abbr'][istate]
    #print(match_state)
    match_state = np.where(EIA_ComCounts['State'].str.contains(State_ANSI['name'][istate]))[0][0]
    EIA_ComCounts['State'][match_state] = State_ANSI['abbr'][istate]
    #print(match_state)

# Initialize state array of leak losses (State X year) 
res_counts = np.zeros((len(State_ANSI),num_years))
com_counts = np.zeros((len(State_ANSI),num_years))

    # To this array, add ANSI value for each state, then for each year,
    # make a new leakvolume array that records the yearly state-level leak loss volume data
    # Data only extend back to 1997, so use 1997 values for years 1990-1996

for ifacility in np.arange(0,len(EIA_ResCounts)):
    match_state1 = np.where(State_ANSI['abbr'] == EIA_ResCounts['State'][ifacility])[0][0]
    match_state2 = np.where(State_ANSI['abbr'] == EIA_ComCounts['State'][ifacility])[0][0]

    for iyear in np.arange(num_years):
        res_counts[match_state1][iyear] += EIA_ResCounts[str(iyear+start_year)][ifacility]
        com_counts[match_state2][iyear] += EIA_ComCounts[str(iyear+start_year)][ifacility]

# Fill in data gaps (interpolate to fill zeros between years and extend most historical value back to 1990)
res_counts = np.nan_to_num(res_counts)
com_counts = np.nan_to_num(com_counts)

for iyear in np.arange(0,num_years):
    print(year_range_str[iyear]+' Total Counts: ')
    print(' Residential Customers: '+str(res_counts.sum(axis=0)[iyear]))
    print(' Commercial Customers: '+str(com_counts.sum(axis=0)[iyear]))

### Step 2.6. MOVES CNG Vehicle Counts

In [None]:
#Data were pulled from the MOVES model on CNG vehicle counts by state for the years 1990, 1999, and 2020. 
# Data are interpolated between years

#Initialize array
MOVES_cng_full = np.zeros([len(State_ANSI),2020-1990])
MOVES_cng = np.zeros([len(State_ANSI),num_years])

#post_meter_inputfile

moves = pd.read_csv(moves_inputfile, sep=',')
idx_1990 = 1990-start_year
idx_1999 = 1999-start_year
idx_2020 = 2020-start_year

moves = moves[['yearID','stateAbbr','VehCount_ERG']]

for istate in np.arange(0,len(State_ANSI)):
    #print(istate)
    subset = moves[moves['stateAbbr'] == State_ANSI['abbr'][istate]]
    if len(subset) > 1:
        #display(subset)
        MOVES_cng_full[istate,idx_1990] = subset.loc[subset['yearID'] == 1990, 'VehCount_ERG']
        MOVES_cng_full[istate,idx_1999] = subset.loc[subset['yearID'] == 1999, 'VehCount_ERG']
        MOVES_cng_full[istate,idx_2020] = subset.loc[subset['yearID'] == 2020, 'VehCount_ERG']
    
# Fill in data gaps (interpolate to fill zeros between years and extend most historical value back to 1990)
for istate in np.arange(0,len(MOVES_cng)):
    temp = MOVES_cng_full[istate][:] 
    if MOVES_cng_full[istate][0] ==0:
        index = np.argmax(temp > 0)    
        temp[0:index] = temp[index]      #extend most historical value back to 1990 if no 1990 data available
    temp = pd.Series(temp)       
    temp.replace(0,np.NaN, inplace=True)
    temp = temp.interpolate().values      #interpolate to fill missing years
    MOVES_cng_full[istate][:] = temp            #reasaign to original array

    MOVES_cng_full = np.nan_to_num(MOVES_cng_full)
    #display(MOVES_cng[istate,:])
    
#extract desired timeseries
MOVES_cng[:,:] = MOVES_cng_full[:,start_year-1990:end_year-1990+1]

### Step 2.7 Read in gridded population density data and calculate state-level populations, regrid to 0.1x0.1 degrees

In [None]:
#Read population density map
pop_den_map = data_load_fn.load_pop_den_map('../'+pop_map_inputfile)

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

In [None]:
#NG CH4 in units of metric tonnes, converted to kt
names = pd.read_excel(EPA_post_meter_inputfile, sheet_name = "Post-Meter Summary", usecols = "A:AD", skiprows = 1, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_postm_CH4 = pd.read_excel(EPA_post_meter_inputfile, sheet_name = "Post-Meter Summary", usecols = "A:AD", skiprows = 1, names = colnames, nrows = 6)
EPA_emi_postm_CH4.rename(columns={'Year':'Source'},inplace=True)
EPA_emi_postm_CH4 = EPA_emi_postm_CH4.drop(columns = [n for n in range(1990, start_year,1)])
EPA_emi_postm_CH4.iloc[:,1:] = EPA_emi_postm_CH4.iloc[:,1:]/1000 #covert from metric tons to kt
EPA_emi_postm_total = EPA_emi_postm_CH4[EPA_emi_postm_CH4['Source'] == 'Post-Meter Total']

display(EPA_emi_postm_CH4)
display(EPA_emi_postm_total)


#### 3.2. Split Emissions into Gridding Groups (each Group will have the same proxy applied during the state allocation/gridding)

In [None]:
#in units of kt

start_year_idx = EPA_emi_postm_CH4.columns.get_loc((start_year))
end_year_idx = EPA_emi_postm_CH4.columns.get_loc((end_year))+1
ghgi_pm_groups = ghgi_pm_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])

DEBUG =1

for igroup in np.arange(0,len(ghgi_pm_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year
    #print(igroup)
    vars()[ghgi_pm_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_pm_map.loc[ghgi_pm_map['GHGI_Emi_Group'] == ghgi_pm_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    #print(pattern_temp)
    emi_temp = EPA_emi_postm_CH4[EPA_emi_postm_CH4['Source'].str.contains(pattern_temp)]
    vars()[ghgi_pm_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)
    #print(emi_temp)    
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_pm_groups)):
        sum_emi[iyear] += vars()[ghgi_pm_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_postm_total.iloc[0,iyear+1]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG ==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

--------------
## Step 4. Grid Data
-------------

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'Stationary_ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#national --> state proxies (state x year)
state_indu_gas= sedsind_gas_state
state_residential = res_counts
state_commercial = com_counts
state_cng_vehicles = MOVES_cng

#national --> grid proxies (0.1x0.1)
Map_elec_gas = arp_gas_array
Map_elec_gas_nongrid = arp_gas_array_nongrid

#state --> grid proxies (0.01x0.01)
Map_indu = ghgrp_emi_array
Map_indu_nongrid = ghgrp_emi_array_nongrid
Map_population = np.zeros([area_map.shape[0], area_map.shape[1], num_years])

for iyear in np.arange(0,num_years):
    Map_population[:,:,iyear] = pop_den_map*area_map



##### Step 4.1.2 Allocate National EPA Emissions to the State-Level

In [None]:
# Calculate state-level emissions for 
# Emissions in kt
# State data = national GHGI emissions * state proxy/national total


# Note that national emissions are retained for groups that do not have state proxies (identified in the mapping file)
# and are gridded in the next step
DEBUG =1

# Make placeholder emission arrays for each group
for igroup in np.arange(0,len(proxy_pm_map)):
    #if proxy_pm_map.loc[igroup,'State_Month_Flag'] ==1:
    vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
    #else:
    #    vars()['State_'+proxy_dist_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
    vars()['NonState_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(num_years):
    #Loop over states
    for istate in np.arange(len(State_ANSI)):
        for igroup in np.arange(0,len(proxy_pm_map)):    
            if proxy_pm_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_pm_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear] =  vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    data_fn.safe_div(vars()[proxy_pm_map.loc[igroup,'State_Proxy_Group']][istate,iyear], np.sum(vars()[proxy_pm_map.loc[igroup,'State_Proxy_Group']][:,iyear]))
            else:
                vars()['NonState_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear] = vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear]
                
# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_emi_postm_total.iloc[0,iyear+1] 
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_pm_map)):
        calc_emi +=  np.sum(vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])+\
            vars()['NonState_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear] #np.sum(Emissions[:,iyear]) + Emissions_nongrid[iyear] + Emissions_nonstate[iyear]
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

##### 4.1.3 Allocate emissions to the CONUS region (0.1x0.1)

In [None]:
# Allocate State-Level emissions (kt) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

#Define emission arrays
#Emissions_array = np.zeros([area_map.shape[0],area_map.shape[1],num_years,num_months])
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_nongrid = np.zeros([num_years])

DEBUG=1 
# To speed up the code, masks are used rather than looping individually through each lat/lon. 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given county
# The masked values are set to zero, remaining values = 1. 
# For each year, (2a), if emission groups have been previously allocated to the state-level, then allocate to grid
# AK and HI and territories are removed from the analysis at this stage. 
# The final emissions allocated to the grid are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. 
######Emission arrays are re-gridded to 0.1x0.1 degrees as looping through monthly high-resolution
# grids was prohibitively slow
# (2b) For emission groups that were not first allocated to states, national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2c ) - record 'not mapped' emission groups in the 'non-grid' array


print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
#make emission group array to save later (0.1x0.1 degrees)
for igroup in np.arange(len(proxy_pm_map)):
    vars()['Ext_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])

for iyear in np.arange(0,num_years):
    print(iyear, 'year of',num_years)
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap 
    running_count = 0
    
    #1. Step through each gridding group
    for igroup in np.arange(0,len(proxy_pm_map)):
        print(igroup, 'group of',len(proxy_pm_map))

        # 1. weight proxy by the number of days in each month (depending on whether proxy has month res or not)
        proxy_temp = vars()[proxy_pm_map.loc[igroup,'Proxy_Group']]
        proxy_temp_nongrid = vars()[proxy_pm_map.loc[igroup,'Proxy_Group']+'_nongrid']
        if proxy_pm_map.loc[igroup,'Grid_Month_Flag'] ==1:
            for imonth in np.arange(0, num_months):
                proxy_temp[:,:,iyear,imonth] *= month_days[imonth]
                proxy_temp_nongrid[iyear,imonth] *= month_days[imonth]
        else:
            proxy_temp[:,:,iyear] *= np.sum(month_days)
            proxy_temp_nongrid[iyear] *= np.sum(month_days)
        ##DEBUG## print("group " + str(igroup) +' of '+ str(len(proxy_pm_map)))
                        
        
        #2a.if  allocated to state-level, Step through each state (if group was previously allocated to state level)
        if proxy_pm_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_pm_map.loc[igroup,'State_Proxy_Group'] != 'state_not_mapped':
            for istate in np.arange(0,len(State_ANSI)):
                mask_state = np.ma.ones(np.shape(state_ANSI_map))
                mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
                mask_state = np.ma.filled(mask_state,0)   
                if np.sum(mask_state*proxy_temp[:,:,iyear]) > 0 and State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51: 
                    weighted_array = data_fn.safe_div(mask_state*proxy_temp[:,:,iyear], np.sum(mask_state*proxy_temp[:,:,iyear]))
                    weighted_array_01 = data_fn.regrid001_to_01(weighted_array, Lat_01, Lon_01)
                    #for imonth in np.arange(0,num_months):
                    grid_emi = vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear]*weighted_array_01
                    Emissions_array_01[:,:,iyear] += grid_emi
                    vars()['Ext_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += grid_emi
                else: 
                    #for imonth in np.arange(0,num_months):
                    Emissions_nongrid[iyear] += vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear]
                ##DEBUG## running_count += np.sum(vars()['State_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,:])
                
                ##DEBUG## print(running_count)
                ##DEBUG## print(np.sum(Emissions_array_01[:,:,iyear,:]) +np.sum(Emissions_nongrid[iyear,:]))
            #print(igroup, np.sum(Emissions_array_01[:,:,iyear]))
            #print(Emissions_nongrid[iyear])
                
        #2b. if instead emissions are not allocated to state , allocate national total to grid here
        elif proxy_pm_map.loc[igroup,'State_Proxy_Group'] == '-':
            temp_sum = np.sum(vars()[proxy_pm_map.loc[igroup,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_pm_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear])
            #for imonth in np.arange(0,num_months):
            grid_emi = vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                data_fn.safe_div(vars()[proxy_pm_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
            Emissions_array_01[:,:,iyear] += grid_emi
            vars()['Ext_'+proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += grid_emi
            Emissions_nongrid[iyear] += vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear] *\
                data_fn.safe_div(vars()[proxy_pm_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            ##DEBUG## running_count += vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            #print(igroup, np.sum(Emissions_array_01[:,:,iyear]))
            #print(Emissions_nongrid[iyear])
                
        #2d. this is the case that GHGI emissions are not mapped (e.g., specified outside of CONUS in the GHGI)
        elif proxy_pm_map.loc[igroup,'Proxy_Group'] == 'Map_not_mapped':    
            #for imonth in np.arange(0,num_months):
            Emissions_nongrid[iyear] += vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            ##DEBUG## running_count += vars()[proxy_pm_map.loc[igroup,'GHGI_Emi_Group']][iyear]
        ##DEBUG## print(running_count)
        ##DEBUG## print(np.sum(Emissions_array_01[:,:,iyear,:]) +np.sum(Emissions_nongrid[iyear,:]))
            #print(igroup, np.sum(Emissions_array_01[:,:,iyear]))
            #print(Emissions_nongrid[iyear])
            
    #Emissions_array_01[:,:,iyear,:] += data_fn.regrid001_to_01(Emissions_array[:,:,iyear,:], Lat_01, Lon_01) #covert to 10x10km
    calc_emi = np.sum(Emissions_array_01[:,:,iyear]) + np.sum(Emissions_nongrid[iyear]) 
    summary_emi = EPA_emi_postm_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG ==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = proxy_pm_map['GHGI_Emi_Group']
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')
#nc_out.createDimension('state', len(State_ANSI))

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    #print(len(np.shape(vars()['Ext_'+unique_groups[igroup]])))
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]][:,:,:]
        print(np.shape(ghgi_temp))
        print(np.sum(ghgi_temp[:,:,0]))

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# conversion: kt emissions to molec/cm2/s flux

Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    calc_emi = 0
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
    else:
        year_days = np.sum(month_day_nonleap)

    conversion_factor_01 = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    Flux_array_01_annual[:,:,iyear] = Emissions_array_01[:,:,iyear]*conversion_factor_01
    #convert back to mass to check
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi = np.sum(Flux_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_emi_postm_total.iloc[0,iyear+1]
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual

-------------
## Step 5. Write netCDF
------------

In [None]:
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded post meter emissions written to file: {}" .format(os.getcwd())+gridded_outputfile)

----------
## Step 6. Plot Gridded Data
---------

#### Step 6.1. Plot Annual Emission Fluxes

In [None]:
#Plot Annual Data
scale_max = 5
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_outfile)

#### Step 6.2 Plot Difference between first and last inventory year

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_outfile)

In [None]:
ct = datetime.datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_Post_Meter_Supplement: COMPLETE **')