# Gridded EPA Methane Inventory
## Category: 1B2a Petroleum Systems

***
#### Authors: 
Joannes D. Maasakkers, Erin E. McDuffie
#### Date Last Updated: 
See Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual and monthly gridded methane emission fluxes (molec./cm2/s) from Petroleum Systems (production, transport, and refining segments) in the CONUS region between 2012-2018.
#### Summary & Notes:
EPA GHGI Petroleum system emissions are read in from the GHGI Petroleum Systems workbook at the national level. Emissions are then distributed onto a 0.1x0.1 degree grid as a function of emission group. The activity/proxy data used to spatially distribute emissions from each group include well locations and production levels from Enverus (DI and Prism), Greenhouse Gas Reporting Program (GHGRP) refinery emissions, and BOEM GOADS and BSEE platform emissions and location data for Federal Offshore emissions. Emissions data are calculated as a function of month, largely determined by whether a well was producing in a particular month or not (e.g., from Enverus/BOEM). Some proxy data are only available with annual data and are allocated evenly across each month. Both monthly and annual emission fluxes (molec./cm2/s) are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder. Individual data files are written for total petroleum systems, as well as exploration&production, oil transport, and refining segments. 
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory
import os
import time
modtime = os.path.getmtime('./1B2a_Petroleum_Systems.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import pyodbc
import PyPDF2 as pypdf
import tabula as tb
from datetime import datetime
from copy import copy
import shapefile as shp

# Import additional modules
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Load functions
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
## SPECIFY RECALS ##

# Specify which sections to re-calculate or load from previously saved arrays
# for time saving purposes
#0 = load from saved files, 1 = re-calculate

# 1) ReCalc Enverus Production Data?
ReCalc_Enverus =1

# 2) ReCalc Offshore GOADS Data
ReCalc_GOADS = 0

# 3) Re-Calc NEI Indiana and Illinois data?
ReCalc_NEI = 0

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
#pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]
print(globalinputlocation)

# EPA Inventory Data
EPA_Petr_inputfile = globalinputlocation+'GHGI/Ch3_Energy/PetroleumSystems_1990-2018_2020-04-11.xlsx'

#proxy mapping file
Petr_Mapping_inputfile = './InputData/Petroleum_ProxyMapping.xlsx'

#NEI grid reference
NEI_grid_ref_inputfile = globalinputlocation+'Gridded/NEI_Reference_Grid_LCC_to_WGS84_latlon.shp'

#ERG/NEI Spatial Surrogate Data
ERG_NEI_inputloc = globalinputlocation+'NEI/ERG_ILINData/CONUS_SA_FILES_'
ERG_NEI_inputloc_2018 = globalinputlocation+'NEI/ERG_ILINData/IL_IN_ALLOCATED_WELL_LEVEL_DATA_2018_2019/IL_IN_WELL_LEVEL_DATA.accdb'

#ERG Processed Well Count and Production Notebook
Enverus_WellCounts_inputfile = globalinputlocation+'Enverus/Enverus DrillingInfo Processing - Well Counts_2021-03-17.xlsx'
Enverus_WellProd_inputfile = globalinputlocation+'Enverus/Enverus DrillingInfo Processing - Well Prod_2021-03-17.xlsx'

#Activity Data
Enverus_Prism_inputdata_2019 = globalinputlocation+ 'Enverus/Production/prism_monthly_2019_110221.csv'
Enverus_Prism_inputdata_2018 = globalinputlocation+ 'Enverus/Production/prism_monthly_2018_110221.csv'
Enverus_Prism_inputdata_2017 = globalinputlocation+ 'Enverus/Production/prism_monthly_2017_110221.csv'
Enverus_Prism_inputdata_2016 = globalinputlocation+ 'Enverus/Production/prism_monthly_2016_110221.csv'
Enverus_Prism_inputdata_2015 = globalinputlocation+ 'Enverus/Production/prism_monthly_2015_110221.csv'
Enverus_Prism_inputdata_2014 = globalinputlocation+ 'Enverus/Production/prism_monthly_2014_110221.csv'
Enverus_Prism_inputdata_2013 = globalinputlocation+ 'Enverus/Production/prism_monthly_2013_110221.csv'
Enverus_Prism_inputdata_2012 = globalinputlocation+ 'Enverus/Production/prism_monthly_2012_110221.csv'

Enverus_DI_inputdata_2019 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2019_102621.csv'
Enverus_DI_inputdata_2018 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2018_102621.csv'
Enverus_DI_inputdata_2017 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2017_102621.csv'
Enverus_DI_inputdata_2016 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2016_102621.csv'
Enverus_DI_inputdata_2015 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2015_102621.csv'
Enverus_DI_inputdata_2014 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2014_102621.csv'
Enverus_DI_inputdata_2013 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2013_102621.csv'
Enverus_DI_inputdata_2012 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2012_102621.csv'

# Offshore GOADS Data
GOADS_11_inputfile = globalinputlocation+'BOEM/2011_Gulfwide_Platform_Inventory.accdb'
GOADS_14_inputfile = globalinputlocation+'BOEM/2014_Gulfwide_Platform_Inventory_20161102.accdb'
GOADS_17_inputfile = globalinputlocation+'BOEM/2017_Gulfwide_Platform_Inventory_20190705_CAP_GHG.accdb'
ERG_GOADSEmissions_inputfile = globalinputlocation+'BOEM/BOEM GEI Emissions Data_EmissionSource_2020-03-11.xlsx'
#BSEE Pacific Data
BSEE_platformloc_inputdata = "./InputData/platlocpacdelimit.txt"
BSEE_platformmaster_inputdata = "./InputData/platmastpacdelimit.txt"
BSEE_prod_2012_inputdata = "./InputData/ogor2012pacdelimit.txt"
BSEE_prod_2013_inputdata = "./InputData/ogor2013pacdelimit.txt"
BSEE_prod_2014_inputdata = "./InputData/ogor2014pacdelimit.txt"
BSEE_prod_2015_inputdata = "./InputData/ogor2015pacdelimit.txt"
BSEE_prod_2016_inputdata = "./InputData/ogor2016pacdelimit.txt"
BSEE_prod_2017_inputdata = "./InputData/ogor2017pacdelimit.txt"
BSEE_prod_2018_inputdata = "./InputData/ogor2018pacdelimit.txt"

#GHGRP data
GHGRP_facility_inputfile = './InputData/ghgrp_facility_info.CSV'
ghgrp_refinery_inputfile = './InputData/SubpartY_PetrRefinery_Emissions.CSV' 


#OUTPUT FILES
#Total Petroleum Systems
gridded_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems.nc'
gridded_monthly_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Monthly.nc'
netCDF_description = 'Gridded EPA Inventory - Total Petroleum Systems Emissions - IPCC Source Category 1B2b'
netCDF_description_m = 'Gridded EPA Inventory - Total Monthly Petroleum Systems Emissions - IPCC Source Category 1B2b'
title_str = "EPA methane emissions from petroleum systems"
title_diff_str = "Emissions from petroleum systems difference: 2018-2012"

#Exploration
gridded_expl_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Exploration.nc'
gridded_monthly_expl_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Exploration_Monthly.nc'
netCDF_description_expl = 'Gridded EPA Inventory - Petroleum Systems Emissions - IPCC Source Category 1B2a - Exploration'
netCDF_description_expl_m = 'Gridded EPA Inventory - Monthly Petroleum Systems Emissions - IPCC Source Category 1B2a - Exploration'
title_expl_str = "EPA methane emissions from exploration"
title_diff_expl_str = "Emissions from exploration difference: 2018-2012"

#Production
gridded_prod_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Production.nc'
gridded_monthly_prod_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Production_Monthly.nc'
netCDF_description_prod = 'Gridded EPA Inventory - Petroleum Systems Emissions - IPCC Source Category 1B2a - Production'
netCDF_description_prod_m = 'Gridded EPA Inventory - Monthly Petroleum Systems Emissions - IPCC Source Category 1B2a - Production'
title_prod_str = "EPA methane emissions from production"
title_diff_prod_str = "Emissions from production difference: 2018-2012"

#Oil Transport
gridded_trans_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Transport.nc'
gridded_monthly_trans_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Transport_Monthly.nc'
netCDF_description_trans = 'Gridded EPA Inventory - Petroleum Systems Emissions - IPCC Source Category 1B2a - Oil Transport'
netCDF_description_trans_m = 'Gridded EPA Inventory - Monthly Petroleum Systems Emissions - IPCC Source Category 1B2a - Oil Transport'
title_trans_str = "EPA methane emissions from oil transport"
title_diff_trans_str = "Emissions from oil transport difference: 2018-2012"

#Oil Refining
gridded_ref_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Refining.nc'
gridded_monthly_ref_outputfile = '../Final_Gridded_Data/EPA_v2_1B2a_Petroleum_Systems_Refining_Monthly.nc'
netCDF_description_ref = 'Gridded EPA Inventory - Petroleum Systems Emissions - IPCC Source Category 1B2a - Refining'
netCDF_description_ref_m = 'Gridded EPA Inventory - Monthly Petroleum Systems Emissions - IPCC Source Category 1B2a - Refining'
title_ref_str = "EPA methane emissions from refining"
title_diff_ref_str = "Emissions from refining difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Petroleum_Grid_Emi.nc'
grid_emi_prod_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Petr_Production_Grid_Emi.nc'
grid_emi_trans_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Petr_Transport_Grid_Emi.nc'
grid_emi_ref_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Petr_Refining_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_tag = ['01','02','03','04','05','06','07','08','09','10','11','12']
month_dict = {'January':1, 'February':2,'March':3,'April':4,'May':5,'June':6, 'July':7,'August':8,'September':9,'October':10,\
             'November':11,'December':12}

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)
num_regions = 7

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data, NEMS definitions, and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict, abbr_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:3]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

#County ANSI Data
#Includes State ANSI number, county ANSI number, county name, and country area (square miles)
#pd_counties = pd.read_csv(County_ANSI_inputfile,encoding='latin-1')

#QA: number of counties
#print ('Read input file: ' + f"{County_ANSI_inputfile}")
#print('Total "Counties" found (include PR): ' + '%.0f' % len(pd_counties))
#print(' ')

#Create a placeholder array for county data
#county_array = np.zeros([len(pd_counties),3])

#Populate array with State ANSI number (0), county ANSI number (1), and county area (2)
#for icounty in np.arange(0,len(pd_counties)):
#    county_array[icounty,0] = int(pd_counties.values[icounty,0])
#    county_array[icounty,1] = int(pd_counties.values[icounty,1])
#    county_array[icounty,2] = pd_counties.values[icounty,3]
    
# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
#county_ANSI_map = data_load_fn.load_county_ansi_map(Grid_county001_ansi_inputfile)
#county_ANSI_map = county_ANSI_map.astype('int32')
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[1:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2
#state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)

# Print time
ct = datetime.now() 
print("current time:", ct) 

-------------
## Step 2: Read-in and Format Proxy Data
-------------

### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - E&P", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_prod_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - E&P", usecols = "A:B", skiprows = 2, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_prod_map = ghgi_prod_map[ghgi_prod_map['GHGI_Emi_Group'] != 'na']
ghgi_prod_map = ghgi_prod_map[ghgi_prod_map['GHGI_Emi_Group'].notna()]
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\(","- ")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\)","")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r'"',"")
ghgi_prod_map.reset_index(inplace=True, drop=True)
display(ghgi_prod_map)

#load emission group - proxy map
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - E&P", usecols = "A:C",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_prod_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - E&P", usecols = "A:C", skiprows = 1, names = colnames)
display((proxy_prod_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_prod_map)):
    if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
    else:
        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
        vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        
#Transport
#load GHGI Mapping Groups
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - Trans", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_trans_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - Trans", usecols = "A:B", skiprows = 2, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_trans_map = ghgi_trans_map[ghgi_trans_map['GHGI_Emi_Group'] != 'na']
ghgi_trans_map = ghgi_trans_map[ghgi_trans_map['GHGI_Emi_Group'].notna()]
ghgi_trans_map['GHGI_Source']= ghgi_trans_map['GHGI_Source'].str.replace(r"\(","- ")
ghgi_trans_map['GHGI_Source']= ghgi_trans_map['GHGI_Source'].str.replace(r"\)","")
ghgi_trans_map['GHGI_Source']= ghgi_trans_map['GHGI_Source'].str.replace(r'"',"")
ghgi_trans_map.reset_index(inplace=True, drop=True)
display(ghgi_prod_map)

#load emission group - proxy map
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - Trans", usecols = "A:C",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_trans_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - Trans", usecols = "A:C", skiprows = 1, names = colnames)
display((proxy_prod_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_trans_map)):
    if proxy_trans_map.loc[igroup, 'Month_Flag'] == 1:
        vars()[proxy_trans_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_trans_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        vars()[ghgi_trans_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
    else:
        vars()[proxy_trans_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_trans_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
        vars()[ghgi_trans_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        
#Refining
#load GHGI Mapping Groups
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - Ref", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_ref_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "GHGI Map - Ref", usecols = "A:B", skiprows = 2, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_ref_map = ghgi_ref_map[ghgi_ref_map['GHGI_Emi_Group'] != 'na']
ghgi_ref_map = ghgi_ref_map[ghgi_ref_map['GHGI_Emi_Group'].notna()]
ghgi_ref_map['GHGI_Source']= ghgi_ref_map['GHGI_Source'].str.replace(r"\(","- ")
ghgi_ref_map['GHGI_Source']= ghgi_ref_map['GHGI_Source'].str.replace(r"\)","")
ghgi_ref_map['GHGI_Source']= ghgi_ref_map['GHGI_Source'].str.replace(r'"',"")
ghgi_ref_map.reset_index(inplace=True, drop=True)
display(ghgi_prod_map)

#load emission group - proxy map
names = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - Ref", usecols = "A:C",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_ref_map = pd.read_excel(Petr_Mapping_inputfile, sheet_name = "Proxy Map - Ref", usecols = "A:C", skiprows = 1, names = colnames)
display((proxy_prod_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_ref_map)):
    if proxy_ref_map.loc[igroup, 'Month_Flag'] == 1:
        vars()[proxy_ref_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_ref_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        vars()[ghgi_ref_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
    else:
        vars()[proxy_ref_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_ref_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
        vars()[ghgi_ref_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        
#create empty arrays that will be used as part of calculations, but will be renamed or combined before the final mapping
#Enverus state pacific
Map_StatePacOffshore =  np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])


### Step 2.2 Read In GHGRP Data

In [None]:
# GHGRP Emissions from Refineries (subpart Y, units: metric tonnes, converted to kt)
# Also read in GHGRP facility location information and match facilities in the emissions dataset
# with facility location information in the facilities dataset, based on matching facility IDs.

# Make Map_Refineries (will be assigned to proxy map variable later)
Map_GHGRPRefineries = np.zeros([len(Lat_01),len(Lon_01),num_years])
Map_GHGRPRefineries_nongrid = np.zeros([num_years])

facility_info = pd.read_csv(GHGRP_facility_inputfile)
ghgrp_facility_emissions = pd.read_csv(ghgrp_refinery_inputfile)
#filter for methane only
ghgrp_facility_emissions = ghgrp_facility_emissions[ghgrp_facility_emissions['Y_SUBPART_LEVEL_INFORMATION.GHG_NAME']=='Methane']
ghgrp_facility_emissions.reset_index(drop=True,inplace=True)

ghgrp_facility_emissions['Lat'] = 0
ghgrp_facility_emissions['Lon'] = 0

#find facility lat and lon based on matching facility ID
for ifacility in np.arange(0,len(ghgrp_facility_emissions)):
    ilocation = np.where(facility_info['V_GHG_EMITTER_FACILITIES.FACILITY_ID'] == ghgrp_facility_emissions['Y_SUBPART_LEVEL_INFORMATION.FACILITY_ID'][ifacility])[0][0]
    ghgrp_facility_emissions.loc[ifacility,'Lat'] = facility_info['V_GHG_EMITTER_FACILITIES.LATITUDE'][ilocation]
    ghgrp_facility_emissions.loc[ifacility,'Lon'] = facility_info['V_GHG_EMITTER_FACILITIES.LONGITUDE'][ilocation]

for iyear in np.arange(0,num_years):
    temp_data = ghgrp_facility_emissions[ghgrp_facility_emissions['Y_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR']==year_range[iyear]]
    temp_data.reset_index(drop=True,inplace=True)
    for ifacility in np.arange(0,len(temp_data)):
        if temp_data['Lon'][ifacility] > Lon_left and temp_data['Lon'][ifacility] < Lon_right \
            and temp_data['Lat'][ifacility] > Lat_low and temp_data['Lat'][ifacility] < Lat_up:
            ilat = int((temp_data['Lat'][ifacility] - Lat_low)/Res01)
            ilon = int((temp_data['Lon'][ifacility] - Lon_left)/Res01)
            Map_GHGRPRefineries[ilat,ilon,iyear] += temp_data['Y_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY'][ifacility]/1000
        else:
            Map_GHGRPRefineries_nongrid[iyear] += temp_data['Y_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY'][ifacility]/1000
    print('Year: ',year_range[iyear])    
    print('Total refinery emissions on grid (kt): ',np.sum(Map_GHGRPRefineries[:,:,iyear]))
    print('Total refinery emissions offgrid (kt): ',np.sum(Map_GHGRPRefineries_nongrid[iyear]))

### Step 2.3. Read In GOADS Data

#### Step 2.1.1 - Initialize arrays

In [None]:
#initialize GOADS maps array (will be assigned to proxy map variable later)
Map_GOADSmajor_emissions = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Map_GOADSminor_emissions = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])

#### Step 2.2.2. 2011 Data

In [None]:
#Read In data for 2011 (use for 2012, 2014, and 2017). Interpolate between for missing years
# goal: populating Map_FedGOM_Offshore to allocate federal offshore GOM emissions (state GOM allocated with Enverus)

#Only run if need to save new file (takes a few hours to run)
if ReCalc_GOADS ==1:
    ## 2011
    # Read In and Format 2011 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_11_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_locations = pd.read_sql("SELECT * FROM tblPointER", conn)
    GOADS_emissions = pd.read_sql("SELECT * FROM tblPointEM", conn)
    conn.close()

    # Format Location Data
    GOADS_locations = GOADS_locations[["strStateFacilityIdentifier","strEmissionReleasePointID","dblXCoordinate",\
                                   "dblYCoordinate"]]
    #Create platform-by-platform file
    GOADS_locations_Unique = pd.DataFrame({'strStateFacilityIdentifier':GOADS_locations['strStateFacilityIdentifier'].unique()})
    GOADS_locations_Unique['lon'] = 0.0
    GOADS_locations_Unique['lat'] = 0.0
    GOADS_locations_Unique['strEmissionReleasePointID'] = ''

    for iplatform in np.arange(len(GOADS_locations_Unique)):
        match_platform = np.where(GOADS_locations['strStateFacilityIdentifier'] == GOADS_locations_Unique['strStateFacilityIdentifier'][iplatform])[0][0]
        GOADS_locations_Unique.loc[iplatform,'lon',] = GOADS_locations['dblXCoordinate'][match_platform]
        GOADS_locations_Unique.loc[iplatform,'lat',] = GOADS_locations['dblYCoordinate'][match_platform]
        GOADS_locations_Unique.loc[iplatform,'strEmissionReleasePointID'] = GOADS_locations['strEmissionReleasePointID'][match_platform][:3]

    GOADS_locations_Unique.reset_index(inplace=True, drop=True)

    #Format Emissions Data (clean lease data string)
    GOADS_emissions = GOADS_emissions[["strStateFacilityIdentifier","strPollutantCode","dblEmissionNumericValue","BOEM-MONTH",
                                  "BOEM-LEASE_NUM","BOEM-COMPLEX_ID"]]
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('OCS','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('-','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace(' ','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G1477','G01477')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G73','00073')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G605','00605')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G72','00072')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G599','00599')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G7155','G07155')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G2357','G02357')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G4921','G04921')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['dblEmissionNumericValue'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['strPollutantCode'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #display(GOADS_emissions)

    # Use ERG Preprocessed data to determine if major or minor and oil or gas
    ERG_complex_crosswalk = pd.read_excel(ERG_GOADSEmissions_inputfile, sheet_name = "Complex Emissions by Source", usecols = "AJ:AM", nrows = 11143)
    #display(ERG_complex_crosswalk)

    # add data to map array, for the closest year to 2011
    year_diff = [abs(x - 2011) for x in year_range]
    iyear = year_diff.index(min(year_diff))

    #assign oil vs gas by lease/complex ID
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['BOEM-COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2011))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions['BOEM-COMPLEX_ID'][istruc])

        # for all oil platforms, match the platform to the emissions
        if GOADS_emissions['LEASE_TYPE'][istruc] =='Oil':
            match_platform = np.where(GOADS_locations_Unique.strStateFacilityIdentifier==GOADS_emissions['strStateFacilityIdentifier'][istruc])[0][0]
            ilat = int((GOADS_locations_Unique['lat'][match_platform] - Lat_low)/Res01)
            ilon = int((GOADS_locations_Unique['lon'][match_platform] - Lon_left)/Res01)
            imonth = GOADS_emissions['BOEM-MONTH'][istruc]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            
            
    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Oil']
    num_majcplx = majcplx['BOEM-COMPLEX_ID'].unique()
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Oil']
    num_mincplx = mincplx['BOEM-COMPLEX_ID'].unique()           
    del GOADS_emissions
    print('Number of Major Oil Complexes: ',(np.size(num_majcplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Oil Complexes: ',(np.size(num_mincplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))

#### 2.2.3. 2014 Data

In [None]:
## 2014

#Only run if need to save new file (takes a few hours to run)
if ReCalc_GOADS ==1:
    #Read In and Format 2014 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_14_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_emissions = pd.read_sql("SELECT * FROM 2014_Gulfwide_Platform_20161102", conn)
    conn.close()

    GOADS_emissions = GOADS_emissions[["PLATFORM_ID","X_COORDINATE","Y_COORDINATE","POLLUTANT_CODE","EMISSIONS_VALUE","MONTH",\
                                  "LEASE_NUMBER","COMPLEX_ID"]]
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('OCS','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('-','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace(' ','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1477','G01477')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G73','00073')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G605','00605')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G72','00072')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G599','00599')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G7155','G07155')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G2357','G02357')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G4921','G04921')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2839','G02839')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO5761','G05761')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO0026','00026')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO3194','G03194')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1034','G01034')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0456','G00456')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0060','G00060')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['EMISSIONS_VALUE'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['POLLUTANT_CODE'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #assign oil vs gas by lease/complex ID
    # add data to map array, for the closest year to 2014
    year_diff = [abs(x - 2014) for x in year_range]
    iyear = year_diff.index(min(year_diff))
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2014))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions['COMPLEX_ID'][istruc])

        if GOADS_emissions['LEASE_TYPE'][istruc] =='Oil':
            #then for all oil platforms, match the platform to the emissions
            ilat = int((GOADS_emissions['Y_COORDINATE'][istruc] - Lat_low)/Res01)
            ilon = int((GOADS_emissions['X_COORDINATE'][istruc] - Lon_left)/Res01)
            month_str = GOADS_emissions['MONTH'][istruc]             
            imonth = month_dict[GOADS_emissions['MONTH'][istruc]]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]

    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Oil']
    num_majcplx = majcplx['COMPLEX_ID'].unique()
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Oil']
    num_mincplx = mincplx['COMPLEX_ID'].unique()         
    del GOADS_emissions
    print('Number of Major Oil Complexes: ',(np.size(num_majcplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Oil Complexes: ',(np.size(num_mincplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))

#### 2.2.4. 2017 Data

In [None]:
## 2017
#Only run if need to save new file (takes a few hours to run)
if ReCalc_GOADS ==1:
#Read In and Format 2017 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_17_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_emissions = pd.read_sql("SELECT * FROM 2017_Gulfwide_Platform_20190705_CAP_GHG", conn)
    conn.close()

    GOADS_emissions = GOADS_emissions[["PLATFORM_ID","X_COORDINATE","Y_COORDINATE","POLLUTANT_CODE","EMISSIONS_VALUE","Month",\
                                   "LEASE_NUMBER","COMPLEX_ID"]]
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('OCS','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('-','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace(' ','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1477','G01477')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G73','00073')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G605','00605')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G72','00072')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G599','00599')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G7155','G07155')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G2357','G02357')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G4921','G04921')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2839','G02839')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2893','G02893')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO5761','G05761')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO0026','00026')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO3194','G03194')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1034','G01034')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0456','G00456')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0060','G00060')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['EMISSIONS_VALUE'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['POLLUTANT_CODE'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #assign oil vs gas by lease/complex ID
    # add data to map array, for the closest year to 2014
    year_diff = [abs(x - 2017) for x in year_range]
    iyear = year_diff.index(min(year_diff))
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2017))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions["COMPLEX_ID"][istruc])

        if GOADS_emissions['LEASE_TYPE'][istruc] =='Oil':
            #then for all oil platforms, match the platform to the emissions
            ilat = int((GOADS_emissions['Y_COORDINATE'][istruc] - Lat_low)/Res01)
            ilon = int((GOADS_emissions['X_COORDINATE'][istruc] - Lon_left)/Res01)
            imonth = month_dict[GOADS_emissions['Month'][istruc]]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]

    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Oil']
    num_majcplx = majcplx["COMPLEX_ID"].unique()
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Oil']
    num_mincplx = mincplx["COMPLEX_ID"].unique()          
    del GOADS_emissions
    print('Number of Major Oil Complexes: ',(np.size(num_majcplx)))
    print('Emissions: ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Oil Complexes: ',(np.size(num_mincplx)))
    print('Emissions: ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))

#### 2.2.5. Interpolate GOADS Data

In [None]:
#2011 data applied to 2012
# 2014 data applied to 2013-2015
# 2017 data applied 2016 forward

if ReCalc_GOADS ==1:
    Map_GOADSmajor_emissions[:,:,1,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,2,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,3,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,4,:] = Map_GOADSmajor_emissions[:,:,5,:]
    Map_GOADSmajor_emissions[:,:,6,:] = Map_GOADSmajor_emissions[:,:,5,:]
    
    Map_GOADSminor_emissions[:,:,1,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,2,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,3,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,4,:] = Map_GOADSminor_emissions[:,:,5,:]
    Map_GOADSminor_emissions[:,:,6,:] = Map_GOADSminor_emissions[:,:,5,:]
    
    np.save('./IntermediateOutputs/GOADSmajor_oil_tempoutput', Map_GOADSmajor_emissions)
    np.save('./IntermediateOutputs/GOADSminor_oil_tempoutput', Map_GOADSminor_emissions)
else:
    Map_GOADSmajor_emissions = np.load('./IntermediateOutputs/GOADSmajor_oil_tempoutput.npy')
    Map_GOADSminor_emissions = np.load('./IntermediateOutputs/GOADSminor_oil_tempoutput.npy')

### Step 2.3 Read In BSEE Data (for Federal Offshore Pacfic platforms)

In [None]:
#0) Intialize empty array
Map_BSEEOffshore = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])


#1) Read in platform location information
#header information from: https://www.data.bsee.gov/Main/HtmlPage.aspx?page=pacplatformLocations
plat_loc = pd.read_csv(BSEE_platformloc_inputdata, sep=",", header=None)
plat_loc.columns = ["DISTRICT", "COMPLEX_ID", "STRUCT_NUMBER", "AREA_CODE","BLOCK","STRUCT_NAME","NS_DIST","NS_CODE",\
"EW_DIST","EW_CODE","X","Y","LON","LAT"]
#display(plat_loc)

#2) Read in masterplatform information
#header information from: https://www.data.bsee.gov/Main/HtmlPage.aspx?page=pacplatformMasters
plat_master = pd.read_csv(BSEE_platformmaster_inputdata, sep=",", header=None)
plat_master.columns = ["COMPLEX_ID", "ABANDON_FLAG", "ALOC_FLAG", "ATTEND_FLAG","COND_PROD_FLAG","SHORE_DIST",\
                       "DRILL_FLAG","FIRED_VESSEL_FLAG","GAS_PROD_FLAG","GAS_FLARE_FLAG","MMS_NUM","MANNED_FLAG",\
                       "MAJOR_COMPLEX_FLAG","LEASE_NUMBER","LAST_REV_DATE","LAST_MTR_FLAG","INJEC_CODE","HELIPORT_FLAG",\
                      "WORKOVER_FLAG","WATER_PROD_FLAG","DEPTH","TANK_GUAGE_FLAG","SUL_PROD_FLAG","SUBDIST_CODE","STORE_TANK_FLAG",\
                      "RIG_COUNT","QTR_CODE","PROD_EQMT_FLAG","PROD_FLAG","POWER_SOURCE","POWER_GEN_FLAG","OIL_PROD_FLAG","GAS_SALE",\
                      "FIELD_NAME","DISTRICT","CRANE_CODE","COMP_FLAG","COMGL_PROD_FLAG","BED_COUNT","AREA_CODE","BLOCK","METER_PROVER"]

#3) Add lease number to location array
plat_loc['LEASE'] = ''
for iplatform in np.arange(0,len(plat_loc)):
    imatch = np.where(plat_master['COMPLEX_ID'] == plat_loc['COMPLEX_ID'][iplatform])[0][0]
    plat_loc.loc[iplatform,'LEASE'] = plat_master['LEASE_NUMBER'][imatch]

#4) Find the production data for each platform
#header information from: 
for iyear in np.arange(0,num_years):
    bsee_temp = plat_loc.copy()
    for imonth in np.arange(0,num_months):
        bsee_temp['OIL'+month_tag[imonth]] = 0
        bsee_temp['GAS'+month_tag[imonth]] = 0
    
    OGOR_PAC = pd.read_csv(vars()['BSEE_prod_'+year_range_str[iyear]+'_inputdata'], sep=",", header=None)
    OGOR_PAC.columns = ["LEASE", "COMP_NAME", "PROD_DATE", "PROD_DAYS","PROD_CODE","MONTH_OILVOL",\
                       "MONTH_GASVOL","MONTH_WATERVOL","WELL_API","WELL_STATUS","AREA_BLOCK","OPER_NUM",\
                       "OPER_NAME","FIELD","INJ_VOL","PROD_INTV","FIRST_PROD_DATE","UNIT_NUM"]
    for iplatform in np.arange(0,len(plat_loc)):
        imatch = np.where(OGOR_PAC['LEASE'] == plat_loc['LEASE'][iplatform])[0]
        temp = OGOR_PAC.iloc[imatch,:]
        temp.reset_index(drop=True,inplace=True)
        for idx in np.arange(0,len(temp)):
            if temp.loc[idx,'PROD_DAYS'] >0 :
                month_str = str(temp.loc[idx,'PROD_DATE'])[4:6]
                month_loc = 'OIL'+month_str
                bsee_temp.loc[iplatform,month_loc] += temp.loc[idx,'MONTH_OILVOL']
                month_loc = 'GAS'+month_str
                bsee_temp.loc[iplatform,month_loc] += temp.loc[idx,'MONTH_GASVOL']
    bsee_temp['CUM_GAS'] = bsee_temp.loc[:,bsee_temp.columns.str.contains('GAS')].sum(1)
    bsee_temp['CUM_OIL'] = bsee_temp.loc[:,bsee_temp.columns.str.contains('OIL')].sum(1)     
    vars()['BSEE_platform_prod_'+year_range_str[iyear]] = bsee_temp.copy()


#4) Correct the production data for platforms with the same lease number 
# (evenly divide production between all platforms on a given lease)
for iyear in np.arange(0,num_years):
    bsee_temp = vars()['BSEE_platform_prod_'+year_range_str[iyear]].copy()
    unique_lease = np.unique(bsee_temp['LEASE'])
    for ilease in np.arange(0,len(unique_lease)):
        imatch = np.where(bsee_temp['LEASE']== unique_lease[ilease])[0]
        temp = bsee_temp[bsee_temp['LEASE']== unique_lease[ilease]]
        num_lease = np.shape(temp)[0]
        bsee_temp.loc[imatch,bsee_temp.columns.str.contains('OIL')] /= num_lease
        bsee_temp.loc[imatch,bsee_temp.columns.str.contains('GAS')] /= num_lease
    vars()['BSEE_platform_prod_'+year_range_str[iyear]] = bsee_temp.copy()    

print('Annual Total Pacific Offshore Oil Prod:')
#5) Make Map of Offshore Pacific Oil production for oil platforms (Annual GOR < 100)
for iyear in np.arange(0,num_years):
    bsee_temp = vars()['BSEE_platform_prod_'+year_range_str[iyear]].copy()
    for iplatform in np.arange(0,len(bsee_temp)):
        if bsee_temp['LON'][iplatform] > Lon_left and bsee_temp['LON'][iplatform] < Lon_right \
            and bsee_temp['LAT'][iplatform] > Lat_low and bsee_temp['LAT'][iplatform] < Lat_up:
            #find index of lon and lat
            ilat = int((bsee_temp['LAT'][iplatform] - Lat_low)/Res01)
            ilon = int((bsee_temp['LON'][iplatform] - Lon_left)/Res01)
            if ((data_fn.safe_div(bsee_temp['CUM_GAS'][iplatform],float(bsee_temp['CUM_OIL'][iplatform]))) <= 100):
                # if oil well, 
                for imonth in np.arange(0,num_months):
                #count production in map only for months where there is oil production (emissions ~ when production is occuring)
                    prod_str = 'OIL'+month_tag[imonth]  
                    Map_BSEEOffshore[ilat,ilon,iyear,imonth] += bsee_temp[prod_str][iplatform] # production from oil complexes only

    print('Year ' +year_range_str[iyear]+': ', np.sum(Map_BSEEOffshore[:,:,iyear,:]))

### Step 2.4 Well and Production Data (from Enverus)

#### Step 2.4.1 Read In Raw Data

In [None]:
#Read In and Format the Prism and DI data 
# 1. Read Data
# 2. Drop unsed columns, rename columns to match between DI and Prism
# 3. Combine DI and Prism into one data array
# 4. Calculate annual cummulate production totals
# 5. Save the data as a year-specific variable

#Based on ERGs logic, active wells are determined based on their production levels and not producing status

for iyear in np.arange(0,num_years):
    
    #DI data
    DI_data = pd.read_csv(vars()['Enverus_DI_inputdata_' +year_range_str[iyear]])
    DI_data = DI_data.drop(columns=['ENTITY_ID','API_UWI','OPERATOR_COMPANY_NAME','AAPG_FULL_ERG',\
                           'FIELD','RESERVOIR','LAST_PROD_DATE','DRILL_TYPE','CUM_GAS','CUM_OIL','CUM_WATER'])
    DI_data.rename({'WELL_COUNT_ID':'WELL_COUNT','DI_BASIN':'BASIN','NEMS_REGION_ERG':'NEMS_REGION',\
                    'SURFACE_LATITUDE_WGS84':'LATITUDE','SURFACE_LONGITUDE_WGS84':'LONGITUDE','MONTHLY_WATER_01':'WATERPROD_01',\
                   'MONTHLY_WATER_02':'WATERPROD_02','MONTHLY_WATER_03':'WATERPROD_03','MONTHLY_WATER_04':'WATERPROD_04',\
                   'MONTHLY_WATER_05':'WATERPROD_05','MONTHLY_WATER_06':'WATERPROD_06','MONTHLY_WATER_07':'WATERPROD_07',\
                   'MONTHLY_WATER_08':'WATERPROD_08','MONTHLY_WATER_09':'WATERPROD_09','MONTHLY_WATER_10':'WATERPROD_10',\
                   'MONTHLY_WATER_11':'WATERPROD_11','MONTHLY_WATER_12':'WATERPROD_12','MONTHLY_OIL_01':'OILPROD_01',\
                   'MONTHLY_OIL_02':'OILPROD_02','MONTHLY_OIL_03':'OILPROD_03','MONTHLY_OIL_04':'OILPROD_04',\
                   'MONTHLY_OIL_05':'OILPROD_05','MONTHLY_OIL_06':'OILPROD_06','MONTHLY_OIL_07':'OILPROD_07',\
                   'MONTHLY_OIL_08':'OILPROD_08','MONTHLY_OIL_09':'OILPROD_09','MONTHLY_OIL_10':'OILPROD_10',\
                   'MONTHLY_OIL_11':'OILPROD_11','MONTHLY_OIL_12':'OILPROD_12','MONTHLY_GAS_01':'GASPROD_01',\
                   'MONTHLY_GAS_02':'GASPROD_02','MONTHLY_GAS_03':'GASPROD_03','MONTHLY_GAS_04':'GASPROD_04',\
                   'MONTHLY_GAS_05':'GASPROD_05','MONTHLY_GAS_06':'GASPROD_06','MONTHLY_GAS_07':'GASPROD_07',\
                   'MONTHLY_GAS_08':'GASPROD_08','MONTHLY_GAS_09':'GASPROD_09','MONTHLY_GAS_10':'GASPROD_10',\
                   'MONTHLY_GAS_11':'GASPROD_11','MONTHLY_GAS_12':'GASPROD_12'},axis=1, inplace=True)
    
    DI_data['WELL_COUNT'] = 1

    #Prism Data
    Prism_data = pd.read_csv(vars()['Enverus_Prism_inputdata_'+year_range_str[iyear]])
    Prism_data = Prism_data.drop(columns=['WELLID','API_UWI','RSOPERATOR','TRAJECTORY','FIELD','RSREGION','FORMATION',\
                                         'TOTALFLUIDPUMPED_BBL','SPUDDATE'])
    Prism_data.rename({'RSBASIN':'BASIN','COMPLETIONDATE':'COMPLETION_DATE','SPUDDATE':'SPUD_DATE','FIRSTPRODDATE':'FIRST_PROD_DATE',\
                     'OILGRAVITY_API':'OIL_GRAVITY','WATERPROD_BBL_01':'WATERPROD_01',\
                    'WATERPROD_BBL_02':'WATERPROD_02','WATERPROD_BBL_03':'WATERPROD_03','WATERPROD_BBL_04':'WATERPROD_04',\
                   'WATERPROD_BBL_05':'WATERPROD_05','WATERPROD_BBL_06':'WATERPROD_06','WATERPROD_BBL_07':'WATERPROD_07',\
                   'WATERPROD_BBL_08':'WATERPROD_08','WATERPROD_BBL_09':'WATERPROD_09','WATERPROD_BBL_10':'WATERPROD_10',\
                   'WATERPROD_BBL_11':'WATERPROD_11','WATERPROD_BBL_12':'WATERPROD_12','LIQUIDSPROD_BBL_01':'OILPROD_01',\
                   'LIQUIDSPROD_BBL_02':'OILPROD_02','LIQUIDSPROD_BBL_03':'OILPROD_03','LIQUIDSPROD_BBL_04':'OILPROD_04',\
                   'LIQUIDSPROD_BBL_05':'OILPROD_05','LIQUIDSPROD_BBL_06':'OILPROD_06','LIQUIDSPROD_BBL_07':'OILPROD_07',\
                   'LIQUIDSPROD_BBL_08':'OILPROD_08','LIQUIDSPROD_BBL_09':'OILPROD_09','LIQUIDSPROD_BBL_10':'OILPROD_10',\
                   'LIQUIDSPROD_BBL_11':'OILPROD_11','LIQUIDSPROD_BBL_12':'OILPROD_12','GASPROD_MCF_01':'GASPROD_01',\
                   'GASPROD_MCF_02':'GASPROD_02','GASPROD_MCF_03':'GASPROD_03','GASPROD_MCF_04':'GASPROD_04',\
                   'GASPROD_MCF_05':'GASPROD_05','GASPROD_MCF_06':'GASPROD_06','GASPROD_MCF_07':'GASPROD_07',\
                   'GASPROD_MCF_08':'GASPROD_08','GASPROD_MCF_09':'GASPROD_09','GASPROD_MCF_10':'GASPROD_10',\
                   'GASPROD_MCF_11':'GASPROD_11','GASPROD_MCF_12':'GASPROD_12','RSWELLSTATUS':'PRODUCING_STATUS'},axis=1,inplace=True)
    #
    Prism_data['WELL_COUNT'] = 1
    
    #combine into one array with common column names, replace nans with zeros, and sum annual production
    Enverus_data = pd.concat([DI_data,Prism_data], ignore_index=True)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')].fillna(0)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')].fillna(0)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')].fillna(0)

    #Calculate cummulative annual production totals for Gas, Oil, Water
    Enverus_data['CUM_GAS'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')].sum(1)
    Enverus_data['CUM_OIL'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')].sum(1)
    Enverus_data['CUM_WATER'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')].sum(1)
    
    Enverus_data['NEMS_CODE'] = 0;#Enverus_data['NEMS_REGION'].map(NEMS_dict) Don't need to correct, not used in Petr analysis
    
    #save out the data for that year
    vars()['Enverus_data_'+year_range_str[iyear]] = Enverus_data.copy()
    print('Load Complete: Year '+year_range_str[iyear])
    
    del DI_data #save memory space 
    
    #define default values for a new row in this table (to be used later during data corrections)
    default = {'WELL_COUNT': 0, 'STATE':'','COUNTY':'','BASIN':'','AAPG_CODE_ERG':'UNK','NEMS_REGION':'UNK','NEMS_CODE':99,\
               'LATITUDE':0,'LONGITUDE':0,'PRODUCING_STATUS':'','RESERVOIR_TYPE':'','COMPLETION_DATE':'','SPUD_DATE':'',\
               'FIRST_PROD_DATE':'','HF':'', 'OFFSHORE':'','OIL_GRAVITY':'','GOR':-99,'GOR_QUAL':'','PROD_FLAG':'',\
               'OILPROD_01':0, 'GASPROD_01':0, 'WATERPROD_01':0,'OILPROD_02':0, 'GASPROD_02':0, 'WATERPROD_02':0,\
          'OILPROD_03':0, 'GASPROD_03':0, 'WATERPROD_03':0,'OILPROD_04':0, 'GASPROD_04':0, 'WATERPROD_04':0,\
          'OILPROD_05':0, 'GASPROD_05':0, 'WATERPROD_05':0,'OILPROD_06':0, 'GASPROD_06':0, 'WATERPROD_06':0,\
          'OILPROD_07':0, 'GASPROD_07':0, 'WATERPROD_07':0,'OILPROD_08':0, 'GASPROD_08':0, 'WATERPROD_08':0,\
          'OILPROD_09':0, 'GASPROD_09':0, 'WATERPROD_09':0,'OILPROD_10':0, 'GASPROD_10':0, 'WATERPROD_10':0,\
          'OILPROD_11':0, 'GASPROD_11':0, 'WATERPROD_11':0,'OILPROD_12':0, 'GASPROD_12':0, 'WATERPROD_12':0}
    
display(Enverus_data)

#### Step 2.4.2 Correct Enverus Data for Select States (following ERG procedure)

In [None]:
# 1) Read In Coverage Table from State Well Counts File from ERG
# (specifies the first year with bad data and which years need to be corrected; 
# all years including and after the first bad year of data need to be corrected)

ERG_StateWellCounts_FirstBadDataYear = pd.read_excel(Enverus_WellCounts_inputfile, sheet_name = "2021 - Coverage", usecols = "A:B", skiprows = 2, nrows = 40)
ERG_StateWellCounts_FirstBadDataYear['date'] = pd.to_datetime(ERG_StateWellCounts_FirstBadDataYear['Date to USE'], errors = 'coerce')
ERG_StateWellCounts_FirstBadDataYear['year'] = pd.DatetimeIndex(ERG_StateWellCounts_FirstBadDataYear['date']).year.fillna(end_year+100).astype(int)

# 2) Loops through the each state and year in Enverus to determine if the data for that particualar year needs to 
# be corrected. At the moment, the only corrections ERG makes to the data is to use the prior year of data if there
# is no new Enverus data reportd for that state. If a particular state is not included for any years in the Enverus
# dataset, then a row of zeros is added to the Enverus table for that year. 

for istate in np.arange(0,len(State_ANSI)):
    correctdata =0
    state_str = State_ANSI['abbr'][istate]
    firstbadyear = ERG_StateWellCounts_FirstBadDataYear['year'][ERG_StateWellCounts_FirstBadDataYear['State'] == state_str].values
    if firstbadyear.size  == 0:
        firstbadyear = end_year+5 #if state isn't included in correction list, don't correct any data
    
    for iyear in np.arange(0,num_years):
        enverus_data_temp= vars()['Enverus_data_'+year_range_str[iyear]].copy()
        state_list = np.unique(enverus_data_temp['STATE'])
        if state_str in state_list:
            inlist =1
        else:
            inlist = 0
        if inlist ==1 or correctdata==1: #if the state is included in Enverus data, or had data for at least one good year
            #if first year, correctdata will be zero, but inlist will also be zero if no Enverus data
            #check to see whether corrections are necessary for the given year/state
            if year_range[iyear] == (firstbadyear-1):
                print(state_str,year_range[iyear],'last good year')
                # This is the last year of good data. Do not correct the data but save
                # but so that this data can be used for all following years for that state
                temp_data = enverus_data_temp[enverus_data_temp['STATE'] == state_str]
                lastgoodyear = year_range_str[iyear]
                correctdata=1
            elif year_range[iyear] >= firstbadyear: 
                #correct data for all years equal to and after the first bad year (remove old data first if necessary)
                if inlist == 1:
                    enverus_data_temp = enverus_data_temp[enverus_data_temp['STATE'] != state_str]
                enverus_data_temp = pd.concat([enverus_data_temp,temp_data],ignore_index=True)
                print(state_str +' data for ' +year_range_str[iyear] +' were corrected with '+lastgoodyear+' data')
            else:
                no_corrections =1
                
        if inlist==0 and correctdata==0:
        #if there is no Enverus data for a given state, and there was no good data, add a row with default values
            temp_row = {'STATE':state_str}
            enverus_data_temp = enverus_data_temp.append({**default,**temp_row}, ignore_index=True)
            print(state_str +' has no Enverus data in the year ' +year_range_str[iyear]+', default values set')
            
        #resave that year of Enverus data
        enverus_data_temp.reset_index(drop=True,inplace=True)
        vars()['Enverus_data_'+year_range_str[iyear]] = enverus_data_temp.copy()

### Step 2.5 Convert Enverus Well and Production Arrays into Gridded Location Arrays

In [None]:
# clear variables
del ERG_StateWellCounts_FirstBadDataYear
del Prism_data
del colnames
del names
del temp_data

In [None]:
# Make Annual gridded arrays (maps) of well data (a well will be counted every month if there is any production that year)
# Includes Oil Wells and Production onshore in the CONUS region
# source emissions are related to the presence of a well and its production status (no emission if no production)

#Define well location/production arrays 
Map_EnvAllwell = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvOilProd = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasin220 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months]) 
Map_EnvBasin360 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasin395 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasin430 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasinOther = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvHFOilWell = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvConvOilWell = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvHFComp = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvConvComp = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvOilWellDrilled = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvStateGOMOffshore = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvStatePacOffshore = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
#nongrid
Map_EnvAllwell_nongrid = np.zeros([num_years, num_months])
Map_EnvOilProd_nongrid = np.zeros([num_years, num_months])
Map_EnvBasin220_nongrid = np.zeros([num_years, num_months]) 
Map_EnvBasin360_nongrid = np.zeros([num_years, num_months])
Map_EnvBasin395_nongrid = np.zeros([num_years, num_months])
Map_EnvBasin430_nongrid = np.zeros([num_years, num_months])
Map_EnvBasinOther_nongrid = np.zeros([num_years, num_months])
Map_EnvHFOilWell_nongrid = np.zeros([num_years, num_months])
Map_EnvConvOilWell_nongrid = np.zeros([num_years, num_months])
Map_EnvHFComp_nongrid = np.zeros([num_years, num_months])
Map_EnvConvComp_nongrid = np.zeros([num_years, num_months])
Map_EnvOilWellDrilled_nongrid = np.zeros([num_years, num_months])
Map_EnvStateGOMOffshore_nongrid = np.zeros([num_years, num_months])
Map_EnvStatePacOffshore_nongrid = np.zeros([num_years, num_months])

if ReCalc_Enverus ==1:
    for iyear in np.arange(0,num_years):
        enverus_data_temp = vars()['Enverus_data_'+year_range_str[iyear]].copy()
        nocompdate = 0 #record the number of wells that don't have reported completion dates (but have production in that given year)
        nodrill = 0 #record the number of wells that don't have drilling information
        nooffshore = 0
        
        #loop through each row (e.g., well) in the Enverus dataset (for both onnshore and offshore gas wells wells)
        # This will not include wells that have zero gas production in a given year, but is consistant with the GHGI approach.
        list_onshore_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'OFFSHORE'] == 'N'].tolist()
        list_offshore_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'OFFSHORE'] == 'Y'].tolist()
        list_oil_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'CUM_OIL'] > 0].tolist()
        #find onshore oil wells based on common list elements...
        list1_as_set = set(list_onshore_wells)
        intersection = list1_as_set.intersection(list_oil_wells)
        list_onshore_oil_wells = list(intersection)
        #find offshore oil wells based on common list elements...
        list1_as_set = set(list_offshore_wells)
        intersection = list1_as_set.intersection(list_oil_wells)
        list_offshore_oil_wells = list(intersection)
    
        # for onshore oil wells... 
        for iwell in list_onshore_oil_wells:
            #Check if location is within CONUS
            if enverus_data_temp['LONGITUDE'][iwell] > Lon_left and enverus_data_temp['LONGITUDE'][iwell] < Lon_right \
                and enverus_data_temp['LATITUDE'][iwell] > Lat_low and enverus_data_temp['LATITUDE'][iwell] < Lat_up:
                #find index of lon and lat, and NEMS region
                ilat = int((enverus_data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
                ilon = int((enverus_data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
            
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) <= 100 and \
                    ((enverus_data_temp['GOR_QUAL'][iwell] =='Liq only') or (enverus_data_temp['GOR_QUAL'][iwell] =='Liq+Gas'))):
                    # if oil well, 
                    for imonth in np.arange(0,num_months):
                    #count wells in map only for months where there is oil production (emissions ~ when production is occuring)
                        prod_str = 'OILPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0:
                            Map_EnvAllwell[ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes oil wells only
                            Map_EnvOilProd[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell] # production from non-assoc. gas wells only
                        
                            #save basin-specific production levels for onshore non-associated gas wells
                            if enverus_data_temp['AAPG_CODE_ERG'][iwell] =='220':
                                Map_EnvBasin220[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='360':
                                Map_EnvBasin360[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='395':
                                Map_EnvBasin395[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='430':
                                Map_EnvBasin430[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            else: 
                                Map_EnvBasinOther[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                            if enverus_data_temp['HF'][iwell] == 'Y':
                            #is it an HF well or not?
                                Map_EnvHFOilWell[ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #oil HF wells
                            else:     
                                Map_EnvConvOilWell[ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #oil conventional wells
                                            
                    if isinstance(enverus_data_temp['COMPLETION_DATE'][iwell],float):
                    #if oil well (onshore), regardless of whether the well this month is producing, 
                    # determine whether the given well was completed this year, and if so, assign it to the correct month,
                    # if not completed in the current year, then don't add completion year (assume it was captured already in previous year loop)
                    # if completion date is NaN, do not record anywhere (may undercount). Will also undercount if well completed in
                    # one year but does not start producing until the next. 
                        if np.isnan(enverus_data_temp['COMPLETION_DATE'][iwell]):
                            nocompdate = nocompdate +1
                    else:
                        month = enverus_data_temp['COMPLETION_DATE'][iwell][5:7] #extract the month
                        year = enverus_data_temp['COMPLETION_DATE'][iwell][0:4] #extract year
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                            for imonth in np.arange(0, num_months):
                                if month_tag[imonth] == month:
                                    if enverus_data_temp['HF'][iwell] == 'Y':
                                        Map_EnvHFComp[ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes completions from non-associated HF gas wells that were producing in the same year
                                    else:
                                        #print('here, non-HF')
                                        Map_EnvConvComp[ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes completions from non-associated conventional wells that were producing in the same year
                
                    if isinstance(enverus_data_temp['SPUD_DATE'][iwell],float):
                    #if oil well (onshore), regardless of whether the well this month is producing, 
                    # determine whether the given well was drilled this year, and if so, assign it to the correct *YEAR* 
                    # assign based on SPUD Date, unless Null, then check to see if producing in the current year
                    # NOTE: the National inventory looks for first production date in the nexy year to see if drilled this year. 
                    # This logic is too difficult to implement here, so only counted if first_prod_date is in current year
                        if np.isnan(enverus_data_temp['SPUD_DATE'][iwell]):
                            if isinstance(enverus_data_temp['FIRST_PROD_DATE'][iwell],float):
                                if np.isnan(enverus_data_temp['FIRST_PROD_DATE'][iwell]):
                                    nodrill += 1
                            else:
                                year = enverus_data_temp['FIRST_PROD_DATE'][iwell][0:4] #extract year
                                if year_range_str[iyear] == year:
                                    Map_EnvOilWellDrilled[ilat,ilon,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                    else:
                        year = enverus_data_temp['SPUD_DATE'][iwell][0:4] #extract year
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                            Map_EnvOilWellDrilled[ilat,ilon,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                
            #if not in CONUS grid, still count those wells in non-grid arrays (does not include offshore, dealt with next)
            # same logic sequence as above
            else:
                #inems = enverus_data_temp['NEMS_CODE'][iwell].astype(int) 
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) <= 100 and \
                    ((enverus_data_temp['GOR_QUAL'][iwell] =='Liq only') or (enverus_data_temp['GOR_QUAL'][iwell] =='Liq+Gas'))):
                    for imonth in np.arange(0,num_months):
                    #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                        prod_str = 'OILPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0:
                        #check if an oil well
                            Map_EnvAllwell_nongrid[iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes oil wells only
                            Map_EnvOilProd_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                            #save basin-specific production levels for onshore non-associated gas wells
                            if enverus_data_temp['AAPG_CODE_ERG'][iwell] =='220':
                                Map_EnvBasin220_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='360':
                                Map_EnvBasin360_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='395':
                                Map_EnvBasin395_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='430':
                                Map_EnvBasin430_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            else: 
                                Map_EnvBasinOther_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                                
                            if enverus_data_temp['HF'][iwell] == 'Y':
                                Map_EnvHFOilWell_nongrid[iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                            else:     
                                Map_EnvConvOilWell_nongrid[iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                                         
                    if isinstance(enverus_data_temp['COMPLETION_DATE'][iwell],float): 
                        if np.isnan(enverus_data_temp['COMPLETION_DATE'][iwell]):
                            nocompdate = nocompdate +1
                    else:
                        month = enverus_data_temp['COMPLETION_DATE'][iwell][5:7] #extract the month
                        year = enverus_data_temp['COMPLETION_DATE'][iwell][0:4]
                        if year_range_str[iyear] == year:
                            for imonth in np.arange(0, num_months):
                                if month_tag[imonth] == month:
                                    if enverus_data_temp['HF'][iwell] == 'Y':
                                        Map_EnvHFComp_nongrid[iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                                    else:
                                        Map_EnvConvComp_nongrid[iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
            
                    if isinstance(enverus_data_temp['SPUD_DATE'][iwell],float):
                        if np.isnan(enverus_data_temp['SPUD_DATE'][iwell]):
                            if isinstance(enverus_data_temp['FIRST_PROD_DATE'][iwell],float):
                                if np.isnan(enverus_data_temp['FIRST_PROD_DATE'][iwell]):
                                    nodrill += 1
                            else:
                                year = enverus_data_temp['FIRST_PROD_DATE'][iwell][0:4] #extract year
                                if year_range_str[iyear] == year:
                                    Map_EnvOilWellDrilled_nongrid[iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                    else:
                        year = enverus_data_temp['SPUD_DATE'][iwell][0:4] #extract year
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                            Map_EnvOilWellDrilled_nongrid[iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]      
            
                    
        #for offshore gas well locations... 
        # EPA State GOM offshore emissions will be allocated based on Enverus production for
        # offshore emissions in GOM states (AL, LA, TX, etc). 
        # Offshore emissions (in NGOM region) are not included in the ERG well count nor here. 
        # Federal offshore emissions are allocated later based on BOEM GOADS platform emissions
        for iwell in list_offshore_oil_wells:

            #Check if location is on grid
            if enverus_data_temp['LONGITUDE'][iwell] > Lon_left and enverus_data_temp['LONGITUDE'][iwell] < Lon_right \
                and enverus_data_temp['LATITUDE'][iwell] > Lat_low and enverus_data_temp['LATITUDE'][iwell] < Lat_up:
                #Set ilon and ilat
                ilat = int((enverus_data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
                ilon = int((enverus_data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
                
                #figure out how to deal with this ....
                # check if non-associated gas well (offshore)
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) <= 100 and \
                    ((enverus_data_temp['GOR_QUAL'][iwell] =='Liq only') or (enverus_data_temp['GOR_QUAL'][iwell] =='Liq+Gas'))):
                    if enverus_data_temp['STATE'][iwell] in {'AL','FL','LA','MS','TX'}:
                        for imonth in np.arange(0,num_months):
                        #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                            prod_str = 'OILPROD_'+month_tag[imonth]  
                            if enverus_data_temp[prod_str][iwell] >0:
                                Map_EnvStateGOMOffshore[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                    elif enverus_data_temp['STATE'][iwell] in {'CA'}:
                        for imonth in np.arange(0,num_months):
                        #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                            prod_str = 'OILPROD_'+month_tag[imonth]  
                            if enverus_data_temp[prod_str][iwell] >0:
                                Map_EnvStatePacOffshore[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
            else:
                nooffshore +=1
                #print("Error - No offshore outside of the domain")#display(EPA_emi_prod_NG_CH4)           
                

        print('Enverus data not included in this analysis:')
        print('Year: '+year_range_str[iyear])
        print('Wells without drilling information (no Spud or Production data): ',nodrill)
        print('Wells without completion dates: ',nocompdate)
        print('Wells offshore and outside of grid domain: ',nooffshore)

    #save current status of datafiles
    np.savez('./IntermediateOutputs/Oil_EnvAllWell_tempout', x=Map_EnvAllwell, y=Map_EnvAllwell_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvOilProd_tempout', x=Map_EnvOilProd, y=Map_EnvOilProd_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvBasin220_tempout', x=Map_EnvBasin220, y=Map_EnvBasin220_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvBasin360_tempout', x=Map_EnvBasin360, y=Map_EnvBasin360_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvBasin395_tempout', x=Map_EnvBasin395, y=Map_EnvBasin395_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvBasin430_tempout', x=Map_EnvBasin430, y=Map_EnvBasin430_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvBasinOther_tempout', x=Map_EnvBasinOther, y=Map_EnvBasinOther_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvHFOilWell_tempout', x=Map_EnvHFOilWell, y=Map_EnvHFOilWell_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvConvOilWell_tempout', x=Map_EnvConvOilWell, y=Map_EnvConvOilWell_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvHFComp_tempout', x=Map_EnvHFComp, y=Map_EnvHFComp_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvConvComp_tempout', x=Map_EnvConvComp, y=Map_EnvConvComp_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvOilWellDrilled_tempout', x=Map_EnvOilWellDrilled, y=Map_EnvOilWellDrilled_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvStateGOMOffshore_tempout', x=Map_EnvStateGOMOffshore, y=Map_EnvStateGOMOffshore_nongrid)
    np.savez('./IntermediateOutputs/Oil_EnvStatePacOffshore_tempout', x=Map_EnvStatePacOffshore, y=Map_EnvStatePacOffshore_nongrid)

else:
    #load previously saved files
    npzfile = np.load('./IntermediateOutputs/Oil_EnvAllWell_tempout.npz')
    Map_EnvAllwell = npzfile['x']
    Map_EnvAllwell_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvOilProd_tempout.npz')
    Map_EnvOilProd = npzfile['x']
    Map_EnvOilProd_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvBasin220_tempout.npz')
    Map_EnvBasin220 = npzfile['x']
    Map_EnvBasin220_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvBasin360_tempout.npz')
    Map_EnvBasin360 = npzfile['x']
    Map_EnvBasin360_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvBasin430_tempout.npz')
    Map_EnvBasin430 = npzfile['x']
    Map_EnvBasin430_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvBasinOther_tempout.npz')
    Map_EnvBasinOther = npzfile['x']
    Map_EnvBasinOther_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvBasin395_tempout.npz')
    Map_EnvBasin395 = npzfile['x']
    Map_EnvBasin395_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvHFOilWell_tempout.npz')
    Map_EnvHFOilWell = npzfile['x']
    Map_EnvHFOilWell_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvConvOilWell_tempout.npz')
    Map_EnvConvOilWell = npzfile['x']
    Map_EnvConvOilWell_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvHFComp_tempout.npz')
    Map_EnvHFComp = npzfile['x']
    Map_EnvHFComp_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvConvComp_tempout.npz')
    Map_EnvConvComp = npzfile['x']
    Map_EnvConvComp_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvOilWellDrilled_tempout.npz')
    Map_EnvOilWellDrilled = npzfile['x']
    Map_EnvOilWellDrilled_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvStateGOMOffshore_tempout.npz')
    Map_EnvStateGOMOffshore = npzfile['x']
    Map_EnvStateGOMOffshore_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Oil_EnvStatePacOffshore_tempout.npz')
    Map_EnvStatePacOffshore = npzfile['x']
    Map_EnvStatePacOffshore_nongrid = npzfile['y']

### Step 2.6 Oil Gravities [not included]

### Step 2.7 Correct Missing IL/IN Data

In [None]:
# General Process
# 1. Read the GHGI well and production statistics from the GHGI (contain corrected IL and IN data)
# 2. Read in the relevant NEI data (from both file formats) and place onto GEPA grid (including reproj of NEI data)
# 3. Scale the NEI proxy maps to the corresponding state level values from Step 1.
# 4. Calculate the lease condensate proxy for IL/IN using the same method as the Enverus data
# 5. Place the scaled NEI grid data on the appropriate Enverus proxy grids. 

#### Step 2.7.1 Read in GHGI State-Level Well Statistics for IL/IN

##### Step 2.7.1.1 Well Counts

In [None]:
#1. Read in National Well Statistics for IL/IN (from ERG Wells Processing Workbook)
# for scaling the NEI proxies to GHGI totals so that IL and IN are correctly weighted relative to 
# the relative weights in the GHGI (e.g., consistent well counts in IL/In relative to national total)
# There may be absolute differences in the NEI due to different data processing. 
# In otherwords, we want to take the relative spatial information from the NEI, but not the absolute values
# Use the well count data for 2016, 2017, 2018, and 2019 - corrected by ERG 

Env_ILIN_wells = pd.read_excel(Enverus_WellCounts_inputfile, sheet_name = "2020 PR - State", skiprows = 4)
Env_ILIN_wells = Env_ILIN_wells.drop(columns = ['Category','WELLCOUNT_16', 'WELLCOUNT_17','WELLCOUNT_18'])
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_16_ERG')]:'WELLCOUNT_16'}, inplace=True)
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_17_ERG')]:'WELLCOUNT_17'}, inplace=True)
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_18_ERG')]:'WELLCOUNT_18'}, inplace=True)
Env_ILIN_wells = Env_ILIN_wells.fillna(0)
Env_ILIN_wells = Env_ILIN_wells[(Env_ILIN_wells['STATE']=='IL') | (Env_ILIN_wells['STATE']=='IN')]
Env_ILIN_wells.reset_index(inplace=True, drop=True)
Env_ILIN_wells['NEMS'] = 0 #IN and IL are both in the north east region

#2 Calculate Well Counts of Each Well Type for each NEMS region
# ERG Query codes
# 1 = Non-Associated Gas Wells, #2 = Oil Wells
# 3 = Associated Gas Wells (not included in total well counts)
# 4  = Gas Wells (non-associated) with Hydraulic Fracturing
# 5 = Gas Well Completions with Hydraulic Fracturing
# 6 = Oil Wells with Hydraulic Fracturing, 
# 7 = Oil well completions with hydraulic fracturing
# 8 = All Gas Well Completions, 
# 9 All Oil well completions
# 10a = Gas Wells Drilled, #10 b = Oil Wells Drilled
# 10c = Dry Wells Drilled

Well_Allwell_ILIN = np.zeros([2,num_years]) #all oil wells
Well_HFOilWell_ILIN = np.zeros([2,num_years]) #HF oil wells
Well_ConvOilWell_ILIN = np.zeros([2, num_years]) # conventional oil wells (all - HF)
Well_HFComp_ILIN = np.zeros([2, num_years]) #HF oil well completions
Well_ConvComp_ILIN = np.zeros([2, num_years]) # oil conventional well completions (all - HF)
Well_AllComp_ILIN = np.zeros([2, num_years]) #all oil well completions

Well_Gaswell_drilled_ILIN = np.zeros([2, num_years]) # gas wells drilled
Well_Oilwell_drilled_ILIN = np.zeros([2, num_years]) # will end up being corrected total oil wells drilled (inclduign fraction of dry wells)
Well_Drywell_drilled_ILIN = np.zeros([2, num_years]) # dry wells drilled

# 1) Get all well count data for non-HF wells and completions
start_year_idx = Env_ILIN_wells.columns.get_loc('WELLCOUNT_'+str(start_year)[2:4])
end_year_idx = Env_ILIN_wells.columns.get_loc('WELLCOUNT_'+str(end_year)[2:4])+1

for idx in np.arange(0,len(Env_ILIN_wells)):
    if Env_ILIN_wells['STATE'][idx] == 'IL':
        istate =0
    else:
        istate =1

    if Env_ILIN_wells['QUERY_NMBR'][idx] ==2:
        Well_Allwell_ILIN[istate,] = Well_Allwell_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==6:
        Well_HFOilWell_ILIN[istate,] = Well_HFOilWell_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==7:
        Well_HFComp_ILIN[istate,] = Well_HFComp_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==9:   
        Well_AllComp_ILIN[istate,] = Well_AllComp_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10a':   
        Well_Gaswell_drilled_ILIN[istate,] = Well_Gaswell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10c':
        Well_Drywell_drilled_ILIN[istate,] = Well_Drywell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10b':
        Well_Oilwell_drilled_ILIN[istate,] = Well_Oilwell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]

# Calculate Conventional well counts and completions (All gas wells - HF gas wells)
Well_ConvOilWell_ILIN = Well_Allwell_ILIN - Well_HFOilWell_ILIN
Well_ConvComp_ILIN = Well_AllComp_ILIN - Well_HFComp_ILIN

# Calculate total number of wells drilled wells, accounting for fraction of dry wells (= total - corrected NG wells) 
for istate in np.arange(0,2):
    for iyear in np.arange(0,num_years):
        Well_Oilwell_drilled_ILIN[istate,iyear] = Well_Oilwell_drilled_ILIN[istate,iyear] + Well_Drywell_drilled_ILIN[istate,iyear] \
                                        * (data_fn.safe_div(Well_Oilwell_drilled_ILIN[istate,iyear],\
                                        (Well_Gaswell_drilled_ILIN[istate,iyear]+Well_Oilwell_drilled_ILIN[istate,iyear])))

print('IL/IN GHGI total counts')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    #Print final well counts ** ADD IN QA/QC with final wells notebook later **
    print('All oil wells:     ',np.sum(Well_Allwell_ILIN[:,iyear]))
    print('Oil wells Conv:    ',np.sum(Well_ConvOilWell_ILIN[:,iyear]))
    print('Oil wells HF:      ',np.sum(Well_HFOilWell_ILIN[:,iyear]))
    print('All oil well comp: ',np.sum(Well_AllComp_ILIN[:,iyear]))
    print('Oil HF comp:       ',np.sum(Well_HFComp_ILIN[:,iyear]))
    print('Oil Conv comp:     ',np.sum(Well_ConvComp_ILIN[:,iyear]))
    print('Wells Drilled:     ',np.sum(Well_Oilwell_drilled_ILIN[:,iyear]))
    print(' ')

##### Step 2.7.1.2 Well Production

In [None]:
# ERG Processed Well Production Data (from Prism/Enverus)
# Gas produced from wells in each NEMS region, state, and Basin (units of MCF (gas))

# Includes Gas production from NA gas wells (DOES NOT CURRENTLY INCLUDE GAS PRODUCTION FROM OIL WELLS)

# Use the well count data for 2016, 2017, 2018, and 2019 - corrected by ERG 
Env_ILIN_wellsprod = pd.read_excel(Enverus_WellProd_inputfile, sheet_name = "GHG_DATA_AAPG_MAR19", skiprows = 1)

#drop oil production data
match = np.where(Env_ILIN_wellsprod.columns.str.contains('SUMOFGAS'))[0][:]
Env_ILIN_wellsprod = Env_ILIN_wellsprod.drop(Env_ILIN_wellsprod.columns[match], axis=1)

#replace with ERG recalculations
Env_ILIN_wellsprod = Env_ILIN_wellsprod.drop(columns = ['SUMOFLIQ_16', 'SUMOFLIQ_17','SUMOFLIQ_18',])
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFLIQ_16_ERG')]:'SUMOFLIQ_16'}, inplace=True)
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFLIQ_17_ERG')]:'SUMOFLIQ_17'}, inplace=True)
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFLIQ_18_ERG')]:'SUMOFLIQ_18'}, inplace=True)
Env_ILIN_wellsprod = Env_ILIN_wellsprod.fillna(0)
Env_ILIN_wellsprod = Env_ILIN_wellsprod[(Env_ILIN_wellsprod['STATE']=='IL') | (Env_ILIN_wellsprod['STATE']=='IN')]
Env_ILIN_wellsprod.reset_index(inplace=True, drop=True)
#display(Env_ILIN_wellsprod)

#Env_ILIN_wellsprod['NEMS'] = 0 #all data are in northeast region

# Extract the gas production data from non-associated gas wells (QRY = 1) and gas produced from oil wells (QRY = 2)
# and assign to each basin-specific array based on the reported state and AAPG Code as determined by ERG in the workbook
Wellprod_other_ILIN = np.zeros([2, num_years])

start_year_idx = Env_ILIN_wellsprod.columns.get_loc('SUMOFLIQ_'+str(start_year)[2:4])
end_year_idx = Env_ILIN_wellsprod.columns.get_loc('SUMOFLIQ_'+str(end_year)[2:4])+1

for idx in np.arange(0,len(Env_ILIN_wellsprod)):
    if Env_ILIN_wellsprod['STATE'][idx] == 'IL':
        istate =0
    else:
        istate =1
        if Env_ILIN_wellsprod['QUERY_NMBR'][idx] ==2: # production from oil wells
            if Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 220 and Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 395 and Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 430: 
                Wellprod_other_ILIN[istate,] = Wellprod_other_ILIN[istate,] + Env_ILIN_wellsprod.iloc[idx,start_year_idx:end_year_idx]
                
#Print final well counts ** ADD IN QA/QC with final wells notebook later **
print('IL/IN GHGI total production')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Other Basin Production: ',np.sum(Wellprod_other_ILIN[:,iyear]))
    #print(' ')

#### Step 2.7.2 Read In/Format NEI Values

##### Step 2.6.2.1 Read in all data prior to 2018 (text file format)

In [None]:
#1 Read in relevant files by year (for all years before 2018 [2018 read from different file type])
# Data are in a text file format where each row of data contains the surrogate code, FIPS code, column and row location
# (on the NEI CONUS1 grid), and the absolute, fractional, and running sum of data (e.g., counts or production) in the
# given FIPS region. 
# The absolute data are placed onto the GEPA grid by using an NEI reference map shapefile to map the data location
# from the NEI CONUS grid cell indexes to the corresponding latitude and longitude values in the GEPA grid. 
### Note - the 2016 data from the NEI is on a non-standard grid where lat/lons are unknown. Can change later if needed, or
# can interpolat ebetween years if more accurate

NEI_files = ['/USA_695_NOFILL.txt', '/USA_685_NOFILL.txt', '/USA_694_NOFILL.txt', '/USA_681_NOFILL.txt']
data_names = ['map_NEI_oil_wells', 'map_NEI_oil_completions','map_NEI_oil_production','map_NEI_oil_drilledwells']

for ivar in np.arange(0,len(data_names)):
    vars()[data_names[ivar]] = np.zeros([2,len(Lat_01),len(Lon_01),num_years])

# only recalc the data if required (set in Step 0)
if ReCalc_NEI ==1:
    
    #read in the NEI grid refernece shapefile (contains the lat/lons of each NEI coordinate)
    shape = shp.Reader(NEI_grid_ref_inputfile)

    #make the map arrays of aboslute values (counts and mcf)
    for ivar in np.arange(0,len(data_names)):
        for iyear in np.arange(0,num_years):
            if year_range_str[iyear] == '2012':
                year = '2011'
            elif year_range_str[iyear] == '2013' or year_range_str[iyear] == '2014' or year_range_str[iyear] == '2015':
                year = '2014'
            elif year_range_str[iyear] == '2016' or year_range_str[iyear] == '2017':
                year = '2017'
            elif year_range_str[iyear] == '2018':
                continue
            else:
                print('NEI DATA MISSING FOR YEAR ',year_range_str[iyear])
            path = ERG_NEI_inputloc+year+NEI_files[ivar]
            data_temp = pd.read_csv(path, sep='\t', skiprows = 25)
            data_temp = data_temp.drop(["!"], axis=1)
            data_temp.columns = ['Code','FIPS','COL','ROW','Frac','Abs','FIPS_Total','FIPS_Running_Sum']
            data_temp['Lat'] = np.zeros([len(data_temp)])
            data_temp['Lon'] = np.zeros([len(data_temp)])
            colmin = 1332
            colmax=0
            rowmin = 1548
            rowmax=0
            counter =0
        
            #Create the boundary box
            for idx in np.arange(0,len(data_temp)):
                if str(data_temp['FIPS'][idx]).startswith('17') or str(data_temp['FIPS'][idx]).startswith('18'):
                    icol = data_temp['COL'][idx]
                    irow = data_temp['ROW'][idx]
                    if icol > colmax:
                        colmax =icol
                    if icol < colmin:
                        colmin = icol
                    if irow > rowmax:
                        rowmax = irow
                    if irow < rowmin:
                        rowmin  = irow
            
            #Extract the relevant indicies from the NEI reference shapefile
            array_temp = np.zeros([4,((colmax+1-colmin)*(rowmax+1-rowmin))]) #make an array to save col, row, lat, lon
            idx=0
            for rec in shape.iterRecords():
                if (int(rec['cellid'][0:4]) <= colmax and int(rec['cellid'][0:4]) >= colmin) \
                    and (int(rec['cellid'][5:]) <= rowmax and int(rec['cellid'][5:]) >= rowmin):
                        array_temp[0,idx] = int(rec['cellid'][0:4])   #column index
                        array_temp[1,idx] = int(rec['cellid'][5:])    #row index
                        array_temp[2,idx] = rec['Latitude']           #latitude
                        array_temp[3,idx] = rec['Longitude']          #longitude
                        idx +=1
    
            #Use this array to locate and assign the lat lon values to the NEI datafile and then place onto grid
            for idx in np.arange(0,len(data_temp)):
                if str(data_temp['FIPS'][idx]).startswith('17') or str(data_temp['FIPS'][idx]).startswith('18'):
                    icol = data_temp['COL'][idx]
                    irow = data_temp['ROW'][idx]
                    match = np.where((icol == array_temp[0,:]) & (irow == array_temp[1,:]))[0][0]
                    data_temp.loc[idx,'Lat'] = array_temp[2,match]
                    data_temp.loc[idx,'Lon'] = array_temp[3,match]
                    ilat = int((data_temp['Lat'][idx] - Lat_low)/Res01)
                    ilon = int((data_temp['Lon'][idx] - Lon_left)/Res01)
                    if str(data_temp['FIPS'][idx]).startswith('17'):
                        vars()[data_names[ivar]][0,ilat,ilon,iyear] += data_temp.loc[idx,'Abs']
                    else:
                        vars()[data_names[ivar]][1,ilat,ilon,iyear] += data_temp.loc[idx,'Abs']

    np.save('./IntermediateOutputs/NEI_oilwell_tempoutput', map_NEI_oil_wells)
    np.save('./IntermediateOutputs/NEI_oilcomp_tempoutput', map_NEI_oil_completions)
    np.save('./IntermediateOutputs/NEI_oilprod_tempoutput', map_NEI_oil_production)
    np.save('./IntermediateOutputs/NEI_oildrill_tempoutput', map_NEI_oil_drilledwells)

else:
    map_NEI_oil_wells = np.load('./IntermediateOutputs/NEI_oilwell_tempoutput.npy')
    map_NEI_oil_completions = np.load('./IntermediateOutputs/NEI_oilcomp_tempoutput.npy')
    map_NEI_oil_production = np.load('./IntermediateOutputs/NEI_oilprod_tempoutput.npy')
    map_NEI_oil_drilledwells = np.load('./IntermediateOutputs/NEI_oildrill_tempoutput.npy')
            
            
print('IL/IN NEI totals')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Oil wells (Conv + HF):         ',np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    print('All oil well comp (Conv + HF): ',np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    print('Oil production:                ',np.sum(map_NEI_oil_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_oil_drilledwells[:,:,:,iyear]))
    print(' ')

##### Step 2.7.2.2 Read in 2018 data (MS Access data)

In [None]:
#Read in 2018 NEI data from different datafile format
    
if ReCalc_NEI ==1:    
    #Read in the data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+ERG_NEI_inputloc_2018+';'''
    conn = pyodbc.connect(driver_str)
    NEI_2018_ILIN_wells = pd.read_sql("SELECT * FROM 2018_IL_IN_WELLS", conn)
    conn.close()

    data_temp = NEI_2018_ILIN_wells[(NEI_2018_ILIN_wells['ACTIVE_WELL_FLAG'] ==1) & \
                                    (NEI_2018_ILIN_wells['WELL_TYPE'] == 'OIL')]
    data_temp.reset_index(inplace=True, drop=True)
    data_temp.fillna("",inplace=True)

    #find 2018 index
    year_diff = [abs(x - 2018) for x in year_range]
    iyear = year_diff.index(min(year_diff))

    # place data on map for each state (for active wells, production, completions, and drilled wells)
    for iwell in np.arange(0,len(data_temp)):
        ilat = int((data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
        ilon = int((data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
        if str(data_temp['FIPS_CODE'][iwell]).startswith('17'):# or str(data_temp['FIPS_CODE'][iwell]).startswith('18'):
            istate = 0
        else:
            istate =1
        map_NEI_oil_wells[istate,ilat,ilon,iyear] += 1
        map_NEI_oil_production[istate,ilat,ilon,iyear] += data_temp.loc[iwell,'SUM_LIQ']
        if '2018' in data_temp['COMPLETION_DATE'][iwell]:
            map_NEI_oil_completions[istate,ilat,ilon,iyear] += 1
        if '2018' in data_temp['SPUD_DATE'][iwell]:
            map_NEI_oil_drilledwells[istate,ilat,ilon,iyear] += 1

    np.save('./IntermediateOutputs/NEI_oilwell_tempoutput', map_NEI_oil_wells)
    np.save('./IntermediateOutputs/NEI_oilcomp_tempoutput', map_NEI_oil_completions)
    np.save('./IntermediateOutputs/NEI_oilprod_tempoutput', map_NEI_oil_production)
    np.save('./IntermediateOutputs/NEI_oildrill_tempoutput', map_NEI_oil_drilledwells)
else:
    map_NEI_oil_wells = np.load('./IntermediateOutputs/NEI_oilwell_tempoutput.npy')
    map_NEI_oil_completions = np.load('./IntermediateOutputs/NEI_oilcomp_tempoutput.npy')
    map_NEI_oil_production = np.load('./IntermediateOutputs/NEI_oilprod_tempoutput.npy')
    map_NEI_oil_drilledwells = np.load('./IntermediateOutputs/NEI_oildrill_tempoutput.npy')
    
print('IL/IN NEI totals')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Oil wells (Conv + HF):         ',np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    print('All oil well comp (Conv + HF): ',np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    print('Oil production:                ',np.sum(map_NEI_oil_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_oil_drilledwells[:,:,:,iyear]))
    print(' ')

#display(data_temp)

#### Step 2.7.4 Scale NEI absolute values to GHGI data

In [None]:
# Scale the absolute NEI data by the corresponding GHGI counts so that the IL/IN data are not over or under-weighted
# relative to the IL/IN activity data used in the GHGI
# without the scaling, the national emissions would likley be overallocated to these two states as the NEI well and 
# production counts are higher than those used for these states in the current GHGI

#make extra required arrays (HF and Conv will have the same spatial distribution as all gas wells/completions)
map_NEI_oil_wells_conv = map_NEI_oil_wells.copy()
map_NEI_oil_wells_HF = map_NEI_oil_wells.copy()
map_NEI_oil_completions_conv = map_NEI_oil_completions.copy()
map_NEI_oil_completions_HF = map_NEI_oil_completions.copy()

#if ReCalc_NEI ==1:

print('QA/QC: Check that NEI data is scaled to GHGI activity data')
for iyear in np.arange(0,num_years):
    # ratio = sum(GHGI)/ sum(NEI)
    
    #1) conventional oil wells (same spatial distribution as all NEI oil wells)
    ratio_temp = data_fn.safe_div(np.sum(Well_ConvOilWell_ILIN[:,iyear]),np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    map_NEI_oil_wells_conv[:,:,:,iyear] *= ratio_temp
    
    #2) HF oil wells (same spatial distribution as all NEI oil wells)
    ratio_temp = data_fn.safe_div(np.sum(Well_HFOilWell_ILIN[:,iyear]),np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    map_NEI_oil_wells_HF[:,:,:,iyear] *= ratio_temp
    
    # 3) all oil wells
    ratio_temp = data_fn.safe_div(np.sum(Well_Allwell_ILIN[:,iyear]),np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    map_NEI_oil_wells[:,:,:,iyear] *= ratio_temp
    
    #4) Conv oil well completions (same spatial distribution as all oil well completions)
    ratio_temp = data_fn.safe_div(np.sum(Well_ConvOilWell_ILIN[:,iyear]),np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    map_NEI_oil_completions_conv[:,:,:,iyear] *= ratio_temp
    
    #5) HF oil well completions (same spatial distribution as all oil well completions)
    ratio_temp = data_fn.safe_div(np.sum(Well_HFComp_ILIN[:,iyear]),np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    map_NEI_oil_completions_HF[:,:,:,iyear] *= ratio_temp
    
    #6) all oil well completions
    ratio_temp = data_fn.safe_div(np.sum(Well_AllComp_ILIN[:,iyear]),np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    map_NEI_oil_completions[:,:,:,iyear] *= ratio_temp
    
    #7) oil wells drilled
    ratio_temp = data_fn.safe_div(np.sum(Well_Oilwell_drilled_ILIN[:,iyear]),np.sum(map_NEI_oil_drilledwells[:,:,:,iyear]))
    if pd.isna(ratio_temp):
        ratio_temp = 0    #if there is no GHGI data, but there is NEI data, scale to zero counts
    map_NEI_oil_drilledwells[:,:,:,iyear] *= ratio_temp
    
    #8) oil production volumes
    ratio_temp = data_fn.safe_div(np.sum(Wellprod_other_ILIN[:,iyear]),np.sum(map_NEI_oil_production[:,:,:,iyear]))
    if pd.isna(ratio_temp):
        ratio_temp = 0     #if there is no GHGI data, but there is NEI data, scale to zero counts
    map_NEI_oil_production[:,:,:,iyear] *= ratio_temp
    
    diff1 = (np.sum(Well_Allwell_ILIN[:,iyear]) - np.sum(map_NEI_oil_wells[:,:,:,iyear])) +\
            (np.sum(Well_ConvOilWell_ILIN[:,iyear]) - np.sum(map_NEI_oil_wells_conv[:,:,:,iyear])) +\
            (np.sum(Well_HFOilWell_ILIN[:,iyear]) - np.sum(map_NEI_oil_wells_HF[:,:,:,iyear])) + \
            (np.sum(Well_AllComp_ILIN[:,iyear]) - np.sum(map_NEI_oil_completions[:,:,:,iyear])) +\
            (np.sum(Well_ConvComp_ILIN[:,iyear]) - np.sum(map_NEI_oil_completions_conv[:,:,:,iyear])) +\
            (np.sum(Well_HFComp_ILIN[:,iyear]) - np.sum(map_NEI_oil_completions_HF[:,:,:,iyear])) + \
            (np.sum(Well_Oilwell_drilled_ILIN[:,iyear]) - np.sum(map_NEI_oil_drilledwells[:,:,:,iyear])) + \
            (np.sum(Wellprod_other_ILIN[:,iyear]) - np.sum(map_NEI_oil_production[:,:,:,iyear]))
    
    if abs(diff1) < 1e-12:
        print('Year ', year_range_str[iyear],":","PASS")
    else:
        print('Year ', year_range_str[iyear],":","CHECK", diff1)
    
    print('Oil wells (Conv + HF):         ',np.sum(map_NEI_oil_wells[:,:,:,iyear]))
    print('Oil wells (Conv):              ',np.sum(map_NEI_oil_wells_conv[:,:,:,iyear]))
    print('Oil wells (HF):                ',np.sum(map_NEI_oil_wells_HF[:,:,:,iyear]))
    print('All oil well comp (Conv + HF): ',np.sum(map_NEI_oil_completions[:,:,:,iyear]))
    print('All oil well comp (Conv):      ',np.sum(map_NEI_oil_completions_conv[:,:,:,iyear]))
    print('All oil well comp (HF):        ',np.sum(map_NEI_oil_completions_HF[:,:,:,iyear]))
    print('Oil production:                ',np.sum(map_NEI_oil_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_oil_drilledwells[:,:,:,iyear]))
    print(' ')


#### Step 2.7.5 Add the NEI data to the relevant Enverus Proxy Maps

In [None]:
# Add maps to relevant Enverus maps
# add absolute values to the Enverus maps above (then the weighted calculations below can remain unchanged)
# The same values are assigned to each month (e.g., no temporal resolution is applied to IL or IN data)
# NOTE: Proxy maps need to be reloaded if this code is run more than once

for iyear in np.arange(0,num_years):
    for imonth in np.arange(0,num_months):
        Map_EnvAllwell[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_wells[0,:,:,iyear]+map_NEI_oil_wells[1,:,:,iyear])
        Map_EnvOilProd[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_production[0,:,:,iyear]+map_NEI_oil_production[1,:,:,iyear])
        Map_EnvBasinOther[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_production[0,:,:,iyear]+map_NEI_oil_production[1,:,:,iyear])
        Map_EnvHFOilWell[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_wells_HF[0,:,:,iyear]+map_NEI_oil_wells_HF[1,:,:,iyear])
        Map_EnvConvOilWell[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_wells_conv[0,:,:,iyear]+map_NEI_oil_wells_conv[1,:,:,iyear])
        Map_EnvHFComp[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_completions_HF[0,:,:,iyear]+map_NEI_oil_completions_HF[1,:,:,iyear])
        Map_EnvConvComp[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_completions_conv[0,:,:,iyear]+map_NEI_oil_completions_conv[1,:,:,iyear])
        Map_EnvOilWellDrilled[:,:,iyear,imonth] += (1/12)*(map_NEI_oil_drilledwells[0,:,:,iyear]+map_NEI_oil_drilledwells[1,:,:,iyear])


----------------
## Step 3. Read In EPA GHGI Data
---------------

### Step 3.1. Prodcution and Exploration Emissions

In [None]:
# Emissions are in units of MT (= 1x10-6 Tg)

names = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Production_CH4 (MT)", usecols = "A:AE", skiprows = 3, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_prod_Petr = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Production_CH4 (MT)", usecols = "A:AE", skiprows = 5, names = colnames, nrows = 126)
EPA_emi_prod_Petr= EPA_emi_prod_Petr.drop(columns = ['Emission\nSource No.'])
EPA_emi_prod_Petr.rename(columns={EPA_emi_prod_Petr.columns[0]:'Source'}, inplace=True)
EPA_emi_prod_Petr['Source']= EPA_emi_prod_Petr['Source'].str.replace(r"\(","- ")
EPA_emi_prod_Petr['Source']= EPA_emi_prod_Petr['Source'].str.replace(r"\)","")
EPA_emi_prod_Petr['Source']= EPA_emi_prod_Petr['Source'].str.replace(r'"',"")
EPA_emi_prod_Petr = EPA_emi_prod_Petr.fillna('')
EPA_emi_prod_Petr = EPA_emi_prod_Petr.drop(columns = [*range(1990, start_year,1)])
EPA_emi_prod_Petr.reset_index(inplace=True, drop=True)
display(EPA_emi_prod_Petr)

### Step 3.2. Read in Petroleum Transport 

In [None]:
# Emissions are in units of MT (= 1x10-6 Tg)

names = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Transportation Emissions", usecols = "A:AG", skiprows = 32, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_trans_Petr = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Transportation Emissions", usecols = "A:AG", skiprows = 34, names = colnames, nrows = 20)
EPA_emi_trans_Petr= EPA_emi_trans_Petr.drop(columns = ['Emission\nSource No.', 'Unnamed: 2', 'Emission Units'])
EPA_emi_trans_Petr.rename(columns={EPA_emi_trans_Petr.columns[0]:'Source'}, inplace=True)
EPA_emi_trans_Petr = EPA_emi_trans_Petr.fillna('')
EPA_emi_trans_Petr = EPA_emi_trans_Petr.drop(columns = [*range(1990, start_year,1)])
EPA_emi_trans_Petr.reset_index(inplace=True, drop=True)
display(EPA_emi_trans_Petr)


### Step 3.3. Read in Petroleum Refining 

In [None]:
# Emissions are in units of MT (= 1x10-6 Tg)

names = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Refinery Emissions", usecols = "A:AG", skiprows = 7, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_ref_Petr = pd.read_excel(EPA_Petr_inputfile, sheet_name = "Refinery Emissions", usecols = "A:AG", skiprows = 8, names = colnames, nrows = 29)
EPA_emi_ref_Petr= EPA_emi_ref_Petr.drop(columns = ['Emission\nSource No.', 'Scaling Factor for 1990-2009 ','Units'])
EPA_emi_ref_Petr.rename(columns={EPA_emi_ref_Petr.columns[0]:'Source'}, inplace=True)
EPA_emi_ref_Petr = EPA_emi_ref_Petr.fillna('')
EPA_emi_ref_Petr = EPA_emi_ref_Petr.drop(columns = [*range(1990, start_year,1)])
EPA_emi_ref_Petr.reset_index(inplace=True, drop=True)
display(EPA_emi_ref_Petr)

### Step 3.4. Read in Total Petroleum Emissions

In [None]:
# Read in total production + exploration emissions (with methane reductions accounted for)
# data are in kt

names = pd.read_excel(EPA_Petr_inputfile, sheet_name = "CH4 Summary", usecols = "A:AD", skiprows = 4, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_total_Petr = pd.read_excel(EPA_Petr_inputfile, sheet_name = "CH4 Summary", usecols = "A:AD", skiprows = 19, names = colnames, nrows = 5)
EPA_emi_total_Petr.rename(columns={EPA_emi_total_Petr.columns[0]:'Source'}, inplace=True)
EPA_emi_total_Petr = EPA_emi_total_Petr.drop(columns = [*range(1990, start_year,1)])
EPA_emi_total_Petr.reset_index(inplace=True, drop=True)
display(EPA_emi_total_Petr)


### Step 3.5. Split Emissions into Scaling Groups (from Petroleum_ProxyMapping.xlsx)

In [None]:
start_year_idx = EPA_emi_prod_Petr.columns.get_loc(start_year)
end_year_idx = EPA_emi_prod_Petr.columns.get_loc(end_year)+1
ghgi_prod_groups = ghgi_prod_map['GHGI_Emi_Group'].unique()
ghgi_trans_groups = ghgi_trans_map['GHGI_Emi_Group'].unique()
ghgi_ref_groups = ghgi_ref_map['GHGI_Emi_Group'].unique()

for igroup in np.arange(0,len(ghgi_prod_groups)):
    vars()[ghgi_prod_groups[igroup]] = np.zeros(num_years)
    source_temp = ghgi_prod_map.loc[ghgi_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    emi_temp = EPA_emi_prod_Petr[EPA_emi_prod_Petr['Source'].str.contains(pattern_temp)]
    vars()[ghgi_prod_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000)
    
for igroup in np.arange(0,len(ghgi_trans_groups)):
    vars()[ghgi_trans_groups[igroup]] = np.zeros(num_years)
    source_temp = ghgi_trans_map.loc[ghgi_trans_map['GHGI_Emi_Group'] == ghgi_trans_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    emi_temp = EPA_emi_trans_Petr[EPA_emi_trans_Petr['Source'].str.contains(pattern_temp)]
    vars()[ghgi_trans_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000)
    
for igroup in np.arange(0,len(ghgi_ref_groups)):
    vars()[ghgi_ref_groups[igroup]] = np.zeros(num_years)
    source_temp = ghgi_ref_map.loc[ghgi_ref_map['GHGI_Emi_Group'] == ghgi_ref_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    emi_temp = EPA_emi_ref_Petr[EPA_emi_ref_Petr['Source'].str.contains(pattern_temp)]
    vars()[ghgi_ref_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000)

    
print('QA/QC: Check Production, Transport, Refining Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    sum_emi = 0
    for igroup in np.arange(0,len(ghgi_prod_groups)):
        sum_emi += vars()[ghgi_prod_groups[igroup]][iyear]
    for igroup in np.arange(0,len(ghgi_trans_groups)):
        sum_emi += vars()[ghgi_trans_groups[igroup]][iyear]
    for igroup in np.arange(0,len(ghgi_ref_groups)):
        sum_emi += vars()[ghgi_ref_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_total_Petr.iloc[0,iyear+1]+EPA_emi_total_Petr.iloc[1,iyear+1] +EPA_emi_total_Petr.iloc[2,iyear+1]+\
                    EPA_emi_total_Petr.iloc[3,iyear+1]
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi - summary_emi)/((sum_emi + summary_emi)/2)
    print(summary_emi)
    print(sum_emi)
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 
        
## Note: The numbers will not be exactly the same do to conversions and rounding in the Transport sector (between the 
## Transportation Emissions tab and the CH4 summary tab). This is not an error, just a difference. 

----------------
## Step 4. Grid Data (using spatial proxies)
---------------

### Step. 4.1. Calculate the monthly weighted proxy arrays

#### Step 4.1.1 Assign the Appropriate Proxy Variable Names

In [None]:
# The names on the left need to match the 'Petroleum_ProxyMapping' 'Proxy_Group' names (these are initialized in Step 2). 
# The names on the right are the variable names used to caluclate the proxies in this code.

#Production segment
Map_Allwell = Map_EnvAllwell
Map_OilProd = Map_EnvOilProd
Map_Basin220 = Map_Basin220
Map_Allwell = Map_EnvAllwell
Map_Basin220 = Map_EnvBasin220
Map_Basin360 = Map_EnvBasin360
Map_Basin395 = Map_EnvBasin395
Map_Basin430 = Map_EnvBasin430
Map_BasinOther = Map_EnvBasinOther
Map_HFOilWell = Map_EnvHFOilWell
Map_ConvOilWell = Map_EnvConvOilWell
Map_HFComp = Map_EnvHFComp
Map_ConvComp = Map_EnvConvComp
Map_OilWellDrilled = Map_EnvOilWellDrilled
Map_StateGOMOffshore = Map_EnvStateGOMOffshore
Map_StatePacOffshore = Map_EnvStatePacOffshore
#nongrid
Map_Allwell_nongrid = Map_EnvAllwell_nongrid
Map_OilProd_nongrid = Map_EnvOilProd_nongrid
Map_Basin220_nongrid = Map_EnvBasin220_nongrid
Map_Basin360_nongrid = Map_EnvBasin360_nongrid
Map_Basin395_nongrid = Map_EnvBasin395_nongrid
Map_Basin430_nongrid = Map_EnvBasin430_nongrid
Map_BasinOther_nongrid = Map_EnvBasinOther_nongrid
Map_HFOilWell_nongrid = Map_EnvHFOilWell_nongrid
Map_ConvOilWell_nongrid = Map_EnvConvOilWell_nongrid
Map_HFComp_nongrid = Map_EnvHFComp_nongrid
Map_ConvComp_nongrid = Map_EnvConvComp_nongrid
Map_OilWellDrilled_nongrid = Map_EnvOilWellDrilled_nongrid
Map_StateGOMOffshore_nongrid = Map_EnvStateGOMOffshore_nongrid
Map_StatePacOffshore_nongrid = Map_EnvStatePacOffshore_nongrid
#Offshore
Map_FedGOMOffshoreMajor = Map_GOADSmajor_emissions
Map_FedGOMOffshoreMinor = Map_GOADSminor_emissions
Map_FedGOMOffshore_Both = Map_FedGOMOffshoreMajor + Map_FedGOMOffshoreMinor
Map_FedGOMOffshore_Both_nongrid = Map_FedGOMOffshoreMajor_nongrid + Map_FedGOMOffshoreMinor_nongrid
Map_FedPacOffshore = Map_BSEEOffshore
Map_PacOffshore = Map_FedPacOffshore+ Map_EnvStatePacOffshore #Pacfic map includes state and federal production

#Transpot
Map_TransRefining = Map_GHGRPRefineries
Map_TransRefining_nongrid = Map_GHGRPRefineries_nongrid
Map_TransOnshore = Map_OilProd 
Map_TransOnshore_nongrid = Map_OilProd_nongrid 
Map_TransOffshore = Map_StateGOMOffshore #need to switch this later on
Map_TransOffshore_nongrid = Map_StateGOMOffshore_nongrid #need to switch this later on

#Refining
Map_Refineries = Map_GHGRPRefineries
Map_Refineries_nongrid = Map_GHGRPRefineries_nongrid

#### Step 4.1.2 Calculate weighted arrays

In [None]:
# Calculate weighting arrays
# Find the fraction of wells (or gas production) in each grid cell, relative to the total well counts (or gas prod) (on and off grid)
# also weight by the number of days in each month


for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap      
    
    #Production
    print('Prod. Proxy Arrays: ', year_range[iyear])
    for isource in np.arange(0,len(proxy_prod_map)): 
        if proxy_prod_map.loc[isource, 'Month_Flag'] == 1:
            for imonth in np.arange(0, num_months):
                #first weight by the number of days in each month (weighted map for month = month map * number of days in each month)
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,imonth] *= month_days[imonth]
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,imonth] *= month_days[imonth]
            #then normalize
            temp_sum = float(np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:]) + \
                             np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:]))
            vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:] = \
                data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:], temp_sum)
            vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:] = \
                data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:], temp_sum)
            proxy_sum = np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:])+np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check ', proxy_prod_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')
        else:
            vars()[proxy_prod_groups[isource]][:,:,iyear] *= np.sum(month_days)
            vars()[proxy_prod_groups[isource]+'_nongrid'][iyear] *= np.sum(month_days)  
            temp_sum = float(np.sum(vars()[proxy_prod_groups[isource]][:,:,iyear]) + np.sum(vars()[proxy_prod_groups[isource]+'_nongrid'][iyear]))
            vars()[proxy_prod_groups[isource]][:,:,iyear] = data_fn.safe_div(vars()[proxy_prod_groups[isource]][:,:,iyear], temp_sum)
            vars()[proxy_prod_groups[isource]+'_nongrid'][iyear] = data_fn.safe_div(vars()[proxy_prod_groups[isource]+'_nongrid'][iyear], temp_sum)
            proxy_sum = np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check', proxy_prod_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')
    
    #Transport    
    print('Transport Proxy Arrays: ', year_range[iyear])
    for isource in np.arange(0,len(proxy_trans_map)): 
        if proxy_trans_map.loc[isource, 'Month_Flag'] == 1:
            for imonth in np.arange(0, num_months):
                #first weight by the number of days in each month (weighted map for month = month map * number of days in each month)
                vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear,imonth] *= month_days[imonth]
                vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,imonth] *= month_days[imonth]
            #then normalize
            temp_sum = float(np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear,:]) + \
                    np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:]))
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear,:] = \
                data_fn.safe_div(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear,:], temp_sum)
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:] = \
                data_fn.safe_div(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:], temp_sum)
            proxy_sum = np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear,:])+np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check ', proxy_trans_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')
        else:
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear] *= np.sum(month_days)
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] *= np.sum(month_days)  
            temp_sum = float(np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear]) + np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear]))
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear] = data_fn.safe_div(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear], temp_sum)
            vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] = data_fn.safe_div(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            proxy_sum = np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_trans_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check', proxy_trans_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')
                
    #refining
    print('Refining Proxy Arrays: ', year_range[iyear])
    for isource in np.arange(0,len(proxy_ref_map)): 
        if proxy_ref_map.loc[isource, 'Month_Flag'] == 1:
            for imonth in np.arange(0, num_months):
                #first weight by the number of days in each month (weighted map for month = month map * number of days in each month)
                vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear,imonth] *= month_days[imonth]
                vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,imonth] *= month_days[imonth]
            #then normalize
            temp_sum = float(np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear,:]) + \
                         np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:]))
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear,:] = \
                data_fn.safe_div(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear,:], temp_sum)
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:] = \
                data_fn.safe_div(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:], temp_sum)
            proxy_sum = np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear,:])+np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check ', proxy_ref_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')

        else:
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear] *= np.sum(month_days)
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] *= np.sum(month_days)  
            temp_sum = float(np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear]) + np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear]))
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear] = data_fn.safe_div(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear], temp_sum)
            vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] = data_fn.safe_div(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            proxy_sum = np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_ref_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear])
            if proxy_sum >1.0001 or proxy_sum <0.9999:
                print('Check', proxy_ref_map.loc[isource,'Proxy_Group'], ': ', proxy_sum)
            else:
                print('PASS')

    
    #calculate average map weighted by well counts and production
    Map_Both[:,:,iyear,:] = 0.5 * (Map_Allwell[:,:,iyear,:]+Map_OilProd[:,:,iyear,:])  
    Map_Both_nongrid[iyear,:] = 0.5 * (Map_Allwell_nongrid[iyear,:]+Map_OilProd_nongrid[iyear,:])  
    #set 'not-mapped' array to 1 so that the emissions will be included in the calculated total
    Map_not_mapped[:,:,iyear,:] = 1

### Step. 4.2. Allocate to CONUS 0.1x0.1 grid

In [None]:
# For each of the production, transport, and refining segments...
# 1) make flux array with correct dimensions
# 2) weight monthly data by days in month (or year)
# 3) caluclate flux as Flux = GHGI emissions * Map

Emissions = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_expl = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_prod = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_trans = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_ref = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_expl_nongrid = np.zeros([num_years,num_months])
Emissions_prod_nongrid = np.zeros([num_years,num_months])
Emissions_trans_nongrid = np.zeros([num_years,num_months])
Emissions_ref_nongrid = np.zeros([num_years,num_months])
Emi_not_mapped_sum = np.zeros(num_years)
DEBUG=1
if DEBUG==1:
    total_sum = np.zeros(num_years)
    proxy_val= np.zeros(num_years)
    ghgi_val= np.zeros(num_years)
    
#Production
for igroup in np.arange(0,len(proxy_prod_map)):
    if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
        if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years,num_months])
            for iyear in np.arange(0,num_years):
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:] += \
                vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][:,:,iyear,:]
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:] += vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,:]
            
                for imonth in np.arange(0,num_months):
                    if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_OilWellExp' or \
                        proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_ConvCompExp' or \
                        proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_HFCompExp' or \
                        proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_OilWellDrilledExp':
                        Emissions[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                        Emissions_expl_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]
                        Emissions_expl[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                    else:
                        Emissions[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                        Emissions_prod_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]
                        Emissions_prod[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]

                if DEBUG==1:
                    proxy_val[iyear] = np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])+\
                                     np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:])
                    ghgi_val[iyear] = np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear])
                    total_sum[iyear] += proxy_val[iyear]

        else:
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years])
            for iyear in np.arange(0,num_years):
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][:,:,iyear]
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear] += vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear]
                if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_OilWellExp' or \
                    proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_ConvCompExp' or \
                    proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_HFCompExp' or \
                    proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_OilWellDrilledExp':
                    Emissions[:,:,iyear,:] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                    Emissions_expl[:,:,iyear,:] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                    Emissions_expl_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months
                else:
                    Emissions[:,:,iyear,:] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                    Emissions_prod[:,:,iyear,:] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                    Emissions_prod_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months
 
                if DEBUG==1:
                    proxy_val[iyear] = np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])+\
                             np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear])
                    ghgi_val[iyear] = np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear])
                    total_sum[iyear] += proxy_val[iyear]

        if DEBUG==1:
            #these two variables should be the same if code is working properly
            print(igroup, proxy_val[:])
            print(igroup, ghgi_val[:])

    else:
        ###NOTE: currently all non-mapped emissions are in the production segment
        for iyear in np.arange(0,num_years):
            Emissions_prod_nongrid[iyear,:] += (1/12)*Emi_not_mapped[iyear]

# Transport
for igroup in np.arange(0,len(proxy_trans_map)):
    if proxy_trans_map.loc[igroup, 'Month_Flag'] == 1:
        vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        for iyear in np.arange(0,num_years):
            vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:] += vars()[proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_trans_map.loc[igroup,'Proxy_Group']][:,:,iyear,:]
            vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:] += vars()[proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_trans_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,:]
            for imonth in np.arange(0,num_months):
                if proxy_trans_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                    Emissions[:,:,iyear,imonth] += vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                    Emissions_trans_nongrid[iyear,imonth] += vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]
                    Emissions_trans[:,:,iyear,imonth] += vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]

    else:
        vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years])
        for iyear in np.arange(0,num_years):
            vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += vars()[proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_trans_map.loc[igroup,'Proxy_Group']][:,:,iyear]
            vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear] += vars()[proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_trans_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear]
            for imonth in np.arange(0,num_months):
                if proxy_trans_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                    Emissions[:,:,iyear,imonth] += (vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months) #distribute emissions evenly over each month
                    Emissions_trans_nongrid[iyear,imonth] += vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months
                    Emissions_trans[:,:,iyear,imonth] += (vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months) #distribute emissions evenly over each month

# Refining
for igroup in np.arange(0,len(proxy_ref_map)):
    if proxy_ref_map.loc[igroup, 'Month_Flag'] == 1:
        vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        for iyear in np.arange(0,num_years):
            vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:] += vars()[proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_ref_map.loc[igroup,'Proxy_Group']][:,:,iyear,:]
            vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:] += vars()[proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_ref_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,:]
            for imonth in np.arange(0,num_months):
                if proxy_ref_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                    Emissions[:,:,iyear,imonth] += vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                    Emissions_ref_nongrid[iyear,imonth] += vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear, imonth]
                    Emissions_ref[:,:,iyear,imonth] += vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]

    else:
        vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years])
        for iyear in np.arange(0,num_years):
            vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += vars()[proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_ref_map.loc[igroup,'Proxy_Group']][:,:,iyear]
            vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear] += vars()[proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][iyear] * vars()[proxy_ref_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear]
            for imonth in np.arange(0,num_months):
                if proxy_ref_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                    Emissions[:,:,iyear,imonth] += (vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months)
                    Emissions_ref_nongrid[iyear,imonth] += vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months
                    Emissions_ref[:,:,iyear,imonth] += (vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months)


# QA/QC gridded emissions
# Check sum of all gridded emissions + emissions not included in gridding (e.g., AK), and other non-gridded areas
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    calc_emi = 0
    summary_emi = EPA_emi_total_Petr.iloc[0,iyear+1]+EPA_emi_total_Petr.iloc[1,iyear+1] +EPA_emi_total_Petr.iloc[2,iyear+1]+\
    EPA_emi_total_Petr.iloc[3,iyear+1]
    
    for igroup in np.arange(0,len(proxy_prod_map)):
        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
            if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
                calc_emi += np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])
            else:
                calc_emi += np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    for igroup in np.arange(0,len(proxy_trans_map)):
        if proxy_trans_map.loc[igroup, 'Month_Flag'] == 1:
            calc_emi += np.sum(vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])
        else:
            calc_emi += np.sum(vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    for igroup in np.arange(0,len(proxy_ref_map)):
        if proxy_ref_map.loc[igroup, 'Month_Flag'] == 1:
            calc_emi += np.sum(vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])
        else:
            calc_emi += np.sum(vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])           
    
    calc_emi += np.sum(Emissions_expl_nongrid[iyear,:]) +np.sum(Emissions_prod_nongrid[iyear,:])+\
                np.sum(Emissions_trans_nongrid[iyear,:])+np.sum(Emissions_ref_nongrid[iyear,:])
    
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    #check two
    calc_emi2 =  np.sum(Emissions_prod[:,:,iyear,:]) + np.sum(Emissions_trans[:,:,iyear,:]) +\
                 np.sum(Emissions_ref[:,:,iyear,:])+np.sum(Emissions_expl[:,:,iyear,:])+\
                 np.sum(Emissions_expl_nongrid[iyear,:])+np.sum(Emissions_prod_nongrid[iyear,:])+\
                 np.sum(Emissions_trans_nongrid[iyear,:])+np.sum(Emissions_ref_nongrid[iyear,:])
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(summary_emi)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

#### Step 4.2.2 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups2 = (np.unique(proxy_prod_map['GHGI_Emi_Group']))
unique_groups2 = list(unique_groups2[unique_groups2 != 'Emi_not_mapped'])
unique_groups3 = list(np.unique(proxy_trans_map['GHGI_Emi_Group']))
unique_groups4 = list(np.unique(proxy_ref_map['GHGI_Emi_Group']))
unique_groups = unique_groups2+unique_groups3+unique_groups4
print(unique_groups2)

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()['Ext_'+unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_expl_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = np.sum(Emissions_expl_nongrid[:,:],axis=1)

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_prod_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = np.sum(Emissions_prod_nongrid[:,:],axis=1)

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_trans_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = np.sum(Emissions_trans_nongrid[:,:],axis=1)

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_ref_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = np.sum(Emissions_ref_nongrid[:,:],axis=1)

nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out


### Step 4.3 Calculate Gridded Fluxes (molec/s/cm2)

In [None]:
#Step 2 -- Calculate fluxes (molec./s/cm2)
#Convert emissions to emission flux
# conversion: kt emissions to molec/cm2/s flux
DEBUG = 1

### NOTE: Individual Flux arrays are not summing correctly - but are not reported anywhere

#Initialize arrays
check_sum = np.zeros([num_years])
check_sum_annual = np.zeros([num_years])
check_sum_annual2 = np.zeros([num_years])
check_sum_annual3= np.zeros([num_years])
check_sum_annual4 = np.zeros([num_years])
Flux_Emissions_Total = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Total_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_Emissions_Expl = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Expl_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_Emissions_Prod = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Prod_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_Emissions_Trans = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Trans_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_Emissions_Ref = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Ref_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])

for igroup in np.arange(0,len(proxy_prod_map)):
    vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'] = np.zeros([len(Lat_01),len(Lon_01),num_years])
for igroup in np.arange(0,len(proxy_trans_map)):
    vars()['Flux_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_annual'] = np.zeros([len(Lat_01),len(Lon_01),num_years])
for igroup in np.arange(0,len(proxy_ref_map)):
    vars()['Flux_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_annual'] = np.zeros([len(Lat_01),len(Lon_01),num_years])


#Calculate fluxes
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap 
    
    # calculate fluxes for annual data  (=kt * grams/kt *molec/mol *mol/g *s^-1 * cm^-2)
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 * np.sum(month_days) * 24 * 60 *60) / area_matrix_01
    for igroup in np.arange(0,len(proxy_prod_map)):
        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
            if proxy_prod_map.loc[igroup, 'Month_Flag'] == 0:
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] *= conversion_factor_annual
                vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] = vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]
            
    for igroup in np.arange(0,len(proxy_trans_map)):
        if proxy_trans_map.loc[igroup, 'Month_Flag'] == 0:
            vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] *= conversion_factor_annual
            vars()['Flux_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] = vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]
    for igroup in np.arange(0,len(proxy_ref_map)):
        if proxy_ref_map.loc[igroup, 'Month_Flag'] == 0:
            vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] *= conversion_factor_annual
            vars()['Flux_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] = vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]
            
    for imonth in np.arange(0, num_months):
        conversion_factor_month = 10**9 * Avogadro / float(Molarch4 * month_days[imonth] * 24 * 60 *60) / area_matrix_01
        conv_factor2 = month_days[imonth]/year_days
        Flux_Emissions_Total[:,:,iyear,imonth] = Emissions[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Total_annual[:,:,iyear] += Flux_Emissions_Total[:,:,iyear,imonth]*conv_factor2
        Flux_Emissions_Prod[:,:,iyear,imonth] = Emissions_prod[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Prod_annual[:,:,iyear] += Flux_Emissions_Prod[:,:,iyear,imonth]*conv_factor2
        Flux_Emissions_Expl[:,:,iyear,imonth] = Emissions_expl[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Expl_annual[:,:,iyear] += Flux_Emissions_Expl[:,:,iyear,imonth]*conv_factor2
        Flux_Emissions_Trans[:,:,iyear,imonth] = Emissions_trans[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Trans_annual[:,:,iyear] += Flux_Emissions_Trans[:,:,iyear,imonth]*conv_factor2
        Flux_Emissions_Ref[:,:,iyear,imonth] = Emissions_ref[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Ref_annual[:,:,iyear] += Flux_Emissions_Ref[:,:,iyear,imonth]*conv_factor2
        for igroup in np.arange(0,len(proxy_prod_map)):
            if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
                    vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth] *= conversion_factor_month
                    vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]*conv_factor2
        for igroup in np.arange(0,len(proxy_trans_map)):
            if proxy_trans_map.loc[igroup, 'Month_Flag'] == 1:
                vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth] *= conversion_factor_month
                vars()['Flux_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] += vars()['Ext_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]*conv_factor2
        for igroup in np.arange(0,len(proxy_ref_map)):
            if proxy_ref_map.loc[igroup, 'Month_Flag'] == 1:
                vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth] *= conversion_factor_month
                vars()['Flux_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] += vars()['Ext_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]*conv_factor2
        
    
        check_sum[iyear] += np.sum(Flux_Emissions_Total[:,:,iyear,imonth]/conversion_factor_month)
    check_sum_annual[iyear] += np.sum(Flux_Emissions_Total_annual[:,:,iyear]/conversion_factor_annual)
    check_sum_annual2[iyear] += np.sum(Flux_Emissions_Expl_annual[:,:,iyear]/conversion_factor_annual)
    check_sum_annual2[iyear] += np.sum(Flux_Emissions_Prod_annual[:,:,iyear]/conversion_factor_annual)
    check_sum_annual3[iyear] += np.sum(Flux_Emissions_Trans_annual[:,:,iyear]/conversion_factor_annual)
    check_sum_annual4[iyear] += np.sum(Flux_Emissions_Ref_annual[:,:,iyear]/conversion_factor_annual)

print(' ')
print('QA/QC #2: Check final gridded fluxes against GHGI')  
# for the sum, check the converted annual emissions (convert back from flux) plus all the non-gridded emissions
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap 

    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 * np.sum(month_days) * 24 * 60 *60) / area_matrix_01
    
    calc_emi = check_sum_annual[iyear] + np.sum(Emissions_expl_nongrid[iyear,:]) +\
                np.sum(Emissions_prod_nongrid[iyear,:]) +np.sum(Emissions_trans_nongrid[iyear,:]) +np.sum(Emissions_ref_nongrid[iyear,:]) 
    calc_emi2 = check_sum_annual2[iyear] + check_sum_annual3[iyear] +check_sum_annual4[iyear] +\
                 np.sum(Emissions_expl_nongrid[iyear,:]) +\
                np.sum(Emissions_prod_nongrid[iyear,:]) +np.sum(Emissions_trans_nongrid[iyear,:]) +np.sum(Emissions_ref_nongrid[iyear,:]) 
    calc_emi3 = 0
    for igroup in np.arange(0,len(proxy_prod_map)):
        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
            calc_emi3 += np.sum(vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear]/conversion_factor_annual)
    for igroup in np.arange(0,len(proxy_trans_map)):
        calc_emi3 += np.sum(vars()['Flux_'+proxy_trans_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear]/conversion_factor_annual)
    for igroup in np.arange(0,len(proxy_ref_map)):
        calc_emi3 += np.sum(vars()['Flux_'+proxy_ref_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear]/conversion_factor_annual)          
    calc_emi3+=np.sum(Emissions_expl_nongrid[iyear,:]) +\
                np.sum(Emissions_prod_nongrid[iyear,:]) +np.sum(Emissions_trans_nongrid[iyear,:]) +np.sum(Emissions_ref_nongrid[iyear,:]) 

    
    summary_emi = EPA_emi_total_Petr.iloc[0,iyear+1]+EPA_emi_total_Petr.iloc[1,iyear+1]+EPA_emi_total_Petr.iloc[2,iyear+1]+\
                                      EPA_emi_total_Petr.iloc[3,iyear+1]
    if DEBUG ==1:
        print(calc_emi)
        print(calc_emi2)
        print(calc_emi3)
        print(summary_emi)
    
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')
 

-------------
## Step 5. Write gridded (0.1⁰x0.1⁰) data to netCDF files.
-------------

In [None]:
# Initialize and write netCDF files (flux units in molec/s/cm2)

data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_monthly_outputfile, netCDF_description_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual petroleum system fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)
print('')

nc_out = Dataset(gridded_monthly_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Total
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly petroleum system fluxes written to file: {}" .format(os.getcwd())+gridded_monthly_outputfile)
print('')


#Write Exploration Data
data_IO_fn.initialize_netCDF(gridded_expl_outputfile, netCDF_description_expl, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_monthly_expl_outputfile, netCDF_description_expl_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_expl_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Expl_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual exploration fluxes written to file: {}" .format(os.getcwd())+gridded_expl_outputfile)
print('')

nc_out = Dataset(gridded_monthly_expl_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Expl
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly exploration fluxes written to file: {}" .format(os.getcwd())+gridded_monthly_expl_outputfile)
print('')


#Write Production Data
data_IO_fn.initialize_netCDF(gridded_prod_outputfile, netCDF_description_prod, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_monthly_prod_outputfile, netCDF_description_prod_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_prod_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Prod_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual production fluxes written to file: {}" .format(os.getcwd())+gridded_prod_outputfile)
print('')

nc_out = Dataset(gridded_monthly_prod_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Prod
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly production fluxes written to file: {}" .format(os.getcwd())+gridded_monthly_prod_outputfile)
print('')



#Write Transport Data
data_IO_fn.initialize_netCDF(gridded_trans_outputfile, netCDF_description_trans, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_monthly_trans_outputfile, netCDF_description_trans_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_trans_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Trans_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual transport fluxes written to file: {}" .format(os.getcwd())+gridded_trans_outputfile)
print('')

nc_out = Dataset(gridded_monthly_trans_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Trans
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly transport fluxes written to file: {}" .format(os.getcwd())+gridded_monthly_trans_outputfile)
print('')


#Write Refining Data
data_IO_fn.initialize_netCDF(gridded_ref_outputfile, netCDF_description_ref, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_monthly_ref_outputfile, netCDF_description_ref_m, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_ref_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Ref_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual refining fluxes written to file: {}" .format(os.getcwd())+gridded_ref_outputfile)
print('')

nc_out = Dataset(gridded_monthly_ref_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Ref
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly refining fluxes written to file: {}" .format(os.getcwd())+gridded_monthly_ref_outputfile)
print('')

-------------
## Step 6. Plot Data
-------------

#### 6.1 Plot Annual Emission Fluxes

##### 6.1.1 Total Petroleum System Emissions

In [None]:
# Plot annual emissions for each year (function converts from molec/cm2/s to Mg/year/km2)
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str, scale_max, save_flag, save_outfile)

##### 6.1.2 Exploration/Production Emissions

In [None]:
# Plot annual emissions for each year (function converts from molec/cm2/s to Mg/year/km2)
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Expl_annual, Lat_01, Lon_01, year_range, title_expl_str, scale_max,save_flag,save_outfile)

scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Prod_annual, Lat_01, Lon_01, year_range, title_prod_str, scale_max,save_flag,save_outfile)

##### 6.1.3 Transport Emissions

In [None]:
# Plot annual emissions for each year (function converts from molec/cm2/s to Mg/year/km2)
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Trans_annual, Lat_01, Lon_01, year_range, title_trans_str, scale_max,save_flag, save_outfile)

##### 6.1.4 Refining Emissions

In [None]:
# Plot annual emissions for each year (function converts from molec/cm2/s to Mg/year/km2)
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Ref_annual, Lat_01, Lon_01, year_range, title_ref_str, scale_max,save_flag, save_outfile)

#### 6.2 Plot Difference Between First and Last Inventory Year

##### 6.2.1 Total Petroleum System Emissions

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag, save_outfile)

##### 6.2.2 Exploration/Production Emissions

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Expl_annual, Lat_01, Lon_01, year_range, title_diff_expl_str,save_flag, save_outfile)

# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Prod_annual, Lat_01, Lon_01, year_range, title_diff_prod_str,save_flag, save_outfile)

##### 6.2.2 Transport Emissions

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Trans_annual, Lat_01, Lon_01, year_range, title_diff_trans_str,save_flag, save_outfile)

##### 6.2.2 Refining Emissions

In [None]:
# Plot difference between last and first year
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Ref_annual, Lat_01, Lon_01, year_range, title_diff_ref_str,save_flag, save_outfile)

#### 6.3 Plot Key Proxy Data

In [None]:
#Map (well location) heatmap

# Activity_Map = 0.1x0.1 map of activity data (counts or absolute units)
# Plot_Frac    = 0 or 1 (0= plot activity data in absolute counts, 1= plot fractional activity data)
# Lat          = 0.1 degree Lat values (select range)
# Lon          = 0.1 degree Lon values (select range)
# year_range   = array of inventory years
# title_str    = title of map
# legend_str   = title of legend
# scale_max    = maximum of color scale

Map_output = np.zeros([len(Lat_01),len(Lon_01),num_years])
for iyear in np.arange(0,num_years):
    for imonth in np.arange(0,num_months):    
        Map_output[:,:,iyear] += Map_Allwell[:,:,iyear,imonth]  

            
Activity_Map = Map_output
Plot_Frac = 1
Lat = Lat_01
Lon = Lon_01
year_range = year_range
title_str2 = "Proxy - Oil Well Locations"
legend_str = "Annual Fraction of National Well Population"
scale_max = 0.001

for iyear in np.arange(6,7):#len(year_range)): 
    my_cmap = copy(plt.cm.get_cmap('rainbow',lut=3000))
    my_cmap._init()
    slopen = 200
    alphas_slope = np.abs(np.linspace(0, 1.0, slopen))
    alphas_stable = np.ones(3003-slopen)
    alphas = np.concatenate((alphas_slope, alphas_stable))
    my_cmap._lut[:,-1] = alphas
    my_cmap.set_under('gray', alpha=0)
    
    Lon_cor = Lon[50:632]-0.05
    Lat_cor = Lat[43:300]-0.05
    
    xpoints = Lon_cor
    ypoints = Lat_cor
    yp,xp = np.meshgrid(ypoints,xpoints)
    
    if np.shape(Activity_Map)[0] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[iyear,43:300,50:632]/np.sum(Activity_Map[iyear,:,:])
        else:
            zp = Activity_Map[iyear,43:300,50:632]
    elif np.shape(Activity_Map)[2] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[43:300,50:632,iyear]/np.sum(Activity_Map[:,:,iyear])
        else: 
            zp = Activity_Map[43:300,50:632,iyear]
    #zp = zp/float(10**6 * Avogadro) * (year_days * 24 * 60 * 60) * Molarch4 * float(1e10)
    
    fig, ax = plt.subplots(dpi=300)
    m = Basemap(llcrnrlon=xp.min(), llcrnrlat=yp.min(), urcrnrlon=xp.max(),
                urcrnrlat=yp.max(), projection='merc', resolution='h', area_thresh=5000)
    m.drawmapboundary(fill_color='Azure')
    m.fillcontinents(color='FloralWhite', lake_color='Azure',zorder=1)
    m.drawcoastlines(linewidth=0.5,zorder=3)
    m.drawstates(linewidth=0.25,zorder=3)
    m.drawcountries(linewidth=0.5,zorder=3)

    xpi,ypi = m(xp,yp)
    plot = m.pcolor(xpi,ypi,zp.transpose(), cmap=my_cmap, vmin=10**-15, vmax=scale_max, snap=True,zorder=2)
    #plot = m.scatter(xpi,ypi,s=20,c=zp.transpose(),cmap=my_cmap,zorder=2,vmin = 10**-15,snap = True,vmax = scale_max)
    cb = m.colorbar(plot, location = "bottom", pad = "1%")        
    tick_locator = ticker.MaxNLocator(nbins=5)
    cb.locator = tick_locator
    cb.update_ticks()
    
    cb.ax.set_xlabel(legend_str,fontsize=10)
    cb.ax.tick_params(labelsize=10)
    Titlestring = str(year_range[iyear])+' '+title_str2
    plt.title(Titlestring, fontsize=14);
    plt.show();

In [None]:
# Plot Refinery Locations

# Activity_Map = 0.1x0.1 map of activity data (counts or absolute units)
# Plot_Frac    = 0 or 1 (0= plot activity data in absolute counts, 1= plot fractional activity data)
# Lat          = 0.1 degree Lat values (select range)
# Lon          = 0.1 degree Lon values (select range)
# year_range   = array of inventory years
# title_str    = title of map
# legend_str   = title of legend
# scale_max    = maximum of color scale

Activity_Map = Map_Refineries
Plot_Frac = 1
Lat = Lat_01
Lon = Lon_01
year_range = year_range
title_str2 = "Proxy - Petroleum Refinery Emissions"
legend_str = "Annual Fraction of National Refinery Emissions"
scale_max = 0.05

for iyear in np.arange(0,len(year_range)): 
    my_cmap = copy(plt.cm.get_cmap('rainbow',lut=3000))
    my_cmap._init()
    slopen = 200
    alphas_slope = np.abs(np.linspace(0, 1.0, slopen))
    alphas_stable = np.ones(3003-slopen)
    alphas = np.concatenate((alphas_slope, alphas_stable))
    my_cmap._lut[:,-1] = alphas
    my_cmap.set_under('gray', alpha=0)
    
    Lon_cor = Lon[50:632]-0.05
    Lat_cor = Lat[43:300]-0.05
    
    xpoints = Lon_cor
    ypoints = Lat_cor
    yp,xp = np.meshgrid(ypoints,xpoints)
    
    if np.shape(Activity_Map)[0] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[iyear,43:300,50:632]/np.sum(Activity_Map[iyear,:,:])
        else:
            zp = Activity_Map[iyear,43:300,50:632]
    elif np.shape(Activity_Map)[2] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[43:300,50:632,iyear]/np.sum(Activity_Map[:,:,iyear])
        else: 
            zp = Activity_Map[43:300,50:632,iyear]
    
    fig, ax = plt.subplots(dpi=300)
    m = Basemap(llcrnrlon=xp.min(), llcrnrlat=yp.min(), urcrnrlon=xp.max(),
                urcrnrlat=yp.max(), projection='merc', resolution='h', area_thresh=5000)
    m.drawmapboundary(fill_color='Azure')
    m.fillcontinents(color='FloralWhite', lake_color='Azure',zorder=1)
    m.drawcoastlines(linewidth=0.5,zorder=3)
    m.drawstates(linewidth=0.25,zorder=3)
    m.drawcountries(linewidth=0.5,zorder=3)
        
    
    xpi,ypi = m(xp,yp)
    plot = m.scatter(xpi,ypi,s=20,c=zp.transpose(),cmap=my_cmap,zorder=2,vmin = 10**-15,snap = True,vmax = scale_max)
    cb = m.colorbar(plot, location = "bottom", pad = "1%")        
    tick_locator = ticker.MaxNLocator(nbins=5)
    cb.locator = tick_locator
    cb.update_ticks()
    
    cb.ax.set_xlabel(legend_str,fontsize=10)
    cb.ax.tick_params(labelsize=10)
    Titlestring = str(year_range[iyear])+' '+title_str2
    plt.title(Titlestring, fontsize=14);
    plt.show();

In [None]:
ct = datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_1B2a_Petroleum_Systems_Production: COMPLETE **')