# Gridded EPA Methane Inventory
## Category: 1B2b Natural Gas Production

***
#### Authors: 
Joannes D. Maasakkers, Erin E. McDuffie
#### Date Last Updated: 
see Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual and monthly gridded methane emission fluxes (molec./cm2/s) from natural gas systems exploration and production segments in the CONUS region between 2012-2018. 
#### Summary & Notes:
EPA GHGI gas exploration and production emissions are read in from the GHGI Natural Gas Systems workbook at the NEMS level, where available (national otherwise). Emissions are then distributed onto a 0.1°x0.1° grid as a function of emission group. The activity/proxy data used to spatially distribute emissions from each group include well locations and production levels from Enverus (DI and Prism), EIA state-level Lease Condensate production data, NEI 4km grid data (for the states of IL and IN), and BOEM GOADS platform emissions and location data for Federal Offshore emissions. Emissions data are calculated as a function of month, largely determined by whether a well was producing in a particular month or not (from Enverus). Some proxy data are only available with annual data and are allocated evenly across each month. Both monthly and annual emission fluxes (molec./cm2/s) are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder.
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory & print last update time
import os
import time
modtime = os.path.getmtime('./1B2b_Natural_Gas_Systems_Production.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import pyodbc
import PyPDF2 as pypdf
import tabula as tb
import shapefile as shp
from datetime import datetime
from copy import copy
import pyproj 

# Import additional modules
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
#print(module_path)
if module_path not in sys.path:
    sys.path.append(module_path)

# Load functions
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
## SPECIFY RECALS ##

# Specify which sections to re-calculate or load from previously saved arrays
# for time saving purposes
#0 = load from saved files, 1 = re-calculate

# 1) ReCalc Enverus Production Data?
ReCalc_Enverus =1

# 2) ReCalc Offshore GOADS Data
ReCalc_GOADS = 0

# 3) Re-Calc NEI Indiana and Illinois data?
ReCalc_NEI = 0

# 4) ReCalc Lease Condensate Data?
ReCalc_Condensates = 0

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
#pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
#Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]
print(globalinputlocation)

# EPA Inventory Data
EPA_NG_prod_inputfile = globalinputlocation+'GHGI/Ch3_Energy/NaturalGasSystems_1990-2018_GHGI_2020-04-11.xlsx'

#proxy mapping file
NG_Mapping_inputfile = './InputData/NaturalGas_Production_ProxyMapping.xlsx'

#NEI grid reference
NEI_grid_ref_inputfile = globalinputlocation+'Gridded/NEI_Reference_Grid_LCC_to_WGS84_latlon.shp'

#ERG/NEI Spatial Surrogate Data
ERG_NEI_inputloc = globalinputlocation+'NEI/ERG_ILINData/CONUS_SA_FILES_'
ERG_NEI_inputloc_2018 = globalinputlocation+'NEI/ERG_ILINData/IL_IN_ALLOCATED_WELL_LEVEL_DATA_2018_2019/IL_IN_WELL_LEVEL_DATA.accdb'

#ERG Processed Well Count and Production Notebook
Enverus_WellCounts_inputfile = globalinputlocation+'Enverus/Enverus DrillingInfo Processing - Well Counts_2021-03-17.xlsx'
Enverus_WellProd_inputfile = globalinputlocation+'Enverus/Enverus DrillingInfo Processing - Well Prod_2021-03-17.xlsx'

#Activity Data
Enverus_Prism_inputdata_2019 = globalinputlocation+ 'Enverus/Production/prism_monthly_2019_110221.csv'
Enverus_Prism_inputdata_2018 = globalinputlocation+ 'Enverus/Production/prism_monthly_2018_110221.csv'
Enverus_Prism_inputdata_2017 = globalinputlocation+ 'Enverus/Production/prism_monthly_2017_110221.csv'
Enverus_Prism_inputdata_2016 = globalinputlocation+ 'Enverus/Production/prism_monthly_2016_110221.csv'
Enverus_Prism_inputdata_2015 = globalinputlocation+ 'Enverus/Production/prism_monthly_2015_110221.csv'
Enverus_Prism_inputdata_2014 = globalinputlocation+ 'Enverus/Production/prism_monthly_2014_110221.csv'
Enverus_Prism_inputdata_2013 = globalinputlocation+ 'Enverus/Production/prism_monthly_2013_110221.csv'
Enverus_Prism_inputdata_2012 = globalinputlocation+ 'Enverus/Production/prism_monthly_2012_110221.csv'

Enverus_DI_inputdata_2019 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2019_102621.csv'
Enverus_DI_inputdata_2018 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2018_102621.csv'
Enverus_DI_inputdata_2017 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2017_102621.csv'
Enverus_DI_inputdata_2016 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2016_102621.csv'
Enverus_DI_inputdata_2015 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2015_102621.csv'
Enverus_DI_inputdata_2014 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2014_102621.csv'
Enverus_DI_inputdata_2013 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2013_102621.csv'
Enverus_DI_inputdata_2012 = globalinputlocation+ 'Enverus/Production/didsk_monthly_2012_102621.csv'

Enverus_NG_GBstations_inputfile = globalinputlocation+ 'Enverus/Midstream/Gathering_CompressorStations_CONUS_onshore_wgs84.xls'
Enverus_NG_GBpipeline_inputfile = globalinputlocation+ 'Enverus/Midstream/Gathering_pipelines_CONUS_onshore_WGS84_01x01.xls'
AKHI_pipelines_shp = globalinputlocation+ 'Enverus/Midstream/Gathering_pipelines_AKHI_wgs84.shp'
CONUS_pipelines_shp = globalinputlocation+ 'Enverus/Midstream/Gathering_pipelines_CONUS_onshore_wgs84.shp'

# Lease Condensate Data
EIA_Condensate_inputfile = './InputData/NG_PROD_LC_S1_A.xls'

# Offshore GOADS Data
GOADS_11_inputfile = globalinputlocation+'BOEM/2011_Gulfwide_Platform_Inventory.accdb'
GOADS_14_inputfile = globalinputlocation+'BOEM/2014_Gulfwide_Platform_Inventory_20161102.accdb'
GOADS_17_inputfile = globalinputlocation+'BOEM/2017_Gulfwide_Platform_Inventory_20190705_CAP_GHG.accdb'
ERG_GOADSEmissions_inputfile = globalinputlocation+'BOEM/BOEM GEI Emissions Data_EmissionSource_2020-03-11.xlsx'

#OUTPUT FILES
gridded_prod_outputfile = '../Final_Gridded_Data/EPA_v2_1B2b_Natural_Gas_Production.nc'
gridded_prod_monthly_outputfile = '../Final_Gridded_Data/EPA_v2_1B2b_Natural_Gas_Production_Month.nc'
netCDF_prod_description = 'Gridded EPA Inventory - Gas Production Emissions - IPCC Source Category 1B2b'
title_prod_str = "EPA methane emissions from gas production"
title_prod_diff_str = "Emissions from gas production difference: 2018-2012"

gridded_expl_outputfile = '../Final_Gridded_Data/EPA_v2_1B2b_Natural_Gas_Exploration.nc'
gridded_expl_monthly_outputfile = '../Final_Gridded_Data/EPA_v2_1B2b_Natural_Gas_Exploration_Month.nc'
netCDF_expl_description = 'Gridded EPA Inventory - Gas Exploration Emissions - IPCC Source Category 1B2b'
title_expl_str = "EPA methane emissions from gas exploration"
title_expl_diff_str = "Emissions from gas exploration difference: 2018-2012"


#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/NG_Production_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_tag = ['01','02','03','04','05','06','07','08','09','10','11','12']
month_dict = {'January':1, 'February':2,'March':3,'April':4,'May':5,'June':6, 'July':7,'August':8,'September':9,'October':10,\
             'November':11,'December':12}

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)
num_regions = 7

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data, NEMS definitions, and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict, abbr_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:3]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
#state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[1:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2
#state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)
del area_map, lat001, lon001, global_filenames

# Print time
ct = datetime.now() 
print("current time:", ct) 

In [None]:
#Make NEMS State classifications
# Treat NM and TX separately since these states cover multiple NEMS regions

#0 = NE, 1 = MC, 2 = RM, 3 = SW, 4 = WC, 5 = GC, 6 = offshore
NEMS_State = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Drivers Production", usecols = "B:H", skiprows = 5, nrows = 39)
NEMS_State = NEMS_State.fillna(0)
NM_idx = NEMS_State.index[NEMS_State['State'].str.contains('New Mexico')].tolist()
TX_idx = NEMS_State.index[NEMS_State['State'].str.contains('Texas')].tolist()
idx = NM_idx+TX_idx
NEMS_State= NEMS_State.drop(NEMS_State.index[idx])
NEMS_State.reset_index(drop=True,inplace=True)
#print(NEMS_State)

NEMS_State['NEMS'] = 0
NEMS_State['Ansi'] = 0

for istate in np.arange(len(NEMS_State)):
    if NEMS_State['NE'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 0
        NEMS_State.loc[istate,'NEMS_Region'] = 'North East'
    elif NEMS_State['MC'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 1
        NEMS_State.loc[istate,'NEMS_Region'] = 'Midcontinent'
    elif NEMS_State['RM'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 2
        NEMS_State.loc[istate,'NEMS_Region'] = 'Rocky Mountain'
    elif NEMS_State['SW'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 3
        NEMS_State.loc[istate,'NEMS_Region'] = 'South West'
    elif NEMS_State['WC'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 4
        NEMS_State.loc[istate,'NEMS_Region'] = 'West Coast'
    elif NEMS_State['GC'][istate] == 1:
        NEMS_State.loc[istate,'NEMS'] = 5
        NEMS_State.loc[istate,'NEMS_Region'] = 'Gulf Coast'
    else:
        print('Error for', NEMS_State['State'][istate])
    NEMS_State.loc[istate,'Ansi'] = list(abbr_dict.keys())[list(abbr_dict.values()).index(name_dict[NEMS_State['State'][istate]])]
    
print(NEMS_State)

NEMS_dict = {'North East':0, 'Midcontinent':1,'Rocky Mountain':2,'South West':3,'West Coast':4,'Gulf Coast':5}


-------------
## Step 2: Read-in and Format Proxy Data
-------------

### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(NG_Mapping_inputfile, sheet_name = "GHGI Map - E&P", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_prod_map = pd.read_excel(NG_Mapping_inputfile, sheet_name = "GHGI Map - E&P", usecols = "A:B", skiprows = 2, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_prod_map = ghgi_prod_map[ghgi_prod_map['GHGI_Emi_Group'] != 'na']
ghgi_prod_map = ghgi_prod_map[ghgi_prod_map['GHGI_Emi_Group'].notna()]
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\(","")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\)","")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\[","")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r"\]","")
ghgi_prod_map['GHGI_Source']= ghgi_prod_map['GHGI_Source'].str.replace(r'"',"")
ghgi_prod_map.reset_index(inplace=True, drop=True)
display(ghgi_prod_map)

#load emission group - proxy map
names = pd.read_excel(NG_Mapping_inputfile, sheet_name = "Proxy Map - E&P", usecols = "A:D",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_prod_map = pd.read_excel(NG_Mapping_inputfile, sheet_name = "Proxy Map - E&P", usecols = "A:D", skiprows = 1, names = colnames)
display((proxy_prod_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_prod_map)):
    #print(igroup)
    if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
        if proxy_prod_map.loc[igroup,'NEMS_Data'] ==1:
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([num_regions-1,len(Lat_01),len(Lon_01),num_years,num_months])
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_regions-1,num_years,num_months])
            vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_regions-1,len(Lat_01),len(Lon_01),num_years,num_months])
        else:
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
            vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])#

    elif proxy_prod_map.loc[igroup, 'Month_Flag'] == 0:
        if proxy_prod_map.loc[igroup,'NEMS_Data'] ==1:
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([num_regions-1,len(Lat_01),len(Lon_01),num_years])
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_regions-1,num_years])
            vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_regions-1,len(Lat_01),len(Lon_01),num_years])
        else:
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
            vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
            vars()[ghgi_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        

emi_group_names = np.unique(ghgi_prod_map['GHGI_Emi_Group'])
#print(emi_group_names)
#print(np.unique(proxy_prod_map['GHGI_Emi_Group']))
print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_prod_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')

#### 2.1 State Condensate Data

In [None]:
#Read in the EIA condensate file (A1) and then format for later use

## 1) Read the condensate file
state_condensates = pd.read_excel(EIA_Condensate_inputfile, skiprows=2, sheet_name = 'Data 1')
state_condensates['Date'] = state_condensates['Date'].astype(str)
state_condensates['Date'] = [state_condensates['Date'][i][0:4] for i in np.arange(len(state_condensates))]   #extract the year
state_condensates = state_condensates.T
state_condensates.columns = state_condensates.iloc[0]
state_condensates = state_condensates.drop(state_condensates.index[[0]])
Names_cond = state_condensates.index.values.tolist()
state_condensates.reset_index(drop=True,inplace=True)
state_condensates['State'] = Names_cond
state_condensates['NEMS'] = 0
state_condensates = state_condensates.fillna(0)
#print(state_condensates)

# 2) drop extra regions that are not state-specific
idx1 = state_condensates.index[state_condensates['State'].str.contains('Calif--')].tolist()
idx2 = state_condensates.index[state_condensates['State'].str.contains('California--State')].tolist()
idx3 = state_condensates.index[state_condensates['State'].str.contains('Louisiana--')].tolist()
idx4 = state_condensates.index[state_condensates['State'].str.contains('New Mexico ')].tolist()
idx5 = state_condensates.index[state_condensates['State'].str.contains('Texas ')].tolist()
idx6 = state_condensates.index[state_condensates['State'].str.contains('Federal Offshore')].tolist()
idx7 = state_condensates.index[state_condensates['State'].str.contains('Lower 48')].tolist()
idx8 = state_condensates.index[state_condensates['State'].str.contains('Utah and Wyoming')].tolist()
idx = idx1+idx2+idx3+idx4+idx5+idx6+idx7+idx8
state_condensates= state_condensates.drop(state_condensates.index[idx])
state_condensates.reset_index(drop=True,inplace=True)


# 3) add State names and NEMS regions to each entry
for istate in np.arange(0,len(State_ANSI)):          
        if state_condensates['State'].str.contains(State_ANSI['name'][istate]).any() and \
            State_ANSI['name'][istate] != 'New Mexico' and State_ANSI['name'][istate] != 'Texas':
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate]))[0][:]
            nems_match = np.where(NEMS_State['State']==State_ANSI['name'][istate])[0][0]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = NEMS_State['NEMS'][nems_match] 
        elif State_ANSI['name'][istate] == 'New Mexico':
            #SouthWest
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate] and '--East'))[0][:]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = 3
            #Rocky Mountain
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate] and '--West'))[0][:]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = 2
        elif State_ANSI['name'][istate] == 'Texas':
            #0 = NE, 1 = MC, 2 = RM, 3 = SW, 4 = WC, 5 = GC
            #Gulf Coast
            match_str = ['District 1 ','District 2','District 3','District 4','District 5','District 6']
            pattern = '|'.join(match_str)
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate] and pattern))[0][:]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = 5
            #SouthWest
            match_str = ['District 7','District 8','District 9']
            pattern = '|'.join(match_str)
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate] and pattern))[0][:]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = 3
            #Mid-Continent
            match_str = ['District 10']
            pattern = '|'.join(match_str)
            match_state = np.where(state_condensates['State'].str.contains(State_ANSI['name'][istate] and pattern))[0][:]
            state_condensates.loc[match_state,'State'] = State_ANSI['name'][istate]
            state_condensates.loc[match_state,'NEMS'] = 1
            #Other Offshore - delete
            idx = state_condensates.index[state_condensates['State'].str.contains(State_ANSI['name'][istate] and 'Offshore')].tolist()
            state_condensates= state_condensates.drop(state_condensates.index[idx])
            state_condensates.reset_index(drop=True,inplace=True)


#3.5) Calculate Misc. barrels
Sum_LeaseC = np.sum(state_condensates.iloc[1:,:-2]) #sum all years of data (not including regions)
Misc_LeaseC = state_condensates.iloc[0,:-2] - Sum_LeaseC[:]
#Drop extra rows
idx1 = state_condensates.index[state_condensates['State'].str.contains('Gulf')].tolist()
idx1 += state_condensates.index[state_condensates['State'].str.contains('U.S')].tolist()
state_condensates= state_condensates.drop(state_condensates.index[idx1])

NE_factor = 0.57
MC_factor = 0.14
RM_factor = 0.21
WC_factor = 0.07

state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*RM_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Arizona'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '2'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Illinois'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Indiana'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Maryland'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*MC_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Missouri'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '1'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*MC_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Nebraska'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '1'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*RM_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Nevada'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '2'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'New York'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Ohio'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*WC_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Oregon'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '4'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Pennsylvania'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*RM_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'South Dakota'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '2'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*0.21
state_condensates.loc[state_condensates.shape[0],'State'] = 'Arizona'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '2'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Tennessee'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.loc[state_condensates.shape[0]+1] = Misc_LeaseC*NE_factor
state_condensates.loc[state_condensates.shape[0],'State'] = 'Virginia'
state_condensates.loc[state_condensates.shape[0],'NEMS'] = '0'
state_condensates.reset_index(drop=True,inplace=True)

#print(state_condensates)

# 4) Reformat into time series of condensate values as a function of state and NEMS region 
start_year_idx = state_condensates.columns.get_loc(str(start_year))
end_year_idx = state_condensates.columns.get_loc(str(end_year))+1

#Create condensates array
state_cond_prod = np.zeros([num_regions, len(State_ANSI), num_years])

for irow in np.arange(0,len(state_condensates)):
    istate = np.where(State_ANSI['name'] == state_condensates['State'][irow])[0][0]
    inems = int(state_condensates['NEMS'][irow])

    if state_condensates['State'][irow] == 'Texas' or state_condensates['State'][irow] =='New Mexico':
        state_cond_prod[inems,istate,:] = state_cond_prod[inems,istate,:] + state_condensates.iloc[irow,start_year_idx:end_year_idx].values
    else:
        state_cond_prod[inems,istate,:] = state_cond_prod[inems,istate,:] + state_condensates.iloc[irow,start_year_idx:end_year_idx].values

print('QA/QC: Check that all data is accounted for in the reformatted array')
diff = np.sum(state_condensates.iloc[:,start_year_idx:end_year_idx].values)-np.sum(state_cond_prod)
if diff ==0:
    print('PASS')
else:
    print('FAIL: Check condensate array')

### 2.2. Read-In GOADS Emissions & Make Map Array

#### Step 2.1.1 - Initialize arrays

In [None]:
#initialize GOADS maps array (will be assigned to proxy map variable later)
Map_GOADSmajor_emissions = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Map_GOADSminor_emissions = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])

#### 2.2.1. 2011 Data

In [None]:
#Read In data for 2011 (use for 2012, 2014, and 2017). Interpolate between for missing years
# goal: populating Map_FedGOM_Offshore to allocate federal offshore GOM emissions (state GOM allocated with Enverus)

#Only run if need to save new file (takes a few hours to run)
if ReCalc_GOADS ==1:
    ## 2011
    # Read In and Format 2011 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_11_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_locations = pd.read_sql("SELECT * FROM tblPointER", conn)
    GOADS_emissions = pd.read_sql("SELECT * FROM tblPointEM", conn)
    conn.close()

    # Format Location Data
    GOADS_locations = GOADS_locations[["strStateFacilityIdentifier","strEmissionReleasePointID","dblXCoordinate","dblYCoordinate"]]
    #Create platform-by-platform file
    GOADS_locations_Unique = pd.DataFrame({'strStateFacilityIdentifier':GOADS_locations['strStateFacilityIdentifier'].unique()})
    GOADS_locations_Unique['lon'] = 0.0
    GOADS_locations_Unique['lat'] = 0.0
    GOADS_locations_Unique['strEmissionReleasePointID'] = ''

    for iplatform in np.arange(len(GOADS_locations_Unique)):
        match_platform = np.where(GOADS_locations['strStateFacilityIdentifier'] == GOADS_locations_Unique['strStateFacilityIdentifier'][iplatform])[0][0]
        GOADS_locations_Unique.loc[iplatform,'lon',] = GOADS_locations['dblXCoordinate'][match_platform]
        GOADS_locations_Unique.loc[iplatform,'lat',] = GOADS_locations['dblYCoordinate'][match_platform]
        GOADS_locations_Unique.loc[iplatform,'strEmissionReleasePointID'] = GOADS_locations['strEmissionReleasePointID'][match_platform][:3]

    GOADS_locations_Unique.reset_index(inplace=True, drop=True)
    #display(GOADS_locations_Unique)

    #print(GOADS_emissions.columns)
    #Format Emissions Data (clean lease data string)
    GOADS_emissions = GOADS_emissions[["strStateFacilityIdentifier","strPollutantCode","dblEmissionNumericValue","BOEM-MONTH",
                                  "BOEM-LEASE_NUM","BOEM-COMPLEX_ID"]]
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('OCS','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('-','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace(' ','')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G1477','G01477')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G73','00073')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G605','00605')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G72','00072')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G599','00599')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G7155','G07155')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G2357','G02357')
    GOADS_emissions['BOEM-LEASE_NUM'] = GOADS_emissions['BOEM-LEASE_NUM'].str.replace('G4921','G04921')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['dblEmissionNumericValue'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['strPollutantCode'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #display(GOADS_emissions)

    # Use ERG Preprocessed data to determine if major or minor and oil or gas
    ERG_complex_crosswalk = pd.read_excel(ERG_GOADSEmissions_inputfile, sheet_name = "Complex Emissions by Source", usecols = "AJ:AM", nrows = 11143)

    # add data to map array, for the closest year to 2011
    year_diff = [abs(x - 2011) for x in year_range]
    iyear = year_diff.index(min(year_diff))

    #assign oil vs gas by lease/complex ID
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['BOEM-COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2011))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions['BOEM-COMPLEX_ID'][istruc])

        # for all gas platforms, match the platform to the emissions
        if GOADS_emissions['LEASE_TYPE'][istruc] =='Gas':
            match_platform = np.where(GOADS_locations_Unique.strStateFacilityIdentifier==GOADS_emissions['strStateFacilityIdentifier'][istruc])[0][0]
            ilat = int((GOADS_locations_Unique['lat'][match_platform] - Lat_low)/Res01)
            ilon = int((GOADS_locations_Unique['lon'][match_platform] - Lon_left)/Res01)
            imonth = GOADS_emissions['BOEM-MONTH'][istruc]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            
            
    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Gas']
    num_majcplx = majcplx['BOEM-COMPLEX_ID'].unique()
    #print(np.shape(num_majcplx))
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Gas']
    num_mincplx = mincplx['BOEM-COMPLEX_ID'].unique()
    #print(np.size(num_mincplx))            
    del GOADS_emissions
    print('Number of Major Gas Complexes: ',(np.size(num_majcplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Gas Complexes: ',(np.size(num_mincplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))

#### 2.2.2. 2014 Data

In [None]:
## 2014

if ReCalc_GOADS ==1:
    #Read In and Format 2014 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_14_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_emissions = pd.read_sql("SELECT * FROM 2014_Gulfwide_Platform_20161102", conn)
    conn.close()

    GOADS_emissions = GOADS_emissions[["PLATFORM_ID","X_COORDINATE","Y_COORDINATE","POLLUTANT_CODE","EMISSIONS_VALUE","MONTH",\
                                  "LEASE_NUMBER","COMPLEX_ID"]]
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('OCS','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('-','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace(' ','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1477','G01477')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G73','00073')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G605','00605')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G72','00072')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G599','00599')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G7155','G07155')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G2357','G02357')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G4921','G04921')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2839','G02839')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO5761','G05761')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO0026','00026')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO3194','G03194')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1034','G01034')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0456','G00456')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0060','G00060')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['EMISSIONS_VALUE'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['POLLUTANT_CODE'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #assign oil vs gas by lease/complex ID
    # add data to map array, for the closest year to 2014
    year_diff = [abs(x - 2014) for x in year_range]
    iyear = year_diff.index(min(year_diff))
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2014))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions['COMPLEX_ID'][istruc])
    #display(GOADS_emissions)

    #for iplatform in np.arange(len(GOADS_emissions)):
        if GOADS_emissions['LEASE_TYPE'][istruc] =='Gas':
            #then for all oil platforms, match the platform to the emissions
            ilat = int((GOADS_emissions['Y_COORDINATE'][istruc] - Lat_low)/Res01)
            ilon = int((GOADS_emissions['X_COORDINATE'][istruc] - Lon_left)/Res01)
            month_str = GOADS_emissions['MONTH'][istruc]             
            imonth = month_dict[GOADS_emissions['MONTH'][istruc]]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]

    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Gas']
    num_majcplx = majcplx['COMPLEX_ID'].unique()
    #print(np.shape(num_majcplx))
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Gas']
    num_mincplx = mincplx['COMPLEX_ID'].unique()
    #print(np.size(num_mincplx))            
    del GOADS_emissions
    print('Number of Major Gas Complexes: ',(np.size(num_majcplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Gas Complexes: ',(np.size(num_mincplx)))
    print('Emissions (Tg): ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))

#### 2.2.3. 2017 Data

In [None]:
## 2017
if ReCalc_GOADS ==1:
    #Read In and Format 2017 BEOM Data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+GOADS_17_inputfile+';'''
    conn = pyodbc.connect(driver_str)
    GOADS_emissions = pd.read_sql("SELECT * FROM 2017_Gulfwide_Platform_20190705_CAP_GHG", conn)
    conn.close()

    GOADS_emissions = GOADS_emissions[["PLATFORM_ID","X_COORDINATE","Y_COORDINATE","POLLUTANT_CODE","EMISSIONS_VALUE","Month",\
                                   "LEASE_NUMBER","COMPLEX_ID"]]
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('OCS','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('-','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace(' ','')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1477','G01477')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G73','00073')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G605','00605')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G72','00072')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G599','00599')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G7155','G07155')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G2357','G02357')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G4921','G04921')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2839','G02839')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO2893','G02893')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO5761','G05761')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO0026','00026')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('GO3194','G03194')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G1034','G01034')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0456','G00456')
    GOADS_emissions['LEASE_NUMBER'] = GOADS_emissions['LEASE_NUMBER'].str.replace('G0060','G00060')
    GOADS_emissions['Emis_tg'] = 0.0
    GOADS_emissions['Emis_tg'] = 9.0718474E-7 * GOADS_emissions['EMISSIONS_VALUE'] #convert short tons to Tg
    GOADS_emissions = GOADS_emissions[GOADS_emissions['POLLUTANT_CODE'] == 'CH4']
    GOADS_emissions.reset_index(inplace=True, drop=True)

    #assign oil vs gas by lease/complex ID
    # add data to map array, for the closest year to 2014
    year_diff = [abs(x - 2017) for x in year_range]
    iyear = year_diff.index(min(year_diff))
    GOADS_emissions['LEASE_TYPE'] =''
    GOADS_emissions['MAJOR_STRUC'] =''
    for istruc in np.arange(0,len(GOADS_emissions)):
        imatch = np.where(np.logical_and(ERG_complex_crosswalk['BOEM COMPLEX ID.2']==int(GOADS_emissions['COMPLEX_ID'][istruc]),\
                            ERG_complex_crosswalk['Year.2'] == 2017))
        if np.size(imatch) >0:
            imatch = imatch[0][0]
            GOADS_emissions.loc[istruc,'LEASE_TYPE'] = ERG_complex_crosswalk['Oil Gas Defn FINAL.1'][imatch]
            GOADS_emissions.loc[istruc,'MAJOR_STRUC'] = ERG_complex_crosswalk['Major / Minor.1'][imatch]
        else:
            print(istruc, GOADS_emissions["COMPLEX_ID"][istruc])
    #display(GOADS_emissions)

    #for iplatform in np.arange(len(GOADS_emissions)):
        if GOADS_emissions['LEASE_TYPE'][istruc] =='Gas':
            #then for all oil platforms, match the platform to the emissions
            ilat = int((GOADS_emissions['Y_COORDINATE'][istruc] - Lat_low)/Res01)
            ilon = int((GOADS_emissions['X_COORDINATE'][istruc] - Lon_left)/Res01)
            imonth = month_dict[GOADS_emissions['Month'][istruc]]-1 #dict is 1-12, not 0-11
            if GOADS_emissions['MAJOR_STRUC'][istruc] =='Major':
                Map_GOADSmajor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]
            else:
                Map_GOADSminor_emissions[ilat,ilon,iyear,imonth] += GOADS_emissions['Emis_tg'][istruc]

    # sum complexes and emissions for diagnostic
    majcplx = GOADS_emissions[(GOADS_emissions['MAJOR_STRUC']=='Major')]
    majcplx = majcplx[majcplx['LEASE_TYPE'] =='Gas']
    num_majcplx = majcplx["COMPLEX_ID"].unique()
    #print(np.shape(num_majcplx))
    mincplx = GOADS_emissions[GOADS_emissions['MAJOR_STRUC']=='Minor']
    mincplx = mincplx[mincplx['LEASE_TYPE'] =='Gas']
    num_mincplx = mincplx["COMPLEX_ID"].unique()
    #print(np.size(num_mincplx))            
    print('Number of Major Gas Complexes: ',(np.size(num_majcplx)))
    print('Emissions: ',np.sum(Map_GOADSmajor_emissions[:,:,iyear,:]))
    print('Number of Minor Gas Complexes: ',(np.size(num_mincplx)))
    print('Emissions: ',np.sum(Map_GOADSminor_emissions[:,:,iyear,:]))
    #clean (remove unused arrays)
    del GOADS_emissions, majcplx, mincplx
    del ERG_complex_crosswalk, GOADS_locations, GOADS_locations_Unique

#### 2.2.4. Interpolate Data & Save Data

In [None]:
#interpolate and save resulting proxy maps, unless loading from previous calculation

if ReCalc_GOADS ==1:
    #2011 data applied to 2012
    # 2014 data applied to 2013-2015
    # 2017 data applied 2016 forward
    Map_GOADSmajor_emissions[:,:,1,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,2,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,3,:] = Map_GOADSmajor_emissions[:,:,2,:]
    Map_GOADSmajor_emissions[:,:,4,:] = Map_GOADSmajor_emissions[:,:,5,:]
    Map_GOADSmajor_emissions[:,:,6,:] = Map_GOADSmajor_emissions[:,:,5,:]
    
    Map_GOADSminor_emissions[:,:,1,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,2,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,3,:] = Map_GOADSminor_emissions[:,:,2,:]
    Map_GOADSminor_emissions[:,:,4,:] = Map_GOADSminor_emissions[:,:,5,:]
    Map_GOADSminor_emissions[:,:,6,:] = Map_GOADSminor_emissions[:,:,5,:]
    
    np.save('./IntermediateOutputs/GOADSmajor_gas_tempoutput', Map_GOADSmajor_emissions)
    np.save('./IntermediateOutputs/GOADSminor_gas_tempoutput', Map_GOADSminor_emissions)
else:
    Map_GOADSmajor_emissions = np.load('./IntermediateOutputs/GOADSmajor_gas_tempoutput.npy')
    Map_GOADSminor_emissions = np.load('./IntermediateOutputs/GOADSminor_gas_tempoutput.npy')

### 2.3. Well and Production Data (from Enverus)

##### 2.3.1 Read In & Combine Each Year of Prism & DI Monthly Data (from Enverus)

In [None]:
# Data come from Enverus, both Drilling Info and Prism
# The reason 2 datasets are used is because Prism does not include all states
# So remaining states, or those with more DI coverage are taken from DI

#Only re-do this section if data need to be re-calculated (ReCalc_Enverus ==1)

In [None]:
#Read In and Format the Prism and DI data 
# 1. Read Data
# 2. Drop unsed columns, rename columns to match between DI and Prism
# 3. Combine DI and Prism into one data array
# 4. Calculate annual cummulate production totals
# 5. Save the data as a year-specific variable

#Based on ERGs logic, active wells are determined based on their production levels and not producing status


for iyear in np.arange(0,num_years):
    
    #DI data
    DI_data = pd.read_csv(vars()['Enverus_DI_inputdata_' +year_range_str[iyear]])
    DI_data = DI_data.drop(columns=['ENTITY_ID','API_UWI','OPERATOR_COMPANY_NAME','AAPG_FULL_ERG',\
                           'FIELD','RESERVOIR','LAST_PROD_DATE','DRILL_TYPE','CUM_GAS','CUM_OIL','CUM_WATER'])
    DI_data.rename({'WELL_COUNT_ID':'WELL_COUNT','DI_BASIN':'BASIN','NEMS_REGION_ERG':'NEMS_REGION',\
                    'SURFACE_LATITUDE_WGS84':'LATITUDE','SURFACE_LONGITUDE_WGS84':'LONGITUDE','MONTHLY_WATER_01':'WATERPROD_01',\
                   'MONTHLY_WATER_02':'WATERPROD_02','MONTHLY_WATER_03':'WATERPROD_03','MONTHLY_WATER_04':'WATERPROD_04',\
                   'MONTHLY_WATER_05':'WATERPROD_05','MONTHLY_WATER_06':'WATERPROD_06','MONTHLY_WATER_07':'WATERPROD_07',\
                   'MONTHLY_WATER_08':'WATERPROD_08','MONTHLY_WATER_09':'WATERPROD_09','MONTHLY_WATER_10':'WATERPROD_10',\
                   'MONTHLY_WATER_11':'WATERPROD_11','MONTHLY_WATER_12':'WATERPROD_12','MONTHLY_OIL_01':'OILPROD_01',\
                   'MONTHLY_OIL_02':'OILPROD_02','MONTHLY_OIL_03':'OILPROD_03','MONTHLY_OIL_04':'OILPROD_04',\
                   'MONTHLY_OIL_05':'OILPROD_05','MONTHLY_OIL_06':'OILPROD_06','MONTHLY_OIL_07':'OILPROD_07',\
                   'MONTHLY_OIL_08':'OILPROD_08','MONTHLY_OIL_09':'OILPROD_09','MONTHLY_OIL_10':'OILPROD_10',\
                   'MONTHLY_OIL_11':'OILPROD_11','MONTHLY_OIL_12':'OILPROD_12','MONTHLY_GAS_01':'GASPROD_01',\
                   'MONTHLY_GAS_02':'GASPROD_02','MONTHLY_GAS_03':'GASPROD_03','MONTHLY_GAS_04':'GASPROD_04',\
                   'MONTHLY_GAS_05':'GASPROD_05','MONTHLY_GAS_06':'GASPROD_06','MONTHLY_GAS_07':'GASPROD_07',\
                   'MONTHLY_GAS_08':'GASPROD_08','MONTHLY_GAS_09':'GASPROD_09','MONTHLY_GAS_10':'GASPROD_10',\
                   'MONTHLY_GAS_11':'GASPROD_11','MONTHLY_GAS_12':'GASPROD_12'},axis=1, inplace=True)
    DI_data['WELL_COUNT'] = 1
    
    

    #Prism Data
    Prism_data = pd.read_csv(vars()['Enverus_Prism_inputdata_'+year_range_str[iyear]])
    Prism_data = Prism_data.drop(columns=['WELLID','API_UWI','RSOPERATOR','TRAJECTORY','FIELD','RSREGION','FORMATION',\
                                         'TOTALFLUIDPUMPED_BBL','SPUDDATE'])
    Prism_data.rename({'RSBASIN':'BASIN','COMPLETIONDATE':'COMPLETION_DATE','SPUDDATE':'SPUD_DATE','FIRSTPRODDATE':'FIRST_PROD_DATE',\
                     'OILGRAVITY_API':'OIL_GRAVITY','WATERPROD_BBL_01':'WATERPROD_01',\
                    'WATERPROD_BBL_02':'WATERPROD_02','WATERPROD_BBL_03':'WATERPROD_03','WATERPROD_BBL_04':'WATERPROD_04',\
                   'WATERPROD_BBL_05':'WATERPROD_05','WATERPROD_BBL_06':'WATERPROD_06','WATERPROD_BBL_07':'WATERPROD_07',\
                   'WATERPROD_BBL_08':'WATERPROD_08','WATERPROD_BBL_09':'WATERPROD_09','WATERPROD_BBL_10':'WATERPROD_10',\
                   'WATERPROD_BBL_11':'WATERPROD_11','WATERPROD_BBL_12':'WATERPROD_12','LIQUIDSPROD_BBL_01':'OILPROD_01',\
                   'LIQUIDSPROD_BBL_02':'OILPROD_02','LIQUIDSPROD_BBL_03':'OILPROD_03','LIQUIDSPROD_BBL_04':'OILPROD_04',\
                   'LIQUIDSPROD_BBL_05':'OILPROD_05','LIQUIDSPROD_BBL_06':'OILPROD_06','LIQUIDSPROD_BBL_07':'OILPROD_07',\
                   'LIQUIDSPROD_BBL_08':'OILPROD_08','LIQUIDSPROD_BBL_09':'OILPROD_09','LIQUIDSPROD_BBL_10':'OILPROD_10',\
                   'LIQUIDSPROD_BBL_11':'OILPROD_11','LIQUIDSPROD_BBL_12':'OILPROD_12','GASPROD_MCF_01':'GASPROD_01',\
                   'GASPROD_MCF_02':'GASPROD_02','GASPROD_MCF_03':'GASPROD_03','GASPROD_MCF_04':'GASPROD_04',\
                   'GASPROD_MCF_05':'GASPROD_05','GASPROD_MCF_06':'GASPROD_06','GASPROD_MCF_07':'GASPROD_07',\
                   'GASPROD_MCF_08':'GASPROD_08','GASPROD_MCF_09':'GASPROD_09','GASPROD_MCF_10':'GASPROD_10',\
                   'GASPROD_MCF_11':'GASPROD_11','GASPROD_MCF_12':'GASPROD_12','RSWELLSTATUS':'PRODUCING_STATUS'},axis=1,inplace=True)
    #
    #Prism_data = Prism_data[Prism_data['PRODUCING_STATUS'] == 'PRODUCING']
    Prism_data['WELL_COUNT'] = 1
    #print(Prism_data)
    
    #combine into one array with common column names, replace nans with zeros, and sum annual production
    Enverus_data = pd.concat([DI_data,Prism_data], ignore_index=True)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')].fillna(0)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')].fillna(0)
    Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')] = Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')].fillna(0)

    #Calculate cummulative annual production totals for Gas, Oil, Water
    Enverus_data['CUM_GAS'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('GASPROD_')].sum(1)
    Enverus_data['CUM_OIL'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('OILPROD_')].sum(1)
    Enverus_data['CUM_WATER'] = Enverus_data.loc[:,Enverus_data.columns.str.contains('WATERPROD_')].sum(1)
    
    Enverus_data['NEMS_CODE'] = Enverus_data['NEMS_REGION'].map(NEMS_dict)
    
    #save out the data for that year
    vars()['Enverus_data_'+year_range_str[iyear]] = Enverus_data.copy()
    print('Load Complete: Year '+year_range_str[iyear])
    
    del DI_data #save memory space 
    
    #define default values for a new row in this table (to be used later during data corrections)
    default = {'WELL_COUNT': 0, 'STATE':'','COUNTY':'','BASIN':'','AAPG_CODE_ERG':'UNK','NEMS_REGION':'UNK','NEMS_CODE':99,\
               'LATITUDE':0,'LONGITUDE':0,'PRODUCING_STATUS':'','RESERVOIR_TYPE':'','COMPLETION_DATE':'','SPUD_DATE':'',\
               'FIRST_PROD_DATE':'','HF':'', 'OFFSHORE':'','OIL_GRAVITY':'','GOR':-99,'GOR_QUAL':'','PROD_FLAG':'',\
               'OILPROD_01':0, 'GASPROD_01':0, 'WATERPROD_01':0,'OILPROD_02':0, 'GASPROD_02':0, 'WATERPROD_02':0,\
          'OILPROD_03':0, 'GASPROD_03':0, 'WATERPROD_03':0,'OILPROD_04':0, 'GASPROD_04':0, 'WATERPROD_04':0,\
          'OILPROD_05':0, 'GASPROD_05':0, 'WATERPROD_05':0,'OILPROD_06':0, 'GASPROD_06':0, 'WATERPROD_06':0,\
          'OILPROD_07':0, 'GASPROD_07':0, 'WATERPROD_07':0,'OILPROD_08':0, 'GASPROD_08':0, 'WATERPROD_08':0,\
          'OILPROD_09':0, 'GASPROD_09':0, 'WATERPROD_09':0,'OILPROD_10':0, 'GASPROD_10':0, 'WATERPROD_10':0,\
          'OILPROD_11':0, 'GASPROD_11':0, 'WATERPROD_11':0,'OILPROD_12':0, 'GASPROD_12':0, 'WATERPROD_12':0}
    
display(Enverus_data)

In [None]:
# Correct the NEMS Code for missing NEMS_REGIONS
# Note OFFSHORE regions will have NaN as NEMS_Code

for iyear in np.arange(0,num_years):
    enverus_data_temp = vars()['Enverus_data_'+year_range_str[iyear]].copy()
    list_well = enverus_data_temp.index[pd.isna(enverus_data_temp.loc[:,'NEMS_REGION'])].tolist()
    if np.size(list_well) > 0:
        for irow in list_well: 
            match_state = np.where(NEMS_State['Ansi']==enverus_data_temp['STATE'][irow])[0][0]
            enverus_data_temp.loc[irow,'NEMS_CODE'] = NEMS_State['NEMS'][match_state].astype(int)
    vars()['Enverus_data_'+year_range_str[iyear]] = enverus_data_temp.copy()
#print(NEMS_State)


#### Step 2.3.2 - Correct Enverus Data for Select States

In [None]:
# 1) Read In Coverage Table from State Well Counts File from ERG
# (specifies the first year with bad data and which years need to be corrected; 
# all years including and after the first bad year of data need to be corrected)

ERG_StateWellCounts_FirstBadDataYear = pd.read_excel(Enverus_WellCounts_inputfile, sheet_name = "2021 - Coverage", usecols = "A:B", skiprows = 2, nrows = 40)
ERG_StateWellCounts_FirstBadDataYear['date'] = pd.to_datetime(ERG_StateWellCounts_FirstBadDataYear['Date to USE'], errors = 'coerce')
ERG_StateWellCounts_FirstBadDataYear['year'] = pd.DatetimeIndex(ERG_StateWellCounts_FirstBadDataYear['date']).year.fillna(end_year+100).astype(int)

# 2) Loops through the each state and year in Enverus to determine if the data for that particualar year needs to 
# be corrected. At the moment, the only corrections ERG makes to the data is to use the prior year of data if there
# is no new Enverus data reportd for that state. If a particular state is not included for any years in the Enverus
# dataset, then a row of zeros is added to the Enverus table for that year. 

for istate in np.arange(0,len(State_ANSI)):
    correctdata =0
    state_str = State_ANSI['abbr'][istate]
    firstbadyear = ERG_StateWellCounts_FirstBadDataYear['year'][ERG_StateWellCounts_FirstBadDataYear['State'] == state_str].values
    if firstbadyear.size  == 0:
        firstbadyear = end_year+5 #if state isn't included in correction list, don't correct any data
    
    for iyear in np.arange(0,num_years):
        enverus_data_temp= vars()['Enverus_data_'+year_range_str[iyear]].copy()
        state_list = np.unique(enverus_data_temp['STATE'])
        if state_str in state_list:
            inlist =1
        else:
            inlist = 0
        if inlist ==1 or correctdata==1: #if the state is included in Enverus data, or had data for at least one good year
            #if first year, correctdata will be zero, but inlist will also be zero if no Enverus data
            #check to see whether corrections are necessary for the given year/state
            if year_range[iyear] == (firstbadyear-1):
                print(state_str,year_range[iyear],'last good year')
                # This is the last year of good data. Do not correct the data but save
                # but so that this data can be used for all following years for that state
                temp_data = enverus_data_temp[enverus_data_temp['STATE'] == state_str]
                lastgoodyear = year_range_str[iyear]
                correctdata=1
            elif year_range[iyear] >= firstbadyear: 
                print(state_str,year_range[iyear])
                #correct data for all years equal to and after the first bad year (remove old data first if necessary)
                if inlist == 1:
                    enverus_data_temp = enverus_data_temp[enverus_data_temp['STATE'] != state_str]
                enverus_data_temp = pd.concat([enverus_data_temp,temp_data],ignore_index=True)
                print(state_str +' data for ' +year_range_str[iyear] +' were corrected with '+lastgoodyear+' data')
            else:
                no_corrections =1
                
        if inlist==0 and correctdata==0:
        #if there is no Enverus data for a given state, and there was no good data, add a row with default values
            temp_row = {'STATE':state_str}
            enverus_data_temp = enverus_data_temp.append({**default,**temp_row}, ignore_index=True)
            print(state_str +' has no Enverus data in the year ' +year_range_str[iyear]+', default values set')
            
        #resave that year of Enverus data
        enverus_data_temp.reset_index(drop=True,inplace=True)
        vars()['Enverus_data_'+year_range_str[iyear]] = enverus_data_temp.copy()

### Step 2.4. - Calculate Fractional Monthly Condensate Arrays 
(EIA condensate production (bbl) relative to producing Enverus gas wells by month in each state and region)

In [None]:
#Calculate the number of wells from the EIA Condensate Data Set (converted to monthly counts) relative to the number of
# producing non-associated gas wells each month in the Enverus dataset (for each state and nems region)

#1 ) Initialize arrays
cond_env_well_count = np.zeros([len(State_ANSI),num_regions,num_years,num_months])
cond_month_wellfrac = np.zeros([len(State_ANSI),num_regions,num_years,num_months])
cond_weighted_well_count = np.zeros([len(State_ANSI),num_regions,num_years,num_months])

if ReCalc_Condensates == 1:
    # 2) Calculate the number of producing non-associated gas wells (onshore) each month
    # (NA classification based on annual production, then producing well if production > 0 in the given month)
    for iyear in np.arange(0,num_years):  
        enverus_data_temp = vars()['Enverus_data_'+year_range_str[iyear]].copy()
        list_well1 = enverus_data_temp.index[enverus_data_temp.loc[:,'OFFSHORE'] == 'N'].tolist()
        list_well2 = enverus_data_temp.index[enverus_data_temp.loc[:,'CUM_GAS'] > 0 ].tolist()
        list1_as_set = set(list_well1)
        intersection = list1_as_set.intersection(list_well2) #find common elements between both lists
        list_well_total = list(intersection)
        for iwell in list_well_total:
            if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) > 100 or \
                    enverus_data_temp['GOR_QUAL'][iwell] =='Gas only'):
                inems = enverus_data_temp['NEMS_CODE'][iwell].astype(int)
                #if inems < 0:
                #    print(iwell)
                istate = np.where(State_ANSI['abbr'] == enverus_data_temp['STATE'][iwell])[0][0]
                for imonth in np.arange(0,num_months):
                    prod_str = 'GASPROD_'+month_tag[imonth]  
                    if enverus_data_temp[prod_str][iwell] >0:
                        cond_env_well_count[istate,inems,iyear,imonth] = \
                                cond_env_well_count[istate,inems,iyear,imonth] + enverus_data_temp['WELL_COUNT'][iwell]

    # 3) Calculate monthly weighted Enverus well counts ( = number of producing wells each month * number of days in month)
    for iyear in np.arange(0,num_years):    
        for imonth in np.arange(0, num_months):
            # 4) Normalize 
            # (monthly condensate well fraction = annual state condensate well counts * number of days in month / (all well counts from Enverus * number of days in month)))
            # Results in the the fraction of wells each month in the condensate data set relative to in the Enverus dataset
            for istate in np.arange(0,len(State_ANSI)):
                for inems in np.arange(0,num_regions):
                    if state_cond_prod[inems,istate,iyear] > 0:
                        cond_month_wellfrac[istate,inems,iyear,imonth] = \
                        data_fn.safe_div((state_cond_prod[inems,istate,iyear]),float(np.sum(cond_env_well_count[istate,inems,iyear,:])))
    #this calculates the condensate production volume per well for each state, region, year, and month              
    np.save('./IntermediateOutputs/Condensates_wellcount_tempoutput', cond_env_well_count)
    np.save('./IntermediateOutputs/Condensates_wellfrac_tempoutput', cond_month_wellfrac)
    np.save('./IntermediateOutputs/Condensates_stateprod_tempoutput', state_cond_prod)

else:
    cond_env_well_count = np.load('./IntermediateOutputs/Condensates_wellcount_tempoutput.npy')
    cond_month_wellfrac = np.load('./IntermediateOutputs/Condensates_wellfrac_tempoutput.npy')
    state_cond_prod = np.load('./IntermediateOutputs/Condensates_stateprod_tempoutput.npy')

### Step 2.5 Convert Enverus Well and Production Arrays, and Condensate Array into Gridded Location Arrays

In [None]:
# clear variables
del ERG_StateWellCounts_FirstBadDataYear
del Prism_data
del colnames
del names
del state_condensates
del temp_data

In [None]:
# Make Annual gridded arrays (maps) of well data (a well will be counted every month if there is any production that year)
# Includes NA Gas Wells and Production onshore in the CONUS region
# source emissions are related to the presence of a well and its production status (no emission if no production)
# Details: ERG does not include a well in the national count if there is no (cummulative) oil or gas production from that well.
# Wells are not considered active for a given year if there is no production data that year
# This may cause wells that are completed but not yet producing to be dropped from the national count. 
# ERG has developed their own logic to determine if a well is an HF well or not and that result is included in the 
# HF variable in this dataset. This method does not rely on the Enverus well 'Producing Status'
# Well Type (e.g., non-associated gas well) is determined based on annual production GOR at that well (CUM OIL/ CUM GAS), 
# but the prsence of a well will only be included in maps in months where monthly gas prod > 


#Define well location/production arrays  
Map_EnvAllwell = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvNonAssocProd = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasin220 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months]) 
Map_EnvBasin395 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasin430 = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvBasinOther = np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvLeaseC = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvNonAssoc_HF = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvNonAssoc_Conv = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvCoalBed = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvNonAssocExp_HF_comp = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvNonAssocExp_Conv_comp = np.zeros([num_regions-1,len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvGasWellExp_drilled = np.zeros([num_regions-1, len(Lat_01), len(Lon_01),num_years, num_months])
Map_EnvStateGOM_Offshore=  np.zeros([len(Lat_01), len(Lon_01),num_years, num_months])
#nongrid
Map_EnvAllwell_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvNonAssocProd_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvBasin220_nongrid = np.zeros([num_years, num_months]) 
Map_EnvBasin395_nongrid = np.zeros([num_years, num_months])
Map_EnvBasin430_nongrid = np.zeros([num_years, num_months])
Map_EnvBasinOther_nongrid = np.zeros([num_years, num_months])
Map_EnvLeaseC_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvNonAssoc_HF_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvNonAssoc_Conv_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvCoalBed_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvNonAssocExp_HF_comp_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvNonAssocExp_Conv_comp_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvGasWellExp_drilled_nongrid = np.zeros([num_regions-1,num_years, num_months])
Map_EnvStateGOM_Offshore_nongrid = np.zeros([num_years, num_months])
#ReCalc_Enverus=1 #*****
if ReCalc_Enverus ==1:
    
    for iyear in np.arange(0,num_years):
        enverus_data_temp = vars()['Enverus_data_'+year_range_str[iyear]].copy()
        nocompdate = 0 #record the number of wells that don't have reported completion dates (but have production in that given year)
        nodrill = 0 #record the number of wells that don't have drilling information
        nooffshore = 0
        
        #loop through each row (e.g., well) in the Enverus dataset (for both onnshore and offshore gas wells wells)
        # This will not include wells that have zero gas production in a given year, but is consistant with the GHGI approach.
        list_onshore_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'OFFSHORE'] == 'N'].tolist()
        list_offshore_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'OFFSHORE'] == 'Y'].tolist()
        list_gas_wells = enverus_data_temp.index[enverus_data_temp.loc[:,'CUM_GAS'] > 0].tolist()
        #find onshore gas wells based on common list elements...
        list1_as_set = set(list_onshore_wells)
        intersection = list1_as_set.intersection(list_gas_wells)
        list_onshore_gas_wells = list(intersection)
        #find offshore gas wells based on common list elements...
        list1_as_set = set(list_offshore_wells)
        intersection = list1_as_set.intersection(list_gas_wells)
        list_offshore_gas_wells = list(intersection)
    
        # for onshore gas wells... 
        for iwell in list_onshore_gas_wells:
            #Check if location is within CONUS
            if enverus_data_temp['LONGITUDE'][iwell] > Lon_left and enverus_data_temp['LONGITUDE'][iwell] < Lon_right \
                and enverus_data_temp['LATITUDE'][iwell] > Lat_low and enverus_data_temp['LATITUDE'][iwell] < Lat_up:
                #find index of lon and lat, and NEMS region
                ilat = int((enverus_data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
                ilon = int((enverus_data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
                inems = enverus_data_temp['NEMS_CODE'][iwell].astype(int)
            
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) > 100 or \
                    enverus_data_temp['GOR_QUAL'][iwell] =='Gas only'):
                    # if non-associated gas well, 
                    for imonth in np.arange(0,num_months):
                    #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                        prod_str = 'GASPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0:
                            Map_EnvAllwell[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes non-assoc. wells only
                            Map_EnvNonAssocProd[inems,ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell] # production from non-assoc. gas wells only
                        
                            #save basin-specific production levels for onshore non-associated gas wells
                            if enverus_data_temp['AAPG_CODE_ERG'][iwell] =='220':
                                Map_EnvBasin220[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='395':
                                Map_EnvBasin395[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='430':
                                Map_EnvBasin430[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            else: 
                                Map_EnvBasinOther[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                            #Add condesate wells
                            istate = np.where(State_ANSI['abbr'] == enverus_data_temp['STATE'][iwell])[0][0]
                            Map_EnvLeaseC[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]*cond_month_wellfrac[istate,inems,iyear,imonth]
                            #print(cond_month_wellfrac[iyear,imonth,istate,inems])
                            #[mi,abbr_dict[wells_all_prod['STATE'][i]],nemsi]
                        
                            if enverus_data_temp['HF'][iwell] == 'Y':
                            #is it an HF well or not?
                                Map_EnvNonAssoc_HF[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #non-assoc. HF wells
                            else:     
                                Map_EnvNonAssoc_Conv[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #non-assoc. conventional wells
                    
                        prod_str = 'WATERPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0 and \
                            (enverus_data_temp['BASIN'][iwell] =='POWDER RIVER' or enverus_data_temp['BASIN'][iwell] =='BLACK WARRIOR'):
                        # save produced water volumes from non-associated gas wells in the powder river and black warrior basins
                            Map_EnvCoalBed[inems,ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                    if isinstance(enverus_data_temp['COMPLETION_DATE'][iwell],float):
                    #if Non-associated gas well (onshore), regardless of whether the well this month is producing, 
                    # determine whether the given well was completed this year, and if so, assign it to the correct month,
                    # if not completed in the current year, then don't add completion year (assume it was captured already in previous year loop)
                    # if completion date is NaN, do not record anywhere (may undercount). Will also undercount if well completed in
                    # one year but does not start producing until the next. 
                        if np.isnan(enverus_data_temp['COMPLETION_DATE'][iwell]):
                            nocompdate = nocompdate +1
                    else:
                        month = enverus_data_temp['COMPLETION_DATE'][iwell][5:7] #extract the month
                        #print(month)
                        year = enverus_data_temp['COMPLETION_DATE'][iwell][0:4] #extract year
                        #print(year)
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                        #print('here')
                            for imonth in np.arange(0, num_months):
                                if month_tag[imonth] == month:
                                    #print('here, month')
                                    if enverus_data_temp['HF'][iwell] == 'Y':
                                        #print('here, HF')
                                        Map_EnvNonAssocExp_HF_comp[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes completions from non-associated HF gas wells that were producing in the same year
                                    else:
                                        #print('here, non-HF')
                                        Map_EnvNonAssocExp_Conv_comp[inems,ilat,ilon,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes completions from non-associated conventional wells that were producing in the same year
                
                    if isinstance(enverus_data_temp['SPUD_DATE'][iwell],float):
                    #if Non-associated gas well (onshore), regardless of whether the well this month is producing, 
                    # determine whether the given well was drilled this year, and if so, assign it to the correct *YEAR* 
                    # assign based on SPUD Date, unless Null, then check to see if producing in the current year
                    # NOTE: the National inventory looks for first production date in the nexy year to see if drilled this year. 
                    # This logic is too difficult to implement here, so only counted if first_prod_date is in current year
                        if np.isnan(enverus_data_temp['SPUD_DATE'][iwell]):
                            if isinstance(enverus_data_temp['FIRST_PROD_DATE'][iwell],float):
                                if np.isnan(enverus_data_temp['FIRST_PROD_DATE'][iwell]):
                                    nodrill += 1
                            else:
                                year = enverus_data_temp['FIRST_PROD_DATE'][iwell][0:4] #extract year
                                if year_range_str[iyear] == year:
                                    Map_EnvGasWellExp_drilled[inems,ilat,ilon,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                    else:
                        year = enverus_data_temp['SPUD_DATE'][iwell][0:4] #extract year
                        #print(year)
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                            Map_EnvGasWellExp_drilled[inems,ilat,ilon,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                
            #if not in coninental US grid, still count those wells in non-grid arrays (does not include offshore, dealt with next)
            # same logic sequence as above
            else:
                inems = enverus_data_temp['NEMS_CODE'][iwell].astype(int) 
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) > 100 or \
                    enverus_data_temp['GOR_QUAL'][iwell] =='Gas only'):
                    for imonth in np.arange(0,num_months):
                    #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                        prod_str = 'GASPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0:
                        #check if an non-assoc. gas well
                            Map_EnvAllwell_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell] #includes non-assoc. wells only
                            Map_EnvNonAssocProd_nongrid[inems,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                            #save basin-specific production levels for onshore non-associated gas wells
                            if enverus_data_temp['AAPG_CODE_ERG'][iwell] =='220':
                                Map_EnvBasin220_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='395':
                                Map_EnvBasin395_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            elif enverus_data_temp['AAPG_CODE_ERG'][iwell] =='430':
                                Map_EnvBasin430_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            else: 
                                Map_EnvBasinOther_nongrid[iyear,imonth] += enverus_data_temp[prod_str][iwell]
                            
                            istate = np.where(State_ANSI['abbr'] == enverus_data_temp['STATE'][iwell])[0][0]
                            Map_EnvLeaseC_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]*cond_month_wellfrac[istate,inems,iyear,imonth]

                            if enverus_data_temp['HF'][iwell] == 'Y':
                                Map_EnvNonAssoc_HF_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                            else:     
                                Map_EnvNonAssoc_Conv_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                 
                        prod_str = 'WATERPROD_'+month_tag[imonth]  
                        if enverus_data_temp[prod_str][iwell] >0 and \
                            (enverus_data_temp['BASIN'][iwell] =='POWDER RIVER' or enverus_data_temp['BASIN'][iwell] =='BLACK WARRIOR'):
                        # save produced water volumes from non-associated gas wells in the powder river and black warrior basins
                            Map_EnvCoalBed_nongrid[inems,iyear,imonth] += enverus_data_temp[prod_str][iwell]
                        
                    if isinstance(enverus_data_temp['COMPLETION_DATE'][iwell],float): 
                        if np.isnan(enverus_data_temp['COMPLETION_DATE'][iwell]):
                            nocompdate = nocompdate +1
                    else:
                        month = enverus_data_temp['COMPLETION_DATE'][iwell][5:7] #extract the month
                        year = enverus_data_temp['COMPLETION_DATE'][iwell][0:4]
                        if year_range_str[iyear] == year:
                            for imonth in np.arange(0, num_months):
                                if month_tag[imonth] == month:
                                    if enverus_data_temp['HF'][iwell] == 'Y':
                                        Map_EnvNonAssocExp_HF_comp_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
                                    else:
                                        Map_EnvNonAssocExp_Conv_comp_nongrid[inems,iyear,imonth] += enverus_data_temp['WELL_COUNT'][iwell]
            
                    if isinstance(enverus_data_temp['SPUD_DATE'][iwell],float):
                        if np.isnan(enverus_data_temp['SPUD_DATE'][iwell]):
                            if isinstance(enverus_data_temp['FIRST_PROD_DATE'][iwell],float):
                                if np.isnan(enverus_data_temp['FIRST_PROD_DATE'][iwell]):
                                    nodrill += 1
                            else:
                                year = enverus_data_temp['FIRST_PROD_DATE'][iwell][0:4] #extract year
                                if year_range_str[iyear] == year:
                                    Map_EnvGasWellExp_drilled_nongrid[inems,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]
                    else:
                        year = enverus_data_temp['SPUD_DATE'][iwell][0:4] #extract year
                        #print(year)
                        if year_range_str[iyear] == year:
                        # if completed in the current year, add to the correct month map
                            Map_EnvGasWellExp_drilled_nongrid[inems,iyear,:] += enverus_data_temp['WELL_COUNT'][iwell]      
            
                    
        #for offshore gas well locations... 
        # EPA State GOM offshore emissions will be allocated based on Enverus production for
        # offshore emissions in GOM states (AL, LA, TX, etc). 
        # Offshore emissions (in NGOM region) are not included in the ERG well count nor here. 
        # Federal offshore emissions are allocated later based on BOEM GOADS platform emissions
        for iwell in list_offshore_gas_wells:

            #Check if location is on grid
            if enverus_data_temp['LONGITUDE'][iwell] > Lon_left and enverus_data_temp['LONGITUDE'][iwell] < Lon_right \
                and enverus_data_temp['LATITUDE'][iwell] > Lat_low and enverus_data_temp['LATITUDE'][iwell] < Lat_up:
                #Set ilon and ilat
                ilat = int((enverus_data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
                ilon = int((enverus_data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
                
                #figure out how to deal with this ....
                # check if non-associated gas well (offshore)
                #if enverus_data_temp['CUM_GAS'][iwell] > 0:
                if ((data_fn.safe_div(enverus_data_temp['CUM_GAS'][iwell],float(enverus_data_temp['CUM_OIL'][iwell]))) > 100 or \
                    enverus_data_temp['GOR_QUAL'][iwell] =='Gas only'):
                    if enverus_data_temp['STATE'][iwell] in {'AL','FL','LA','MS','TX'}:
                        for imonth in np.arange(0,num_months):
                        #count wells in map only for months where there is gas production (emissions ~ when production is occuring)
                            prod_str = 'GASPROD_'+month_tag[imonth]  
                            if enverus_data_temp[prod_str][iwell] >0:
                                Map_EnvStateGOM_Offshore[ilat,ilon,iyear,imonth] += enverus_data_temp[prod_str][iwell]
            else:
                nooffshore +=1
                #print("Error - No offshore outside of the domain")#display(EPA_emi_prod_NG_CH4)           
                

        print('Enverus data not included in this analysis:')
        print('Year: ',year_range_str[iyear])
        print('Wells without drilling information (no Spud or Production data): ',nodrill)
        print('Wells without completion dates: ',nocompdate)
        print('Wells offshore and outside of grid domain: ',nooffshore)
    
    #save current status of datafiles
    #Define well location/production arrays
    np.savez('./IntermediateOutputs/Gas_EnvAllWell_tempout', x=Map_EnvAllwell, y=Map_EnvAllwell_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvNonAssocProd_tempout', x=Map_EnvNonAssocProd, y=Map_EnvNonAssocProd_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvBasin220_tempout', x=Map_EnvBasin220, y=Map_EnvBasin220_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvBasin395_tempout', x=Map_EnvBasin395, y=Map_EnvBasin395_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvBasin430_tempout', x=Map_EnvBasin430, y=Map_EnvBasin430_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvBasinOther_tempout', x=Map_EnvBasinOther, y=Map_EnvBasinOther_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvLeaseC_tempout', x=Map_EnvLeaseC, y=Map_EnvLeaseC_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvNonAssoc_HF_tempout', x=Map_EnvNonAssoc_HF, y=Map_EnvNonAssoc_HF_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvNonAssoc_Conv_tempout', x=Map_EnvNonAssoc_Conv, y=Map_EnvNonAssoc_Conv_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvCoalBed_tempout', x=Map_EnvCoalBed, y=Map_EnvCoalBed_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvNonAssocExp_HF_comp_tempout', x=Map_EnvNonAssocExp_HF_comp, y=Map_EnvNonAssocExp_HF_comp_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvNonAssocExp_Conv_comp_tempout', x=Map_EnvNonAssocExp_Conv_comp, y=Map_EnvNonAssocExp_Conv_comp_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvGasWellExp_drilled_tempout', x=Map_EnvGasWellExp_drilled, y=Map_EnvGasWellExp_drilled_nongrid)
    np.savez('./IntermediateOutputs/Gas_EnvStateGOM_Offshore_tempout', x=Map_EnvStateGOM_Offshore, y=Map_EnvStateGOM_Offshore_nongrid)
else:
    #load previously saved files
    npzfile = np.load('./IntermediateOutputs/Gas_EnvAllWell_tempout.npz')
    Map_EnvAllwell = npzfile['x']
    Map_EnvAllwell_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvNonAssocProd_tempout.npz')
    Map_EnvNonAssocProd = npzfile['x']
    Map_EnvNonAssocProd_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvBasin220_tempout.npz')
    Map_EnvBasin220 = npzfile['x']
    Map_EnvBasin220_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvBasin395_tempout.npz')
    Map_EnvBasin395 = npzfile['x']
    Map_EnvBasin395_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvBasin430_tempout.npz')
    Map_EnvBasin430 = npzfile['x']
    Map_EnvBasin430_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvBasinOther_tempout.npz')
    Map_EnvBasinOther = npzfile['x']
    Map_EnvBasinOther_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvLeaseC_tempout.npz')
    Map_EnvLeaseC = npzfile['x']
    Map_EnvLeaseC_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvNonAssoc_HF_tempout.npz')
    Map_EnvNonAssoc_HF = npzfile['x']
    Map_EnvNonAssoc_HF_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvNonAssoc_Conv_tempout.npz')
    Map_EnvNonAssoc_Conv = npzfile['x']
    Map_EnvNonAssoc_Conv_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvCoalBed_tempout.npz')
    Map_EnvCoalBed = npzfile['x']
    Map_EnvCoalBed_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvNonAssocExp_HF_comp_tempout.npz')
    Map_EnvNonAssocExp_HF_comp = npzfile['x']
    Map_EnvNonAssocExp_HF_comp_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvNonAssocExp_Conv_comp_tempout.npz')
    Map_EnvNonAssocExp_Conv_comp = npzfile['x']
    Map_EnvNonAssocExp_Conv_comp_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvGasWellExp_drilled_tempout.npz')
    Map_EnvGasWellExp_drilled = npzfile['x']
    Map_EnvGasWellExp_drilled_nongrid = npzfile['y']
    npzfile = np.load('./IntermediateOutputs/Gas_EnvStateGOM_Offshore_tempout.npz')
    Map_EnvStateGOM_Offshore = npzfile['x']
    Map_EnvStateGOM_Offshore_nongrid = npzfile['y']

### Step 2.6 Correct Missing IL/IN Data

In [None]:
# General Process
# 1. Read the GHGI well and production statistics from the GHGI (contain corrected IL and IN data)
# 2. Read in the relevant NEI data (from both file formats) and place onto GEPA grid (including reproj of NEI data)
# 3. Scale the NEI prxy maps to the corresponding state level values from Step 1.
# 4. Calculate the lease condensate proxy for IL/IN using the same method as the Enverus data
# 5. Place the scaled NEI grid data on the appropriate Enverus proxy grids. 

#### Step 2.6.1 Read in GHGI State-Level Well Statistics for IL/IN

##### Step 2.6.1.1 Well Counts

In [None]:
#1. Read in National Well Statistics for IL/IN (from ERG Wells Processing Workbook)
# for scaling the NEI proxies to GHGI totals so that IL and IN are correctly weighted relative to 
# the relative weights in the GHGI (e.g., consistent well counts in IL/In relative to national total)
# There may be absolute differences in the NEI due to different data processing. 
# In otherwords, we want to take the relative spatial information from the NEI, but not the absolute values
# Use the well count data for 2016, 2017, 2018, and 2019 - corrected by ERG 

Env_ILIN_wells = pd.read_excel(Enverus_WellCounts_inputfile, sheet_name = "2020 PR - State", skiprows = 4)
Env_ILIN_wells = Env_ILIN_wells.drop(columns = ['Category','WELLCOUNT_16', 'WELLCOUNT_17','WELLCOUNT_18'])
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_16_ERG')]:'WELLCOUNT_16'}, inplace=True)
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_17_ERG')]:'WELLCOUNT_17'}, inplace=True)
Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_18_ERG')]:'WELLCOUNT_18'}, inplace=True)
#Env_ILIN_wells.rename(columns={Env_ILIN_wells.columns[Env_ILIN_wells.columns.get_loc('WELLCOUNT_19_ERG')]:'WELLCOUNT_19'}, inplace=True)
Env_ILIN_wells = Env_ILIN_wells.fillna(0)
Env_ILIN_wells = Env_ILIN_wells[(Env_ILIN_wells['STATE']=='IL') | (Env_ILIN_wells['STATE']=='IN')]
Env_ILIN_wells.reset_index(inplace=True, drop=True)
Env_ILIN_wells['NEMS'] = 0 #IN and IL are both in the north east region
#display(Env_ILIN_wells)

#2 Calculate Well Counts of Each Well Type for each NEMS region
# ERG Query codes
# 1 = Non-Associated Gas Wells, #2 = Oil Wells
# 3 = Associated Gas Wells (not included in total well counts)
# 4  = Gas Wells (non-associated) with Hydraulic Fracturing
# 5 = Gas Well Completions with Hydraulic Fracturing
# 6 = Oil Wells with Hydraulic Fracturing, 
# 7 = Oil well completions with hydraulic fracturing
# 8 = All Gas Well Completions, 
# 9 All Oil well completions
# 10a = Gas Wells Drilled, #10 b = Oil Wells Drilled
# 10c = Dry Wells Drilled

Well_NonAssoc_ILIN = np.zeros([2,num_years]) #all NA wells
Well_NonAssoc_HF_ILIN = np.zeros([2,num_years]) #HF NA wells
Well_NonAssoc_Conv_ILIN = np.zeros([2, num_years]) #NA conventional wells (all NA - HF NA)
Well_NonAssoc_HF_comp_ILIN = np.zeros([2, num_years]) #HF NA well completions
Well_NonAssoc_Conv_comp_ILIN = np.zeros([2, num_years]) #NA conventional well completions (all NA - HF NA)
Well_Allwell_gas_comp_ILIN = np.zeros([2, num_years]) #all NA well completions

Well_Gaswell_drilled_ILIN = np.zeros([2, num_years]) #will end up being corrected total gas wells drilled (inclduign fraction of dry wells)
Well_Oilwell_drilled_ILIN = np.zeros([2, num_years]) # oil wells drilled
Well_Drywell_drilled_ILIN = np.zeros([2, num_years]) # dry wells drilled

#Well_CoalBed_ILIN = np.zeros([num_regions, len(State_ANSI), num_years]) #***NO COAL BED METHANE WELL DATA - UNCLEAR WHERE GHGI PRODUCED WATER INPUTS ARE FROM

# 1) Get all well count data for non-HF wells and completions
start_year_idx = Env_ILIN_wells.columns.get_loc('WELLCOUNT_'+str(start_year)[2:4])
end_year_idx = Env_ILIN_wells.columns.get_loc('WELLCOUNT_'+str(end_year)[2:4])+1

for idx in np.arange(0,len(Env_ILIN_wells)):
    if Env_ILIN_wells['STATE'][idx] == 'IL':
        istate =0
    else:
        istate =1

    if Env_ILIN_wells['QUERY_NMBR'][idx] ==1:
        Well_NonAssoc_ILIN[istate,] = Well_NonAssoc_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==4:
        Well_NonAssoc_HF_ILIN[istate,] = Well_NonAssoc_HF_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==5:
        Well_NonAssoc_HF_comp_ILIN[istate,] = Well_NonAssoc_HF_comp_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] ==8:   
        Well_Allwell_gas_comp_ILIN[istate,] = Well_Allwell_gas_comp_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10a':   
        Well_Gaswell_drilled_ILIN[istate,] = Well_Gaswell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10c':
        Well_Drywell_drilled_ILIN[istate,] = Well_Drywell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]
    elif Env_ILIN_wells['QUERY_NMBR'][idx] =='10b':
        Well_Oilwell_drilled_ILIN[istate,] = Well_Oilwell_drilled_ILIN[istate,]+Env_ILIN_wells.iloc[idx,start_year_idx:end_year_idx]

# Calculate Conventional well counts and completions (All gas wells - HF gas wells)
Well_NonAssoc_Conv_ILIN = Well_NonAssoc_ILIN - Well_NonAssoc_HF_ILIN
Well_NonAssoc_Conv_comp_ILIN = Well_Allwell_gas_comp_ILIN - Well_NonAssoc_HF_comp_ILIN

# Calculate total number of wells drilled wells, accounting for fraction of dry wells (= total - corrected NG wells) 
for istate in np.arange(0,2):
    for iyear in np.arange(0,num_years):
        Well_Gaswell_drilled_ILIN[istate,iyear] = Well_Gaswell_drilled_ILIN[istate,iyear] + Well_Drywell_drilled_ILIN[istate,iyear] \
                                        * (data_fn.safe_div(Well_Gaswell_drilled_ILIN[istate,iyear],\
                                        (Well_Gaswell_drilled_ILIN[istate,iyear]+Well_Oilwell_drilled_ILIN[istate,iyear])))

print('IL/IN GHGI total counts')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    #Print final well counts ** ADD IN QA/QC with final wells notebook later **
    print('Non Associated:       ',np.sum(Well_NonAssoc_ILIN[:,iyear]))
    print('Non Associated Conv:  ',np.sum(Well_NonAssoc_Conv_ILIN[:,iyear]))
    print('Non Associated HF:    ',np.sum(Well_NonAssoc_HF_ILIN[:,iyear]))
    print('All well gas comp:    ',np.sum(Well_Allwell_gas_comp_ILIN[:,iyear]))
    print('Non Associated HF comp: ',np.sum(Well_NonAssoc_HF_comp_ILIN[:,iyear]))
    print('Non Associated Conv comp: ',np.sum(Well_NonAssoc_Conv_comp_ILIN[:,iyear]))
    print('Wells Drilled:        ',np.sum(Well_Gaswell_drilled_ILIN[:,iyear]))
    print(' ')

##### Step 2.6.1.2 Well Production

In [None]:
# ERG Processed Well Production Data (from Prism/Enverus)
# Gas produced from wells in each NEMS region, state, and Basin (units of MCF (gas))

# Includes Gas production from NA gas wells (DOES NOT CURRENTLY INCLUDE GAS PRODUCTION FROM OIL WELLS)

# Use the well count data for 2016, 2017, 2018, and 2019 - corrected by ERG 
Env_ILIN_wellsprod = pd.read_excel(Enverus_WellProd_inputfile, sheet_name = "GHG_DATA_AAPG_MAR19", skiprows = 1)

#drop oil production data
match = np.where(Env_ILIN_wellsprod.columns.str.contains('SUMOFLIQ'))[0][:]
Env_ILIN_wellsprod = Env_ILIN_wellsprod.drop(Env_ILIN_wellsprod.columns[match], axis=1)

#replace with ERG recalculations
Env_ILIN_wellsprod = Env_ILIN_wellsprod.drop(columns = ['SUMOFGAS_16', 'SUMOFGAS_17','SUMOFGAS_18',])
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFGAS_16_ERG')]:'SUMOFGAS_16'}, inplace=True)
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFGAS_17_ERG')]:'SUMOFGAS_17'}, inplace=True)
Env_ILIN_wellsprod.rename(columns={Env_ILIN_wellsprod.columns[Env_ILIN_wellsprod.columns.get_loc('SUMOFGAS_18_ERG')]:'SUMOFGAS_18'}, inplace=True)
Env_ILIN_wellsprod = Env_ILIN_wellsprod.fillna(0)
Env_ILIN_wellsprod = Env_ILIN_wellsprod[(Env_ILIN_wellsprod['STATE']=='IL') | (Env_ILIN_wellsprod['STATE']=='IN')]
Env_ILIN_wellsprod.reset_index(inplace=True, drop=True)

Env_ILIN_wellsprod['NEMS'] = 0 #all data are in northeast region

# Extract the gas production data from non-associated gas wells (QRY = 1) and gas produced from oil wells (QRY = 2)
# and assign to each basin-specific array based on the reported state and AAPG Code as determined by ERG in the workbook
Wellprod_other_ILIN = np.zeros([2, num_years])

start_year_idx = Env_ILIN_wellsprod.columns.get_loc('SUMOFGAS_'+str(start_year)[2:4])
end_year_idx = Env_ILIN_wellsprod.columns.get_loc('SUMOFGAS_'+str(end_year)[2:4])+1

for idx in np.arange(0,len(Env_ILIN_wellsprod)):
    #inems = Wellprod_other_ILIN['NEMS'][idx]
    #if inems !=6:
    if Env_ILIN_wellsprod['STATE'][idx] == 'IL':
        istate =0
    else:
        istate =1
        if Env_ILIN_wellsprod['QUERY_NMBR'][idx] ==1: #or Env_all_wellsprod['QUERY_NMBR'][idx] ==2:
            if Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 220 and Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 395 and Env_ILIN_wellsprod['AAPG_CODE_ERG'][idx] != 430: 
                Wellprod_other_ILIN[istate,] = Wellprod_other_ILIN[istate,] + Env_ILIN_wellsprod.iloc[idx,start_year_idx:end_year_idx]
                
#Print final well counts ** ADD IN QA/QC with final wells notebook later **
print('IL/IN GHGI total production')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Other Basin Production: ',np.sum(Wellprod_other_ILIN[:,iyear]))

#### Step 2.6.2 Read In/Format NEI Values

##### Step 2.6.2.1 Read in all data prior to 2018 (text file format)

In [None]:
#1 Read in relevant files by year (for all years before 2018 [2018 read from different file type])
# Data are in a text file format where each row of data contains the surrogate code, FIPS code, column and row location
# (on the NEI CONUS1 grid), and the absolute, fractional, and running sum of data (e.g., counts or production) in the
# given FIPS region. 
# The absolute data are placed onto the GEPA grid by using an NEI reference map shapefile to map the data location
# from the NEI CONUS grid cell indexes to the corresponding latitude and longitude values in the GEPA grid. 
### Note - the 2016 data from the NEI is on a non-standard grid where lat/lons are unknown. Can change later if needed, or
# can interpolate between years if more accurate

NEI_files = ['/USA_698_NOFILL.txt', '/USA_678_NOFILL.txt', '/USA_696_NOFILL.txt', '/USA_671_NOFILL.txt']
data_names = ['map_NEI_gas_wells', 'map_NEI_gas_completions','map_NEI_gas_production','map_NEI_gas_drilledwells']

for ivar in np.arange(0,len(data_names)):
    vars()[data_names[ivar]] = np.zeros([2,len(Lat_01),len(Lon_01),num_years])

# only recalc the data if required (set in Step 0)
if ReCalc_NEI ==1:
    
    #read in the NEI grid refernece shapefile (contains the lat/lons of each NEI coordinate)
    shape = shp.Reader(NEI_grid_ref_inputfile)

    #make the map arrays of aboslute values (counts and mcf)
    for ivar in np.arange(0,len(data_names)):
        for iyear in np.arange(0,num_years):
            if year_range_str[iyear] == '2012':
                year = '2011'
            elif year_range_str[iyear] == '2013' or year_range_str[iyear] == '2014' or year_range_str[iyear] == '2015':
                year = '2014'
            elif year_range_str[iyear] == '2016' or year_range_str[iyear] == '2017':
                year = '2017'
            elif year_range_str[iyear] == '2018':
                continue
            else:
                print('NEI DATA MISSING FOR YEAR ',year_range_str[iyear])
            path = ERG_NEI_inputloc+year+NEI_files[ivar]
            data_temp = pd.read_csv(path, sep='\t', skiprows = 25)
            data_temp = data_temp.drop(["!"], axis=1)
            data_temp.columns = ['Code','FIPS','COL','ROW','Frac','Abs','FIPS_Total','FIPS_Running_Sum']
            data_temp['Lat'] = np.zeros([len(data_temp)])
            data_temp['Lon'] = np.zeros([len(data_temp)])
            colmin = 1332
            colmax=0
            rowmin = 1548
            rowmax=0
            counter =0
        
            #Create the boundary box
            for idx in np.arange(0,len(data_temp)):
                if str(data_temp['FIPS'][idx]).startswith('17') or str(data_temp['FIPS'][idx]).startswith('18'):
                    icol = data_temp['COL'][idx]
                    irow = data_temp['ROW'][idx]
                    if icol > colmax:
                        colmax =icol
                    if icol < colmin:
                        colmin = icol
                    if irow > rowmax:
                        rowmax = irow
                    if irow < rowmin:
                        rowmin  = irow
            
            #Extract the relevant indicies from the NEI reference shapefile
            array_temp = np.zeros([4,((colmax+1-colmin)*(rowmax+1-rowmin))]) #make an array to save col, row, lat, lon
            idx=0
            for rec in shape.iterRecords():
                if (int(rec['cellid'][0:4]) <= colmax and int(rec['cellid'][0:4]) >= colmin) \
                    and (int(rec['cellid'][5:]) <= rowmax and int(rec['cellid'][5:]) >= rowmin):
                        array_temp[0,idx] = int(rec['cellid'][0:4])   #column index
                        array_temp[1,idx] = int(rec['cellid'][5:])    #row index
                        array_temp[2,idx] = rec['Latitude']           #latitude
                        array_temp[3,idx] = rec['Longitude']          #longitude
                        idx +=1
                        #print(idx,int(rec['cellid'][0:4]),int(rec['cellid'][5:]))
    
            #Use this array to locate and assign the lat lon values to the NEI datafile and then place onto grid
            for idx in np.arange(0,len(data_temp)):
                if str(data_temp['FIPS'][idx]).startswith('17') or str(data_temp['FIPS'][idx]).startswith('18'):
                    icol = data_temp['COL'][idx]
                    irow = data_temp['ROW'][idx]
                    match = np.where((icol == array_temp[0,:]) & (irow == array_temp[1,:]))[0][0]
                    #print(match)
                    data_temp.loc[idx,'Lat'] = array_temp[2,match]
                    data_temp.loc[idx,'Lon'] = array_temp[3,match]
                    ilat = int((data_temp['Lat'][idx] - Lat_low)/Res01)
                    ilon = int((data_temp['Lon'][idx] - Lon_left)/Res01)
                    if str(data_temp['FIPS'][idx]).startswith('17'):
                        vars()[data_names[ivar]][0,ilat,ilon,iyear] += data_temp.loc[idx,'Abs']
                    else:
                        vars()[data_names[ivar]][1,ilat,ilon,iyear] += data_temp.loc[idx,'Abs']

    np.save('./IntermediateOutputs/NEI_gaswell_tempoutput', map_NEI_gas_wells)
    np.save('./IntermediateOutputs/NEI_gascomp_tempoutput', map_NEI_gas_completions)
    np.save('./IntermediateOutputs/NEI_gasprod_tempoutput', map_NEI_gas_production)
    np.save('./IntermediateOutputs/NEI_gasdrill_tempoutput', map_NEI_gas_drilledwells)

else:
    map_NEI_gas_wells = np.load('./IntermediateOutputs/NEI_gaswell_tempoutput.npy')
    map_NEI_gas_completions = np.load('./IntermediateOutputs/NEI_gascomp_tempoutput.npy')
    map_NEI_gas_production = np.load('./IntermediateOutputs/NEI_gasprod_tempoutput.npy')
    map_NEI_gas_drilledwells = np.load('./IntermediateOutputs/NEI_gasdrill_tempoutput.npy')
            
            
print('IL/IN NEI totals')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Non Associated (Conv + HF):    ',np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    print('All well gas comp (Conv + HF): ',np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    print('Gas production:                ',np.sum(map_NEI_gas_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_gas_drilledwells[:,:,:,iyear]))
    print(' ')

##### Step 2.6.2.2 Read in 2018 data (MS Access data)

In [None]:
#Read in 2018 NEI data from different datafile format
    
if ReCalc_NEI ==1:    
    #Read in the data
    driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+ERG_NEI_inputloc_2018+';'''
    conn = pyodbc.connect(driver_str)
    NEI_2018_ILIN_wells = pd.read_sql("SELECT * FROM 2018_IL_IN_WELLS", conn)
    conn.close()

    data_temp = NEI_2018_ILIN_wells[(NEI_2018_ILIN_wells['ACTIVE_WELL_FLAG'] ==1) & (NEI_2018_ILIN_wells['WELL_TYPE'] == 'GAS')]
    data_temp.reset_index(inplace=True, drop=True)
    data_temp.fillna("",inplace=True)

    #find 2018 index
    year_diff = [abs(x - 2018) for x in year_range]
    iyear = year_diff.index(min(year_diff))

    # place data on map for each state (for active wells, production, completions, and drilled wells)
    for iwell in np.arange(0,len(data_temp)):
        ilat = int((data_temp['LATITUDE'][iwell] - Lat_low)/Res01)
        ilon = int((data_temp['LONGITUDE'][iwell] - Lon_left)/Res01)
        if str(data_temp['FIPS_CODE'][iwell]).startswith('17'):# or str(data_temp['FIPS_CODE'][iwell]).startswith('18'):
            istate = 0
        else:
            istate =1
        map_NEI_gas_wells[istate,ilat,ilon,iyear] += 1
        map_NEI_gas_production[istate,ilat,ilon,iyear] += data_temp.loc[iwell,'SUM_GAS']
        if '2018' in data_temp['COMPLETION_DATE'][iwell]:
            map_NEI_gas_completions[istate,ilat,ilon,iyear] += 1
        if '2018' in data_temp['SPUD_DATE'][iwell]:
            map_NEI_gas_drilledwells[istate,ilat,ilon,iyear] += 1

    np.save('./IntermediateOutputs/NEI_gaswell_tempoutput', map_NEI_gas_wells)
    np.save('./IntermediateOutputs/NEI_gascomp_tempoutput', map_NEI_gas_completions)
    np.save('./IntermediateOutputs/NEI_gasprod_tempoutput', map_NEI_gas_production)
    np.save('./IntermediateOutputs/NEI_gasdrill_tempoutput', map_NEI_gas_drilledwells)
else:
    map_NEI_gas_wells = np.load('./IntermediateOutputs/NEI_gaswell_tempoutput.npy')
    map_NEI_gas_completions = np.load('./IntermediateOutputs/NEI_gascomp_tempoutput.npy')
    map_NEI_gas_production = np.load('./IntermediateOutputs/NEI_gasprod_tempoutput.npy')
    map_NEI_gas_drilledwells = np.load('./IntermediateOutputs/NEI_gasdrill_tempoutput.npy')
    
print('IL/IN NEI totals')
for iyear in np.arange(0,num_years):
    print('Year: ', year_range_str[iyear])
    print('Non Associated (Conv + HF):    ',np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    print('All well gas comp (Conv + HF): ',np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    print('Gas production:                ',np.sum(map_NEI_gas_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_gas_drilledwells[:,:,:,iyear]))
    print(' ')


#### Step 2.6.4 Scale NEI absolute values to GHGI data

In [None]:
# Scale the absolute NEI data by the corresponding GHGI counts so that the IL/IN data are not over or under-weighted
# relative to the IL/IN activity data used in the GHGI
# without the scaling, the national emissions would likley be overallocated to these two states as the NEI well and 
# production counts are higher than those used for these states in the current GHGI

#make extra required arrays (HF and Conv will have the same spatial distribution as all gas wells/completions)
map_NEI_gas_wells_conv = map_NEI_gas_wells.copy()
map_NEI_gas_wells_HF = map_NEI_gas_wells.copy()
map_NEI_gas_completions_conv = map_NEI_gas_completions.copy()
map_NEI_gas_completions_HF = map_NEI_gas_completions.copy()

#if ReCalc_NEI ==1:

print('QA/QC: Check that NEI data is scaled to GHGI activity data')
for iyear in np.arange(0,num_years):
    # ratio = sum(GHGI)/ sum(NEI)
    
    #1) conventional gas wells (same spatial distribution as all NEI gas wells)
    ratio_temp = data_fn.safe_div(np.sum(Well_NonAssoc_Conv_ILIN[:,iyear]),np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    map_NEI_gas_wells_conv[:,:,:,iyear] *= ratio_temp
    
    #2) HF gas wells (same spatial distribution as all NEI gas wells)
    ratio_temp = data_fn.safe_div(np.sum(Well_NonAssoc_HF_ILIN[:,iyear]),np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    map_NEI_gas_wells_HF[:,:,:,iyear] *= ratio_temp
    
    # 3) all gas wells
    ratio_temp = data_fn.safe_div(np.sum(Well_NonAssoc_ILIN[:,iyear]),np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    map_NEI_gas_wells[:,:,:,iyear] *= ratio_temp
    
    #4) Conv gas well completions (same spatial distribution as all gas well completions)
    ratio_temp = data_fn.safe_div(np.sum(Well_NonAssoc_Conv_comp_ILIN[:,iyear]),np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    map_NEI_gas_completions_conv[:,:,:,iyear] *= ratio_temp
    
    #5) HF gas well completions (same spatial distribution as all gas well completions)
    ratio_temp = data_fn.safe_div(np.sum(Well_NonAssoc_HF_comp_ILIN[:,iyear]),np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    map_NEI_gas_completions_HF[:,:,:,iyear] *= ratio_temp
    
    #6) all gas well completions
    ratio_temp = data_fn.safe_div(np.sum(Well_Allwell_gas_comp_ILIN[:,iyear]),np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    map_NEI_gas_completions[:,:,:,iyear] *= ratio_temp
    
    #7) gas wells drilled
    ratio_temp = data_fn.safe_div(np.sum(Well_Gaswell_drilled_ILIN[:,iyear]),np.sum(map_NEI_gas_drilledwells[:,:,:,iyear]))
    if pd.isna(ratio_temp):
        ratio_temp = 0    #if there is no GHGI data, but there is NEI data, scale to zero counts
    map_NEI_gas_drilledwells[:,:,:,iyear] *= ratio_temp
    #print(np.sum(map_NEI_gas_drilledwells[:,:,:,iyear]))
    
    #8) Gas production volumes
    ratio_temp = data_fn.safe_div(np.sum(Wellprod_other_ILIN[:,iyear]),np.sum(map_NEI_gas_production[:,:,:,iyear]))
    if pd.isna(ratio_temp):
        ratio_temp = 0     #if there is no GHGI data, but there is NEI data, scale to zero counts
    map_NEI_gas_production[:,:,:,iyear] *= ratio_temp
    #print(ratio_temp)
    
    diff1 = (np.sum(Well_NonAssoc_ILIN[:,iyear]) - np.sum(map_NEI_gas_wells[:,:,:,iyear])) +\
            (np.sum(Well_NonAssoc_Conv_ILIN[:,iyear]) - np.sum(map_NEI_gas_wells_conv[:,:,:,iyear])) +\
            (np.sum(Well_NonAssoc_HF_ILIN[:,iyear]) - np.sum(map_NEI_gas_wells_HF[:,:,:,iyear])) + \
            (np.sum(Well_Allwell_gas_comp_ILIN[:,iyear]) - np.sum(map_NEI_gas_completions[:,:,:,iyear])) +\
            (np.sum(Well_NonAssoc_Conv_comp_ILIN[:,iyear]) - np.sum(map_NEI_gas_completions_conv[:,:,:,iyear])) +\
            (np.sum(Well_NonAssoc_HF_comp_ILIN[:,iyear]) - np.sum(map_NEI_gas_completions_HF[:,:,:,iyear])) + \
            (np.sum(Well_Gaswell_drilled_ILIN[:,iyear]) - np.sum(map_NEI_gas_drilledwells[:,:,:,iyear])) + \
            (np.sum(Wellprod_other_ILIN[:,iyear]) - np.sum(map_NEI_gas_production[:,:,:,iyear]))
    
    if abs(diff1) < 1e-12:
        print('Year ', year_range_str[iyear],":","PASS")
    else:
        print('Year ', year_range_str[iyear],":","CHECK", diff1)
    
    print('Non Associated (Conv + HF):    ',np.sum(map_NEI_gas_wells[:,:,:,iyear]))
    print('Non Associated (Conv):         ',np.sum(map_NEI_gas_wells_conv[:,:,:,iyear]))
    print('Non Associated (HF):           ',np.sum(map_NEI_gas_wells_HF[:,:,:,iyear]))
    print('All well gas comp (Conv + HF): ',np.sum(map_NEI_gas_completions[:,:,:,iyear]))
    print('All well gas comp (Conv):      ',np.sum(map_NEI_gas_completions_conv[:,:,:,iyear]))
    print('All well gas comp (HF):        ',np.sum(map_NEI_gas_completions_HF[:,:,:,iyear]))
    print('Gas production:                ',np.sum(map_NEI_gas_production[:,:,:,iyear]))
    print('Wells Drilled:                 ',np.sum(map_NEI_gas_drilledwells[:,:,:,iyear]))
    print(' ')

#### Step 2.6.5 Correct Lease Condensate Data for IL/IN

In [None]:
#Calculate the lease condensate production in IL/IN
# based on number of wells * condensate/well ratio. 
# Number of wells have already been scaled to GHGI totals, so no further scaling is required here

map_NEI_LeaseC = np.zeros([len(Lat_01),len(Lon_01),num_years])
inems = 0 #both IL/IN in northeast region

print('Check condensate production levels match corrected EIA data')
for iyear in np.arange(0,num_years):
    #Calc IL condensate (from condensate/well * wells)
    istate = np.where(State_ANSI['abbr'] == 'IL')[0][0]
    cond_per_well = data_fn.safe_div((state_cond_prod[inems,istate,iyear]),float(np.sum(map_NEI_gas_wells[0,:,:,iyear])))
    map_NEI_LeaseC[:,:,iyear] += map_NEI_gas_wells[0,:,:,iyear]*cond_per_well
    #Calc IN condensate (from condensate/well * wells)
    istate = np.where(State_ANSI['abbr'] == 'IN')[0][0]
    cond_per_well = data_fn.safe_div((state_cond_prod[inems,istate,iyear]),float(np.sum(map_NEI_gas_wells[1,:,:,iyear])))
    map_NEI_LeaseC[:,:,iyear] += map_NEI_gas_wells[1,:,:,iyear]*cond_per_well

    diff = np.sum(map_NEI_LeaseC[:,:,iyear])-np.sum(state_cond_prod[inems,13:15,iyear])
    if diff < 1e-12:
        print('Year', year_range_str[iyear], ': PASS')
    else:
        print('Year', year_range_str[iyear], ': CHECK', diff)

#### Step 2.6.6 Add the NEI data to the relevant Enverus Proxy Maps

In [None]:
#3. add maps to relevant Enverus maps
# add absolute values to the Enverus maps above (then the weighted calculations below can remain unchanged)
# The same values are assigned to each month (e.g., no temporal resolution is applied to IL or IN data)
# NOTE: Proxy maps need to be reloaded if this code is run more than once
inems =0
for iyear in np.arange(0,num_years):
    for imonth in np.arange(0,num_months):
        Map_EnvAllwell[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_wells[0,:,:,iyear]+map_NEI_gas_wells[1,:,:,iyear])
        Map_EnvNonAssocProd[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_production[0,:,:,iyear]+map_NEI_gas_production[1,:,:,iyear])
        Map_EnvBasinOther[:,:,iyear,imonth] += (1/12)*(map_NEI_gas_production[0,:,:,iyear]+map_NEI_gas_production[1,:,:,iyear])
        Map_EnvLeaseC[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_LeaseC[:,:,iyear])
        Map_EnvNonAssoc_HF[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_wells_HF[0,:,:,iyear]+map_NEI_gas_wells_HF[1,:,:,iyear])
        Map_EnvNonAssoc_Conv[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_wells_conv[0,:,:,iyear]+map_NEI_gas_wells_conv[1,:,:,iyear])
        Map_EnvNonAssocExp_HF_comp[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_completions_HF[0,:,:,iyear]+map_NEI_gas_completions_HF[1,:,:,iyear])
        Map_EnvNonAssocExp_Conv_comp[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_completions_conv[0,:,:,iyear]+map_NEI_gas_completions_conv[1,:,:,iyear])
        Map_EnvGasWellExp_drilled[inems,:,:,iyear,imonth] += (1/12)*(map_NEI_gas_drilledwells[0,:,:,iyear]+map_NEI_gas_drilledwells[1,:,:,iyear])

#### Step 2.7. Save Map_EnvAllwell for use in Transmission notebook

In [None]:
#First combine all NEMS and monthly data into national annual well counts (both on and off grid)
fileloc = '../GEPA_Gas_Transmission/InputData/Map_Enverus_NAGasWellLocations_ongrid.nc'
Map_output = np.zeros([len(Lat_01),len(Lon_01),num_years])
Map_output_nongrid = np.zeros([num_years])

for iyear in np.arange(0,num_years):
    for imonth in np.arange(0,num_months):
        for inems in np.arange(0,num_regions-1):
            Map_output[:,:,iyear] += Map_EnvAllwell[inems,:,:,iyear,imonth]#Map_EnvAllwell[inems,ilat,ilon,iyear,imonth]   
            Map_output_nongrid[iyear] += Map_EnvAllwell_nongrid[inems,iyear,imonth]

data_IO_fn.initialize_netCDF(fileloc, 'Non-Associated Gas Well Locations', 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the OnGrid Data to netCDF
nc_out = Dataset(fileloc, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Map_output
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded Non-Associated Gas Wells saved to file: {}", fileloc)#.format(os.getcwd())+fileloc)
print('')

#Write the Off-grid data to a csv
outfile = pd.DataFrame(Map_output_nongrid)
outfile.to_csv('../GEPA_Gas_Transmission/InputData/Map_Enverus_NAGasWellLocations_offgrid.csv')

#### Step 2.8. Make G&B Station Counts Map

In [None]:
# 1) Read In Enverus Gathering Compressor Station Data (pre-processed in ArcMap)
# 2) Make a proxy map with compressor station locations

Env_GathStation_loc = pd.read_excel(Enverus_NG_GBstations_inputfile, usecols= "AE,AH,AI,AJ", header = 0)
Map_EnvGB_stations = np.zeros([len(Lat_01),len(Lon_01),num_years]) #data represent a snapshot in time that is applied to entire timeseries
Map_EnvGB_stations_nongrid = np.zeros([num_years])

for istation in np.arange(0,len(Env_GathStation_loc)):
    if Env_GathStation_loc['Longitude'][istation] > Lon_left and Env_GathStation_loc['Longitude'][istation] < Lon_right \
        and Env_GathStation_loc['Latitude'][istation] > Lat_low and Env_GathStation_loc['Latitude'][istation] < Lat_up:
        ilat = int((Env_GathStation_loc['Latitude'][istation] - Lat_low)/Res01)
        ilon = int((Env_GathStation_loc['Longitude'][istation] - Lon_left)/Res01)
        Map_EnvGB_stations[ilat,ilon,0] += 1
    else:
        Map_EnvGB_stations_nongrid[0] += 1
    #print('Year: ',year_range[iyear])    
print('Total Gathering Compressor Stations on grid: ',np.sum(Map_EnvGB_stations[:,:,0]))
print('Total Gathering Compressor Stations off grid: ',np.sum(Map_EnvGB_stations_nongrid))

#apply the same proxy to all years
for iyear in np.arange(1,num_years):
    Map_EnvGB_stations[:,:,iyear] = Map_EnvGB_stations[:,:,0]
    Map_EnvGB_stations_nongrid[iyear] = Map_EnvGB_stations_nongrid[0]

#### Step 2.9 Read In G&B Pipeline Miles

In [None]:
# 1) Read In Enverus Gathering Pipeline Data (pre-processed in ArcMap)
# 2) Make a proxy map with the length of pipeline in each grid cell
# 3) Calculate the ratio of GB infrastructure in AK & HI compared to the national onshore total

#Step 1)
Env_GathPipelines_loc = pd.read_excel(Enverus_NG_GBpipeline_inputfile, usecols= "C:G", header = 0)
Map_EnvGB_pipelines = np.zeros([len(Lat_01),len(Lon_01),num_years]) #data represent a snapshot in time that is applied to entire timeseries
Map_EnvGB_pipelines_nongrid = np.zeros([num_years])

#allocation is based on the relative pipeline length in each grid cell (pre-processed in ArcGIS)
# Note that the sum mileage in each grid cell != original dataset mileage due to changes when data was projected

# Step 2)
for iloc in np.arange(0,len(Env_GathPipelines_loc)):
    if Env_GathPipelines_loc['Longitude'][iloc] > Lon_left and Env_GathPipelines_loc['Longitude'][iloc] < Lon_right \
        and Env_GathPipelines_loc['Latitude'][iloc] > Lat_low and Env_GathPipelines_loc['Latitude'][iloc] < Lat_up:
        ilat = int((Env_GathPipelines_loc['Latitude'][iloc] - Lat_low)/Res01)
        ilon = int((Env_GathPipelines_loc['Longitude'][iloc] - Lon_left)/Res01)
        Map_EnvGB_pipelines[ilat,ilon,0] += Env_GathPipelines_loc['SUM_Shape_'][iloc]
    else:
        Map_EnvGB_pipelines_nongrid[0] += Env_GathPipelines_loc['SUM_Shape_'][iloc]
    #print('Year: ',year_range[iyear])    
print('Total Gathering Pipeline length on grid: ',np.sum(Map_EnvGB_pipelines[:,:,0]))
print('Total Gathering Pipeline length off grid: ',np.sum(Map_EnvGB_pipelines_nongrid))

#apply the same proxy to all years
for iyear in np.arange(1,num_years):
    Map_EnvGB_pipelines[:,:,iyear] = Map_EnvGB_pipelines[:,:,0]
    Map_EnvGB_pipelines_nongrid[iyear] = Map_EnvGB_pipelines_nongrid[0]
    
#Step 3)
#1. Open Gathering_pipelines_wgs.shp
#2. sum the miles field
#3. Open Gathering_pipelines_AKHI_wgs84.shp
#4. sum the miles field
#5. Ratio the AKHI miles / (conus + AKHI miles)
#6. Apply this ratio and subtract from the GB pipeline fields (save this fraction as 'not_mapped')

#There are no Gathering Compressor Stations in AK or HI so no subtractions necessary for the remaining G&B emissions

shape = shp.Reader(AKHI_pipelines_shp)
AKHI_miles = 0
for rec in shape.iterRecords():
    AKHI_miles += rec['MILES']
#print(AKHI_miles)

shape = shp.Reader(CONUS_pipelines_shp)
CONUS_miles = 0
for rec in shape.iterRecords():
    CONUS_miles += rec["MILES"]
#print(CONUS_miles)

#apply this fraction and subtract from the GB pipeline emissions in step 4
CONUS_ratio = AKHI_miles/(CONUS_miles+AKHI_miles)
#print(CONUS_ratio)

----------------
## Step 3. Read In EPA Data
---------------

#### Step 3.1 Natural Gas Systems

In [None]:
# Read In EPA Production/Exploration Emissions by NEMS Region
# Note that these emissions do not include potential emissions reductions due to GasSTAR
# Therefore, relevant sub-sector emissions in each region are scaled equally to account for the 
# national level GasSTAR reductions (the same way as done in the national GHGI)

# Emissions are in units of Mg (= 1x10-6 Tg)

names = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Production Sector _ Emissions", usecols = "A:AG", skiprows = 3, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_prod_NG_CH4 = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Production Sector _ Emissions", usecols = "A:AG", skiprows = 3, names = colnames, nrows = 460)
EPA_emi_prod_NG_CH4= EPA_emi_prod_NG_CH4.drop(columns = ['Unnamed: 0', 'Unnamed: 3'])
EPA_emi_prod_NG_CH4.rename(columns={EPA_emi_prod_NG_CH4.columns[0]:'Region'}, inplace=True)
EPA_emi_prod_NG_CH4.rename(columns={EPA_emi_prod_NG_CH4.columns[1]:'Source'}, inplace=True)
EPA_emi_prod_NG_CH4 = EPA_emi_prod_NG_CH4.fillna('')
EPA_emi_prod_NG_CH4 = EPA_emi_prod_NG_CH4.drop(columns = [*range(1990, start_year,1)])
EPA_emi_prod_NG_CH4['Source']= EPA_emi_prod_NG_CH4['Source'].str.replace(r"\(","")
EPA_emi_prod_NG_CH4['Source']= EPA_emi_prod_NG_CH4['Source'].str.replace(r"\)","")
EPA_emi_prod_NG_CH4['Source']= EPA_emi_prod_NG_CH4['Source'].str.replace(r"\[","")
EPA_emi_prod_NG_CH4['Source']= EPA_emi_prod_NG_CH4['Source'].str.replace(r"\]","")
EPA_emi_prod_NG_CH4.reset_index(inplace=True, drop=True)
region_idx_NG_CH4 = EPA_emi_prod_NG_CH4.index[EPA_emi_prod_NG_CH4['Region']!=''].tolist()  #find the index at the start of each region

print('EPA GHGI Emissions w/out Reductions (Mg)')
display(EPA_emi_prod_NG_CH4)



##### Step 3.1.2. Read in Total NG Emissions (Production + Exploration) (kt)

In [None]:
# Read in total production + exploration emissions (with methane reductions accounted for)
# data are in kt

names = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "SUMMARY CH4", usecols = "A:AD", skiprows = 10, header = 0, nrows = 1)
colnames = names.columns.values
EPA_emi_total_NG_CH4 = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "SUMMARY CH4", usecols = "A:AD", skiprows = 17, names = colnames, nrows = 5)
EPA_emi_total_NG_CH4.rename(columns={EPA_emi_total_NG_CH4.columns[0]:'Source'}, inplace=True)
EPA_emi_total_NG_CH4 = EPA_emi_total_NG_CH4.drop(columns = [*range(1990, start_year,1)])
EPA_emi_total_NG_CH4.reset_index(inplace=True, drop=True)

print("EPA GHGI Emissions with Reductions (kt)")
display(EPA_emi_total_NG_CH4)

##### Step 3.1.3 Read in and Format NG GasSTAR Reductions (kt)

In [None]:
# Read in and format Gas STAR reductions data (units of Mg, converted here to kt)
# For NG CH4, current reductions include those for Gas Engines, Compressor Starts, and 'Other'

# get column names from top of spreadsheet
col_range = 'A:AG'
names = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Gas STAR Reductions", usecols = col_range, skiprows = 5, header = 0, nrows = 1)
colnames = names.columns.values

# Load full Gas STAR page and save required reductions
EPA_Gas_STAR_NG_CH4 = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Gas STAR Reductions", usecols = col_range, skiprows = 6, names = colnames, nrows = 50)
EPA_Gas_STAR_NG_CH4 = EPA_Gas_STAR_NG_CH4.fillna('')

EPA_emi_red_NG_CH4 = EPA_Gas_STAR_NG_CH4[EPA_Gas_STAR_NG_CH4['Unnamed: 0'].str.contains('Gas Engines|Compressor Starts|Reduction: Other|Scaling factor')]
EPA_emi_red_NG_CH4= EPA_emi_red_NG_CH4.drop(columns = ['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 3'])
EPA_emi_red_NG_CH4 = EPA_emi_red_NG_CH4.drop(columns = [*range(1990, start_year,1)])
EPA_emi_red_NG_CH4.reset_index(inplace=True, drop = True)
print('EPA GHGI Gas STAR Reductions (row 0-2 in Mg, row 3 in %):')
display(EPA_emi_red_NG_CH4)

#Calculate Reductions for Non-Associated Gas production sources
Emi_red_NonAssoc_NG_CH4 = EPA_emi_red_NG_CH4[EPA_emi_red_NG_CH4['Source'].str.contains('Gas Engines|Compressor Starts')]
start_year_idx = Emi_red_NonAssoc_NG_CH4.columns.get_loc(start_year)
Emi_red_NonAssoc_NG_CH4 = Emi_red_NonAssoc_NG_CH4.iloc[:,start_year_idx:].sum(axis=0)/float(1000) #convert to kt
Emi_red_NonAssoc_total_NG_CH4 = Emi_red_NonAssoc_NG_CH4
print('EPA GHGI Non-Assoc. Reductions (kt):')
display(Emi_red_NonAssoc_total_NG_CH4)

#Calculate reduction for 'Other' production sources
Emi_red_Other_NG_CH4 = EPA_emi_red_NG_CH4.iloc[2,start_year_idx:]/float(1000) 
Emi_red_scale_NG_CH4 = EPA_emi_red_NG_CH4.iloc[3,start_year_idx:] #read in scaling factor for 'Other' reductions
Emi_red_Other_total_NG_CH4 = Emi_red_Other_NG_CH4*Emi_red_scale_NG_CH4   #scale 'Other' reductions
print('EPA GHGI Other Reductions (kt):')
display(Emi_red_Other_total_NG_CH4)



##### Step 3.1.4. Read In and Format NG Regulation Reductions (kt)

In [None]:
# Read in and format Regulation reductions data (units of Mg, converted here to kt)
# Then scale EPA national emissions (for all NEMS regions) by the appropriate reductions
# For NG CH4, current reductions include those for 'Deydrator Vents'

# get column names from top of spreadsheet
col_range = 'A:AG'
names = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Regulations Reductions", usecols = col_range, skiprows = 5, header = 0, nrows = 1)
colnames = names.columns.values

# Load full Reduction Regulations page and save required reductions
EPA_RegRed_NG_CH4 = pd.read_excel(EPA_NG_prod_inputfile, sheet_name = "Regulations Reductions", usecols = col_range, skiprows = 7, names = colnames, nrows = 50)
EPA_RegRed_NG_CH4 = EPA_RegRed_NG_CH4.fillna('')

EPA_emi_regred_NG_CH4 = EPA_RegRed_NG_CH4[EPA_RegRed_NG_CH4['Source'].str.contains('Dehydrator Vents')]
EPA_emi_regred_NG_CH4= EPA_emi_regred_NG_CH4.drop(columns = ['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 3'])
EPA_emi_regred_NG_CH4 = EPA_emi_regred_NG_CH4.drop(columns = [*range(1990, start_year,1)])
EPA_emi_regred_NG_CH4.reset_index(inplace=True, drop = True)
print('EPA GHGI Gas Regulation Reductions (Mg):')
display(EPA_emi_regred_NG_CH4)

# FOR THE 2021 INVENTORY ONLY - Remove 2019 Reg Reductions due to small error in Inventory Workbook (delete this line in future iterations)
#EPA_emi_regred_NG_CH4.loc[0,2019] = 0
##

# Calculate Reductions for Non-Associated Gas production sources
# Add reduction regulations to the Gas STAR reductions
start_year_idx = EPA_emi_regred_NG_CH4.columns.get_loc(start_year)
Emi_red_NonAssoc_total_NG_CH4 += (EPA_emi_regred_NG_CH4.iloc[:,start_year_idx:].sum(axis=0)/float(1000)) #convert Mg to kt
print('EPA GHGI TOTAL Non-Assoc. Reductions (kt):')
print(Emi_red_NonAssoc_total_NG_CH4)


In [None]:
# Apply GasSTAR and Regulation reductions to emissions from 
# Gas Engines, Compressor Starts, and Dehydrator Vents. 
# The the fraction of total remaining national emissions (after reductions) is 
# calculated and applied to each NEMS regions (so that each region has the same fractional reduction)
#NOTE: Other production sector emission reductions will be applied to emissions at a later step. 
#units in Mg

print('Corrected Regional and Total Emissions (Mg)')
#correct gas engine emissions (find the fractional reduction of the national total 
# and apply that reduction fraction to each NEMS region)
emi_temp = EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Gas Engines']
red_temp = EPA_emi_red_NG_CH4[EPA_emi_red_NG_CH4['Source'].str.contains('Gas Engines')]
red_frac = (emi_temp.iloc[-1,2:] - red_temp.iloc[0,1:])/emi_temp.iloc[-1,2:]
emi_update = emi_temp.iloc[:,2:] * red_frac
for iyear in np.arange(0,num_years):
    EPA_emi_prod_NG_CH4.loc[EPA_emi_prod_NG_CH4['Source']=='Gas Engines',year_range[iyear]] = emi_update.loc[:,year_range[iyear]]
display(EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Gas Engines'])

#correct compressor start emissions
emi_temp = EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Compressor Starts']
red_temp = EPA_emi_red_NG_CH4[EPA_emi_red_NG_CH4['Source'].str.contains('Compressor Starts')]
red_frac = (emi_temp.iloc[-1,2:] - red_temp.iloc[0,1:])/emi_temp.iloc[-1,2:]
emi_update = emi_temp.iloc[:,2:] * red_frac
for iyear in np.arange(0,num_years):
    EPA_emi_prod_NG_CH4.loc[EPA_emi_prod_NG_CH4['Source']=='Compressor Starts',year_range[iyear]] = emi_update.loc[:,year_range[iyear]]
display(EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Compressor Starts'])

#Subtract Dehydrator vent regulation reductions
emi_temp = EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Dehydrator Vents']
red_temp = EPA_emi_regred_NG_CH4[EPA_emi_regred_NG_CH4['Source'].str.contains('Dehydrator Vents')]
red_frac = (emi_temp.iloc[-1,2:] - red_temp.iloc[0,1:])/emi_temp.iloc[-1,2:]
emi_update = emi_temp.iloc[:,2:] * red_frac
for iyear in np.arange(0,num_years):
    EPA_emi_prod_NG_CH4.loc[EPA_emi_prod_NG_CH4['Source']=='Dehydrator Vents',year_range[iyear]] = emi_update.loc[:,year_range[iyear]]
EPA_emi_prod_NG_CH4[EPA_emi_prod_NG_CH4['Source'] == 'Dehydrator Vents']  

##### 3.1.5. Split Emissions into Gridding Groups (each Group will have the same proxy applied during the gridding)

In [None]:
# Final Emissions in Units of kt
# Use mapping proxy and source files to split the NEMS-specific GHGI emissions (note some regions don't have NEMS emissions)
region_idx = region_idx_NG_CH4
EPA_emi_prod = EPA_emi_prod_NG_CH4
start_year_idx = EPA_emi_prod.columns.get_loc(start_year)
end_year_idx = EPA_emi_prod.columns.get_loc(end_year)+1
red_frac = np.zeros(num_years)
sum_emi_prod = np.zeros(num_years)
sum_emi_expl = np.zeros(num_years)
sum_emi2 = np.zeros(num_years)

ghgi_prod_groups = ghgi_prod_map['GHGI_Emi_Group'].unique()

DEBUG = 1

for igroup in np.arange(0,len(ghgi_prod_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year
    if proxy_prod_map.loc[proxy_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'NEMS_Data'].values ==1: 
        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
        source_temp = ghgi_prod_map.loc[ghgi_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'GHGI_Source']
        pattern_temp  = '|'.join(source_temp) 
        for iregion in np.arange(0,len(region_idx)-1):
            EPA_emi_prod_region = EPA_emi_prod_NG_CH4.loc[region_idx[iregion]:region_idx[iregion+1],] 
            if iregion < num_regions-1:
                emi_temp = EPA_emi_prod_region[EPA_emi_prod_region['Source'].str.contains(pattern_temp)]
                vars()[ghgi_prod_groups[igroup]][iregion,:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000) #convert Mg to kt
    elif proxy_prod_map.loc[proxy_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'NEMS_Data'].values ==2: #indicates offshore region
        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_years])
        source_temp = ghgi_prod_map.loc[ghgi_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'GHGI_Source']
        pattern_temp  = '|'.join(source_temp) 
        EPA_emi_prod_region = EPA_emi_prod_NG_CH4.loc[region_idx[num_regions-1]:region_idx[num_regions],] #offshore region
        EPA_emi_prod_region.reset_index(inplace=True, drop=True)
        mjr_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='Major Complexes'].values[0]
        min_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='Minor Complexes'].values[0]
        flr_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='GOM Federal Waters Flaring'].values[0]
        sta_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='Offshore GOM State Waters'].values[0]
        ak_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='Offshore Alaska State Waters'].values[0]
        if 'Major' in ghgi_prod_groups[igroup]:
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[mjr_idx+1:min_idx,:] #skip the first row with zero data in this region
        elif 'Minor' in ghgi_prod_groups[igroup]:
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[min_idx+1:flr_idx,:] #skip the first row with zero data in this region
        elif 'Offshore_Both' in ghgi_prod_groups[igroup]:
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[flr_idx:flr_idx+1,:]
        elif 'StateGOM' in ghgi_prod_groups[igroup]:
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[sta_idx+1:ak_idx,:]
        else:
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[ak_idx:-1,:]
        EPA_emi_prod_region.reset_index(inplace=True, drop=True)
        emi_temp = EPA_emi_prod_region[EPA_emi_prod_region['Source'].str.contains(pattern_temp)]
        vars()[ghgi_prod_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000) #convert Mg to kt
        
    else: #for non-offshore groups that don't have NEMS data...
        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_years])
        source_temp = ghgi_prod_map.loc[ghgi_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'GHGI_Source']
        pattern_temp  = '|'.join(source_temp)
        EPA_emi_prod_region = EPA_emi_prod.loc[region_idx[num_regions]+1:,] 
        EPA_emi_prod_region.reset_index(inplace=True, drop=True)
        if 'Emi_Gath' in ghgi_prod_groups[igroup]:
            gb_idx = EPA_emi_prod_region.index[EPA_emi_prod_region['Source']=='Gathering and Boosting'].values[0]
            EPA_emi_prod_region = EPA_emi_prod_region.iloc[gb_idx+2:,:] #skip the first two rows with zero data in this region
            EPA_emi_prod_region.reset_index(inplace=True, drop=True) 
        emi_temp = EPA_emi_prod_region[EPA_emi_prod_region['Source'].str.contains(pattern_temp)]
        vars()[ghgi_prod_groups[igroup]][:] = np.where(emi_temp.iloc[:,start_year_idx:] =='',[0],emi_temp.iloc[:,start_year_idx:]).sum(axis=0)/float(1000) #convert Mg to kt


#at this point, only sepcific reductions have been applied, not the general 'other' category (so need to add back into total) 
print('QA/QC #1: Check Exploration and Production Emission Sum against GHGI Summary Emissions (pre-reductions)')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_prod_groups)):
        if proxy_prod_map.loc[proxy_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'NEMS_Data'].values ==1:
            if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled': 
                sum_emi_expl[iyear] += np.sum(vars()[ghgi_prod_groups[igroup]][:,iyear])
            else:
                sum_emi_prod[iyear] += np.sum(vars()[ghgi_prod_groups[igroup]][:,iyear])
        else:
            if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled':
                sum_emi_expl[iyear] += vars()[ghgi_prod_groups[igroup]][iyear]
            else:
                sum_emi_prod[iyear] += vars()[ghgi_prod_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_total_NG_CH4.iloc[0,iyear+1]+EPA_emi_total_NG_CH4.iloc[1,iyear+1] \
                + Emi_red_Other_total_NG_CH4.iloc[iyear] #+Emi_red_NonAssoc_total_NG_CH4.iloc[iyear]
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs((sum_emi_prod[iyear]+sum_emi_expl[iyear]) - summary_emi)/(((sum_emi_expl[iyear]+sum_emi_prod[iyear]) + summary_emi)/2)
    if DEBUG==1:
        print(summary_emi)
        print(sum_emi_prod[iyear]+sum_emi_expl[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 
        
        
#Apply 'Other' reductions and re-check against final national inventory values
#reductions should only be applied to Production segment NOT Exploration
print('QA/QC #2: Check Exploration and Production Emission Sum against GHGI Summary Emissions (post-reductions)')
for iyear in np.arange(0,num_years):
    red_frac[iyear] = data_fn.safe_div((sum_emi_prod[iyear] - Emi_red_Other_total_NG_CH4.iloc[iyear]), sum_emi_prod[iyear])
    for igroup in np.arange(0,len(ghgi_prod_groups)):
        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled':
            continue
        else:
            if proxy_prod_map.loc[proxy_prod_map['GHGI_Emi_Group'] == ghgi_prod_groups[igroup], 'NEMS_Data'].values ==1:
                vars()[ghgi_prod_groups[igroup]][:,iyear] *= red_frac[iyear]
                sum_emi2[iyear] += np.sum(vars()[ghgi_prod_groups[igroup]][:,iyear])
            else:
                vars()[ghgi_prod_groups[igroup]][iyear] *= red_frac[iyear]
                sum_emi2[iyear] += vars()[ghgi_prod_groups[igroup]][iyear]
    summary_emi = EPA_emi_total_NG_CH4.iloc[0,iyear+1]+EPA_emi_total_NG_CH4.iloc[1,iyear+1]
    diff1 = abs((sum_emi2[iyear]+sum_emi_expl[iyear]) - summary_emi)/(((sum_emi2[iyear]+sum_emi_expl[iyear]) + summary_emi)/2)
    if DEBUG==1:
        print(summary_emi)
        print(sum_emi2[iyear]+sum_emi_expl[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 


----------------
## Step 4. Grid Data (using spatial proxies)
---------------

### Step. 4.1. Calculate the monthly and regional weighted arrays

#### Step 4.1.1 Assign the Appropriate Proxy Variable Names

In [None]:
# The names on the *left* need to match the 'NaturalGas_Production_ProxyMapping' 'Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2.5

#Production segment
Map_Allwell = Map_EnvAllwell
Map_NonAssocProd = Map_EnvNonAssocProd
Map_Basin220 = Map_EnvBasin220
Map_Basin395 = Map_EnvBasin395
Map_Basin430 = Map_EnvBasin430
Map_BasinOther = Map_EnvBasinOther
Map_LeaseC = Map_EnvLeaseC
Map_NonAssocHF = Map_EnvNonAssoc_HF
Map_NonAssocConv = Map_EnvNonAssoc_Conv
Map_CoalBed = Map_EnvCoalBed
Map_NonAssocExpHFComp = Map_EnvNonAssocExp_HF_comp
Map_NonAssocExpConvComp = Map_EnvNonAssocExp_Conv_comp
Map_GasWellExpDrilled = Map_EnvGasWellExp_drilled
Map_StateGOMoffshore = Map_EnvStateGOM_Offshore
#nongrid
Map_Allwell_nongrid = Map_EnvAllwell_nongrid
Map_NonAssocProd_nongrid = Map_EnvNonAssocProd_nongrid
Map_Basin220_nongrid = Map_EnvBasin220_nongrid
Map_Basin395_nongrid = Map_EnvBasin395_nongrid
Map_Basin430_nongrid = Map_EnvBasin430_nongrid
Map_BasinOther_nongrid = Map_EnvBasinOther_nongrid
Map_LeaseC_nongrid = Map_EnvLeaseC_nongrid
Map_NonAssocHF_nongrid = Map_EnvNonAssoc_HF_nongrid
Map_NonAssocConv_nongrid = Map_EnvNonAssoc_Conv_nongrid
Map_CoalBed_nongrid = Map_EnvCoalBed_nongrid
Map_NonAssocExpHFComp_nongrid = Map_EnvNonAssocExp_HF_comp_nongrid
Map_NonAssocExpConvComp_nongrid = Map_EnvNonAssocExp_Conv_comp_nongrid
Map_GasWellExpDrilled_nongrid = Map_EnvGasWellExp_drilled_nongrid
Map_StateGOMoffshore_nongrid = Map_EnvStateGOM_Offshore_nongrid
#Offshore
Map_FedGOMOffshoreMajor = Map_GOADSmajor_emissions
Map_FedGOMOffshoreMinor = Map_GOADSminor_emissions
Map_FedGOMOffshore_Both = Map_FedGOMOffshoreMajor + Map_FedGOMOffshoreMinor
Map_FedGOMOffshore_Both_nongrid = Map_FedGOMOffshoreMajor_nongrid + Map_FedGOMOffshoreMinor_nongrid

#Gathering and Boosting
Map_GatheringStations          = Map_EnvGB_stations
Map_GatheringStations_nongrid  = Map_EnvGB_stations_nongrid
Map_GatheringPipelines         = Map_EnvGB_pipelines
Map_GatheringPipelines_nongrid = Map_EnvGB_pipelines_nongrid



In [None]:
# Calculate weighting arrays
# Find the fraction of wells (or gas production) in each grid cell, relative to the total well counts (or gas prod) (on and off grid)
# also weight by the number of days in each month
# Note condensate array is alrady calculated previously in Step 2.4

for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap  
    
    #For each Proxy in List:
    #Step 1a: weighted proxy ongrid = ongrid proxy * days each month
    #Step 1b: weighted proxy offgrid = offgrid proxy * days each month
    #Step 2a: noramlized weighted proxy ongrid = weighted proxy in each grid cell / (sum weighted proxy ongrid + weighted proxy offgrid)
    #Step 2b: noramlized weighted proxy offgrid = weighted proxy offgrid / (sum weighted proxy ongrid + weighted proxy offgrid)
    print('Prod. Proxy Arrays: ', year_range[iyear])
    for isource in np.arange(0,len(proxy_prod_map)): 
        #print(proxy_prod_map.loc[isource, 'Proxy_Group'])
        if proxy_prod_map.loc[isource, 'Month_Flag'] == 1:            
            #first weight by the number of days in each month
            #then normalize within each NEMS region (or country wide)
            if proxy_prod_map.loc[isource, 'NEMS_Data'] == 1:
                for imonth in np.arange(0, num_months):
                    #first weight by the number of days in each month (weighted map for month = month map * number of days in each month)
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,:,iyear,imonth] *= month_days[imonth]
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][inems,iyear,imonth] *= month_days[imonth]
                for inems in np.arange(0,6): #**** inems??? ****
                    temp_sum = float(np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][inems,:,:,iyear,:]) + \
                             np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][inems,iyear,:]))
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']][inems,:,:,iyear,:] = \
                            data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][inems,:,:,iyear,:], temp_sum)
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][inems,iyear,:] = \
                            data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][inems,iyear,:], temp_sum)
                    proxy_sum = np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][inems,:,:,iyear,:])+\
                        np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][inems,iyear,:])
                    #DEBUG# print(proxy_sum)
                    if (proxy_sum >1.0001 or proxy_sum <0.9999) and proxy_prod_map.loc[isource,'Proxy_Group'] != 'Map_CoalBed':
                        print('Check ', proxy_prod_map.loc[isource,'Proxy_Group'], 'NEMS ', inems, ': ', proxy_sum)
                    #DEBUG# else:
                    #DEBUG#    print('Pass: ', proxy_prod_map.loc[isource,'Proxy_Group'], ' ', proxy_sum)
                        
            elif proxy_prod_map.loc[isource, 'NEMS_Data'] == 0 or proxy_prod_map.loc[isource, 'NEMS_Data'] == 2:
                for imonth in np.arange(0, num_months):
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,imonth] *= month_days[imonth]
                    vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,imonth] *= month_days[imonth]
                temp_sum = float(np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:]) + \
                        np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:])) 
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:] = \
                        data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:], temp_sum)
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:] = \
                        data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:], temp_sum)
                #DEBUG# proxy_sum = np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear,:])+np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear,:])
                if proxy_sum >1.0001 or proxy_sum <0.9999:
                    print('Check ', proxy_prod_map.loc[isource,'Proxy_Group'],': ', proxy_sum)   
                #DEBUG# else:
                #DEBUG#     print('Pass: ', proxy_prod_map.loc[isource,'Proxy_Group'], ' ', proxy_sum)

    
        else: #annual proxies
            if proxy_prod_map.loc[isource, 'NEMS_Data'] == 0:
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear] *= np.sum(month_days)
                #DEBUG# print(np.sum(vars()[proxy_prod_map.loc[iproxy,'Proxy_Group']][:,:,iyear]))
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] *= np.sum(month_days)
                temp_sum = float(np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear]) + \
                    np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear]))
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear] = \
                        data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear], temp_sum)
                vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear] = \
                        data_fn.safe_div(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
                proxy_sum = np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_prod_map.loc[isource,'Proxy_Group']+'_nongrid'][iyear])
                if proxy_sum >1.0001 or proxy_sum <0.9999:
                    print('Check ', proxy_prod_map.loc[isource,'Proxy_Group'],': ', proxy_sum)   
                #DEBUG# else:
                #DEBUG#     print('Pass: ', proxy_prod_map.loc[isource,'Proxy_Group'], ' ', proxy_sum)
                  

In [None]:
# Correct and necessary Proxy Arrays (those that equal 0 and emissions > 0)
# As of current GEPA version, Map_NonAssocExpHFComp is the only array that needs fixing (group 4 only)
# These necessary corrections can be informed by comparing whether there are GHGI emission an emissions group
# in a given NEMS regions, for a given year and whether the corresponding Map_Proxy = 0 for that same region/time period

# example test (uncomment the following lines): 
#inems = 4
#iyear = 5
#print(np.sum(Emi_NonAssocExpHFComp[inems,iyear]))
#print(np.sum(Map_NonAssocExpHFComp[inems,:,:,iyear,:]))
# if Emi > 0 and Map = 0, need to correct the Map array with data from a different year

Map_NonAssocExpHFComp[4,:,:,3,:] = Map_NonAssocExpHFComp[4,:,:,2,:]
Map_NonAssocExpHFComp[4,:,:,4,:] = Map_NonAssocExpHFComp[4,:,:,2,:]
Map_NonAssocExpHFComp[4,:,:,5,:] = Map_NonAssocExpHFComp[4,:,:,2,:]

### Step. 4.2. Weight the National Data to Grid, then Calculate 0.1x0.1 degree flux maps

In [None]:
# For production segment...
# 1) make flux array with correct dimensions
# 2) weight monthly data by days in month (or year)
# 3) caluclate flux as Flux = GHGI emissions * Proxy Map

DEBUG=1

Emissions_prod = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_prod_nongrid = np.zeros([num_years,num_months])
Emissions_expl = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Emissions_expl_nongrid = np.zeros([num_years,num_months])
Emi_not_mapped_sum = np.zeros(num_years)
CONUS_red= np.zeros(num_years)

if DEBUG==1:
    total_sum = np.zeros(num_years)
    proxy_val= np.zeros(num_years)
    ghgi_val= np.zeros(num_years)

# loop through each emissions group, gridded emissions = national emissions * proxy
for igroup in np.arange(0,len(proxy_prod_map)):
    #first check whether the Emi group was created (i.e., that the given proxy is used)
    # if the given proxy is listed in the excel mapping file, but is not actually used to grid, skip and move to next proxy 
    if str(proxy_prod_map.loc[igroup,'GHGI_Emi_Group']) not in locals():
        continue 
    print(proxy_prod_map.loc[igroup, 'Proxy_Group'])
    #deal with groups where proxies have monthly data
    if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
        vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        #print(np.shape())
        for iyear in np.arange(0,num_years):
            #deal with groups where proxies have month data and national emissions are by NEMS group
            if proxy_prod_map.loc[igroup, 'NEMS_Data'] == 1:
                for inems in np.arange(0,6):
                    vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:] += \
                        vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][inems,iyear] * \
                        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][inems,:,:,iyear,:]
                    vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:] += \
                        vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][inems,iyear] * \
                        vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'][inems,iyear,:]
                    #print(np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][inems,iyear]))                 
                for imonth in np.arange(0,num_months):
                    if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled': 
                            #print('group:',proxy_prod_map.loc[igroup,'GHGI_Emi_Group'])
                            Emissions_expl[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                            Emissions_expl_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]
                        else:
                            Emissions_prod[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                            Emissions_prod_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]

                if DEBUG==1:
                    proxy_val[iyear] = np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])+\
                                 np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:])
                    ghgi_val[iyear] = np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])
                    total_sum[iyear] += proxy_val[iyear]
            
            #deal with groups where proxies have month data but emissions are national totals (e.g., NEMS_Data = 0 or 2)
            elif proxy_prod_map.loc[igroup, 'NEMS_Data'] == 0 or proxy_prod_map.loc[igroup, 'NEMS_Data'] == 2:
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:] += \
                    vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][:,:,iyear,:]
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:] += \
                      vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                      vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear,:]
                                
                for imonth in np.arange(0,num_months):
                    if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                        if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled': 
                            #print('group:',proxy_prod_map.loc[igroup,'GHGI_Emi_Group'])
                            Emissions_expl[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                            Emissions_expl_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]
                        else:
                            Emissions_prod[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]
                            Emissions_prod_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,imonth]

                if DEBUG==1:
                    proxy_val[iyear] = np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])+\
                             np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear,:])
                    ghgi_val[iyear] = np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear])
                    total_sum[iyear] += proxy_val[iyear]
                                   
    #deal with groups where proxies don't have month data
    else:
        vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'] = np.zeros([num_years])
        for iyear in np.arange(0,num_years):
            if proxy_prod_map.loc[igroup,'Proxy_Group'] == 'Map_GatheringPipelines':
                 #If gathpipelines, remove AK/HI fraction of emissions, and save other fraction as not_mapped emissions
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += \
                     (vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] -\
                     (vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear]* CONUS_ratio)) * \
                     vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][:,:,iyear]
                CONUS_red[iyear] += vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear]* CONUS_ratio
            else:
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += \
                    vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    vars()[proxy_prod_map.loc[igroup,'Proxy_Group']][:,:,iyear]
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear] += \
                    vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    vars()[proxy_prod_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear]
            if DEBUG==1:
                proxy_val[iyear] = np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])+\
                         np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear])
                ghgi_val[iyear] = np.sum(vars()[proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][iyear])
                total_sum[iyear] += proxy_val[iyear]
                        
            for imonth in np.arange(0,num_months):
                if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                    if proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpWell' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpHFComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_NonAssocExpConvComp' or \
                            proxy_prod_map.loc[igroup,'GHGI_Emi_Group'] == 'Emi_GasWellExpDrilled': 
                        #print('group:',proxy_prod_map.loc[igroup,'GHGI_Emi_Group'])
                        Emissions_expl[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                        Emissions_expl_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months
                    else:
                        Emissions_prod[:,:,iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]/num_months
                        Emissions_prod_nongrid[iyear,imonth] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_nongrid'][iyear]/num_months

    if DEBUG==1:
        print(igroup, proxy_val[:])
        print(igroup, ghgi_val[:])
    
#sum all the not_mapped emissions with the added AK/HI contributions (this is prod only)
for iyear in np.arange(0,num_years): 
    Emi_not_mapped_sum[iyear] = Emi_not_mapped[iyear] + CONUS_red[iyear]
    Emissions_prod_nongrid[iyear,:] += (1/12)*Emi_not_mapped_sum[iyear]
    
if DEBUG==1:
    print(CONUS_red[iyear], Emi_not_mapped[iyear])

# QA/QC gridded emissions
# Check sum of all gridded emissions + emissions not included in gridding (e.g., AK), and other non-gridded areas
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    calc_emi = 0
    summary_emi = EPA_emi_total_NG_CH4.iloc[0,iyear+1]+EPA_emi_total_NG_CH4.iloc[1,iyear+1]
    for igroup in np.arange(0,len(proxy_prod_map)):
        if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
            calc_emi += np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,:])
        else:
            calc_emi += np.sum(vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
                
    calc_emi += np.sum(Emissions_expl_nongrid[iyear,:]) +np.sum(Emissions_prod_nongrid[iyear,:])
    if DEBUG==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

#### Step 4.2.2 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_prod_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_prod_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()['Ext_'+unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = np.sum(Emissions_prod_nongrid[:,:],axis=1)+np.sum(Emissions_expl_nongrid[:,:],axis=1)
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

### Step 4.3 Calculate Gridded Fluxes (molec/s/cm2)

In [None]:
#Step 2 -- Calculate fluxes (molec./s/cm2)

DEBUG =1

#Initialize arrays
check_sum = np.zeros([num_years])
check_sum_annual = np.zeros([num_years])
Flux_Emissions_Total_prod = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Total_prod_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_Emissions_Total_expl = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
Flux_Emissions_Total_expl_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
for igroup in np.arange(0,len(proxy_prod_map)):
    vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'] = np.zeros([len(Lat_01),len(Lon_01),num_years])


#Calculate fluxes
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        month_days = month_day_nonleap 
    
    # calculate fluxes for each emissions group and national sum  (=kt * grams/kt *molec/mol *mol/g *s^-1 * cm^-2)
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 * np.sum(month_days) * 24 * 60 *60) / area_matrix_01
    for igroup in np.arange(0,len(proxy_prod_map)):
        #first check whether the Flux group was created (i.e., that the given proxy is used)
        # if the given proxy is listed in the excel mapping file, but is not actually used to grid, skip and move to next proxy 
        if str('Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']) not in locals():
            continue
        
        if proxy_prod_map.loc[igroup, 'Month_Flag'] == 0:
            vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] *= conversion_factor_annual
            vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] = vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear]
        if proxy_prod_map.loc[igroup, 'Month_Flag'] == 1:
            for imonth in np.arange(0,num_months):
                conversion_factor_month = 10**9 * Avogadro / float(Molarch4 * month_days[imonth] * 24 * 60 *60) / area_matrix_01
                conv_factor2 = month_days[imonth]/year_days
                vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth] *= conversion_factor_month
                vars()['Flux_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']+'_annual'][:,:,iyear] += vars()['Ext_'+proxy_prod_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear,imonth]*conv_factor2        
    
    #calculate national total flux    
    for imonth in np.arange(0, num_months):
        conversion_factor_month = 10**9 * Avogadro / float(Molarch4 * month_days[imonth] * 24 * 60 *60) / area_matrix_01
        conv_factor2 = month_days[imonth]/year_days
        Flux_Emissions_Total_prod[:,:,iyear,imonth] = Emissions_prod[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Total_prod_annual[:,:,iyear] += Flux_Emissions_Total_prod[:,:,iyear,imonth]*conv_factor2
        Flux_Emissions_Total_expl[:,:,iyear,imonth] = Emissions_expl[:,:,iyear,imonth]*conversion_factor_month
        Flux_Emissions_Total_expl_annual[:,:,iyear] += Flux_Emissions_Total_expl[:,:,iyear,imonth]*conv_factor2

         
        #calculate the monthly running flux totals and convert from flux back to mass (also calc annual sum)    
        check_sum[iyear] += np.sum(Flux_Emissions_Total_prod[:,:,iyear,imonth]/conversion_factor_month) +\
                            np.sum(Flux_Emissions_Total_expl[:,:,iyear,imonth]/conversion_factor_month)
    check_sum_annual[iyear] += np.sum(Flux_Emissions_Total_prod_annual[:,:,iyear]/conversion_factor_annual) +\
                                np.sum(Flux_Emissions_Total_expl_annual[:,:,iyear]/conversion_factor_annual)

print(' ')
print('QA/QC #2: Check final gridded fluxes against GHGI')  
# for the sum, check the converted annual emissions (convert back from flux) plus all the non-gridded emissions
for iyear in np.arange(0,num_years):
    calc_emi = check_sum_annual[iyear] + np.sum(Emissions_prod_nongrid[iyear,:])+np.sum(Emissions_expl_nongrid[iyear,:]) #Emi_not_mapped_sum[iyear] +
    summary_emi = EPA_emi_total_NG_CH4.iloc[0,iyear+1]+EPA_emi_total_NG_CH4.iloc[1,iyear+1]
    if DEBUG==1:
        print(calc_emi)
        print(summary_emi)
    
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

-------------
## Step 5. Write gridded (0.1⁰x0.1⁰) data to netCDF files.
-------------

In [None]:
# Initialize netCDF files
#exploration
data_IO_fn.initialize_netCDF(gridded_expl_outputfile, netCDF_expl_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_expl_monthly_outputfile, netCDF_expl_description, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_expl_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_expl_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual exploration fluxes written to file: {}" .format(os.getcwd())+gridded_expl_outputfile)
print('')

nc_out = Dataset(gridded_expl_monthly_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Total_expl
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly exploration fluxes written to file: {}" .format(os.getcwd())+gridded_expl_monthly_outputfile)
print('')

#production
data_IO_fn.initialize_netCDF(gridded_prod_outputfile, netCDF_prod_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
data_IO_fn.initialize_netCDF(gridded_prod_monthly_outputfile, netCDF_prod_description, 1, year_range, loc_dimensions, Lat_01, Lon_01)

# Write the Data to netCDF
nc_out = Dataset(gridded_prod_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_prod_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded annual production fluxes written to file: {}" .format(os.getcwd())+gridded_prod_outputfile)
print('')

nc_out = Dataset(gridded_prod_monthly_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:,:] = Flux_Emissions_Total_prod
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded monthly production fluxes written to file: {}" .format(os.getcwd())+gridded_prod_monthly_outputfile)
print('')

-------------
## Step 6. Plot Data
-------------

#### 6.1 Plot Annual Emission Fluxes

In [None]:
# Plot annual emissions for each year
#xploration
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_expl_annual, Lat_01, Lon_01, year_range, title_expl_str, scale_max,save_flag,save_outfile)

#production
scale_max = 10
save_flag = 0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_prod_annual, Lat_01, Lon_01, year_range, title_prod_str, scale_max,save_flag,save_outfile)

#### 6.2 Plot Difference Between First and Last Inventory Year

In [None]:
# Plot difference between last and first year
#exploration
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_expl_annual, Lat_01, Lon_01, year_range, title_expl_diff_str,save_flag,save_outfile)

#production
save_flag = 0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_prod_annual, Lat_01, Lon_01, year_range, title_prod_diff_str,save_flag,save_outfile)

#### 6.3 Plot Key Proxy Data

In [None]:
#Map (well location) heatmap

# Activity_Map = 0.1x0.1 map of activity data (counts or absolute units)
# Plot_Frac    = 0 or 1 (0= plot activity data in absolute counts, 1= plot fractional activity data)
# Lat          = 0.1 degree Lat values (select range)
# Lon          = 0.1 degree Lon values (select range)
# year_range   = array of inventory years
# title_str    = title of map
# legend_str   = title of legend
# scale_max    = maximum of color scale

map_output = np.zeros([len(Lat_01),len(Lon_01),num_years])
for iyear in np.arange(0,num_years):
    for imonth in np.arange(0,num_months):
        for inems in np.arange(0,num_regions-1):
            map_output[:,:,iyear] += Map_EnvLeaseC[inems,:,:,iyear,imonth]  

            
Activity_Map = map_output
Plot_Frac = 1
Lat = Lat_01
Lon = Lon_01
year_range = year_range
title_str2 = "Proxy - Non-Associated Gas Well Locations"
legend_str = "Annual Fraction of National Well Population"
scale_max = 0.001

for iyear in np.arange(0,1):#len(year_range)): 
    my_cmap = copy(plt.cm.get_cmap('rainbow',lut=3000))
    my_cmap._init()
    slopen = 200
    alphas_slope = np.abs(np.linspace(0, 1.0, slopen))
    alphas_stable = np.ones(3003-slopen)
    alphas = np.concatenate((alphas_slope, alphas_stable))
    my_cmap._lut[:,-1] = alphas
    my_cmap.set_under('gray', alpha=0)
    
    Lon_cor = Lon[50:632]-0.05
    Lat_cor = Lat[43:300]-0.05
    
    xpoints = Lon_cor
    ypoints = Lat_cor
    yp,xp = np.meshgrid(ypoints,xpoints)
    
    if np.shape(Activity_Map)[0] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[iyear,43:300,50:632]/np.sum(Activity_Map[iyear,:,:])
        else:
            zp = Activity_Map[iyear,43:300,50:632]
    elif np.shape(Activity_Map)[2] == len(year_range):
        if Plot_Frac ==1:
            zp = Activity_Map[43:300,50:632,iyear]/np.sum(Activity_Map[:,:,iyear])
        else: 
            zp = Activity_Map[43:300,50:632,iyear]
    #zp = zp/float(10**6 * Avogadro) * (year_days * 24 * 60 * 60) * Molarch4 * float(1e10)
    
    fig, ax = plt.subplots(dpi=300)
    m = Basemap(llcrnrlon=xp.min(), llcrnrlat=yp.min(), urcrnrlon=xp.max(),
                urcrnrlat=yp.max(), projection='merc', resolution='h', area_thresh=5000)
    m.drawmapboundary(fill_color='Azure')
    m.fillcontinents(color='FloralWhite', lake_color='Azure',zorder=1)
    m.drawcoastlines(linewidth=0.5,zorder=3)
    m.drawstates(linewidth=0.25,zorder=3)
    m.drawcountries(linewidth=0.5,zorder=3)
        
        #if Plot_Frac == 1:
        #    scale_max 
    
    xpi,ypi = m(xp,yp)
    plot = m.pcolor(xpi,ypi,zp.transpose(), cmap=my_cmap, vmin=10**-15, vmax=scale_max, snap=True,zorder=2)
    #plot = m.scatter(xpi,ypi,s=20,c=zp.transpose(),cmap=my_cmap,zorder=2,vmin = 10**-15,snap = True,vmax = scale_max)
    cb = m.colorbar(plot, location = "bottom", pad = "1%")        
    tick_locator = ticker.MaxNLocator(nbins=5)
    cb.locator = tick_locator
    cb.update_ticks()
    
    cb.ax.set_xlabel(legend_str,fontsize=10)
    cb.ax.tick_params(labelsize=10)
    Titlestring = str(year_range[iyear])+' '+title_str2
    plt.title(Titlestring, fontsize=14);
    plt.show();

In [None]:
ct = datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** 1B2b_Natural_Gas_Systems_ProductionExploration: COMPLETE **')