# Gridded EPA Methane Inventory
## Category: 1B1a Coal Mines (Active)

***
#### Authors: 
Joannes D. Maasakkers, Erin E. McDuffie
#### Date Last Updated: 
see Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual gridded (0.1°x0.1°) methane emission fluxes (molec./cm2/s) from active coal mining (surface and underground) in the CONUS region between 2012-2018.    
#### Summary & Notes:
EPA GHGI active coal mining emissions from underground (mining, post-mining, recovered and used) and surface (mining and post-mining) activities are read in at the state level from the GHGI (GHGI workbook). For all sources, national total emissions are first allocated to states based on the state-level emissions. Net underground mining emissions are taken as the sum of underground mining emissions and the amount of methane recovered and used. State-level net underground mining emissions are then allocated to a 0.01°x0.01° grid using high resolution maps of underground mine locations and emissions, from both the GHGRP (subpart FF, where available) and estimated from relative levels of coal production at each active underground mine relative to total state production (from the EIA & Mine Safety and Health Administration [MSHA]), weighted by the basin-level in situ methane content of coal in states with mine sin multiple basins. State-level emissions from all active surface mine activities are allocated to a 0.01°x0.01° grid using high resolution maps of surface mine locations and emissions, as estimated from the coal production at each active surface mine, relative to the total state production (from EIA & MSHA), also weighted by the basin-level in situ methane content of coal in states with mines in multiple basins. All emissions are re-gridded to a 0.1°x0.1° grid. Emissions are converted to emission flux. Annual emission fluxes (molec./cm2/s) are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder. 
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
-------

In [None]:
#Confirm working directory
import os
import time
modtime = os.path.getmtime('./v2_1B1a_Coal.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import datetime
from copy import copy

# Import additional modules
# Load plotting package Basemap 
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Load Tabula (for reading tables from PDFs)
import tabula as tb   
    
# Load user-defined global functions (modules)
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]
print(globalinputlocation)

# Specify names of inputs files used in this notebook
#EPA Data
EPA_coal_inputfile = '../Global_InputData/GHGI/Ch3_Energy/Coal1990-2018_022020.xlsm'

#Proxy Data file
Coal_Mapping_inputfile = "./InputData/Coal_ProxyMapping.xlsx"

#Activity Data
EIA_mine_inputfile = "./InputData/EIA/coalpublic"
Mine_loc_inputfile = "../Global_InputData/MSHA/Mines.txt"
Corrected_surf_mine_inputfile = "./InputData/Updated_Loc.csv"
Corrected_ug_mine_inputfile = "./InputData/Updated_Loc_ug.csv"

#OUTPUT FILES
gridded_outputfile = '../Final_Gridded_Data/EPA_v2_1B1a_Coal.nc'
gridded_surf_outputfile = '../Final_Gridded_Data/EPA_v2_1B1a_Surface_Coal.nc'
gridded_und_outputfile = '../Final_Gridded_Data/EPA_v2_1B1a_Underground_Coal.nc'
netCDF_description = 'Gridded EPA Inventory - Coal Mine Emissions - IPCC Source Category 1B1a'
netCDF_surf_description = 'Gridded EPA Inventory - Surface Coal Mining Emissions - IPCC Source Category 1B1a'
netCDF_und_description = 'Gridded EPA Inventory - Underground Coal Mining Emissions - IPCC Source Category 1B1a'
title_str = "EPA methane emissions from coal mines"
title_diff_str = "Emissions from coal mines difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Coal_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01
tg_scale   = 0.001                  #Tg scale number [New file allows for the exclusion of the territories] 

# Million cubic ft (mmcf) to Tg conversion factor - Source: EPA spreadsheet, 'CM Emissions Summary' tab
mmcf_to_tg = 51921

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;

In [None]:
# Track run time
ct = datetime.datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:2]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
state_ANSI_map = state_ANSI_map.astype('int32')
#county_ANSI_map = data_load_fn.load_county_ansi_map(Grid_county001_ansi_inputfile)
#county_ANSI_map = county_ANSI_map.astype('int32')
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state and county ANSI maps
area_map01, Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[0:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2

state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)

# Print time
ct = datetime.datetime.now() 
print("current time:", ct) 

-------------
## Step 2: Read-in and Format Proxy Data
-------------

#### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(Coal_Mapping_inputfile, sheet_name = "GHGI Map - Coal", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_coal_map = pd.read_excel(Coal_Mapping_inputfile, sheet_name = "GHGI Map - Coal", usecols = "A:B", skiprows = 1, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_coal_map = ghgi_coal_map[ghgi_coal_map['GHGI_Emi_Group'] != 'na']
ghgi_coal_map = ghgi_coal_map[ghgi_coal_map['GHGI_Emi_Group'].notna()]
ghgi_coal_map['GHGI_Source']= ghgi_coal_map['GHGI_Source'].str.replace(r"\(","")
ghgi_coal_map['GHGI_Source']= ghgi_coal_map['GHGI_Source'].str.replace(r"\)","")
ghgi_coal_map.reset_index(inplace=True, drop=True)
display(ghgi_coal_map)

#load emission group - proxy map
names = pd.read_excel(Coal_Mapping_inputfile, sheet_name = "Proxy Map - Coal", usecols = "A:E",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_coal_map = pd.read_excel(Coal_Mapping_inputfile, sheet_name = "Proxy Map - Coal", usecols = "A:E", skiprows = 1, names = colnames)
display((proxy_coal_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_coal_map)):
    if proxy_coal_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        vars()[proxy_coal_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_coal_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_coal_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_coal_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_coal_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
    
    if proxy_coal_map.loc[igroup,'State_Proxy_Group'] != '-':
        if proxy_coal_map.loc[igroup,'State_Month_Flag'] == 0:
            vars()[proxy_coal_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years])
            #print('here')
        else:
            vars()[proxy_coal_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years,num_months])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file
        
emi_group_names = np.unique(ghgi_coal_map['GHGI_Emi_Group'])

print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_coal_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')
    print(emi_group_names)

#### Step 2.2. Read in EPA State-Level Data

In [None]:
#Read in State-Level Data from InvDB tab in the Inventory workbook (Tg == 1000 kt)

names = pd.read_excel(EPA_coal_inputfile, sheet_name = "InvDB", usecols = "A:AJ",skiprows = 15, header = 0)
colnames = names.columns.values
EPA_emi_coal_CH4 = pd.read_excel(EPA_coal_inputfile, sheet_name = "InvDB", usecols = "A:AJ", skiprows = 15, nrows = 140,names = colnames)
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.fillna('')
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.drop(columns = [n for n in range(1990, start_year,1)])
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.drop(columns = ['Sector','Source','Subsource','Fuel','GHG'])
EPA_emi_coal_CH4.rename(columns={'Subref':'Source'},inplace=True)
EPA_emi_coal_CH4['Source']= EPA_emi_coal_CH4['Source'].str.replace(r"\(","")
EPA_emi_coal_CH4['Source']= EPA_emi_coal_CH4['Source'].str.replace(r"\)","")
EPA_emi_coal_CH4.reset_index(inplace=True, drop=True)


display(EPA_emi_coal_CH4)

# Make State_arrays for net underground emissions, post-mining underground, post-mining surface, and surface emissions
state_under = np.zeros([len(State_ANSI),num_years])
state_post_under = np.zeros([len(State_ANSI),num_years])
state_post_surf = np.zeros([len(State_ANSI),num_years])
state_suface = np.zeros([len(State_ANSI),num_years])

for istate in np.arange(0, len(State_ANSI)):
    temp = EPA_emi_coal_CH4[EPA_emi_coal_CH4['State'] == State_ANSI['abbr'][istate]]
    if len(temp) > 0:
        for iyear in np.arange(0, num_years):
            state_under[istate,iyear] = float(temp.loc[temp['Source']=='Underground Liberated',year_range[iyear]])+\
                                        float(temp.loc[temp['Source']=='Underground Recovered &Used',year_range[iyear]])
            state_post_under[istate,iyear] = float(temp.loc[temp['Source']=='Post-Mining Underground',year_range[iyear]])
            state_post_surf[istate,iyear] = float(temp.loc[temp['Source']=='Post-Mining Surface',year_range[iyear]])
            state_suface[istate,iyear] = float(temp.loc[temp['Source']=='Surface Mining',year_range[iyear]])
            

#### Step 2.3. Read mine-level Data

##### Step 2.3.1 Read in EPA Data

In [None]:
# 1) Read-in Mine level emissions data (from GHGI workbook) 
# emissions in Tg

for iyear in np.arange(0,num_years):
    year_name = 'UG-'+year_range_str[iyear]
    start_row = 3
    if year_range[iyear]==2012:
        start_row = 2
        num_rows = 117
    elif year_range[iyear] ==2013:
        num_rows = 204
    elif year_range[iyear] ==2014:
        num_rows = 178
    elif year_range[iyear] ==2015:
        num_rows = 219
    elif year_range[iyear] ==2016:
        num_rows = 162
    elif year_range[iyear] ==2017:
        num_rows = 161
    elif year_range[iyear] ==2018:
        num_rows = 163
    names = pd.read_excel(EPA_coal_inputfile, sheet_name = year_name, usecols = "A,D,G,J",skiprows = start_row, header = 0)
    colnames = names.columns.values
    EPA_emi_mine = pd.read_excel(EPA_coal_inputfile, sheet_name = year_name, usecols = "A,D,G,J", skiprows = start_row, nrows = num_rows,names = colnames)
    EPA_emi_mine = EPA_emi_mine.fillna('')
   # mmcf_to_tg
    
    EPA_emi_mine.rename(columns={EPA_emi_mine.columns[3]: 'Net Emissions Tg'},inplace=True)
    EPA_emi_mine['Net Emissions Tg'] /= mmcf_to_tg #convert to from mmcf Tg
    EPA_emi_mine.reset_index(inplace=True, drop=True)
    vars()['EPA_mine_emi_'+year_range_str[iyear]] = EPA_emi_mine


##### Step 2.3.2 Read in EIA Data

In [None]:
#2 ) Read in EIA Data
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012:
        cols= "A,B,D,G,M"
    else:
        cols = "A,B,D,G,N"
    filename = EIA_mine_inputfile+year_range_str[iyear]+'.xls'
    names = pd.read_excel(filename, sheet_name = 'Hist_Coal_Prod', usecols = cols,skiprows = 3, header = 0)
    colnames = names.columns.values
    EIA_mine_prod = pd.read_excel(filename, sheet_name = 'Hist_Coal_Prod', usecols = cols, skiprows = 3, names = colnames)
    EIA_mine_prod = EIA_mine_prod.fillna('')

    EIA_mine_prod.rename(columns={EIA_mine_prod.columns[4]: 'Production Short Tons'},inplace=True)
    EIA_mine_prod.rename(columns={EIA_mine_prod.columns[1]: 'MSHA'},inplace=True)
    EIA_mine_prod.reset_index(inplace=True, drop=True)
    vars()['EIA_mine_prod'+year_range_str[iyear]] = EIA_mine_prod

    display(vars()['EIA_mine_prod'+year_range_str[iyear]])

##### Step 2.3.3 Make Mine Reference Array with Mine-Level ID, Production, and Emissions Data

In [None]:
# Make mine reference dataframe for EIA mines (make a list of all unique mines)

# Start with 2012 data
iyear = 0
mine_ref = pd.DataFrame(data=vars()['EIA_mine_prod'+year_range_str[iyear]]['MSHA'])
mine_ref['Mine Type'] = vars()['EIA_mine_prod'+year_range_str[iyear]]['Mine Type']
mine_ref['Mine State'] = vars()['EIA_mine_prod'+year_range_str[iyear]]['Mine State']

# Fill out dataframe with unique mines for the rest of years
for iyear in np.arange(0,num_years):
    temp_df = vars()['EIA_mine_prod'+year_range_str[iyear]]
    for imine in np.arange(0,len(temp_df)):
        if len(mine_ref.loc[mine_ref['MSHA']==temp_df['MSHA'][imine]])==0: # if mine not already in dataframe
            new_mine = pd.DataFrame([[temp_df['MSHA'][imine],temp_df['Mine Type'][imine],temp_df['Mine State'][imine]]], columns=['MSHA','Mine Type','Mine State']) # add mine entry to mine_ref
            mine_ref=mine_ref.append(new_mine,ignore_index=True)
mine_ref.reset_index(inplace=True, drop=True)

# Add production and emissions for each mine for each year
for iyear in np.arange(0,num_years):
    temp_eia_df = vars()['EIA_mine_prod'+year_range_str[iyear]]
    temp_epa_df = vars()['EPA_mine_emi_'+year_range_str[iyear]] 
    mine_ref['prod_'+year_range_str[iyear]]=0.0 #add production columns for each year
    mine_ref['emi_'+year_range_str[iyear]]=0.0 #add emissions columns for each year
    
    for imine in np.arange(0, len(temp_eia_df)): #add production values
        match_mine = np.where(mine_ref['MSHA']==temp_eia_df['MSHA'][imine])[0][0]   
        mine_ref.at[match_mine, 'prod_'+year_range_str[iyear]] = temp_eia_df['Production Short Tons'][imine]
        
        
    for imine in np.arange(0, len(temp_epa_df)): #add emi values
        if len(np.where(mine_ref['MSHA']==temp_epa_df['MSHA Mine ID'][imine])[0])!=0:
            match_mine = np.where(mine_ref['MSHA']==temp_epa_df['MSHA Mine ID'][imine])[0][0]            
            mine_ref.at[match_mine, 'emi_'+year_range_str[iyear]] = temp_epa_df['Net Emissions Tg'][imine]
                           
display(mine_ref)

#### Step 2.4 Add Mine Location Information (from MSHA)

##### Step 2.4.1 Read in MSHA Data

In [None]:
# Read datafiles with accurate lat/lon - Source: msha.gov
MSHA_Locations = pd.read_csv(Mine_loc_inputfile,sep="|",encoding= 'unicode_escape',usecols=["MINE_ID","LATITUDE","LONGITUDE"])
MSHA_Locations.dropna(axis=0,inplace=True)
MSHA_Locations.reset_index(inplace=True)

# Add lat/lon columns to mine_ref
mine_ref['LAT']=0.0
mine_ref['LON']=0.0

counter=0
for imine in np.arange(0,len(mine_ref)):
    if len(MSHA_Locations[MSHA_Locations['MINE_ID']==mine_ref['MSHA'][imine]])!=0:
        match_mine = np.where(mine_ref['MSHA'][imine]==MSHA_Locations['MINE_ID'])[0][0]
        mine_ref.at[imine,'LAT'] = MSHA_Locations['LATITUDE'][match_mine]
        if MSHA_Locations['LONGITUDE'][match_mine] <0:
            mine_ref.at[imine,'LON'] = MSHA_Locations['LONGITUDE'][match_mine] 
        else:
            mine_ref.at[imine,'LON'] = -MSHA_Locations['LONGITUDE'][match_mine] #msha longitudes must be made negative\
        counter += 1

display(mine_ref.head(20))

##### Step 2.4.2. Format/Correct Location Information

In [None]:
# Throw out mines without location as we won't allocate emissions to them
to_delete=[]
for imine in np.arange(0,len(mine_ref)):
    if mine_ref['LAT'][imine]==0:
        to_delete.append(imine)

mine_ref.drop(to_delete,inplace=True)
mine_ref.reset_index(inplace=True,drop=True)
        
# Print used mines and production percentage captured with MSHA locations
print('Used mines: ', len(mine_ref))
print('')
print('QA/QC: Check that all mines that have production data also have location data')
for iyear in np.arange(0, num_years):
    if np.sum(mine_ref['prod_'+year_range_str[iyear]][mine_ref['LAT']>0])/np.sum(mine_ref['prod_'+year_range_str[iyear]]) ==1:
        print(year_range_str[iyear], ': PASS')
    else:
        print(year_range_str[iyear], 'FAIL - Check')     

##### Step 2.4.3. Correct mine locations based on checking them on Google Maps

In [None]:
#surface mines
Updated_Loc = pd.read_csv(Corrected_surf_mine_inputfile,usecols=[0,1,2])
Updated_Loc[['correct_lat', 'correct_lng']] = Updated_Loc['correct_lat'].str.split(' ',expand = True)
Updated_Loc['msha_change'] = Updated_Loc['msha_change'].replace(np.nan,0)
Updated_Loc['correct_lat'] = Updated_Loc['correct_lat'].replace(np.nan,0)
Updated_Loc['correct_lng'] = Updated_Loc['correct_lng'].replace(np.nan,0)
Updated_Loc['msha_change'] = Updated_Loc['msha_change'].astype(int)

#underground mines
Updated_Loc_ug = pd.read_csv(Corrected_ug_mine_inputfile,usecols=[0,1,2])
Updated_Loc_ug[['correct_lat', 'correct_lng']] = Updated_Loc_ug['correct_lat'].str.split(' ',expand = True)
Updated_Loc_ug['msha_change'] = Updated_Loc_ug['msha_change'].replace(np.nan,0)
Updated_Loc_ug['correct_lat'] = Updated_Loc_ug['correct_lat'].replace(np.nan,0)
Updated_Loc_ug['correct_lng'] = Updated_Loc_ug['correct_lng'].replace(np.nan,0)
Updated_Loc_ug['msha_change'] = Updated_Loc_ug['msha_change'].astype(int)

#surface mines
for imine in np.arange(0,len(Updated_Loc)):
    if Updated_Loc['msha_change'][imine] >0:
        match_change = np.where(mine_ref['MSHA'] == Updated_Loc['msha_change'][imine])[0][0]
        mine_ref.at[match_change,'LAT'] = Updated_Loc['correct_lat'][imine]
        mine_ref.at[match_change,'LON']= Updated_Loc['correct_lng'][imine]
        
#underground mines
for imine in np.arange(len(Updated_Loc_ug)):
    if Updated_Loc_ug['msha_change'][imine] >0:
        match_change = np.where(mine_ref['MSHA'] == Updated_Loc_ug['msha_change'][imine])[0][0]
        mine_ref.at[match_change,'LAT'] = Updated_Loc_ug['correct_lat'][imine]
        mine_ref.at[match_change,'LON']= Updated_Loc_ug['correct_lng'][imine]

#### Step 2.5 Calculate Mine Emissions (based on GHGRP/EPA mine emissions and estimated from mine Production)

In [None]:
# The proxy used to take state-level emissions down to the grid-cell level are estimates of the relative emissions
# from each mine. Emissions are either taken from the GHGRP (read in above) or estimated based on the relative
# amount of coal production at each mine. 
# Step 1. Emissions are estimated for each mine
#     1A. State-level production levels are calculated
#     1B. Mine-level emissions are calculated (as state emissions weighted by coal production at each well relative to state total)
# Step 2. Where available, GHGRP emissions for the well are used for underground mines. 
# Step 3. Emissions are placed onto a grid

#Step 1. Emissions are estimated at each mine using state-level emissions * mine production/state production
# Step 1.A - Calculate State-based totals for both underground and surface mines

State_underground_production = np.zeros([len(State_ANSI),num_years])
State_surface_production = np.zeros([len(State_ANSI),num_years])
EIA_und = np.zeros([len(mine_ref),num_years])
EIA_und_post = np.zeros([len(mine_ref),num_years])
EIA_sur = np.zeros([len(mine_ref),num_years])
EIA_sur_post = np.zeros([len(mine_ref),num_years])
map_und      = np.zeros([len(lat001),len(lon001),num_years])
map_und_post = np.zeros([len(lat001),len(lon001),num_years])
map_sur      = np.zeros([len(lat001),len(lon001),num_years])
map_sur_post = np.zeros([len(lat001),len(lon001),num_years])
map_und_nongrid      = np.zeros([num_years])
map_und_post_nongrid = np.zeros([num_years])
map_sur_nongrid      = np.zeros([num_years])
map_sur_post_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    #Weight Kentucky & West Virginia based on basin
    for imine in np.arange(0,len(mine_ref)):
        if mine_ref['Mine Type'][imine] == 'Underground':  
            if mine_ref['Mine State'][imine] == 'Kentucky (East)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                State_underground_production[istate,iyear] += 61.4*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'Kentucky (West)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                State_underground_production[istate,iyear] += 64.3*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'West Virginia (Northern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                State_underground_production[istate,iyear] += 138.4*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'West Virginia (Southern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                State_underground_production[istate,iyear] += 136.8*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif (mine_ref['Mine State'][imine] == 'Pennsylvania (Bituminous)') or (mine_ref['Mine State'][imine] == 'Pennsylvania (Anthracite)'):
                istate = np.where(State_ANSI==name_dict['Pennsylvania'])[0][0]
                State_underground_production[istate,iyear] += float(mine_ref['prod_'+year_range_str[iyear]][imine])

            elif mine_ref['Mine State'][imine] == 'Refuse Recovery':
                continue
            else:
                istate = np.where(State_ANSI==name_dict[mine_ref['Mine State'][imine]])[0][0]
                State_underground_production[istate,iyear] += float(mine_ref['prod_'+year_range_str[iyear]][imine])
                
        elif mine_ref['Mine Type'][imine] == 'Surface':  
            if mine_ref['Mine State'][imine] == 'Kentucky (East)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                State_surface_production[istate,iyear] += 24.9*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'Kentucky (West)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                State_surface_production[istate,iyear] += 34.3*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'West Virginia (Northern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                State_surface_production[istate,iyear] += 59.5*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'West Virginia (Southern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                State_surface_production[istate,iyear] += 24.9*float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif (mine_ref['Mine State'][imine] == 'Pennsylvania (Bituminous)') or (mine_ref['Mine State'][imine] == 'Pennsylvania (Anthracite)'):
                istate = np.where(State_ANSI==name_dict['Pennsylvania'])[0][0]
                State_surface_production[istate,iyear] += float(mine_ref['prod_'+year_range_str[iyear]][imine])
            elif mine_ref['Mine State'][imine] == 'Refuse Recovery':
                continue
            else:
                istate = np.where(State_ANSI==name_dict[mine_ref['Mine State'][imine]])[0][0]
                State_surface_production[istate,iyear] += float(mine_ref['prod_'+year_range_str[iyear]][imine])

# Step 1.B - Calculate Mine-based emissions
# Calculate mine-by-mine emissions based on production (for all surface mines and underground mines where GHGRP unavailable)

for iyear in np.arange(0, num_years):
    #Weight Kentucky & West Virginia based on basin
    for imine in np.arange(0,len(mine_ref)):
        if mine_ref['Mine Type'][imine] == 'Underground':
            if mine_ref['Mine State'][imine] == 'Kentucky (East)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*61.4*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine,iyear] = state_under[istate,iyear]*61.4*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])

            elif mine_ref['Mine State'][imine] == 'Kentucky (West)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*64.3*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine,iyear] = state_under[istate,iyear]*64.3*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])

            elif mine_ref['Mine State'][imine] == 'West Virginia (Northern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*138.4*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine,iyear] = state_under[istate,iyear]*138.4*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])            

            elif mine_ref['Mine State'][imine] == 'West Virginia (Southern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*136.8*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine] = state_under[istate,iyear]*136.8*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])            

            elif mine_ref['Mine State'][imine] == 'Pennsylvania (Bituminous)' or mine_ref['Mine State'][imine] == 'Pennsylvania (Anthracite)':
                istate = np.where(State_ANSI==name_dict['Pennsylvania'])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine,iyear] = state_under[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])            

            elif mine_ref['Mine State'][imine] == 'Refuse Recovery':
                continue
            else:
                istate = np.where(State_ANSI==name_dict[mine_ref['Mine State'][imine]])[0][0]
                EIA_und_post[imine,iyear] = state_post_under[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])
                if mine_ref['emi_'+year_range_str[iyear]][imine] == 0:
                    EIA_und[imine,iyear] = state_under[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_underground_production[istate,iyear])            


        elif mine_ref['Mine Type'][imine] == 'Surface':  
            if mine_ref['Mine State'][imine] == 'Kentucky (East)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*24.9*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] =      state_suface[istate,iyear]*24.9*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])            
            elif mine_ref['Mine State'][imine] == 'Kentucky (West)':
                istate = np.where(State_ANSI==name_dict['Kentucky'])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*34.3*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] =      state_suface[istate,iyear]*34.3*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])           
            elif mine_ref['Mine State'][imine] == 'West Virginia (Northern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*59.5*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] =      state_suface[istate,iyear]*59.5*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])            
            elif mine_ref['Mine State'][imine] == 'West Virginia (Southern)':
                istate = np.where(State_ANSI==name_dict['West Virginia'])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*24.9*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] =      state_suface[istate,iyear]*24.9*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])            
            elif mine_ref['Mine State'][imine] == 'Pennsylvania (Bituminous)' or mine_ref['Mine State'][imine] == 'Pennsylvania (Anthracite)':
                istate = np.where(State_ANSI==name_dict['Pennsylvania'])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] = state_suface[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])            
            elif mine_ref['Mine State'][imine] == 'Refuse Recovery':
                continue
            else:
                istate = np.where(State_ANSI==name_dict[mine_ref['Mine State'][imine]])[0][0]
                EIA_sur_post[imine,iyear] = state_post_surf[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])
                EIA_sur[imine,iyear] = state_suface[istate,iyear]*data_fn.safe_div(mine_ref['prod_'+year_range_str[iyear]][imine],State_surface_production[istate,iyear])   

# Step 2. Emissions are replaced where GHGRP emissions are avialable
#(first scale to match national total so that estimated emissions are not over or under weighted relative to GHGRP)
for iyear in np.arange(0, num_years):
    EIA_und[:,iyear] = EIA_und[:,iyear]*(state_under[:,iyear].sum()-np.array(mine_ref['emi_'+year_range_str[iyear]]).sum())/float(EIA_und[:,iyear].sum())
    
    #Add in GHGRP underground mine emissions
    array_with_GHGRP = np.array(mine_ref['emi_'+year_range_str[iyear]])
    EIA_und[array_with_GHGRP>0,iyear] = array_with_GHGRP[array_with_GHGRP>0]
    
# Replace nan values as zero
EIA_und = np.nan_to_num(EIA_und, copy=True)
EIA_und_post = np.nan_to_num(EIA_und_post, copy=True)
EIA_sur = np.nan_to_num(EIA_sur, copy=True)
EIA_sur_post = np.nan_to_num(EIA_sur_post, copy=True)


#Step 3. Place data onto grid (0.01 x0.01)
for imine in np.arange(0,len(mine_ref)):
    if mine_ref['LON'][imine] > Lon_left and mine_ref['LON'][imine] < Lon_right and \
        mine_ref['LAT'][imine] > Lat_low and mine_ref['LAT'][imine] < Lat_up:
        ilat = int((mine_ref['LAT'][imine] - Lat_low)/Res_01)
        ilon = int((mine_ref['LON'][imine] - Lon_left)/Res_01)
        for iyear in np.arange(0, num_years):
            map_und[ilat,ilon,iyear] += EIA_und[imine,iyear]
            map_und_post[ilat,ilon,iyear] += EIA_und_post[imine,iyear]
            map_sur[ilat,ilon,iyear] += EIA_sur[imine,iyear]
            map_sur_post[ilat,ilon,iyear] += EIA_sur_post[imine,iyear]
    else:
        for iyear in np.arange(0, num_years):
            map_und_nongrid[iyear] += EIA_und[imine,iyear]
            map_und_post_nongrid[iyear] += EIA_und_post[imine,iyear]
            map_sur_nongrid[iyear] += EIA_sur[imine,iyear]
            map_sur_post_nongrid[iyear] += EIA_sur_post[imine,iyear]
    

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

In [None]:
#Read in Data from InvDB tab in the Inventory workbook (Tg == 1000 kt)

names = pd.read_excel(EPA_coal_inputfile, sheet_name = "InvDB", usecols = "A:AJ",skiprows = 15, header = 0)
colnames = names.columns.values
EPA_emi_coal_CH4 = pd.read_excel(EPA_coal_inputfile, sheet_name = "InvDB", usecols = "A:AJ", skiprows = 15, nrows = 140,names = colnames)
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.fillna('')
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.drop(columns = [n for n in range(1990, start_year,1)])
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.drop(columns = ['Sector','Source','Subsource','Fuel','GHG'])
EPA_emi_coal_CH4.rename(columns={'Subref':'Source'},inplace=True)
EPA_emi_coal_CH4['Source']= EPA_emi_coal_CH4['Source'].str.replace(r"\(","")
EPA_emi_coal_CH4['Source']= EPA_emi_coal_CH4['Source'].str.replace(r"\)","")
EPA_emi_coal_CH4.reset_index(inplace=True, drop=True)

#calculate national total from state values
temp = EPA_emi_coal_CH4.sum(axis=0)
EPA_emi_coal_CH4 = EPA_emi_coal_CH4.append(temp, ignore_index=True)
EPA_emi_coal_CH4.iloc[-1,0] = 'Total'
EPA_emi_coal_CH4.iloc[-1,1] = ''
EPA_emi_coal_total = EPA_emi_coal_CH4[EPA_emi_coal_CH4['Source'] == 'Total']

display(EPA_emi_coal_total)
#display(EPA_emi_coal_CH4)

#### 3.2. Split Emissions into Gridding Groups

In [None]:
#split GHG emissions into gridding groups, based on Coal Proxy Mapping file

DEBUG =1
start_year_idx = EPA_emi_coal_CH4.columns.get_loc((start_year))
end_year_idx = EPA_emi_coal_CH4.columns.get_loc((end_year))+1
ghgi_coal_groups = ghgi_coal_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])

for igroup in np.arange(0,len(ghgi_coal_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
    ##DEBUG## print(ghgi_stat_groups[igroup])
    vars()[ghgi_coal_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_coal_map.loc[ghgi_coal_map['GHGI_Emi_Group'] == ghgi_coal_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    #print(pattern_temp) 
    emi_temp =EPA_emi_coal_CH4[EPA_emi_coal_CH4['Source'].str.contains(pattern_temp)]
    #display(emi_temp)
    vars()[ghgi_coal_groups[igroup]][:] = emi_temp.iloc[:,start_year_idx:].sum()
        
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_coal_groups)):
        sum_emi[iyear] += vars()[ghgi_coal_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_coal_total.iloc[0,iyear+2]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

--------------
## Step 4. Grid Data
-------------

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'Coal_ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#national --> state proxies (state x year )
State_Under = state_under
State_Post_Under = state_post_under
State_Post_Surf = state_post_surf
State_Surf = state_suface

#state --> grid proxies (0.01x0.01)
Map_Under = map_und
Map_Post_Under = map_und_post
Map_Surface = map_sur
Map_Post_Surface = map_sur_post


##### Step 4.1.2 Allocate National EPA Emissions to the State-Level

In [None]:
# Calculate state-level emissions
# Emissions in Tg
# State data = national GHGI emissions * state proxy/national total


# Note that national emissions are retained for groups that do not have state proxies (identified in the mapping file)
# and are gridded in the next step

# Make placeholder emission arrays for each group
for igroup in np.arange(0,len(proxy_coal_map)):
    vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
    vars()['NonState_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(num_years):
    #Loop over states
    for istate in np.arange(len(State_ANSI)):
        for igroup in np.arange(0,len(proxy_coal_map)):    
            if proxy_coal_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_coal_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear] += vars()[proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                        data_fn.safe_div(vars()[proxy_coal_map.loc[igroup,'State_Proxy_Group']][istate,iyear], np.sum(vars()[proxy_coal_map.loc[igroup,'State_Proxy_Group']][:,iyear]))
            else:
                vars()['NonState_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][iyear] = vars()[proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][iyear]
                
# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_emi_coal_total.iloc[0,iyear+2] 
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_coal_map)):
        calc_emi +=  np.sum(vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])+\
            vars()['NonState_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][iyear] 
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0002:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

##### Step 4.1.3 Allocate emissions to the CONUS region (0.1x0.1)

In [None]:
# Allocate State-Level emissions (Tg) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

DEBUG =1
#Define emission arrays
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_surf_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_und_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_001 = np.zeros([len(lat001),len(lon001),num_years])
Emissions_surf_array_001= np.zeros([len(lat001),len(lon001),num_years])
Emissions_und_array_001 = np.zeros([len(lat001),len(lon001),num_years]) 
Emissions_nongrid = np.zeros([num_years])
Emissions_surf_nongrid = np.zeros([num_years])
Emissions_und_nongrid = np.zeros([num_years])


# For each year, (2a) distribute state-level emissions onto a grid using proxies defined above ....
# To speed up the code, masks are used rather than looping individually through each lat/lon. 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given state
# The masked values are set to zero, remaining values = 1. 
# AK and HI and territories are removed from the analysis at this stage. 
# The emissions allocated to each state are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. 
# (2b - not applicable here) For emission groups that were not first allocated to states, national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2c - not applicable here) - record 'not mapped' emission groups in the 'non-grid' array

print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
running_sum = np.zeros([len(proxy_coal_map),num_years])

for igroup in np.arange(0,len(proxy_coal_map)):
    proxy_temp = vars()[proxy_coal_map.loc[igroup,'Proxy_Group']]
    proxy_temp_nongrid = vars()[proxy_coal_map.loc[igroup,'Proxy_Group']+'_nongrid']
    vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']+'_01'] = np.zeros([len(lat001),len(lon001),num_years])

    #2a. Step through each state (if group was previously allocated to state level)
    if proxy_coal_map.loc[igroup,'State_Proxy_Group'] != '-' and \
        proxy_coal_map.loc[igroup,'State_Proxy_Group'] != 'state_not_mapped':
        print('Group:',igroup,'of ',len(proxy_coal_map))
        for istate in np.arange(0,len(State_ANSI)):
            #print(igroup,istate)
            
            if State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51:
                mask_state = np.ma.ones(np.shape(state_ANSI_map))
                mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
                mask_state = np.ma.filled(mask_state,0) 
                for iyear in np.arange(0,num_years):
                    emi_temp = vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear]
                    #print(emi_temp)
                    if np.sum(mask_state*proxy_temp[:,:,iyear]) > 0 and emi_temp > 0: 
                    # if state is on grid and proxy for that state is non-zero
                        weighted_array = data_fn.safe_div(mask_state*proxy_temp[:,:,iyear], \
                                            np.sum(mask_state*proxy_temp[:,:,iyear]))
                        #weighted_array_01 = data_fn.regrid001_to_01(weighted_array, Lat_01, Lon_01)
                        #print(np.sum(weighted_array))
                        Emissions_array_001[:,:,iyear] += emi_temp*weighted_array#_01
                        running_sum[igroup,iyear] += np.sum(emi_temp*weighted_array)
                        if 'Surf' in 'State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']:
                            Emissions_surf_array_001[:,:,iyear] += emi_temp*weighted_array
                        else:
                            Emissions_und_array_001[:,:,iyear] += emi_temp*weighted_array
                        vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']+'_01'][:,:,iyear]+=emi_temp*weighted_array
                    else:
                        #for imonth in np.arange(0,num_months):
                        Emissions_nongrid[iyear] += emi_temp
                        running_sum[igroup,iyear] += np.sum(emi_temp)
                        if 'Surf' in 'State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']:
                            Emissions_surf_nongrid[iyear] += emi_temp
                        else:
                            Emissions_und_nongrid[iyear] += emi_temp
                #print(running_sum[igroup,iyear])
            

            else:
            #    if proxy_coal_map.loc[igroup, 'Urban_Rural_Flag'] ==2:
                for iyear in np.arange(0, num_years):
                    Emissions_nongrid[iyear] += np.sum(vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear])
                    running_sum[igroup,iyear] += np.sum(vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear])    
                    if 'Surf' in 'State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']:
                        Emissions_surf_nongrid[iyear] += np.sum(vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear])
                    else:
                        Emissions_und_nongrid[iyear] += np.sum(vars()['State_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear])
                    
for igroup in np.arange(0,len(proxy_coal_map)):
    vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
    
for iyear in np.arange(0, num_years):    
    Emissions_array_01[:,:,iyear] = data_fn.regrid001_to_01(Emissions_array_001[:,:,iyear], Lat_01, Lon_01)
    Emissions_surf_array_01[:,:,iyear] = data_fn.regrid001_to_01(Emissions_surf_array_001[:,:,iyear], Lat_01, Lon_01)
    Emissions_und_array_01[:,:,iyear] = data_fn.regrid001_to_01(Emissions_und_array_001[:,:,iyear], Lat_01, Lon_01)
    #Emissions_array_01[:,:,iyear] += Emissions_array_01_temp[:,:,iyear]
    calc_emi = np.sum(Emissions_array_01[:,:,iyear]) + np.sum(Emissions_nongrid[iyear]) 
    summary_emi = EPA_emi_coal_total.iloc[0,iyear+2]
    calc_emi3 = 0
    for igroup in np.arange(0,len(proxy_coal_map)):
        vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] = data_fn.regrid001_to_01(vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']+'_01'][:,:,iyear], Lat_01, Lon_01)
        calc_emi3 += np.sum(vars()['Ext_'+proxy_coal_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    calc_emi3 += np.sum(Emissions_nongrid[iyear])
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    #check two
    calc_emi2 = np.sum(Emissions_surf_array_01[:,:,iyear]) + np.sum(Emissions_surf_nongrid[iyear]) +\
                np.sum(Emissions_und_array_01[:,:,iyear]) + np.sum(Emissions_und_nongrid[iyear]) 
    emi_diff2 = abs(calc_emi2-calc_emi)/((calc_emi2+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(calc_emi3)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
ct = datetime.datetime.now() 
print("current time:", ct)

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_coal_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_surf_nongrid[:]+Emissions_und_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# conversion: Tg emissions to molec/cm2/s flux

Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_surf_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_und_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    calc_emi = 0
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
    else:
        year_days = np.sum(month_day_nonleap)

    conversion_factor_01 = 10**12 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    Flux_array_01_annual[:,:,iyear] = Emissions_array_01[:,:,iyear]*conversion_factor_01
    Flux_surf_array_01_annual[:,:,iyear] = Emissions_surf_array_01[:,:,iyear]*conversion_factor_01
    Flux_und_array_01_annual[:,:,iyear] = Emissions_und_array_01[:,:,iyear]*conversion_factor_01
    #convert back to mass to check
    conversion_factor_annual = 10**12 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi = np.sum(Flux_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    calc_emi2 = np.sum(Flux_surf_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_surf_nongrid[iyear]) +\
                np.sum(Flux_und_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_und_nongrid[iyear])
    summary_emi = EPA_emi_coal_total.iloc[0,iyear+2]
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual

-------------
## Step 5. Write netCDF
------------

In [None]:
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded total coal mine fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)


# yearly data - surface emissions ONLY (post-mining and venting)
#Initialize file
data_IO_fn.initialize_netCDF(gridded_surf_outputfile, netCDF_surf_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_surf_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_surf_array_01_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded surface coal mine fluxes written to file: {}" .format(os.getcwd())+gridded_surf_outputfile)


# yearly data - Underground emissions ONLY (post mining + venting)
#Initialize file
data_IO_fn.initialize_netCDF(gridded_und_outputfile, netCDF_und_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_und_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_und_array_01_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded underground coal mine fluxes written to file: {}" .format(os.getcwd())+gridded_und_outputfile)

----------
## Step 6. Plot Gridded Data
---------

#### Step 6.1. Plot Annual Emission Fluxes

In [None]:
#Plot Annual Data
scale_max = 10
save_flag =0
save_outfile =''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_outfile)

In [None]:
#Plot Annual Data - Surface Emissions Only
scale_max = 10
save_flag =0
save_outfile =''
data_plot_fn.plot_annual_emission_flux_map(Flux_surf_array_01_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_outfile)

In [None]:
#Plot Annual Data - Underground Emissions Only
scale_max = 10
save_flag =0
save_outfile =''
data_plot_fn.plot_annual_emission_flux_map(Flux_und_array_01_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_outfile)

#### Step 6.2 Plot Difference between first and last inventory year

In [None]:
# Plot difference between last and first year
save_flag =0
save_fig = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_outfile)

In [None]:
# Plot difference between last and first year - Surface Emissions Only
save_flag =0
save_fig = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_surf_array_01_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_outfile)

In [None]:
# Plot difference between last and first year - Underground Emissions Only
save_flag =0
save_fig = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_und_array_01_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_outfile)

In [None]:
ct = datetime.datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_1B1a_Coal: COMPLETE **')