# Gridded EPA Methane Inventory
## Category: 5B1 Composting

***
#### Authors: 
Joannes D. Maasakkers, Candice F. Z. Chen, Erin E. McDuffie
#### Date Last Updated: 
See Step 0
#### Notebook Purpose: 
This notebook calculates gridded (0.1⁰x0.1⁰) annual emission fluxes of methane (molecules CH4/cm2/s) from composting facilities in the CONUS region for years 2012 - 2018. 
#### Summary & Notes:
The national EPA GHGI emissions from composting facilities are read in from the published GHGI Chapter 7 data tables. National emissions are then allocated to the state level using state-level composting and recycling (converted to composting) emissions using the tonnes of garbage composted in each state. State-level emissions are then distributed onto a 0.01⁰x0.01⁰ grid using a map of composting facility locations (aggregated from four facility-level datasets) and gridded population data (U.S. Census), for states where no facilities exist. Data are then re-gridded to 0.1⁰x0.1⁰ and converted to fluxes (molecules CH4/cm2/s). Annual emission fluxes (molec./cm2/s) are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder.

######

## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory
import os
import time
modtime = os.path.getmtime('./5B1_Composting.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import datetime
from copy import copy

# Import additional modules
# Load plotting package Basemap 
# Must also specify project library path [unique to each user])
# os.environ["PROJ_LIB"] = r'C:\Users\candicechen\Anaconda3\pkgs\basemap-1.2.2-py38haf86b8b_0\Library\share'
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
#print(module_path)
if module_path not in sys.path:
    sys.path.append(module_path)

# Load functions
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]

# Specify names of inputs files used in this notebook
# EPA National emissions
EPA_comp_inputfile = '../Global_InputData/GHGI/Ch7_Waste/Table 7-19.csv'

#Proxy Map
Comp_Mapping_inputfile = './InputData/Composting_ProxyMapping.xlsx'

#Activity Data (facility level)
EPA_compost_facility_inputfile = './InputData/Composting Facilities.csv'
compost_council_facility_inputfile = "./InputData/Ad_compostingcouncil.txt"
biocycle_facilitylocs_inputfile = "./InputData/biocycle_locs_clean.csv"
FRS_inputfile = "../Global_InputData/FRS/national_single/NATIONAL_SINGLE.csv"
#State level
stateofgarbage_inputfile = "./InputData/Shin_State-of-Garbage_2014_Table3.csv"
epa_state_composting_inputfile = "./InputData/Appendix F Waste Sector Estimates.xlsx"

#Specify names of intermediate files
EPA_facility_loc_intfile = "./Intermediate_Data/EPA_compost_locs.csv"
compost_council_facility_intfile = "./Intermediate_Data/compost_council_facilitylocs_new.csv"

#Specify names of gridded output files
gridded_outputfile = '../Final_Gridded_Data/EPA_v2_5B1_Composting.nc'
netCDF_description = 'Gridded EPA Inventory - Composting Emissions - IPCC Source Category 5B1'
title_str = "EPA methane emissions from composting"
title_diff_str = "Emissions from composting difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Composting_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01                   # degrees

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]

# Month arrays
#month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
#num_months = len(month_range_str)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data and Area Maps
_____

In [None]:
#Read the state ANSI file array
State_ANSI, name_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:2]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# Load/Format Gridded Data Maps (0.01x0.01 and 0.1x0.1)
# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[1:3]
#Select relevant Continental US 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000 #convert m2 to cm2
state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)

# Print time
ct = datetime.datetime.now() 
print("current time:", ct)

-------------
## Step 2: Read-in and Format Proxy Data
-------------

### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(Comp_Mapping_inputfile, sheet_name = "GHGI Map - Comp", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_comp_map = pd.read_excel(Comp_Mapping_inputfile, sheet_name = "GHGI Map - Comp", usecols = "A:B", skiprows = 1, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_comp_map = ghgi_comp_map[ghgi_comp_map['GHGI_Emi_Group'] != 'na']
ghgi_comp_map = ghgi_comp_map[ghgi_comp_map['GHGI_Emi_Group'].notna()]
ghgi_comp_map['GHGI_Source']= ghgi_comp_map['GHGI_Source'].str.replace(r"\(","")
ghgi_comp_map['GHGI_Source']= ghgi_comp_map['GHGI_Source'].str.replace(r"\)","")
ghgi_comp_map.reset_index(inplace=True, drop=True)
display(ghgi_comp_map)

#load emission group - proxy map
names = pd.read_excel(Comp_Mapping_inputfile, sheet_name = "Proxy Map - Comp", usecols = "A:E",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_comp_map = pd.read_excel(Comp_Mapping_inputfile, sheet_name = "Proxy Map - Comp", usecols = "A:E", skiprows = 1, names = colnames)
display((proxy_comp_map))

#create empty proxy and emission group arrays (add months for proxy variables that have monthly data)
for igroup in np.arange(0,len(proxy_comp_map)):
    if proxy_comp_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        vars()[proxy_comp_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_comp_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_comp_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_comp_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_comp_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
    
    if proxy_comp_map.loc[igroup,'State_Proxy_Group'] != '-':
        if proxy_comp_map.loc[igroup,'State_Month_Flag'] == 0:
            vars()[proxy_comp_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years])
        else:
            vars()[proxy_comp_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years,num_months])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file
        
#emi_group_names = np.unique(proxy_comp_map['GHGI_Emi_Group'])
emi_group_names = np.unique(ghgi_comp_map['GHGI_Emi_Group'])
print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_comp_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')
    

#### 2.2 State Population Data

In [None]:
#Read population map
pop_den_map = data_load_fn.load_pop_den_map(pop_map_inputfile)

#### 2.3 EPA Composting Locations Data

In [None]:
# Read EPA Composting Facility Location Information (2018 data file)
# data from https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BBEC8068F-2F89-429D-B5DC-FD8147C17101%7D

# Load in the first 10 columns of data with location information, drop duplicate entries based on location and name
# Note: Some of the duplicates removed by name could be the same company with multiple sites. 
# These might be worth keeping in a future inventory
EPA_facility_info = pd.read_csv(EPA_compost_facility_inputfile,usecols = [0,1,2,3,4,5,6,7,8,9], encoding='latin-1')
EPA_facility_info = EPA_facility_info.drop_duplicates(subset=['XCoord','YCoord','Name'],ignore_index=True)

#Convert coordinates to lat/lon
faclat, faclon = data_fn.meters2degrees(EPA_facility_info.loc[:,'XCoord'], EPA_facility_info.loc[:,'YCoord'])
EPA_facility_info.loc[:,('XCoord','YCoord')] = pd.DataFrame([faclon, faclat]).T

#Rename columns
cols = list(EPA_facility_info.columns)
cols[0],cols[1] = cols[1],cols[0]
EPA_facility_info = EPA_facility_info[cols].rename(columns={'XCoord': 'lon', 'YCoord': 'lat'})

# Remove more possible duplicates. The distance value could be played with more
# Identify and remove possible duplicates by indicating sites that are within 0.0025 degrees from each other (<~250m) 
# Write the formating location data to an intermediate csv file
EPA_facility_info['Dupl'] = 0
for index in np.arange(len(EPA_facility_info)):
    for index2 in np.arange(index+1,len(EPA_facility_info)):
        dist = np.sqrt((EPA_facility_info.loc[index,'lon']-EPA_facility_info.loc[index2,'lon'])**2+(EPA_facility_info.loc[index,'lat']-EPA_facility_info.loc[index2,'lat'])**2)
        if dist < 0.0025:
            EPA_facility_info.loc[index,'Dupl'] = 1

# Remove duplicates
EPA_facility_info = EPA_facility_info[EPA_facility_info['Dupl'] == 0]
EPA_facility_info.reset_index(inplace=True, drop=True)
EPA_facility_info.to_csv(EPA_facility_loc_intfile)
print('Locations Found: ', len(EPA_facility_info))
EPA_facility_info.head(1)

# Print time
ct = datetime.datetime.now() 
print("current time:", ct)

#### 2.4 Read in and Format the U.S. Composting Council Facility Locations

In [None]:
# Read Compost Council facility information textfile (includes web based code and requires reformatting)
# Data is broken up into lines using the \\n new line character
# search each line of strings for the pattern matching lat, lon locations
# recond those data and save an intermediate file containing newly formated lat/lon data (after
# dropping duplicate location data)

with open (compost_council_facility_inputfile, "r") as myfile:
    compost_council_web=myfile.read()
    
compost_council_split = re.split('n', compost_council_web)
#print ('Lines found:     ', len(compost_council_split))

compost_council_facilitylocs = []
for iline in np.arange(len(compost_council_split)):
    temp = re.search('[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)', compost_council_split[iline])
    if temp != None:
        if len(temp.group(0)) > 8:
            temp2 = re.split(',',temp.group(0))
            compost_council_facilitylocs.append(temp2)
#print ('Locations found: ', len(compost_council_facilitylocs))

compost_council_facilitylocs = pd.DataFrame(compost_council_facilitylocs, columns = ['lat','lon']).astype('float')
compost_council_facilitylocs = compost_council_facilitylocs.drop_duplicates(ignore_index=True)
print ('Compost Council locations:  ', len(compost_council_facilitylocs))
compost_council_facilitylocs.to_csv(compost_council_facility_intfile)
compost_council_facilitylocs.head()

# Remove Composting Council Facilities that are within 0.025 degrees (~1-5km) of EPA composting locations
compost_council_facilitylocs['Dupl'] = 0

# check the each composting council location against each EPA facility location, record if 
# within 0.025 degrees and then remove from compost council list. 
for index_cc in np.arange(len(compost_council_facilitylocs)):
    for index_epa in np.arange(len(EPA_facility_info)):
        dist = np.sqrt((EPA_facility_info.loc[index_epa,'lon']-compost_council_facilitylocs.loc[index_cc,'lon'])**2\
                       +(EPA_facility_info.loc[index_epa,'lat']-compost_council_facilitylocs.loc[index_cc,'lat'])**2)
        if dist < 0.025:
            compost_council_facilitylocs.loc[index_cc,'Dupl'] = 1

print ('Duplicates to be removed: ', compost_council_facilitylocs['Dupl'].sum())
compost_council_facilitylocs = compost_council_facilitylocs[compost_council_facilitylocs['Dupl'] == 0]
compost_council_facilitylocs.reset_index(inplace=True, drop=True)

print ('Total Facilities: ', len(compost_council_facilitylocs)+len(EPA_facility_info))

#### 2.5 Read in and Format Biocycle locations

In [None]:
# Read in the locations for the Biocycle facilities

# Read in data, then find duplicates within 0.025 degrees of the EPA and
# Compost Council facility locations. Remove the duplicates from the biocycle data.

#Database from findacomposter.com, geocoded in Matlab. This is not updated from the original version
biocycle_facility_locs = pd.read_csv(biocycle_facilitylocs_inputfile)
biocycle_facility_locs = biocycle_facility_locs[biocycle_facility_locs['lat'] > 0]
biocycle_facility_locs.reset_index(inplace=True, drop=True)

print ('Biocycle locations: ', len(biocycle_facility_locs))
biocycle_facility_locs.head(1)

# Find and remove duplicates
biocycle_facility_locs['Dupl'] = 0

for index_bc in np.arange(len(biocycle_facility_locs)):
    for index_epa in np.arange(len(EPA_facility_info)):
        dist = np.sqrt((biocycle_facility_locs.loc[index_bc,'lon']-EPA_facility_info.loc[index_epa,'lon'])**2\
                       +(biocycle_facility_locs.loc[index_bc,'lat']-EPA_facility_info.loc[index_epa,'lat'])**2)
        if dist < 0.025:
            biocycle_facility_locs.loc[index_bc,'Dupl'] = 1

for index_bc in np.arange(len(biocycle_facility_locs)):    
    for index_cc in np.arange(len(compost_council_facilitylocs)):
        dist = np.sqrt((biocycle_facility_locs.loc[index_bc,'lon']-compost_council_facilitylocs.loc[index_cc,'lon'])**2\
                       +(biocycle_facility_locs.loc[index_bc,'lat']-compost_council_facilitylocs.loc[index_cc,'lat'])**2)
        if dist < 0.025:
            biocycle_facility_locs.loc[index_bc,'Dupl'] = 1

print ('Duplicates to be removed: ', biocycle_facility_locs['Dupl'].sum())
biocycle_facility_locs = biocycle_facility_locs[biocycle_facility_locs['Dupl'] == 0]
biocycle_facility_locs.reset_index(inplace=True, drop=True)

print ('Total Facilities: ', len(EPA_facility_info)+len(compost_council_facilitylocs)+len(biocycle_facility_locs))

#### 2.6 Read in and Format EPA Federal Registry Service (FRS) Facility Locations

In [None]:
# Read in the data and format the FRS composting facility data
# Remove duplicates by comparing locations to EPA, CC, and biocycle facility locations

FRS_facility_locs = pd.read_csv(FRS_inputfile, usecols = [2,3,5,7,8,10,17,20,21,26,27,28,31,32,34,35,36],low_memory=False)
FRS_facility_locs.fillna(0, inplace = True)
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['LATITUDE83'] > 0]
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['NAICS_CODES'] != 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs['Comp_Flag'] = 0
for i in np.arange(len(FRS_facility_locs)):
    if re.search('562219',FRS_facility_locs.loc[i,'NAICS_CODES'].lower()) != None:
        FRS_facility_locs.loc[i,'Comp_Flag'] = 1
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['Comp_Flag'] == 1]
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['COLLECT_DESC'] != 'INTERPOLATION-OTHER']
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs = FRS_facility_locs.drop_duplicates(subset=['LATITUDE83','LONGITUDE83'],ignore_index=True)

# remove duplicates within FRS dataset based on two facilities with similar location
FRS_facility_locs['Dupl'] = 0
for index in np.arange(len(FRS_facility_locs)):
    for index2 in np.arange(index+1,len(FRS_facility_locs)):
        dist = np.sqrt((FRS_facility_locs.loc[index,'LONGITUDE83']-FRS_facility_locs.loc[index2,'LONGITUDE83'])**2+(FRS_facility_locs.loc[index,'LATITUDE83']-FRS_facility_locs.loc[index2,'LATITUDE83'])**2)
        if dist < 0.0025:
            FRS_facility_locs.loc[index,'Dupl'] = 1
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['Dupl'] == 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)


# Remove duplicates with other dataset
for index_FRS in np.arange(len(FRS_facility_locs)):
    for index_bc in np.arange(len(biocycle_facility_locs)):
        dist = np.sqrt((biocycle_facility_locs.loc[index_bc,'lon']-FRS_facility_locs.loc[index_FRS,'LONGITUDE83'])**2+(biocycle_facility_locs.loc[index_bc,'lat']-FRS_facility_locs.loc[index_FRS,'LATITUDE83'])**2)
        if dist < 0.025:
            FRS_facility_locs.loc[index_FRS,'Dupl'] = 1
            
for index_FRS in np.arange(len(FRS_facility_locs)):
    for index_cc in np.arange(len(compost_council_facilitylocs)):
        dist = np.sqrt((compost_council_facilitylocs.loc[index_cc,'lon']-FRS_facility_locs.loc[index_FRS,'LONGITUDE83'])**2+(compost_council_facilitylocs.loc[index_cc,'lat']-FRS_facility_locs.loc[index_FRS,'LATITUDE83'])**2)
        if dist < 0.025:
            FRS_facility_locs.loc[index_FRS,'Dupl'] = 1

for index_FRS in np.arange(len(FRS_facility_locs)):
    for index_epa in np.arange(len(EPA_facility_info)):
        dist = np.sqrt((EPA_facility_info.loc[index_epa,'lon']-FRS_facility_locs.loc[index_FRS,'LONGITUDE83'])**2+(EPA_facility_info.loc[index_epa,'lat']-FRS_facility_locs.loc[index_FRS,'LATITUDE83'])**2)
        if dist < 0.025:
            FRS_facility_locs.loc[index_FRS,'Dupl'] = 1
            
print ('FRS locations: ', len(FRS_facility_locs))
print ('Duplicates to be removed: ', FRS_facility_locs['Dupl'].sum())
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['Dupl'] == 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)

print ('Total Facilities: ', len(EPA_facility_info)+len(compost_council_facilitylocs)+len(biocycle_facility_locs)+len(FRS_facility_locs))

#### Step 2.7. Make a gridded Map of Composting Facility Locations

In [None]:
#Put facilities on our map (does this need to be at high res???)
comp_facility_map = np.zeros([len(lat001),len(lon001)])
comp_facility_nongrid = 0

# EPA facilities
# If on continental US map, add to comp_facilities map
for ifacility in np.arange(len(EPA_facility_info)):
    if EPA_facility_info.loc[ifacility,'lon'] > Lon_left and \
     EPA_facility_info.loc[ifacility,'lon'] < Lon_right and \
     EPA_facility_info.loc[ifacility,'lat'] > Lat_low and \
     EPA_facility_info.loc[ifacility,'lat'] < Lat_up:
        ilat = int((EPA_facility_info.loc[ifacility,'lat'] - Lat_low)/Res_01)
        ilon = int((EPA_facility_info.loc[ifacility,'lon'] - Lon_left)/Res_01)
        comp_facility_map[ilat,ilon] += 1
    else:
        comp_facility_nongrid +=1
        
# Biocycle facilities
# If on continental US map, add to comp_facilities map
for ifacility in np.arange(len(biocycle_facility_locs)):
    #Check if on the grid
    if biocycle_facility_locs.loc[ifacility,'lon'] > Lon_left and \
     biocycle_facility_locs.loc[ifacility,'lon'] < Lon_right and \
     biocycle_facility_locs.loc[ifacility,'lat'] > Lat_low and \
     biocycle_facility_locs.loc[ifacility,'lat'] < Lat_up:
        ilat = int((biocycle_facility_locs.loc[ifacility,'lat'] - Lat_low)/Res_01)
        ilon = int((biocycle_facility_locs.loc[ifacility,'lon'] - Lon_left)/Res_01)
        comp_facility_map[ilat,ilon] += 1
    else:
        comp_facility_nongrid += 1

# Composting Council facilities
# If on continental US map, add to comp_facilities map
for ifacility in np.arange(len(compost_council_facilitylocs)):
    if compost_council_facilitylocs.loc[ifacility,'lon'] > Lon_left and \
     compost_council_facilitylocs.loc[ifacility,'lon'] < Lon_right and \
     compost_council_facilitylocs.loc[ifacility,'lat'] > Lat_low and \
     compost_council_facilitylocs.loc[ifacility,'lat'] < Lat_up:
        ilat = int((compost_council_facilitylocs.loc[ifacility,'lat'] - Lat_low)/Res_01)
        ilon = int((compost_council_facilitylocs.loc[ifacility,'lon'] - Lon_left)/Res_01)
        comp_facility_map[ilat,ilon] += 1
    else:
        comp_facility_nongrid +=1

## FRS facilities
## If on continental US map, add to comp_facilities map
for ifacility in np.arange(len(FRS_facility_locs)):
    #Check if on the grid
    if FRS_facility_locs.loc[ifacility,'LONGITUDE83'] > Lon_left and \
     FRS_facility_locs.loc[ifacility,'LONGITUDE83'] < Lon_right and \
     FRS_facility_locs.loc[ifacility,'LATITUDE83'] > Lat_low and \
     FRS_facility_locs.loc[ifacility,'LATITUDE83'] < Lat_up:
        ilat = int((FRS_facility_locs.loc[ifacility,'LATITUDE83'] - Lat_low)/Res_01)
        ilon = int((FRS_facility_locs.loc[ifacility,'LONGITUDE83'] - Lon_left)/Res_01)
        comp_facility_map[ilat,ilon] += 1
    else:
        comp_facility_nongrid +=1
        
print('Gridded Number of Facilities: ', np.sum(comp_facility_map))
print('Nongrid Number of Facilities: ', comp_facility_nongrid)

#### Step 2.8 Merge Facility data and Population Data (by state)

In [None]:
# To take the emissions from the state to the gridded level, Step 4 will use a combination of facility locations (for 
# states where facilities are available) and population density (for states where facilities are not avaialble).
# In this case, it is ok to mix variables of different units in a single gridded matrix, because the data that will
# be used to allocated emissions from a single state will either be facility locations or population, not both. 

#Initialize array
facility_pop_map = np.zeros([len(lat001),len(lon001),num_years])

# 1) Create mask for a given state
# 2) Cacluate the number of facilities within that state
# 3) If less than 2 facilities in the state, replace with population data for that state
# 4) If ore than two facilities in that state, use facility location information for that state
for istate in np.arange(0, len(State_ANSI)):
    mask_state = np.ma.ones(np.shape(state_ANSI_map))
    mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
    mask_state = np.ma.filled(mask_state,0)
    num_state_facilities = np.sum(mask_state*comp_facility_map)
    #print(num_state_facilities)
    if num_state_facilities <= 2:
        facility_pop_map[:,:,0] += mask_state*pop_den_map*area_map
        #print(np.sum(mask_state*pop_den_map*area_map))
    else:
        facility_pop_map[:,:,0] += mask_state*comp_facility_map
        #print(np.sum(mask_state*comp_facility_map))
    #print('State', State_ANSI['abbr'][istate], 'of', len(State_ANSI))   

#fill in remaining years (proxy held constant over time)
for iyear in np.arange(0, num_years):
    facility_pop_map[:,:,iyear] = facility_pop_map[:,:,0]
    print('Year',year_range_str[iyear],'proxy sum:',np.sum(facility_pop_map[:,:,iyear]))

#### Step 2.9. Read In State-Level Recyling & Composting Data

In [None]:
#Option to Use EPA state-level estimates for composting (use Shin et al., 2014 and more recent sources)

#State Emissions data from the GHGI workbook will be used as the State-level proxy here
# EPA methane emission fractions by state

compost_state = np.zeros([len(State_ANSI), num_years])

#read in the data file
EPA_stateComp_Emissions = pd.read_excel(epa_state_composting_inputfile,skiprows=2, sheet_name = 'Composting (F-4) ')
EPA_stateComp_Emissions.dropna(axis=0,inplace=True)
EPA_stateComp_Emissions.rename(columns={EPA_stateComp_Emissions.columns[0]:'State'}, inplace=True)

EPA_stateComp_Emissions = EPA_stateComp_Emissions.drop(columns = [*range(1990, start_year,1)])
EPA_stateComp_Emissions.reset_index(inplace=True, drop=True)

#make state array
for iyear in np.arange(0, num_years):
    for istate in np.arange(0, len(EPA_stateComp_Emissions)):
        #print(EPA_stateComp_Emissions['State'][istate])
        match_state = np.where(EPA_stateComp_Emissions['State'][istate].strip() == State_ANSI['name'])[0][0]
        #print(match_state)
        compost_state[match_state,iyear] = EPA_stateComp_Emissions.loc[istate,year_range[iyear]]#/(25*1e-3) #covert from MMT CO2e to kt

    print('Emissions fraction:', year_range_str[iyear],np.sum(compost_state[:,iyear]))

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

In [None]:
#Read in Table 7-19:  CH4 and N2O Emissions from Composting (kt) 
#Just one number for each year 
EPA_emis_kt =pd.read_csv(EPA_comp_inputfile, skiprows=2, nrows=1) #,usecols = [24,25,26,27,28,29,30]
EPA_emis_kt = EPA_emis_kt.drop(['Unnamed: 0'], axis=1)
EPA_emis_kt.rename(columns={EPA_emis_kt.columns[0]:'Source'}, inplace=True)
EPA_emis_kt = EPA_emis_kt.fillna('')
temprange = [*range(1990, start_year,1)]
dropnames=[str(i) for i in temprange]
EPA_emis_kt = EPA_emis_kt.drop(columns = dropnames)
print('EPA GHGI National CH4 Emissions (kt):')
display(EPA_emis_kt)

#### 3.2. Split Emissions into Gridding Groups (each Group will have the same proxy applied during the state allocation/gridding)

In [None]:
#split emissions into scaling groups
# In this case, data are only availabe for total emissions

DEBUG =1

start_year_idx = EPA_emis_kt.columns.get_loc(str(start_year))
end_year_idx = EPA_emis_kt.columns.get_loc(str(end_year))+1
ghgi_comp_groups = ghgi_comp_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])


for igroup in np.arange(0,len(ghgi_comp_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
    ##DEBUG## print(ghgi_comp_groups[igroup])
    vars()[ghgi_comp_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_comp_map.loc[ghgi_comp_map['GHGI_Emi_Group'] == ghgi_comp_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    emi_temp = EPA_emis_kt[EPA_emis_kt['Source'].str.contains(pattern_temp)]
    ##DEBUG## display(emi_temp)
    vars()[ghgi_comp_groups[igroup]][:] = emi_temp.iloc[:,start_year_idx:].sum()
    ##DEBUG## display(vars()[ghgi_comp_groups[igroup]][:])
        
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_comp_groups)):
        sum_emi[iyear] += vars()[ghgi_comp_groups[igroup]][iyear]
        
    summary_emi = EPA_emis_kt.iloc[0,iyear+1]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG ==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

-----
## Step 4. Grid Emissions
-----

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'Composting_ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#national --> state proxies (state x year (X month))
State_WasteComposted = compost_state

#state --> grid proxies (0.01x0.01)
Map_Facility_Population = facility_pop_map

# remove variables to clear space for larger arrays 
del facility_pop_map,comp_facility_map,pop_den_map

##### Step 4.1.2 Allocate National EPA Emissions to the State-Level

In [None]:
# Calculate state-level emissions (in kt)
# State data = national GHGI emissions * state proxy/national total

DEBUG =1

# Note that national emissions are retained for groups that do not have state proxies (identified in the mapping file)
# and are gridded in the next step (not applicable to this composting)

# Make placeholder emission arrays for each group
for igroup in np.arange(0,len(proxy_comp_map)):
    vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
    vars()['NonState_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(num_years):
    #Loop over states
    for istate in np.arange(len(State_ANSI)):
        for igroup in np.arange(0,len(proxy_comp_map)):    
            if proxy_comp_map.loc[igroup,'State_Proxy_Group'] != '-' and \
                proxy_comp_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear] = vars()[proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                            data_fn.safe_div(vars()[proxy_comp_map.loc[igroup,'State_Proxy_Group']][istate,iyear], np.sum(vars()[proxy_comp_map.loc[igroup,'State_Proxy_Group']][:,iyear]))
            else:
                vars()['NonState_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][iyear] = vars()[proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][iyear]
                
# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_emis_kt.iloc[0,iyear+1]   
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_comp_map)):
        calc_emi +=  np.sum(vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])+\
            vars()['NonState_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][iyear] #np.sum(Emissions[:,iyear]) + Emissions_nongrid[iyear] + Emissions_nonstate[iyear]
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0002:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

##### 4.1.3 Allocate emissions to the CONUS region (0.1 x0.1)

In [None]:
# Allocate State-Level emissions (kt) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

DEBUG = 1

#Define emission arrays
#Emissions_array = np.zeros([area_map.shape[0],area_map.shape[1],num_years,num_months])
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_nongrid = np.zeros([num_years])

# For each year, (2a) distribute state-level emissions onto a grid using proxies defined above ....
# To speed up the code, masks are used rather than looping individually through each lat/lon. 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given state
# The masked values are set to zero, remaining values = 1. 
# AK and HI and territories are removed from the analysis at this stage. 
# The emissions allocated to each state are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. Emission arrays are re-gridded to 0.1x0.1 degrees as looping through monthly high-resolution
# grids was prohibitively slow
# (2b - not applicable here) For emission groups that were not first allocated to states, national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2c - not applicable here) - record 'not mapped' emission groups in the 'non-grid' array

print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
for igroup in np.arange(len(proxy_comp_map)):
    vars()['Ext_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
    
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        #month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        #month_days = month_day_nonleap 
    running_count = 0
    calc_emi = 0
    
    #1. Step through each gridding group
    for igroup in np.arange(0,len(proxy_comp_map)):
        ## 1. weight proxy by the number of days in each month (depending on whether proxy has month res or not)
        print(igroup,'of',len(proxy_comp_map))
        proxy_temp = vars()[proxy_comp_map.loc[igroup,'Proxy_Group']]
        proxy_temp_nongrid = vars()[proxy_comp_map.loc[igroup,'Proxy_Group']+'_nongrid']
        
        #2a. Step through each state (if group was previously allocated to state level)
        if proxy_comp_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_comp_map.loc[igroup,'State_Proxy_Group'] != 'state_not_mapped':
            for istate in np.arange(0,len(State_ANSI)):
                mask_state = np.ma.ones(np.shape(state_ANSI_map))
                mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
                mask_state = np.ma.filled(mask_state,0)   
                ##DEBUG## print("state " + str(istate) +' of '+ str(len(State_ANSI)))
                if np.sum(mask_state*proxy_temp[:,:,iyear]) > 0 and State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51: 
                    weighted_array = data_fn.safe_div(mask_state*proxy_temp[:,:,iyear], np.sum(mask_state*proxy_temp[:,:,iyear]))
                    weighted_array_01 = data_fn.regrid001_to_01(weighted_array, Lat_01, Lon_01)
                    #for imonth in np.arange(0,num_months):
                    emi_temp = vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear]*weighted_array_01
                    Emissions_array_01[:,:,iyear] += emi_temp
                    vars()['Ext_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_temp
                        
                else: 
                    #for imonth in np.arange(0,num_months):
                    Emissions_nongrid[iyear] += vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear]
                ##DEBUG## running_count += np.sum(vars()['State_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][istate,iyear,:])
                
                ##DEBUG## print(running_count)
                ##DEBUG## print(np.sum(Emissions_array_01[:,:,iyear,:]) +np.sum(Emissions_nongrid[iyear,:]))
         

    #Emissions_array_01[:,:,iyear,:] += data_fn.regrid001_to_01(Emissions_array[:,:,iyear,:], Lat_01, Lon_01) #covert to 10x10km
    for igroup in np.arange(0,len(proxy_comp_map)):
        calc_emi += np.sum(vars()['Ext_'+proxy_comp_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    calc_emi += np.sum(Emissions_nongrid[iyear]) 
    #calc_emi = np.sum(Emissions_array_01[:,:,iyear]) + np.sum(Emissions_nongrid[iyear]) 
    summary_emi = EPA_emis_kt.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG ==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_comp_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')
#nc_out.createDimension('state', len(State_ANSI))

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# conversion: kt emissions to molec/cm2/s flux


DEBUG =1

Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
    else:
        year_days = np.sum(month_day_nonleap)
        
    conversion_factor_01 = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    Flux_array_01_annual[:,:,iyear] += Emissions_array_01[:,:,iyear]*conversion_factor_01
    
    #convert back to mass to check
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi = np.sum(Flux_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_emis_kt.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG ==1:
        print(calc_emi)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual

-------------
## Step 5. Write netCDF
------------

In [None]:
#initialize netCDF file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded composting fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)

--------------
## Step 6. Plot Gridded Data
----------------

In [None]:
#Plot Annual Data
scale_max = 10
save_flag =0
save_outfile = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str, scale_max,save_flag,save_outfile)

In [None]:
# Plot difference between last and first year
save_flag =0
save_outfile = ''
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str, save_flag, save_outfile)

In [None]:
ct = datetime.datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_5B1_Composting: COMPLETE **')