# Gridded EPA Methane Inventory
## Category: 5A1 Landfills

***
#### Authors: 
Erin E. McDuffie, Bram Maasakkers, Candice Chen
#### Date Last Updated: 
see Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual gridded (0.1°x0.1°) methane emission fluxes (molec./cm2/s) from Landfills (total, industrial, and municipal solid waste) in the CONUS region between 2012-2018. 
#### Summary & Notes:
The national EPA GHGI emissions from MSW and industrial landfills are read in from the publicly available EPA GHG Inventory Waste annex files. Emissions are available as national totals (for entire time series). State-level allocations are also available from the 2021 State GHG Inventory for two industrial sectors (pulp and paper manufacturing and food and beverage manufacturing) within the industrial waste category. National industrial waste emissions are allocated to the state level (for each subgroup) using these relative state-level emissions data. State-level emissions for each subgroup are then allocated to the 0.01⁰x0.01⁰ CONUS grid using gridded data of facility-level emissions for each subgroup. Data are then re-gridded to 0.1⁰x0.1⁰. National MSW landfill emissions are allocated directly to the 0.1⁰x0.1⁰ CONUS grid using relative facility-level emissions for MSW landfills. All data are then converted to fluxes (molecules CH4/cm2/s). Annual emission fluxes (molecules CH4/cm2/s) for total landfills, MSW landfills, and industrial landfills are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder. 
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory & print last update time
import os
import time
modtime = os.path.getmtime('./5A1_Landfills.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import pyodbc
import PyPDF2 as pypdf
import tabula as tb
import shapefile as shp
from datetime import datetime
from copy import copy
from scipy.interpolate import interp1d
import geopy
from geopy.geocoders import Nominatim

# Import additional modules
# Load plotting package Basemap 
# Must also specify project library path [unique to each user])
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
#import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
#print(module_path)
if module_path not in sys.path:
    sys.path.append(module_path)

# Load functions
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
#pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]
print(globalinputlocation)

# EPA Inventory Data
EPA_landfill_inputfile = globalinputlocation+'GHGI/Ch7_Waste/Table 7-4.csv'

#proxy mapping file
Landfills_Mapping_inputfile = './InputData/Landfills_ProxyMapping.xlsx'

# GHGRP Data
ghgrp_emi_hh_inputfile = './InputData/ghgrp_subpart_hh.csv'
ghgrp_facility_hh_inputfile = './InputData/SubpartHH_MSWlandfills_Facilities.csv'
ghgrp_emi_tt_inputfile = './InputData/ghgrp_subpart_tt.csv'
ghgrp_facility_tt_inputfile = './InputData/SubpartTT_INDlandfills_Facilities.csv'

EPA_nonreporting_msw_inputfile = './InputData/Non-Reporting_LF_DB_2020_1.12.2021.xlsx'

FRS_inputfile = globalinputlocation+'FRS/national_single/NATIONAL_SINGLE.csv'

EPA_IndState_inputfile = './InputData/IND_LF_State_inv_04.20.2021.xlsx'
FoodBeverage_inputdata = './InputData/ExcessFoodPublic_USTer_2020_R9/ExcelTables/Food Manufacturers and Processors.xlsx'
Mills_OnLine_inputdata = './InputData/Mills_OnLine.xlsx'

#OUTPUT FILES
gridded_outputfile = '../Final_Gridded_Data/EPA_v2_6A1_Landfills.nc'
gridded_msw_outputfile = '../Final_Gridded_Data/EPA_v2_5A1_Landfills_MSW.nc'
gridded_ind_outputfile = '../Final_Gridded_Data/EPA_v2_5A1_Landfills_Industrial.nc'

netCDF_description = 'Gridded EPA Inventory - Landfill Emissions - IPCC Source Category 6A1'
netCDF_msw_description = 'Gridded EPA Inventory - Landfill Emissions - IPCC Source Category 5A1 - Municipal Solid Waste (MSW)'
netCDF_ind_description = 'Gridded EPA Inventory - Landfill Emissions - IPCC Source Category 5A1 - Industrial'

title_str = "EPA methane emissions from landfills"
title_str_msw = "EPA methane emissions from MSW landfills"
title_str_ind = "EPA methane emissions from Industrial landfills"
title_diff_str = "Emissions from landfills difference: 2018-2012"
title_diff_str_msw = "Emissions from MSW landfills difference: 2018-2012"
title_diff_str_ind = "Emissions from industrial landfills difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Landfills_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)
num_inv_years = len([*range(1990, end_year+1,1)]) #List of inventory years

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01                   # degrees
hrs_to_yrs = 8760                   #number of hours in a year
g_to_mt    = 1*10**(-6)             # grams to metric ton

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_tag = ['01','02','03','04','05','06','07','08','09','10','11','12']
month_dict = {'January':1, 'February':2,'March':3,'April':4,'May':5,'June':6, 'July':7,'August':8,'September':9,'October':10,\
             'November':11,'December':12}

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)


In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data, and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict, abbr_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:3]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[1:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2
#state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)
del area_map#, lat001, lon001, global_filenames

# Print time
ct = datetime.now() 
print("current time:", ct) 

-------------
## Step 2: Read-in and Format Proxy Data
-------------

### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(Landfills_Mapping_inputfile, sheet_name = "GHGI Map - Landfills", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_landfill_map = pd.read_excel(Landfills_Mapping_inputfile, sheet_name = "GHGI Map - Landfills", usecols = "A:B", skiprows = 2, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_landfill_map = ghgi_landfill_map[ghgi_landfill_map['GHGI_Emi_Group'] != 'na']
ghgi_landfill_map = ghgi_landfill_map[ghgi_landfill_map['GHGI_Emi_Group'].notna()]
ghgi_landfill_map = ghgi_landfill_map[ghgi_landfill_map['GHGI_Emi_Group'] != '-']
ghgi_landfill_map['GHGI_Source']= ghgi_landfill_map['GHGI_Source'].str.replace(r"\(","")
ghgi_landfill_map['GHGI_Source']= ghgi_landfill_map['GHGI_Source'].str.replace(r"\)","")
ghgi_landfill_map['GHGI_Source']= ghgi_landfill_map['GHGI_Source'].str.replace(r"+","")
ghgi_landfill_map.reset_index(inplace=True, drop=True)
display(ghgi_landfill_map)

#load emission group - proxy map
names = pd.read_excel(Landfills_Mapping_inputfile, sheet_name = "Proxy Map - Landfills", usecols = "A:F",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_landfill_map = pd.read_excel(Landfills_Mapping_inputfile, sheet_name = "Proxy Map - Landfills", usecols = "A:F", skiprows = 1, names = colnames)
display((proxy_landfill_map))

#create empty proxy and emission group arrays (for state and months, where needed)
for igroup in np.arange(0,len(proxy_landfill_map)):
    if proxy_landfill_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        if proxy_landfill_map.loc[igroup, 'Grid_SubGroup_Flag'] >= 1:
            vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']] = np.zeros([2,len(Lat_01),len(Lon_01),num_years])
            vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
        else:
            vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
            vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
    
    if proxy_landfill_map.loc[igroup,'State_Proxy_Group'] != '-':
        if proxy_landfill_map.loc[igroup, 'State_SubGroup_Flag'] >= 1:
            vars()[proxy_landfill_map.loc[igroup,'State_Proxy_Group']] = np.zeros([2,len(State_ANSI),num_years])
        else:
            vars()[proxy_landfill_map.loc[igroup,'State_Proxy_Group']] = np.zeros([len(State_ANSI),num_years])
    else:
        continue # do not make state proxy variable if no variable assigned in mapping file    
        
emi_group_names = np.unique(ghgi_landfill_map['GHGI_Emi_Group'])

print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_landfill_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')
    print(emi_group_names)

#### Step 2.2 Read In MSW Landfill Proxy Emissions Data

In [None]:
# Read in GHGRP HH emissions for each reporting year, get lat/lons by matching to facilities
# Place HH emissions onto a map
## Read in Non-Reporting spreadsheet
# Calculate the emissions for each non-reporting landfill (= WIP ind/(WIP total) * GHGRP emissions for that year)
# remove any of these landfills that in GHGRP (not yet offramped) in the given year
# add emissions to map

#this method applies a 11% factor to all years (rather than the 9% factor for 2012-2016 emissions as done in the national 
# GHGI. However, we remove landfills that are still in the GHGRP in the earlier years, so this should help balance out the 
# overestimation of emissions from non-reproting landfills in earlier years. ) [assuming that the increase from 
# 9% to 11% is largely due to the offramping of GHGRP reporting facilitites. ]

##### Step 2.2.1 Read In GHGRP Subpart HH Data

In [None]:
#Read in GHGRP Subpart HH Emissions and place onto CONUS grid

#a) Read in the GHGRP facility data
facility_info = pd.read_csv(ghgrp_facility_hh_inputfile)
facility_emis = pd.read_csv(ghgrp_emi_hh_inputfile)

#filter emissions data for methane only (in metric tonnes CH4) and for years of interest
facility_emis = facility_emis[facility_emis['HH_SUBPART_LEVEL_INFORMATION.GHG_NAME'] == 'METHANE']
facility_emis = facility_emis[facility_emis['HH_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR'].isin(year_range)]
facility_info = facility_info[facility_info['V_GHG_EMITTER_FACILITIES.YEAR'].isin(year_range)]
facility_info.reset_index(inplace=True, drop=True)
facility_emis.reset_index(inplace=True, drop=True)

#rename common columns and merge into one dataframe
facility_info.rename(columns={'V_GHG_EMITTER_FACILITIES.YEAR':'Year', \
                             'V_GHG_EMITTER_FACILITIES.FACILITY_ID':'Facility_ID', \
                             'V_GHG_EMITTER_FACILITIES.LONGITUDE':'LONGITUDE',
                             'V_GHG_EMITTER_FACILITIES.LATITUDE':'LATITUDE'},inplace=True)
facility_emis.rename(columns={'HH_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR':'Year', \
                              'HH_SUBPART_LEVEL_INFORMATION.FACILITY_ID':'Facility_ID'},inplace=True)
ghgrp_msw = pd.merge(facility_info, facility_emis)
ghgrp_msw['emis_kt_tot'] = ghgrp_msw['HH_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']/1e3 #convert metric tonnes to kt


# place ghgrp emissions onto 0.1x0.1 grid and calculate national totals and non-reporting totals
# According to the GHGI national methodology, emissions from non-reporting MSw facilities
# contribute an additional 9% of emissions in 2012-2016 and 11% after 2016. These factors
# were derived based on an analysis comparing waste in place (WIP) totals from GHGRP and
# non-emissions reporting landfills.

#initialize array
map_msw_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_msw_emis_nongrid = np.zeros([num_years])
ghgrp_total_emis = np.zeros(num_years)
nonrepoting_total_emis = np.zeros(num_years)

for iyear in np.arange(0, num_years):
    ghgrp_temp = ghgrp_msw[ghgrp_msw['Year'] == year_range[iyear]]
    ghgrp_temp.reset_index(inplace=True, drop=True)
    #display(ghgrp_temp)
    for ifacility in np.arange(0, len(ghgrp_temp)):
        if ghgrp_temp['LONGITUDE'][ifacility] > Lon_left and \
            ghgrp_temp['LONGITUDE'][ifacility] < Lon_right and \
            ghgrp_temp['LATITUDE'][ifacility] > Lat_low and \
            ghgrp_temp['LATITUDE'][ifacility] < Lat_up:

            ilat = int((ghgrp_temp['LATITUDE'][ifacility] - Lat_low)/Res01)
            ilon = int((ghgrp_temp['LONGITUDE'][ifacility] - Lon_left)/Res01)
            map_msw_emis[ilat,ilon,iyear] += ghgrp_temp['emis_kt_tot'][ifacility]
        else:
            map_msw_emis_nongrid[iyear] += ghgrp_temp['emis_kt_tot'][ifacility]

for iyear in np.arange(0, num_years):
    if year_range[iyear] <= 2016:
        factor = 0.09
    else:
        factor = 0.11
    ghgrp_total_emis[iyear] = np.sum(map_msw_emis[:,:,iyear])+map_msw_emis_nongrid[iyear]
    nonrepoting_total_emis[iyear] = ghgrp_total_emis[iyear]*factor
    print('Year:',year_range[iyear])
    print('GHGRP Emissions (kt):',ghgrp_total_emis[iyear])
    print('Non-Reporting Emissions (kt):',nonrepoting_total_emis[iyear])
    print('')


##### Step 2.2.2 Read In EPA Non-Reporting Landfills Data

In [None]:
# Read in Non-Reporting MSW Landfill Information and Estimate Emissions (based on waste in place)

EPA_nr_msw = pd.read_excel(EPA_nonreporting_msw_inputfile, sheet_name = "LandfillComp", usecols = "A:B,D,F,AP,BL:BM,BR:BS", skiprows = 5)#, nrows = 140)#,names = colnames)
EPA_nr_msw.rename(columns={'Landfill Name (as listed in Original Instance source)':'Name'},inplace=True)
EPA_nr_msw.rename(columns={'HH Off-Ramped, Last Year Reported':'Last GHGRP Year'},inplace=True)
EPA_nr_msw.rename(columns={'Avg. Est. Total WIP (MT)':'WIP_MT'},inplace=True)
EPA_nr_msw.rename(columns={'LMOP Lat':'LAT','LMOP Long':'LON'},inplace=True)

EPA_nr_msw['WBJ City'] = EPA_nr_msw['WBJ City'].astype('string')
EPA_nr_msw['WBJ Location'] = EPA_nr_msw['WBJ Location'].astype(str)
EPA_nr_msw['Full_Address'] = EPA_nr_msw["WBJ Location"]+' '+EPA_nr_msw["WBJ City"]+' '+EPA_nr_msw["State"]
EPA_nr_msw['Partial_Address'] = EPA_nr_msw["WBJ City"]+' '+EPA_nr_msw["State"]
EPA_nr_msw.fillna('NaN', inplace=True)
EPA_nr_msw = EPA_nr_msw[EPA_nr_msw['WIP_MT']>0]
EPA_nr_msw.reset_index(inplace=True, drop=True)
#display(EPA_nr_msw)
print('Total Non-Reporting Landfills:',len(EPA_nr_msw))

#Separate Landfills with and without location information
#These are the landfills from the waste business jounral (WBJ) that have limited location information
EPA_nr_msw_noloc = EPA_nr_msw[(EPA_nr_msw['LAT'] ==0) & (EPA_nr_msw['LON'] ==0)]
EPA_nr_msw_noloc.reset_index(inplace=True, drop=True)

EPA_nr_msw_loc = EPA_nr_msw[(EPA_nr_msw['LAT'] !=0) & (EPA_nr_msw['LON'] !=0)]
EPA_nr_msw_loc.reset_index(inplace=True, drop=True)

##### Step 2.2.3 Read in FRS Landfills dataset

In [None]:
#Read in FRS data to get location information for landfills with missing location information 
##This comes from the Facility Registration system, the original file (NATIONAL_SINGLE.csv is > 1.5 Gb)
   
FRS_facility_locs = pd.read_csv(FRS_inputfile, usecols = [2,3,5,7,8,10,17,20,21,26,27,28,31,32,34,35,36],low_memory=False)
FRS_facility_locs.fillna(0, inplace = True)
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['LATITUDE83'] > 0]
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['NAICS_CODES'] != 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)


FRS_facility_locs = FRS_facility_locs[(FRS_facility_locs['NAICS_CODES']=='562212')]
print('Total landfills: ',len(FRS_facility_locs))
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs['CITY_NAME'] = FRS_facility_locs['CITY_NAME'].replace(0,'NaN')
FRS_facility_locs.reset_index(inplace=True, drop=True)

print('Landfills selected: ', len(FRS_facility_locs))
FRS_facility_locs.head(5)

##### Step 2.2.4 Find Locations by matching EPA Non-Reporting Landfills (without locations) to FRS

In [None]:
# Loop through the Non-Reporting MSW Landfill records that don't have locations, to 
# try to find matches in the FRS dataset
# Note that there are more landfills in the FRS dataset, then then GHGRP+Non-reporting dataset
# In the previous GEPA, all FRS landfills were used. In the GEPA v2, we only use those
# landfills identified and used to estimate national emissions in the GHGI (i.e., 
# GHGRP + Non-Reporting landfills)

EPA_nr_msw_noloc.loc[:,'found'] = 0

for ifacility in np.arange(0,len(EPA_nr_msw_noloc)):
    #first try matching by state and exact name of landfill
    state_temp = EPA_nr_msw_noloc.loc[ifacility,'State']
    imatch = np.where((FRS_facility_locs['PRIMARY_NAME'].str.contains(EPA_nr_msw_noloc.loc[ifacility,'Name'].upper()))\
                      & (FRS_facility_locs['STATE_CODE']==state_temp))[0]
    if len(imatch) == 1:
        EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
        EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
        EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
    elif len(imatch) > 1:
        #if name and state match more than one entry, use the one with the higher accuracy
        # or the first entry if the accuracy values are the same
        FRS_temp = FRS_facility_locs.loc[imatch,:].copy()
        new_match = np.where(np.max(FRS_temp['ACCURACY_VALUE']))[0]
        if len(new_match) ==1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[new_match[0]],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[new_match[0]],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
        elif len(new_match)<1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1

    #next try matching based on any of the words in name and state and city
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        string_temp = [x for x in EPA_nr_msw_noloc.loc[ifacility,'Name'].upper().split() \
                       if x not in {'LANDFILL', 'SANITARY','CITY','TOWN','OF'}]
        #string_temp = EPA_nr_msw_noloc.loc[ifacility,'Name'].upper().split()[0:2]
        string_temp = '|'.join(string_temp)
        string_temp = string_temp.replace("(","")
        string_temp = string_temp.replace(")","")
        string_temp = string_temp.replace("&","")
        string_temp = string_temp.replace("/","")
        #print(string_temp)
        city_temp = EPA_nr_msw_noloc.loc[ifacility,'WBJ City'].upper()
        imatch = np.where((FRS_facility_locs['PRIMARY_NAME'].str.contains(string_temp))\
                      & (FRS_facility_locs['STATE_CODE']==state_temp) & (FRS_facility_locs['CITY_NAME']==city_temp))[0]
        #print(imatch)
        if len(imatch) == 1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
        elif len(imatch) >1:
            #if name and state match more than one entry, use the one with the higher accuracy
            # or the first entry if the accuracy values are the same
            FRS_temp = FRS_facility_locs.loc[imatch,:].copy()
            new_match = np.where(np.max(FRS_temp['ACCURACY_VALUE']))[0]
            if len(new_match) ==1:
                EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[new_match[0]],'LATITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[new_match[0]],'LONGITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
            elif len(new_match)<1:
                EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
    
    #next try matching based on state and city
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        string_temp = [x for x in EPA_nr_msw_noloc.loc[ifacility,'WBJ Location'].upper().split() \
                       if x not in {'ROAD', 'RD','HWY','HIGHWAY'}]
        #string_temp = EPA_nr_msw_noloc.loc[ifacility,'WBJ Location'].upper().split()
        string_temp = '|'.join(string_temp)
        string_temp = string_temp.replace("(","")
        string_temp = string_temp.replace(")","")
        city_temp = EPA_nr_msw_noloc.loc[ifacility,'WBJ City'].upper()
        imatch = np.where((FRS_facility_locs['STATE_CODE']==state_temp) & (FRS_facility_locs['CITY_NAME']==city_temp))[0]
        if len(imatch) == 1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
        elif len(imatch) >1:
            #if city and state match more than one entry, use the one that has some matching address
            FRS_temp = FRS_facility_locs.loc[imatch,:].copy()
            new_match = np.where(FRS_temp['LOCATION_ADDRESS'].str.contains(string_temp))[0]
            if len(new_match) >= 1:
                EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[new_match[0]],'LATITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[new_match[0]],'LONGITUDE83']
                EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
            
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        #check if state matches and city and name have any matches
        city_temp = EPA_nr_msw_noloc.loc[ifacility,'WBJ City'].upper()
        imatch = np.where((FRS_facility_locs['PRIMARY_NAME'].str.contains(city_temp))\
                      & (FRS_facility_locs['STATE_CODE']==state_temp))[0]
        if len(imatch) == 1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
        elif len(imatch)>1:
            #no good matches in this case (do nothing)
            continue
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        #check based on state and any matches between names
        string_temp = [x for x in EPA_nr_msw_noloc.loc[ifacility,'Name'].upper().split() \
                       if x not in {'LANDFILL', 'SANITARY','COUNTY','CITY','TOWN','OF','LF','WASTE'}]
        #string_temp = EPA_nr_msw_noloc.loc[ifacility,'Name'].upper().split()[0:2]
        string_temp = '|'.join(string_temp)
        #print(string_temp)
        string_temp = string_temp.replace("(","")
        string_temp = string_temp.replace(")","")
        string_temp = string_temp.replace("&","")
        string_temp = string_temp.replace("/","")
        #print(string_temp, state_temp)
        city_temp = EPA_nr_msw_noloc.loc[ifacility,'WBJ City'].upper()
        imatch = np.where((FRS_facility_locs['PRIMARY_NAME'].str.contains(string_temp))\
                      & (FRS_facility_locs['STATE_CODE']==state_temp))[0]
        #print(imatch)
        if len(imatch) == 1:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            EPA_nr_msw_noloc.loc[ifacility,'found'] = 1
        elif len(imatch) >1:
            #no good matches
            continue
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        continue
print('Count',len(EPA_nr_msw_noloc[EPA_nr_msw_noloc['found']==0]))



##### Step 2.2.5 Find Locations by geocoding remaining EPA Non-Reporting Landfills (without locations)

In [None]:
## TRY GEOCODING
# Try Geocoding to convert facility addresses into lat/lon values. 
# This uses the free openstreetmaps api (not as good as google maps, but free)
# only need to get locations for facilities where found = 0
#if this doesn't work with run using 'run all', try running individually 

#locator = Nominatim(user_agent="myGeocoder")
#print(locator)
geolocator = Nominatim(user_agent="myGeocode")
geopy.geocoders.options.default_timeout = None
print(geolocator.timeout)

#Examples: #print((location.latitude, location.longitude))
#location = locator.geocode("Champ de Mars, Paris, France")
#location = geolocator.geocode("1726 N Cochran Ave Charlotte MI 48813")

#food_beverage_facilities_locs['geo_match'] = 0
for ifacility in np.arange(0,len(EPA_nr_msw_noloc)):
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        location = geolocator.geocode(EPA_nr_msw_noloc['Full_Address'][ifacility])
        if location is None:
            continue
        else:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = location.latitude
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = location.longitude
            EPA_nr_msw_noloc.loc[ifacility,'found']=1
            
print('First Try - Percentage found:',(np.sum(EPA_nr_msw_noloc['found']))/len(EPA_nr_msw_noloc))
print('Missing Count',len(EPA_nr_msw_noloc[EPA_nr_msw_noloc['found']==0]))

for ifacility in np.arange(0,len(EPA_nr_msw_noloc)):
    if EPA_nr_msw_noloc.loc[ifacility,'found'] ==0:
        #if still no match, remove the address portion and just allocate based on city, state
        location = geolocator.geocode(EPA_nr_msw_noloc['Partial_Address'][ifacility])
        if location is None:
            continue
        else:
            EPA_nr_msw_noloc.loc[ifacility,'LAT'] = location.latitude
            EPA_nr_msw_noloc.loc[ifacility,'LON'] = location.longitude
            EPA_nr_msw_noloc.loc[ifacility,'found']=1
            
print('Second Try - Percentage found:',(np.sum(EPA_nr_msw_noloc['found']))/len(EPA_nr_msw_noloc))
print('Missing Count',len(EPA_nr_msw_noloc[EPA_nr_msw_noloc['found']==0]))

#recombine into a single dataframe
EPA_nr_msw_final = EPA_nr_msw_loc.append(EPA_nr_msw_noloc)
EPA_nr_msw_final.reset_index(inplace=True, drop=True)

##### Step 2.2.6 Calculate Emissions from Non-Reporting Landfills (those w/locations only)

In [None]:
# In this step, emissions for all non-reporting landfills are recalculated, removing
# the landfills with missing locations from the analysis. 

print('QA/QC: report final landfill values to be placed on CONUS grid')
for iyear in np.arange(0, num_years):
    ghgrp_temp = ghgrp_msw[ghgrp_msw['Year'] == year_range[iyear]]
    ghgrp_temp.reset_index(inplace=True, drop=True)
    ghgrp_count = len(ghgrp_temp)

    EPA_nr_msw_final['emis_'+year_range_str[iyear]] = 0
    imatch =np.where(((EPA_nr_msw_final['Last GHGRP Year'] == 0) | \
                                   (EPA_nr_msw_final['Last GHGRP Year'] > year_range[iyear])) & \
                                   (EPA_nr_msw_final['LAT'] != 0))[0]

    WIP_sum = np.sum(EPA_nr_msw_final.loc[imatch,'WIP_MT'])
    EPA_nr_msw_final.loc[imatch,'emis_'+year_range_str[iyear]] = nonrepoting_total_emis[iyear]*\
                          EPA_nr_msw_final.loc[imatch,'WIP_MT']/WIP_sum   

    print('Year:', year_range[iyear])
    print('Total Landfills (counts):                          ',ghgrp_count+len(EPA_nr_msw_final.iloc[imatch,0]))
    print('Total Landfill Emissions (kt):                     ',ghgrp_total_emis[iyear]+nonrepoting_total_emis[iyear])
    print('Total GHGRP Landfills (counts):                    ',ghgrp_count)
    print('Total GHGRP Landfill Emissions (kt):               ',ghgrp_total_emis[iyear] )
    print('Total Non-Reporting Landfills w/location (counts): ',len(EPA_nr_msw_final.iloc[imatch,0]))
    print('Total Non-Reporting Landfills w/location Emis (kt):',np.sum(EPA_nr_msw_final.loc[imatch,'emis_'+year_range_str[iyear]]))
    print('')


##### Step 2.2.7 Add Non-Reporting MSW Landfill Emissions to CONUS grid

In [None]:
#Place data on CONUS grid

for iyear in np.arange(0, num_years):
    for ifacility in np.arange(0, len(EPA_nr_msw_final)):
        if EPA_nr_msw_final['LON'][ifacility] > Lon_left and \
            EPA_nr_msw_final['LON'][ifacility] < Lon_right and \
            EPA_nr_msw_final['LAT'][ifacility] > Lat_low and \
            EPA_nr_msw_final['LAT'][ifacility] < Lat_up:

            ilat = int((EPA_nr_msw_final['LAT'][ifacility] - Lat_low)/Res01)
            ilon = int((EPA_nr_msw_final['LON'][ifacility] - Lon_left)/Res01)
            map_msw_emis[ilat,ilon,iyear] += EPA_nr_msw_final['emis_'+year_range_str[iyear]][ifacility]
        else:
            map_msw_emis_nongrid[iyear] += EPA_nr_msw_final['emis_'+year_range_str[iyear]][ifacility]
    print(year_range[iyear],'Emissions (kt):',np.sum(map_msw_emis[:,:,iyear])+map_msw_emis_nongrid[iyear])
    

#### Step 2.3 Industrial Landfill Proxy Data

##### Step 2.3.1 Read In State-Level Industrial Landfill Proxy Emissions Data - for both subgroups

In [None]:
#Step 1 - Read in the state-level emissions for industrial landfills from pulp & paper and food & beverage
# Place emissions onto state array, with an extra dimension to account for these two ind. subgroups categories

names = pd.read_excel(EPA_IndState_inputfile, sheet_name = "P&P State Emissions", usecols = "B:AF", skiprows = 5, header = 0, nrows = 1)
colnames = names.columns.values
EPA_state_ind_pp = pd.read_excel(EPA_IndState_inputfile, sheet_name = "P&P State Emissions", usecols = "B:AF", skiprows = 5, names = colnames)
EPA_state_ind_pp = EPA_state_ind_pp.drop(columns = [*range(1990, start_year,1)])
EPA_state_ind_pp.reset_index(inplace=True, drop=True)

names = pd.read_excel(EPA_IndState_inputfile, sheet_name = "F&B State Emissions", usecols = "B:AF", skiprows = 5, header = 0, nrows = 1)
colnames = names.columns.values
EPA_state_ind_fb = pd.read_excel(EPA_IndState_inputfile, sheet_name = "F&B State Emissions", usecols = "B:AF", skiprows = 5, names = colnames)
EPA_state_ind_fb = EPA_state_ind_fb.drop(columns = [*range(1990, start_year,1)])
EPA_state_ind_fb.reset_index(inplace=True, drop=True)

#place in state x year array
state_ind_pp = np.zeros([len(State_ANSI),num_years])
state_ind_fb = np.zeros([len(State_ANSI),num_years])

for istate in np.arange(0,len(State_ANSI)):
    imatch = np.where(State_ANSI['abbr'][istate] == EPA_state_ind_pp['State'])[0]
    imatch2 = np.where(State_ANSI['abbr'][istate] == EPA_state_ind_fb['State'])[0]
    if len(imatch) > 0 and len(imatch2) > 0: 
        for iyear in np.arange(0, num_years):
            state_ind_pp[istate,iyear] = EPA_state_ind_pp.loc[imatch,year_range[iyear]]*1e3 #mmt to kt
            state_ind_fb[istate,iyear] = EPA_state_ind_fb.loc[imatch2,year_range[iyear]]*1e3 #mmt to kt

state_ind_emis = np.zeros([2,len(State_ANSI),num_years])
state_ind_emis[0,:,:] = state_ind_pp
state_ind_emis[1,:,:] = state_ind_fb

##### Step 2.3.2. Create Pulp & Paper Manufacturing - grid proxy

##### Step 2.3.2.1 Read in pulp and paper industrial landfill GHGRP emissions

In [None]:
#Read in GHGRP Subpart TT Emissions, extract Pulp & Paper NAICS codes only and place onto CONUS grid

#a) Read in the GHGRP facility data
facility_info = pd.read_csv(ghgrp_facility_tt_inputfile)
facility_emis = pd.read_csv(ghgrp_emi_tt_inputfile)

#filter emissions data for methane only (in metric tonnes CH4) and for years of interest
facility_emis = facility_emis[facility_emis['TT_SUBPART_GHG_INFO.GHG_NAME'] == 'METHANE']
facility_emis = facility_emis[facility_emis['TT_SUBPART_GHG_INFO.REPORTING_YEAR'].isin(year_range)]
facility_info = facility_info[facility_info['V_GHG_EMITTER_FACILITIES.YEAR'].isin(year_range)]
facility_info.reset_index(inplace=True, drop=True)
facility_emis.reset_index(inplace=True, drop=True)

#rename common columns and merge into one dataframe
facility_info.rename(columns={'V_GHG_EMITTER_FACILITIES.YEAR':'Year', \
                             'V_GHG_EMITTER_FACILITIES.FACILITY_ID':'Facility_ID', \
                             'V_GHG_EMITTER_FACILITIES.LONGITUDE':'LONGITUDE',
                             'V_GHG_EMITTER_FACILITIES.LATITUDE':'LATITUDE',
                             'V_GHG_EMITTER_FACILITIES.PRIMARY_NAICS_CODE':'NAICS_CODE',
                             'V_GHG_EMITTER_FACILITIES.COUNTY':'COUNTY',
                             'V_GHG_EMITTER_FACILITIES.CITY':'CITY',
                             'V_GHG_EMITTER_FACILITIES.STATE':'STATE'},inplace=True)
facility_emis.rename(columns={'TT_SUBPART_GHG_INFO.REPORTING_YEAR':'Year', \
                              'TT_SUBPART_GHG_INFO.FACILITY_ID':'Facility_ID'},inplace=True)
ghgrp_ind = pd.merge(facility_info, facility_emis)
ghgrp_ind['emis_kt_tot'] = ghgrp_ind['TT_SUBPART_GHG_INFO.GHG_QUANTITY']/1e3 #convert metric tonnes to kt
ghgrp_ind['COUNTY'] = ghgrp_ind['COUNTY'].str.upper()
ghgrp_ind['CITY'] = ghgrp_ind['CITY'].str.upper()
ghgrp_ind['COUNTY'] = ghgrp_ind['COUNTY'].str.replace("COUNTY","")

ghgrp_ind['NAICS_CODE'] = ghgrp_ind['NAICS_CODE'].astype(str)
ghgrp_ind = ghgrp_ind[(ghgrp_ind['NAICS_CODE'].str.startswith('321')) | (ghgrp_ind['NAICS_CODE'].str.startswith('322'))]
ghgrp_ind.reset_index(inplace=True, drop=True)


##### Step 2.3.2.2 Read in pulp and paper mill counts and find emissions/locations by matching to GHGRP facilities

In [None]:
# Read in Mills Online Database
# This is the activity database used to apportion national emissions in the GHGI down to the state level
# Emissions are assumed to be proportional to the number of mills in each state
# Mills are matched to the GHGRP dataset based on the county and state of each mill
# If a GHGRP facility is not in the Mills dataset, the GHGRP facility is added to the full list of facilities

names = pd.read_excel(Mills_OnLine_inputdata, skiprows = 3, header = 0, nrows = 1)
colnames = names.columns.values
mills_locs = pd.read_excel(Mills_OnLine_inputdata, skiprows = 3, names = colnames)
mills_locs = mills_locs[mills_locs['Pulp and Paper Mill'] == 'Yes']
mills_locs.loc[:,'State_abbr'] = ''
mills_locs['County'] = mills_locs['County'].astype(str)
mills_locs.reset_index(inplace=True, drop=True)

mills_locs.loc[:,'ghgrp_match'] = 0
mills_locs.loc[:,'Lat'] = 0
mills_locs.loc[:,'Lon'] = 0

for imill in np.arange(0, len(mills_locs)):
    mills_locs.loc[imill,'State_abbr'] = State_ANSI['abbr'][np.where(State_ANSI['name'] == mills_locs.loc[imill,'State'])[0][0]]
num_mills = len(mills_locs)
for iyear in np.arange(0, num_years):
    mills_locs.loc[:,'emi_kt_'+year_range_str[iyear]] = 0

ghgrp_ind['found']=0

for iyear in np.arange(0, num_years):
    for imill in np.arange(0,num_mills):
        imatch = np.where((ghgrp_ind['Year'] == year_range[iyear]) & \
                          (ghgrp_ind['STATE'] == mills_locs.loc[imill,'State_abbr']) & \
                          (ghgrp_ind['COUNTY'].str.contains(mills_locs.loc[imill,'County'].upper())))[0]
        if len(imatch)==1:
            mills_locs.loc[imill,'ghgrp_match'] = 1
            mills_locs.loc[imill,'Lat'] = ghgrp_ind.loc[imatch[0],'LATITUDE']
            mills_locs.loc[imill,'Lon'] = ghgrp_ind.loc[imatch[0],'LONGITUDE']
            mills_locs.loc[imill,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[imatch[0],'emis_kt_tot']
            ghgrp_ind.loc[imatch[0],'found'] = 1
        if len(imatch) > 1:
            new_match = np.where((ghgrp_ind['Year'] == year_range[iyear]) & \
                                 (ghgrp_ind['STATE'] == mills_locs.loc[imill,'State_abbr']) &\
                                 (ghgrp_ind['COUNTY'].str.contains(mills_locs.loc[imill,'County'].upper())) &\
                             (ghgrp_ind['CITY'].str.contains(mills_locs.loc[imill,'City'].upper())))[0]
            #print(imill, new_match)
            if len(new_match)>0:
                mills_locs.loc[imill,'ghgrp_match'] = 1
                mills_locs.loc[imill,'Lat'] = ghgrp_ind.loc[new_match[0],'LATITUDE']
                mills_locs.loc[imill,'Lon'] = ghgrp_ind.loc[new_match[0],'LONGITUDE']
                mills_locs.loc[imill,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[new_match[0],'emis_kt_tot']
                ghgrp_ind.loc[new_match[0],'found'] = 1
            
        else:
            continue
        #need to find location and allocate emissions
    print('Found (%) Year',year_range[iyear],':',100*np.sum(mills_locs['ghgrp_match']/len(mills_locs)))
    
    
#add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_ind)):
    if ghgrp_ind.loc[ifacility,'found'] ==0:
        #print(ifacility)
        facility_id = ghgrp_ind.loc[ifacility,'Facility_ID']
        df2 = {'State_abbr':ghgrp_ind.loc[ifacility,'STATE'], 'City':ghgrp_ind.loc[ifacility,'CITY'],\
               'County':ghgrp_ind.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Lat': ghgrp_ind.loc[ifacility,'LATITUDE'], \
               'Lon':ghgrp_ind.loc[ifacility,'LONGITUDE']}
        mills_locs = mills_locs.append(df2, ignore_index = True)
        for iyear in np.arange(0, num_years):
            ighgrp = np.where((ghgrp_ind['Year'] == year_range[iyear]) & (ghgrp_ind['Facility_ID'] == facility_id))[0]
            #print(ighgrp)
            if len(ighgrp)==1:
                ghgrp_ind.loc[ighgrp[0],'found']=1
                mills_locs.loc[len(mills_locs)-1,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[ighgrp[0],'emis_kt_tot']
            else:
                mills_locs.loc[len(mills_locs)-1,'emi_kt_'+year_range_str[iyear]] = 0
        #display(mills_locs)
       

##### Step 2.3.2.3 Try finding missing locations by matching to the FRS dataset

In [None]:
#2) find locations of remaining mills - by comparing to FRS. NAICS Codes starting with 321 and 322

#Read in FRS data to get location information for landfills with missing location information 
 
FRS_facility_locs = pd.read_csv(FRS_inputfile, usecols = [2,3,5,7,8,10,17,20,21,26,27,28,31,32,34,35,36],low_memory=False)
FRS_facility_locs.fillna(0, inplace = True)
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['LATITUDE83'] > 0]
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['NAICS_CODES'] != 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs = FRS_facility_locs[(FRS_facility_locs['NAICS_CODES'].str.startswith('321')) | (FRS_facility_locs['NAICS_CODES'].str.startswith('322'))]

print('Total FRS Pulp & Paper Facilities: ',len(FRS_facility_locs))
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs['CITY_NAME'] = FRS_facility_locs['CITY_NAME'].replace(0,'NaN')
FRS_facility_locs.reset_index(inplace=True, drop=True)


#try to match mills database with FRS database, based on state and city - 
# only need to find locations for where 'ghgrp_match' = 0
mills_locs['FRS_match'] = 0
for ifacility in np.arange(0, len(mills_locs)):
    if mills_locs.loc[ifacility,'ghgrp_match']==0:
        imatch = np.where((mills_locs.loc[ifacility,'State_abbr'] == FRS_facility_locs['STATE_CODE']) &\
                         ((mills_locs.loc[ifacility,'City'].upper() == FRS_facility_locs['CITY_NAME'])))[0]
        if len(imatch)==1:
            mills_locs.loc[ifacility,'Lat'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            mills_locs.loc[ifacility,'Lon'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            mills_locs.loc[ifacility,'FRS_match'] = 1
        elif len(imatch)>1:
            FRS_temp = FRS_facility_locs.loc[imatch,:]
            new_match = np.where(np.max(FRS_temp['ACCURACY_VALUE']))[0]
            if len(new_match) >0:
                mills_locs.loc[ifacility,'Lat'] = FRS_facility_locs.loc[imatch[new_match[0]],'LATITUDE83']
                mills_locs.loc[ifacility,'Lon'] = FRS_facility_locs.loc[imatch[new_match[0]],'LONGITUDE83']
                mills_locs.loc[ifacility,'FRS_match'] = 1
        else:
            continue

print('Not Found:',len(mills_locs)-(np.sum(mills_locs.loc[:,'FRS_match'])+np.sum(mills_locs.loc[:,'ghgrp_match'])), 'of',len(mills_locs))

##### Step 2.3.2.4 Assign/Calculate emissions for non-reporting facilities

In [None]:
# Find the difference in emissions between the GHGI and avaialble GHGRP facilities (for each year)
# assign this emissions difference uniformly to all remaining non-ghgrp mills 

print('Check Sum Against National Emissions')
for iyear in np.arange(0, num_years):
    sum_emis = np.sum(mills_locs.loc[:,'emi_kt_'+year_range_str[iyear]])
    epa_emis = np.sum(state_ind_emis[0,:,iyear]) #sum epa pulp and paper data across all states
    num_facility_missing = np.sum(mills_locs.loc[:,'FRS_match'])#len(mills_locs)-np.sum(mills_locs.loc[:,'ghgrp_match'])+
    emis_diff = epa_emis - sum_emis
    missing_fac_emis = emis_diff/num_facility_missing
    #print(missing_fac_emis)
    for ifacility in np.arange(0, len(mills_locs)):
        if mills_locs.loc[ifacility,'FRS_match']==1 and mills_locs.loc[ifacility,'ghgrp_match']==0:
            mills_locs.loc[ifacility,'emi_kt_'+year_range_str[iyear]] = missing_fac_emis
    diff_emis = (np.sum(mills_locs['emi_kt_'+year_range_str[iyear]])- epa_emis)/((np.sum(mills_locs['emi_kt_'+year_range_str[iyear]])+epa_emis)/2)
    if diff_emis <= 0.0001:
        print('Year',year_range[iyear],': PASS')
    else:
        print('Year',year_range[iyear],': CHECK')                                                                         
    #display(mills_locs)

##### Step 2.3.2.5 Place calculated pulp and paper emissions on the CONUS grid

In [None]:
# Place pulp and Paper manufacturing emissions onto map

#Since Pulp & Paper is a sub-group of the IND landfills emission group, this proxy has dimensions of
# (subgroup x lat x lon x years), where subgroup = 2. This emissions group is first allocated to the state-
# level, so the gridded proxy is at 0.01x0.01 degree resolution 

#initialize arrays and reset to zero
map_ind_emis = np.zeros([2,len(lat001),len(lon001),num_years])
map_ind_emis_nongrid = np.zeros([2,num_years])

map_ind_emis[0,:,:,:] = 0
map_ind_emis_nongrid[0,:]=0

for ifacility in np.arange(0, len(mills_locs)):
    if mills_locs['Lon'][ifacility] > Lon_left and \
        mills_locs['Lon'][ifacility] < Lon_right and \
        mills_locs['Lat'][ifacility] > Lat_low and \
        mills_locs['Lat'][ifacility] < Lat_up:

        ilat = int((mills_locs['Lat'][ifacility] - Lat_low)/Res_01)
        ilon = int((mills_locs['Lon'][ifacility] - Lon_left)/Res_01)
        for iyear in np.arange(0, num_years):
            map_ind_emis[0,ilat,ilon,iyear] += mills_locs['emi_kt_'+year_range_str[iyear]][ifacility]
    elif mills_locs.loc[ifacility,'State_abbr'] in (['AK','HI']):
        #this pulls out AK/HI contributions without also pulling values from facilities where we couldn't find the location
        for iyear in np.arange(0, num_years):
            map_ind_emis_nongrid[0,iyear] += mills_locs['emi_kt_'+year_range_str[iyear]][ifacility]

print('Annual Pulp & Paper Manufacturing Emissions')
for iyear in np.arange(0, num_years):
    print('Year:',year_range[iyear])
    print('P&P Emissions (kt) ongrid:', np.sum(map_ind_emis[0,:,:,iyear]))
    print('P&P Emissions (kt) offgrid:', np.sum(map_ind_emis_nongrid[0,iyear]))

##### Step 2.3.3 Industrial Landfills -  Food and Beverage Maufacturing

##### Step 2.3.3.1 Read in GHGRP Subpart TT emissions

In [None]:
#Read in GHGRP Subpart TT Emissions, extract Food & Beverage NAICS codes only #and place onto CONUS grid

#a) Read in the GHGRP facility data
facility_info = pd.read_csv(ghgrp_facility_tt_inputfile)
facility_emis = pd.read_csv(ghgrp_emi_tt_inputfile)

#filter emissions data for methane only (in metric tonnes CH4) and for years of interest
facility_emis = facility_emis[facility_emis['TT_SUBPART_GHG_INFO.GHG_NAME'] == 'METHANE']
facility_emis = facility_emis[facility_emis['TT_SUBPART_GHG_INFO.REPORTING_YEAR'].isin(year_range)]
facility_info = facility_info[facility_info['V_GHG_EMITTER_FACILITIES.YEAR'].isin(year_range)]
facility_info.reset_index(inplace=True, drop=True)
facility_emis.reset_index(inplace=True, drop=True)

#rename common columns and merge into one dataframe
facility_info.rename(columns={'V_GHG_EMITTER_FACILITIES.YEAR':'Year', \
                             'V_GHG_EMITTER_FACILITIES.FACILITY_ID':'Facility_ID', \
                             'V_GHG_EMITTER_FACILITIES.LONGITUDE':'LONGITUDE',
                             'V_GHG_EMITTER_FACILITIES.LATITUDE':'LATITUDE',
                             'V_GHG_EMITTER_FACILITIES.PRIMARY_NAICS_CODE':'NAICS_CODE',
                             'V_GHG_EMITTER_FACILITIES.COUNTY':'COUNTY',
                             'V_GHG_EMITTER_FACILITIES.CITY':'CITY',
                             'V_GHG_EMITTER_FACILITIES.STATE':'STATE'},inplace=True)
facility_emis.rename(columns={'TT_SUBPART_GHG_INFO.REPORTING_YEAR':'Year', \
                              'TT_SUBPART_GHG_INFO.FACILITY_ID':'Facility_ID'},inplace=True)
ghgrp_ind = pd.merge(facility_info, facility_emis)
ghgrp_ind['emis_kt_tot'] = ghgrp_ind['TT_SUBPART_GHG_INFO.GHG_QUANTITY']/1e3 #convert metric tonnes to kt
ghgrp_ind['COUNTY'] = ghgrp_ind['COUNTY'].str.upper()
ghgrp_ind['CITY'] = ghgrp_ind['CITY'].str.upper()
ghgrp_ind['COUNTY'] = ghgrp_ind['COUNTY'].str.replace("COUNTY","")

ghgrp_ind['NAICS_CODE'] = ghgrp_ind['NAICS_CODE'].astype(str)
fb_naics = ['311612','311421','311513','312140','311611','311615','311225','311613','311710','311221','311224','311314','311313']
ghgrp_ind = ghgrp_ind[(ghgrp_ind['NAICS_CODE'].str.contains('|'.join(fb_naics)))]
ghgrp_ind.reset_index(inplace=True, drop=True)


##### Step 2.3.3.2 Read in List of Food & Beverage Maufacturing Facilities & Match to the GHGRP Facilities

In [None]:
# Read in and Estimate Emissions for Food & Beverage Manufacturing Facilities
# This is the list of facilities that are assumed to contribute to the industrial food & beverage waste
# in the national GHGI. The average excess food waste from these facilities is used to allocate national
# GHGI emissions down to the state-level in the official state GHGI
# By allocating gridded emissions to these same facilities (weighted by relative excess food waste), we are 
# assuming that these facilities landfill their waste at sites within 10km of the manufacturing facility. 
# There is no national database of industrial landfills, so this is the best assumption for now, but could be 
# improved upon as more information becomes avaialble on where industrial landfills are located. 

# Step 1 - Read in Data
# Read in EPA Food Opportunities Map F&B Maufacturing Data
# Data are filtered based on NAICS codes that are references in Appendix F of the GHGI State Methodology Report
# These are the food & beverage manufacturing soureces contributing to industrial waste emissions

names = pd.read_excel(FoodBeverage_inputdata, sheet_name = "Data", usecols = "B:J",skiprows = 0, header = 0)
colnames = names.columns.values
food_beverage_facilities = pd.read_excel(FoodBeverage_inputdata, sheet_name = "Data", usecols = "B:J", skiprows = 0, names = colnames)

#filter for relevant NIACS codes (based on GHGI)
food_beverage_facilities = (food_beverage_facilities[food_beverage_facilities['NAICS_CODE'].isin([311612,311421,311513,312140,311611,311615,\
                                                                             311225,311613,311710,311221,311224,311314,311313])])
#remove the facilities that don't report excess food waste
food_beverage_facilities = food_beverage_facilities[~np.isnan(food_beverage_facilities['EXCESSFOOD_TONYEAR_LOWEST'])]
food_beverage_facilities.reset_index(inplace=True, drop=True)
food_beverage_facilities['ADDRESS'].fillna('', inplace=True)
food_beverage_facilities['CITY'].fillna('', inplace=True)
food_beverage_facilities['COUNTY'].fillna('', inplace=True)
food_beverage_facilities['Full_Address'] = ''
food_beverage_facilities["Full_Address"] = food_beverage_facilities["ADDRESS"].astype(str) + ' '+\
                                         food_beverage_facilities["CITY"].astype(str) +' '+\
                                        food_beverage_facilities["COUNTY"].astype(str)+' '+\
                                        food_beverage_facilities["STATE"].astype(str)+' '+\
                                        food_beverage_facilities["ZIP_CODE"].astype(str)
food_beverage_facilities['Partial_Address'] = ''
food_beverage_facilities['Partial_Address'] = food_beverage_facilities["CITY"].astype(str) +' '+\
                                        food_beverage_facilities["COUNTY"].astype(str)+' '+\
                                        food_beverage_facilities["STATE"].astype(str)+' '+\
                                        food_beverage_facilities["ZIP_CODE"].astype(str)
#create copy to store location information
food_beverage_facilities_locs = food_beverage_facilities.copy()
food_beverage_facilities_locs['Lat'] = 0
food_beverage_facilities_locs['Lon'] = 0
#Calculate the average food waste (tons) at each facility
food_beverage_facilities_locs['Avg_Waste_Tons'] = 0
food_beverage_facilities_locs['Avg_Waste_Frac'] = 0
for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
    food_beverage_facilities_locs.loc[ifacility,'Avg_Waste_Tons'] = \
        np.mean([food_beverage_facilities_locs.loc[ifacility,'EXCESSFOOD_TONYEAR_LOWEST'],\
                 food_beverage_facilities_locs.loc[ifacility,'EXCESSFOOD_TONYEAR_HIGHEST']])

    
#) Step 2 -  try to match facilities to ghgrp based on county and city. 
# If there is no match with a GHGRP facility, add it to the full list of facilities
for iyear in np.arange(0, num_years):
    food_beverage_facilities_locs.loc[:,'emi_kt_'+year_range_str[iyear]] = 0

food_beverage_facilities_locs['ghgrp_match'] = 0
ghgrp_ind['found']=0

for iyear in np.arange(0, num_years):
    for ifacility in np.arange(0,len(food_beverage_facilities_locs)):
        imatch = np.where((ghgrp_ind['Year'] == year_range[iyear]) & \
                          (ghgrp_ind['STATE'] == food_beverage_facilities_locs.loc[ifacility,'STATE']) & \
                          (ghgrp_ind['COUNTY'].str.contains(food_beverage_facilities_locs.loc[ifacility,'COUNTY'].upper())))[0]
        if len(imatch)==1:
            food_beverage_facilities_locs.loc[ifacility,'ghgrp_match'] = 1
            food_beverage_facilities_locs.loc[ifacility,'Lat'] = ghgrp_ind.loc[imatch[0],'LATITUDE']
            food_beverage_facilities_locs.loc[ifacility,'Lon'] = ghgrp_ind.loc[imatch[0],'LONGITUDE']
            food_beverage_facilities_locs.loc[ifacility,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[imatch[0],'emis_kt_tot']
            ghgrp_ind.loc[imatch[0],'found'] = 1
        if len(imatch) > 1:
            new_match = np.where((ghgrp_ind['Year'] == year_range[iyear]) & \
                                 (ghgrp_ind['STATE'] == food_beverage_facilities_locs.loc[ifacility,'STATE']) &\
                                 (ghgrp_ind['COUNTY'].str.contains(food_beverage_facilities_locs.loc[ifacility,'COUNTY'].upper())) &\
                             (ghgrp_ind['CITY'].str.contains(food_beverage_facilities_locs.loc[ifacility,'CITY'].upper())))[0]
            if len(new_match)>0:
                food_beverage_facilities_locs.loc[ifacility,'ghgrp_match'] = 1
                food_beverage_facilities_locs.loc[ifacility,'Lat'] = ghgrp_ind.loc[new_match[0],'LATITUDE']
                food_beverage_facilities_locs.loc[ifacility,'Lon'] = ghgrp_ind.loc[new_match[0],'LONGITUDE']
                food_beverage_facilities_locs.loc[ifacility,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[new_match[0],'emis_kt_tot']
                ghgrp_ind.loc[new_match[0],'found'] = 1
            
        else:
            continue
    print('Found (%) Year',year_range[iyear],':',100*np.sum(food_beverage_facilities_locs['ghgrp_match']/len(food_beverage_facilities_locs)))
    
# Step 3 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_ind)):
    if ghgrp_ind.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_ind.loc[ifacility,'Facility_ID']
        df2 = {'NAICS_CODE':ghgrp_ind.loc[ifacility,'NAICS_CODE'],'STATE':ghgrp_ind.loc[ifacility,'STATE'], 'CITY':ghgrp_ind.loc[ifacility,'CITY'],\
               'COUNTY':ghgrp_ind.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Lat': ghgrp_ind.loc[ifacility,'LATITUDE'], \
               'Lon':ghgrp_ind.loc[ifacility,'LONGITUDE']}
        food_beverage_facilities_locs = food_beverage_facilities_locs.append(df2, ignore_index = True)
        for iyear in np.arange(0, num_years):
            ighgrp = np.where((ghgrp_ind['Year'] == year_range[iyear]) & (ghgrp_ind['Facility_ID'] == facility_id))[0]
            #print(ighgrp)
            if len(ighgrp)==1:
                ghgrp_ind.loc[ighgrp[0],'found']=1
                food_beverage_facilities_locs.loc[len(food_beverage_facilities_locs)-1,'emi_kt_'+year_range_str[iyear]] = ghgrp_ind.loc[ighgrp[0],'emis_kt_tot']
            else:
                food_beverage_facilities_locs.loc[len(food_beverage_facilities_locs)-1,'emi_kt_'+year_range_str[iyear]] = 0

##### Step 2.3.3.3 Find Missing Locations by Matching to the FRS database

In [None]:
#Step 4 - Try to match remaining facilities by comparing to FRS. F&B NAICS codes

# 4a) Read in FRS data to get location information for landfills with missing location information 
FRS_facility_locs = pd.read_csv(FRS_inputfile, usecols = [2,3,5,7,8,10,17,20,21,26,27,28,31,32,34,35,36],low_memory=False)
FRS_facility_locs.fillna(0, inplace = True)
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['LATITUDE83'] > 0]
FRS_facility_locs = FRS_facility_locs[FRS_facility_locs['NAICS_CODES'] != 0]
FRS_facility_locs.reset_index(inplace=True, drop=True)

#fb_naics = ['311612','311421','311513','312140','311611','311615','311225','311613','311710','311221','311224','311314','311313']
ghgrp_ind = ghgrp_ind[(ghgrp_ind['NAICS_CODE'].str.contains('|'.join(fb_naics)))]

FRS_facility_locs = FRS_facility_locs[(FRS_facility_locs['NAICS_CODES'].str.contains('|'.join(fb_naics)))]
print('Total FRS Food & Beverage Facilities: ',len(FRS_facility_locs))
FRS_facility_locs.reset_index(inplace=True, drop=True)

FRS_facility_locs['CITY_NAME'] = FRS_facility_locs['CITY_NAME'].replace(0,'NaN')


# 4b - try to match mills database with FRS database, based on state and city - 
# only need to find locations for where 'ghgrp_match' = 0
food_beverage_facilities_locs['FRS_match'] = 0
for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
    if food_beverage_facilities_locs.loc[ifacility,'ghgrp_match']==0:
        imatch = np.where((food_beverage_facilities_locs.loc[ifacility,'STATE'] == FRS_facility_locs['STATE_CODE']) &\
                         ((food_beverage_facilities_locs.loc[ifacility,'CITY'].upper() == FRS_facility_locs['CITY_NAME'])))[0]
        if len(imatch)==1:
            food_beverage_facilities_locs.loc[ifacility,'Lat'] = FRS_facility_locs.loc[imatch[0],'LATITUDE83']
            food_beverage_facilities_locs.loc[ifacility,'Lon'] = FRS_facility_locs.loc[imatch[0],'LONGITUDE83']
            food_beverage_facilities_locs.loc[ifacility,'FRS_match'] = 1
        elif len(imatch)>1:
            FRS_temp = FRS_facility_locs.loc[imatch,:]
            new_match = np.where(np.max(FRS_temp['ACCURACY_VALUE']))[0]
            if len(new_match) >0:
                food_beverage_facilities_locs.loc[ifacility,'Lat'] = FRS_facility_locs.loc[imatch[new_match[0]],'LATITUDE83']
                food_beverage_facilities_locs.loc[ifacility,'Lon'] = FRS_facility_locs.loc[imatch[new_match[0]],'LONGITUDE83']
                food_beverage_facilities_locs.loc[ifacility,'FRS_match'] = 1
        else:
            continue

print('Not Found:',len(food_beverage_facilities_locs)-(np.sum(food_beverage_facilities_locs.loc[:,'FRS_match'])+np.sum(food_beverage_facilities_locs.loc[:,'ghgrp_match'])), 'of',len(food_beverage_facilities_locs))

##### Step 2.3.3.4 Find Missing Locations by Geocoding Addresses

In [None]:
# Step 5 - Try Geocoding to convert facility addresses into lat/lon values. 
# This uses the free openstreetmaps api (not as good as google maps, but free)
# only need to get locations for facilities where ghgrp_match and frs_match = 0

geolocator = Nominatim(user_agent="myGeocode")
geopy.geocoders.options.default_timeout = None
print(geolocator.timeout)

#Examples: #print((location.latitude, location.longitude))
#location = locator.geocode("Champ de Mars, Paris, France")
#location = geolocator.geocode("1726 N Cochran Ave Charlotte MI 48813")

food_beverage_facilities_locs['geo_match'] = 0
for ifacility in np.arange(0,len(food_beverage_facilities_locs)):
    if food_beverage_facilities_locs.loc[ifacility,'FRS_match'] ==0 and food_beverage_facilities_locs.loc[ifacility,'ghgrp_match'] == 0:
        location = geolocator.geocode(food_beverage_facilities_locs['Full_Address'][ifacility])
        if location is None:
            continue
        else:
            food_beverage_facilities_locs.loc[ifacility,'Lat'] = location.latitude
            food_beverage_facilities_locs.loc[ifacility,'Lon'] = location.longitude
            food_beverage_facilities_locs.loc[ifacility,'geo_match']=1
            
print('First Try - Percentage found:',(np.sum(food_beverage_facilities_locs['ghgrp_match'])+\
                                        np.sum(food_beverage_facilities_locs['FRS_match'])+\
                                        np.sum(food_beverage_facilities_locs['geo_match']))/len(food_beverage_facilities_locs))

for ifacility in np.arange(0,len(food_beverage_facilities_locs)):
    if food_beverage_facilities_locs.loc[ifacility,'FRS_match'] ==0 and food_beverage_facilities_locs.loc[ifacility,'ghgrp_match'] == 0 and \
        food_beverage_facilities_locs.loc[ifacility,'geo_match']==0:
        #remove suite information from the address and try again 
        address_temp = food_beverage_facilities_locs['Full_Address'][ifacility]
        address_temp = address_temp.replace('Ste','')
        address_temp = address_temp.replace('Apt','')
        address_temp = address_temp.replace('Unit','')
        address_temp = address_temp.replace('Bldg','')
        location = geolocator.geocode(address_temp)
        if location is None:
            #if still no match, remove the address portion and just allocate based on city, state, county, zip
            #address_temp = food_beverage_facilities_locs.loc[ifacility,"Partial_Address"]
            address_temp = food_beverage_facilities_locs.loc[ifacility,"CITY"] +' '+\
                                        food_beverage_facilities_locs.loc[ifacility,"COUNTY"]+' '+\
                                        food_beverage_facilities_locs.loc[ifacility,"STATE"]+' '+\
                                        food_beverage_facilities_locs.loc[ifacility,"ZIP_CODE"].astype(str)
            location2 = geolocator.geocode(address_temp)
            if location2 is None:
                #print(ifacility,address_temp)
                continue
            else:
                #count -= 1
                food_beverage_facilities_locs.loc[ifacility,'Lat'] = location2.latitude
                food_beverage_facilities_locs.loc[ifacility,'Lon'] = location2.longitude
                food_beverage_facilities_locs.loc[ifacility,'geo_match']=1
        else:
            #count -= 1
            food_beverage_facilities_locs.loc[ifacility,'Lat'] = location.latitude
            food_beverage_facilities_locs.loc[ifacility,'Lon'] = location.longitude
            food_beverage_facilities_locs.loc[ifacility,'geo_match']=1
            
print('Second Try - Percentage found:',(np.sum(food_beverage_facilities_locs['ghgrp_match'])+\
                                        np.sum(food_beverage_facilities_locs['FRS_match'])+\
                                        np.sum(food_beverage_facilities_locs['geo_match']))/len(food_beverage_facilities_locs))

##### Step 2.3.3.5 Calculate the Average Waste produced by each facility and the fraction of the total

In [None]:
# Step 6. Calculate the fractional waste contributions for all non-matching facilities
# need to calculate emissions for those where emissions are not already calculated (ghgrp_match = 0) 
# and where locations have been found (frs_match and geo_match =1)

# a) first calculate the national total average waste (for all non-matching facilities)
total_avg_waste = 0
for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
    if (food_beverage_facilities_locs.loc[ifacility,'FRS_match'] ==1 or \
        food_beverage_facilities_locs.loc[ifacility,'geo_match'] == 1) and \
        food_beverage_facilities_locs.loc[ifacility,'ghgrp_match']==0:
        #if food_beverage_facilities_locs.loc[ifacility,'ghgrp_match']==0:
        total_avg_waste += food_beverage_facilities_locs.loc[ifacility,'Avg_Waste_Tons']

# b) then calcualte the fractional contribution
for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
    if (food_beverage_facilities_locs.loc[ifacility,'FRS_match'] ==1 or \
        food_beverage_facilities_locs.loc[ifacility,'geo_match'] == 1) and \
        food_beverage_facilities_locs.loc[ifacility,'ghgrp_match']==0:
        food_beverage_facilities_locs.loc[ifacility,'Avg_Waste_Frac'] = \
                                    food_beverage_facilities_locs.loc[ifacility,'Avg_Waste_Tons']/total_avg_waste
     

##### Step 2.3.3.6 Calculate the emissions at all non-GHGRP facilities

In [None]:
# Step 7). Find the difference in emissions between the GHGI and avaialble GHGRP facilities (for each year)
# assign this emissions difference to all remaining non-ghgrp facilities based on the relative average excess food waste 
print('Check Sum Against National Emissions')
for iyear in np.arange(0,num_years):
    sum_emis = np.sum(food_beverage_facilities_locs.loc[:,'emi_kt_'+year_range_str[iyear]])
    epa_emis = np.sum(state_ind_emis[1,:,iyear]) #sum epa food & beverage data across all states
    num_facility_missing = len(food_beverage_facilities_locs) - \
                            (np.sum(food_beverage_facilities_locs.loc[:,'geo_match'])+\
                            np.sum(food_beverage_facilities_locs.loc[:,'FRS_match'])+ \
                            np.sum(food_beverage_facilities_locs.loc[:,'ghgrp_match']))#len(mills_locs)-np.sum(mills_locs.loc[:,'ghgrp_match'])+
    #print(num_facility_missing)
    emis_diff = epa_emis - sum_emis
    #missing_fac_emis = emis_diff/num_facility_missing
    #print(epa_emis, sum_emis, emis_diff)
    for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
        if (food_beverage_facilities_locs.loc[ifacility,'FRS_match'] ==1 or \
        food_beverage_facilities_locs.loc[ifacility,'geo_match'] == 1) and \
        food_beverage_facilities_locs.loc[ifacility,'ghgrp_match']==0:
            food_beverage_facilities_locs.loc[ifacility,'emi_kt_'+year_range_str[iyear]] = \
                                emis_diff*food_beverage_facilities_locs.loc[ifacility,'Avg_Waste_Frac']
    

    diff_emis = (np.sum(food_beverage_facilities_locs['emi_kt_'+year_range_str[iyear]])- epa_emis)/((np.sum(food_beverage_facilities_locs['emi_kt_'+year_range_str[iyear]])+epa_emis)/2)
    #print(diff_emis)
    if abs(diff_emis) <= 0.0001:
        print('Year',year_range[iyear],': PASS')
        print(np.sum(food_beverage_facilities_locs['emi_kt_'+year_range_str[iyear]]))
        print(epa_emis)
    else:
        print('Year',year_range[iyear],': CHECK')                                                                         


##### Step 2.3.3.7 Place Food & Beverage Facility Emissions onto CONUS Grid

In [None]:
# Step 8 - Place calculated F&B emissions onto map

#ensure values are reset to zero
map_ind_emis[1,:,:,:] = 0
map_ind_emis_nongrid[1,:]=0  

for ifacility in np.arange(0, len(food_beverage_facilities_locs)):
    if food_beverage_facilities_locs['Lon'][ifacility] > Lon_left and \
        food_beverage_facilities_locs['Lon'][ifacility] < Lon_right and \
        food_beverage_facilities_locs['Lat'][ifacility] > Lat_low and \
        food_beverage_facilities_locs['Lat'][ifacility] < Lat_up:

        ilat = int((food_beverage_facilities_locs['Lat'][ifacility] - Lat_low)/Res_01)
        ilon = int((food_beverage_facilities_locs['Lon'][ifacility] - Lon_left)/Res_01)
        for iyear in np.arange(0, num_years):
            map_ind_emis[1,ilat,ilon,iyear] += food_beverage_facilities_locs['emi_kt_'+year_range_str[iyear]][ifacility]
    elif food_beverage_facilities_locs.loc[ifacility,'STATE'] in (['AK','HI']):
        #this pulls out AK/HI contributions without also pulling values from facilities where we couldn't find the location
        for iyear in np.arange(0, num_years):
            map_ind_emis_nongrid[1,iyear] += food_beverage_facilities_locs['emi_kt_'+year_range_str[iyear]][ifacility]

print('Annual Food & Beverage Manufacturing Emissions')
for iyear in np.arange(0, num_years):
    print('Year:',year_range[iyear])
    print('F&B Emissions (kt) ongrid:', np.sum(map_ind_emis[1,:,:,iyear]))
    print('F&B Emissions (kt) offgrid:', np.sum(map_ind_emis_nongrid[1,iyear]))

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

In [None]:
#Read in the emissions data from the GHGI main inventory report (in kt)

EPA_emi_landfill_CH4 = pd.read_csv(EPA_landfill_inputfile,skiprows=2,encoding= 'unicode_escape', header=0,nrows=8)
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.fillna('')
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.drop(columns = [str(n) for n in range(1990, start_year,1)])
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.drop(['Unnamed: 0'], axis=1)
EPA_emi_landfill_CH4.rename(columns={EPA_emi_landfill_CH4.columns[0]:'Source'}, inplace=True)
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.apply(lambda x: x.str.replace(',',''))
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.apply(lambda x: x.str.replace(r"\)",""))
EPA_emi_landfill_CH4 = EPA_emi_landfill_CH4.apply(lambda x: x.str.replace(r"\(",""))
EPA_emi_landfill_CH4.iloc[:,1:] = EPA_emi_landfill_CH4.iloc[:,1:].apply(pd.to_numeric,errors='coerce')
EPA_emi_landfill_CH4.reset_index(inplace=True, drop=True)

temp = EPA_emi_landfill_CH4.iloc[:,1:].sum(axis=1)

EPA_emi_landfill_total = EPA_emi_landfill_CH4[EPA_emi_landfill_CH4['Source'] == 'Total']
EPA_emi_landfill_total.reset_index(inplace=True, drop=True)
print(type(EPA_emi_landfill_total['2012']))
print('EPA GHGI National CH4 Emissions (kt):')
display(EPA_emi_landfill_total)

display(EPA_emi_landfill_CH4)

#### 3.2. Split Emissions into Gridding Groups

In [None]:
#split GHG emissions into gridding groups, based on Coal Proxy Mapping file

DEBUG =1
start_year_idx = EPA_emi_landfill_CH4.columns.get_loc(str(start_year))
end_year_idx = EPA_emi_landfill_CH4.columns.get_loc(str(end_year))+1
ghgi_landfill_groups = ghgi_landfill_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])

for igroup in np.arange(0,len(ghgi_landfill_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
    ##DEBUG## print(ghgi_stat_groups[igroup])
    vars()[ghgi_landfill_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_landfill_map.loc[ghgi_landfill_map['GHGI_Emi_Group'] == ghgi_landfill_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    #print(pattern_temp) 
    emi_temp =EPA_emi_landfill_CH4[EPA_emi_landfill_CH4['Source'].str.contains(pattern_temp)]
    vars()[ghgi_landfill_groups[igroup]][:] = emi_temp.iloc[:,start_year_idx:].sum()
        
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_landfill_groups)):
        if iyear ==0:
            vars()[ghgi_landfill_groups[igroup]][iyear] -= 0.5  ##NOTE: correct rounding error so sum of emissions = reported total emissions
        sum_emi[iyear] += vars()[ghgi_landfill_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_landfill_total.iloc[0,iyear+1]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

--------------
## Step 4. Grid Data
-------------

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#state proxies are in dimensions (subgroup x state x year)
# subgroup 0 = pulp and paper, subgroup 1 = food and beverage
State_Ind_Landfills = state_ind_emis

#state --> grid (0.01) proxies (subgroup x lat x lon x year OR lat x lon x year)
Map_Emi_Ind_Landfills = map_ind_emis
Map_Emi_MSW_Landfills = map_msw_emis
Map_Emi_Ind_Landfills_nongrid = map_ind_emis_nongrid
Map_Emi_MSW_Landfills_nongrid = map_msw_emis_nongrid

##### Step 4.1.2. Allocate to the State level

In [None]:
# Calculate state-level emissions
# Emissions in kt
# State data = national GHGI emissions * state proxy/national total

# Note that national emissions are retained for groups that do not have state proxies (identified in the mapping file)
# and are gridded in the next step
DEBUG = 1

# Make placeholder emission arrays for each group
# State SubGroup flag == 1 indicates that the proxy data contains information for two sub groups (e.g., P&P and F&B)
for igroup in np.arange(0,len(proxy_landfill_map)):
    if proxy_landfill_map.loc[igroup,'State_SubGroup_Flag'] ==1:
        vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([2,len(State_ANSI),num_years])
        vars()['NonState_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([2,num_years])
    else:
        vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(State_ANSI),num_years])
        vars()['NonState_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])
        
#Loop over years
for iyear in np.arange(num_years):
    #Loop over states
    for istate in np.arange(len(State_ANSI)):
        for igroup in np.arange(0,len(proxy_landfill_map)):    
            if proxy_landfill_map.loc[igroup,'State_Proxy_Group'] != '-' and proxy_landfill_map.loc[igroup,'GHGI_Emi_Group'] != 'Emi_not_mapped':
                if proxy_landfill_map.loc[igroup,'State_SubGroup_Flag'] ==1:
                    for isubgroup in np.arange(0,2):
                        vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][isubgroup,istate,iyear] = \
                                vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                        data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'State_Proxy_Group']][isubgroup,istate,iyear], \
                                         np.sum(vars()[proxy_landfill_map.loc[igroup,'State_Proxy_Group']][:,:,iyear]))
                    
            else:
                vars()['NonState_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] = vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear]
                
# Check sum of all gridded emissions + emissions not included in state allocation
print('QA/QC #1: Check weighted emissions against GHGI')   
for iyear in np.arange(0,num_years):
    summary_emi = EPA_emi_landfill_total.iloc[0,iyear+1] 
    calc_emi = 0
    for igroup in np.arange(0,len(proxy_landfill_map)):
        if proxy_landfill_map.loc[igroup,'State_SubGroup_Flag'] ==1:
            calc_emi +=  np.sum(vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])+\
                        np.sum(vars()['NonState_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])
        else:
            calc_emi +=  np.sum(vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,iyear])
            calc_emi += vars()['NonState_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] #np.sum(Emissions[:,iyear]) + Emissions_nongrid[iyear] + Emissions_nonstate[iyear]
    if DEBUG ==1:
        print(summary_emi)
        print(calc_emi)
    diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if diff < 0.0001:
        print('Year ', year_range[iyear], ': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear], ': FAIL -- Difference = ', diff*100,'%')

##### 4.1.3 Allocate emissions to the CONUS region (0.1x0.1)

In [None]:
# Allocate State-Level emissions (kt) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

DEBUG =1
#Define emission arrays
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_01_temp = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_001 = np.zeros([len(lat001),len(lon001),num_years])

Emissions_array_01_msw = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_01_temp_msw = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_001_msw = np.zeros([len(lat001),len(lon001),num_years])

Emissions_array_01_ind = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_01_temp_ind = np.zeros([len(Lat_01),len(Lon_01),num_years])
Emissions_array_001_ind = np.zeros([len(lat001),len(lon001),num_years])

Emissions_nongrid = np.zeros([num_years])

# For each year, (2a) distribute state-level emissions onto a grid using proxies defined above ....
# To speed up the code, masks are used rather than looping individually through each lat/lon. 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given state
# The masked values are set to zero, remaining values = 1. 
# AK and HI and territories are removed from the analysis at this stage. 
# The emissions allocated to each state are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. 
# (2b) For emission groups that were not first allocated to states, national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2c) - record 'not mapped' emission groups in the 'non-grid' array (not relevant here)

print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
running_sum = np.zeros([len(proxy_landfill_map),num_years])

for igroup in np.arange(0,len(proxy_landfill_map)):
    print(igroup, 'of', len(proxy_landfill_map))
    proxy_temp = vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']]
    proxy_temp_nongrid = vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid']
    vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_01'] = np.zeros([len(lat001),len(lon001),num_years])
    vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_temp'] = np.zeros([len(Lat_01),len(Lon_01),num_years])

    #2a. Step through each state (if group was previously allocated to state level)
    if proxy_landfill_map.loc[igroup,'State_Proxy_Group'] != '-' and \
        proxy_landfill_map.loc[igroup,'State_Proxy_Group'] != 'state_not_mapped':
        for istate in np.arange(0,len(State_ANSI)):
            if State_ANSI['abbr'][istate] not in {'AK','HI'} and istate < 51:
                mask_state = np.ma.ones(np.shape(state_ANSI_map))
                mask_state = np.ma.masked_where(state_ANSI_map != State_ANSI['ansi'][istate], mask_state)
                mask_state = np.ma.filled(mask_state,0) 
                if proxy_landfill_map.loc[igroup, 'Grid_SubGroup_Flag'] ==1:
                    if proxy_landfill_map.loc[igroup, 'State_SubGroup_Flag'] ==1:
                        for iyear in np.arange(0, num_years):
                            for isubgroup in np.arange(0,2):
                                emi_temp = vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][isubgroup,istate,iyear]
                                if np.sum(mask_state*np.sum(proxy_temp[isubgroup,:,:,iyear],axis=0)) > 0 and emi_temp > 0: 
                                    # if state is on grid and proxy for that state is non-zero
                                    weighted_array = data_fn.safe_div(mask_state*proxy_temp[isubgroup,:,:,iyear], \
                                                                  np.sum(mask_state*proxy_temp[isubgroup,:,:,iyear]))
                                    if 'MSW' in proxy_landfill_map.loc[igroup, 'GHGI_Emi_Group']:
                                        Emissions_array_001_msw[:,:,iyear] += emi_temp*weighted_array
                                    elif 'Ind' in proxy_landfill_map.loc[igroup, 'GHGI_Emi_Group']:
                                        Emissions_array_001_ind[:,:,iyear] += emi_temp*weighted_array
                                    Emissions_array_001[:,:,iyear] += emi_temp*weighted_array#_01
                                    vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_01'][:,:,iyear]+=emi_temp*weighted_array
                                    running_sum[igroup,iyear] += np.sum(emi_temp*weighted_array)
                                else:
                                    Emissions_nongrid[iyear] += emi_temp
                                    running_sum[igroup,iyear] += np.sum(emi_temp)
            
            else:
                if proxy_landfill_map.loc[igroup, 'State_SubGroup_Flag'] ==1:
                    for iyear in np.arange(0, num_years):
                        Emissions_nongrid[iyear] += np.sum(vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,istate,iyear])
                        running_sum[igroup,iyear] += np.sum(vars()['State_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,istate,iyear])    
         
    #2b. if emissions were not allocated to state, allocate national total to grid here (these are in 0.1x0.1 resolution)
    elif proxy_landfill_map.loc[igroup,'State_Proxy_Group'] == '-':
        for iyear in np.arange(0,num_years):
            temp_sum = np.sum(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear])
            emi_temp = vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                       data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
            Emissions_array_01_temp[:,:,iyear] += emi_temp
            vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_temp'][:,:,iyear] += emi_temp
            if 'MSW' in proxy_landfill_map.loc[igroup, 'GHGI_Emi_Group']:
                emi_temp = vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                       data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
                Emissions_array_01_temp_msw[:,:,iyear] += emi_temp
            elif 'Ind' in proxy_landfill_map.loc[igroup, 'GHGI_Emi_Group']:
                emi_temp = vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                       data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
                Emissions_array_01_temp_ind[:,:,iyear]+=emi_temp
            
            Emissions_nongrid[iyear] += vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] *\
                        data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            ##DEBUG## running_count += vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            running_sum[igroup,iyear] += np.sum(vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                       data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)) + \
                        (vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] *\
                        data_fn.safe_div(vars()[proxy_landfill_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum))    

    #2c. this is the case that GHGI emissions are not mapped (e.g., specified outside of CONUS in the GHGI)
    elif proxy_landfill_map.loc[igroup,'Proxy_Group'] == 'Map_not_mapped':  
        for iyear in np.arange(0, num_years):
            Emissions_nongrid[iyear] += vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            running_sum[igroup,iyear] += vars()[proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][iyear] 

    print()
for igroup in np.arange(0, len(proxy_landfill_map)):
    vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])

for iyear in np.arange(0, num_years):    
    Emissions_array_01[:,:,iyear] = data_fn.regrid001_to_01(Emissions_array_001[:,:,iyear], Lat_01, Lon_01)
    Emissions_array_01[:,:,iyear] += Emissions_array_01_temp[:,:,iyear]
    Emissions_array_01_msw[:,:,iyear] = data_fn.regrid001_to_01(Emissions_array_001_msw[:,:,iyear], Lat_01, Lon_01)
    Emissions_array_01_msw[:,:,iyear] += Emissions_array_01_temp_msw[:,:,iyear]
    Emissions_array_01_ind[:,:,iyear] = data_fn.regrid001_to_01(Emissions_array_001_ind[:,:,iyear], Lat_01, Lon_01)
    Emissions_array_01_ind[:,:,iyear] += Emissions_array_01_temp_ind[:,:,iyear]
    calc_emi = np.sum(Emissions_array_01[:,:,iyear]) + np.sum(Emissions_nongrid[iyear]) 
    calc_emi2 = np.sum(Emissions_array_01_msw[:,:,iyear]) +np.sum(Emissions_array_01_ind[:,:,iyear])+ np.sum(Emissions_nongrid[iyear]) 
    calc_emi3 = 0
    for igroup in np.arange(0, len(proxy_landfill_map)):
        vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] = data_fn.regrid001_to_01(vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_01'][:,:,iyear], Lat_01, Lon_01)
        vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']+'_temp'][:,:,iyear]
        calc_emi3 += np.sum(vars()['Ext_'+proxy_landfill_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    calc_emi3 += np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_emi_landfill_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(calc_emi3)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
ct = datetime.now() 
print("current time:", ct)

#del Emissions_array_001, Emissions_array_001_msw, Emissions_array_001_ind

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_landfill_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# convert kt to molec/cm2/s

Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_array_01_annual_msw = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_array_01_annual_ind = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    calc_emi = 0
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
    else:
        year_days = np.sum(month_day_nonleap)
        
    conversion_factor_01 = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    Flux_array_01_annual[:,:,iyear] = Emissions_array_01[:,:,iyear]*conversion_factor_01
    Flux_array_01_annual_msw[:,:,iyear] = Emissions_array_01_msw[:,:,iyear]*conversion_factor_01
    Flux_array_01_annual_ind[:,:,iyear] = Emissions_array_01_ind[:,:,iyear]*conversion_factor_01
    #convert back to mass to check
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi = np.sum(Flux_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    calc_emi2 = np.sum(Flux_array_01_annual_msw[:,:,iyear]/conversion_factor_annual)+\
                np.sum(Flux_array_01_annual_ind[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_emi_landfill_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual
Flux_Emissions_Total_annual_msw = Flux_array_01_annual_msw
Flux_Emissions_Total_annual_ind = Flux_array_01_annual_ind

-------------
## Step 5. Write netCDF
------------

In [None]:
# yearly data

#Total
#Initialize file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)

#MSW Landfills
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_msw_outputfile, netCDF_msw_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_msw_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual_msw
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_msw_outputfile)

#Industrial Landfills
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_ind_outputfile, netCDF_ind_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_ind_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual_ind
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_ind_outputfile)

----------
## Step 6. Plot Gridded Data
---------

#### Step 6.1. Plot Annual Emission Fluxes

In [None]:
#Plot Annual Data
#Total
scale_max = 10
save_flag = 0
save_fig = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_fig)

# MSW
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual_msw, Lat_01, Lon_01, year_range, title_str_msw,scale_max,save_flag,save_fig)

#IND
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual_ind, Lat_01, Lon_01, year_range, title_str_ind,scale_max,save_flag,save_fig)


#### Step 6.2 Plot Difference between first and last inventory year

In [None]:
# Plot difference between last and first year

#Total
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_fig)

#MSW
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual_msw, Lat_01, Lon_01, year_range, title_diff_str_msw,save_flag,save_fig)

#IND
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual_ind, Lat_01, Lon_01, year_range, title_diff_str_ind,save_flag,save_fig)

In [None]:
ct = datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_5A1_Landfills: COMPLETE **')