## This notebook calculates the impacts for each food item and each country that can potentially produce each food item specific to a country and month
- for details please refer to manuscript titled 'Methodology and optimization tool for a personalized low environmental impact and healthful diet specific to country and season' published in Journal of Industrial Ecology 2021.

In [1]:
# PACKAGES
import numpy as np
import pandas as pd
import os
import math
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

In [2]:
# input files from other python scripts
data_dir = os.path.join('..','nb1','Input')
data_trade = os.path.join('..','nb1','master','Trade')

### User input required to run this notebook (from User_Inputs.ipynb)
- location = country (as a cntry_rev key)
    - cntry_rev = pd.read_pickle('trandict.p')
    - cntry_rev = cntry_rev.T.to_dict()[0]
    - cntry = {v:k for k,v in cntry_rev.items()}
- month = month (as a mnths key)
    - mnths = {'jan':'January','feb':'February','mar':'March','apr':'April','may':'May','jun':'June',
    'jul':'July','aug':'August','sep':'September','oct':'October','nov':'November','dec':'December'}

### How to run the notebook: 
- The last function (calc_impacts) automatically incorporates all other functions in this notebook, therefore it is the only one necessary to run. 
- The code: df = calc_impacts(location,month) will generate a dataframe with the food items, the nutrient content, and total food item impacts relevant to a specific country and month for each of the countries from which it is normally sourced (based on FAO trade data). You can skip the incorporation of trade data and seasonality by simply taking the impacts from the 'totalGHG_all' column.

### Relevant Columns and Description of Dataframe Content:

- Food Name : This is the name of the food item.
- totalGHG_all : These are the total impacts (all life cycle stages) for the food item depending on which country it is produced in (including transport to consumption country). This includes all possible countries that can produce this item during this month.
- seasonal_kgCO2_updated, proc_elec, proc_heat : each column is a list of the impacts associated with producing and/or processing one gram of that food item in the country it was produced in (list of countries in countriesthatproductit_season_GHG column) during that month.
- home_cooked, storage_frozen, storage_refrig : each column is a list of the impacts associated with the associated life cycle stage based on the country of consumption. Since all impacts for these are assumed to occur in the country of consumption, these values will be the same.
- transport: a list of the impacts to transport the food item from each possible country of production to the country of consumption based on the assumptions in the transport section below.
- seasonal_landbio_updated: column that includes a list of tuples of all countries and the associated impacts to produce that food item during that month.
- trade_impacts_GHG,	trade_impacts_BIO: These columns represent the countries that would potentially be providing the food item based on FAO trade data when considering all the countries that could possibly produce it. 
- fishlocation : list of the countries which could potentially supply the fish based on tridge.com. This could easily be edited to include more or less countries.
- tradename: This is the name used to match with FAO trade data.
- Processing_kWh_gram_elec,Processing_kWh_gram_steam : The values associated with energy requirements for either electrical or heat processing for 1 gram of a particular food item.
- percentimpactall: The percent of each life cycle to the final impact based on lowest possible food item source

## Input Data/Files to Build the Country and Month Specific Database

In [3]:
# this is a function to find necessary input files regardless of where they are stored.
def findfiles(filename,sheet):
    top = %pwd
    for root, dirs, files in os.walk(top):
        for name in files:
            if name==filename:
                try:
                    file = pd.read_pickle(os.path.join(root,filename))
                except:
                    if len(sheet)>0:
                        file = pd.read_excel(os.path.join(root,filename),sheet_name = sheet)
                    else:
                        file = pd.read_excel(os.path.join(root,filename))
    return file

In [4]:
# Import necessary files
countrycodes = findfiles('For7b.xlsx','')
transport = findfiles('transport_new.p','')
newseasonality = findfiles('for7optseasonality.p','')
dfimport = findfiles('to_build_db.p','')
### other_impacts 'GHG' column has been randomly edited by +/- 5% of the original value to protect distribution of licensed data
other_impacts = findfiles('otherimpacts.xlsx','')
other_impacts.reset_index(inplace=True,drop=True)
processedfoodresidencetime = findfiles('storage_information.xlsx','processedfoodresidencetime')
processedfoodresidencetime=processedfoodresidencetime.fillna(0)
storagedisplayenergy = findfiles('storage_information.xlsx','displaycaseenergyuse')
storagelength = findfiles('storage_information.xlsx','freshfoodstoragetime')
longtermstorage = findfiles('storage_information.xlsx', 'refridgeratedstorage')

## Transport Life Cycle Stage

#### this is code to match country names between databases

In [5]:
countrydict = dict(zip(countrycodes['name'],countrycodes['iso 3166_2']))
clean_dict = {k: v for k, v in countrydict.items() if pd.Series(v).notna().all()}
countrydict2 = {'Antigua and Barbuda':'Antigua & Barbuda','Bahamas':'British Virgin Is.',
    'Bolivia, Plurinational State of':'Bolivia','Bosnia and Herzegovina':'Bosnia & Herzegovin', 'Bouvet Island' :'Bouvet I.',
     'British Indian Ocean Territory':'British Indian Ocea','Brunei Darussalam':'Brunei','Cayman Islands':'Cayman Is.',
      'Central African Republic':'Central African Rep','Christmas Island': 'Christmas I.', 'Cocos (Keeling) Islands':'Cocos Is.',
    'Congo, Democratic Republic of the':'Congo, DRC','Cook Islands':'Cook Is.',"Cote d'Ivoire":"Cote d'Ivory",
          'Falkland Islands (Malvinas)':'Falkland Is.','Faroe Islands':'Faroe Is.',
    'French Southern Territories':'French Southern & A','Heard Island and McDonald Islands':'Heard I. & McDonald',
     'Iran (Islamic Republic of)':'Iran',"Korea, Democratic People's Republic of":'North Korea',
    'Korea, Republic of':'South Korea',"Lao People's Democratic Republic":'Laos',
    'Macedonia, the Former Yugoslav Republic of':'Macedonia','Marshall Islands':'Marshall Is.',
    'Micronesia, Federated States of':'Micronesia','Moldova, Republic of':'Moldova','Pitcairn':'Pitcairn Is.',
    'Russian Federation':'Russia','Sao Tome and Principe':'Sao Tome & Principe',
      'Solomon Islands':'Solomon Is.','South Georgia and the South Sandwich Islands':'South Georgia & the',
    'Svalbard and Jan Mayen':'Svalbard','Saint Kitts and Nevis': 'St. Kitts & Nevis', 'Saint Lucia': 'St. Lucia',
 'Saint Pierre and Miquelon': 'St. Pierre & Miquel', 'Saint Vincent and the Grenadines': 'St. Vincent & the G',
      'Saint Helena':'St. Helena','Syrian Arab Republic':'Syria','Taiwan, Province of China':'Taiwan',
 'Tanzania, United Republic Of':'Tanzania','Gambia':'The Gambia','Bahamas':'The Bahamas',
    'Trinidad and Tobago':'Trinidad & Tobago','Turks and Caicos Islands':'Turks & Caicos Is.',
  'United Arab Emirates': 'United Arab Emirate', 'Viet Nam': 'Vietnam', 'Virgin Islands, British': 'Virgin Is.',
   'Wallis and Futuna':'Wallis & Futuna',  'United States Minor Outlying Islands': 'Virgin Islands, U.S.',           
    'Palestinian Territory, Occupied':'West Bank'}
revcountrydict2 = {y:x for x,y in countrydict2.items()}
countrycodes['name'].replace(countrydict2,inplace=True)

#### Build the transportation database with distances from each country and match country abbreviations

In [6]:
transdict = dict(zip(countrycodes['iso 3166_3'],countrycodes['shortcut']))

In [7]:
# this is the country that produces the food item
transport['country_eco_start']=transport['iso1'].map(transdict)
# this is the country at which the food item will be consumed
transport['country_eco_end']=transport['iso2'].map(transdict)
transport[['capitalport1','capitalport2','roaddistance']] = transport[['capitalport1','capitalport2','roaddistance']].fillna(0)

In [8]:
# factor to go from straight line to road distance
transport['roaddistance_km_new'] = transport['roaddistance_km_new']*1.2

In [159]:
# dictionary of all possible countries included in the optimization
trandict = pd.Series(transport['country1'].values,transport['country_eco_start'].values).to_dict()

In [161]:
# this was done to remove countries that were in the transport file but did not considered in food impact
# databases. Not necessary to run this code.
def removecountries():
    notfound = []
    for k,v in trandict.items():
        try:
            direc = '/home/walkerch/Optimization_Tool/Output'
            sub = pd.read_pickle(os.path.join(direc,k+'_trade.p'))
            sub.to_pickle(os.path.join(data_trade,k+'_trade.p'))
        except:
            notfound.append((k,v))
    return notfound
#notfound = removecountries()
#trandict = {k:v for k,v in trandict.items() if k not in [i[0] for i in notfound]}
#np.save(os.path.join(data_dir,'trandict.npy'),trandict)

# fail in the database building therefore removed as options in the dropdown until further investigation
#cntry_rev = pd.read_pickle('trandict.p')
#cntry_rev = cntry_rev.T.to_dict()[0]
#cntry = {v:k for k,v in cntry_rev.items()}
#nocountries = ['Antigua and Barbuda', 'Burundi', 'Burkina Faso', 'Bahamas, The', 'Belize', 'Bermuda',
#    'Barbados','Central African Republic', 'Cook Islands','Comoros', 'Djibouti', 'Dominica', 'Fiji',  'Faroe Islands', 'Guinea', 'Gambia, The', 'Grenada', 
#    'Greenland',  'Guyana','Kiribati','Saint Kitts and Nevis', 'Saint Lucia',  'Monaco','Madagascar',  'Maldives',
#    'Mali', 'Mauritania', 'Montserrat',  'Malawi','New Caledonia', 'Papua New Guinea', 'French Polynesia', 
 #   'Rwanda', 'Solomon Islands', 'Sierra Leone',    'Sao Tome and Principe', 'Suriname', 'Seychelles', 'Tonga', 'Uganda', 
#    'Saint Vincent and the Grenadines', 'Vanuatu', 'Bhutan', 'Afghanistan']
#dropdown = {}
#for k,v in cntry.items():
#    if k not in nocountries:
#        dropdown[k]=v
#import pickle
#with open('dropdown.pickle', 'wb') as handle:
#    pickle.dump(dropdown, handle, protocol=pickle.HIGHEST_PROTOCOL)
#with open('dropdown.pickle', 'rb') as handle:
#    country_dropdown_new = pickle.load(handle)

#### Functions to calculate the distances and impacts depending on food item and location of food source to location of consumption

In [11]:
def transportimpactsfly(countries,location):
    # this is the function to calculate distances and impacts if the food item is flown.
    # this function requires a list of countries from which each food item can come from, and to 
    # where they will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').
    editedtransport = transport[(transport['country_eco_start']==location)]
    transportimpactslist = []
    for i in countries:
        sub = editedtransport[(editedtransport['country_eco_end']==i)]
        try:  
            # if the countries are on different continents, this will have a value of the distance
            # to fly from production country to consumption country (center points of each), and
            # will calculate the impact
            dist_plane = (sub['flightdistance_km_new'].values[0]*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', 'dab659574eb0acdc7894d874752b3b90')"]['GHG'].values[0]/1000000)         
            # if the countries are on the same continent, this value will be 0, and thus have no flying
            # but will rather take into account impacts from road transport.
            if dist_plane == 0:
                dist_road = (sub['roaddistance_km_new'].values[0]*
                    other_impacts[other_impacts['key']==
                        "('cutoff35', 'b9986f3a64dc89380350a4be59f85da1')"]['GHG'].values[0]/1000000)
                transportimpactslist.append(dist_road)       
            else:
                transportimpactslist.append(dist_plane)
        except IndexError:
            impacts = 0
            transportimpactslist.append(impacts)
    return transportimpactslist

In [12]:
def transportfrozen(countries,location):
    # this is the function to calculate distances and impacts if the food item is frozen.
    # this function requires a list of countries from which each food item can come from, and to 
    # where they will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').   
    editedtransport = transport[(transport['country_eco_start']==location)]
    transportimpactslist = []
    for i in countries:
        sub = editedtransport[editedtransport['country_eco_end']==i]
        try:
            # if the production country is a different continent than the consumption country, 
            # it is assumed to come by ship. This calculates the distance and impact via ship 
            #(as frozen),and the road distance to get to the initial and final ports from the capital city.
            if sub['roaddistance_km_new'].values[0]==0:  
                dist_ship = (sub['seadistance'].values[0]*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', '44821b59b3166727b65c7b2fc63daab1')"]['GHG'].values[0]/1000000)
                dist_road = ((sub['capitalport1'].values[0]+sub['capitalport2'].values[0])*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', 'd2f13d15af29946b73584e52978f4520')"]['GHG'].values[0]/1000000)

                impacts = dist_ship+dist_road
                transportimpactslist.append(impacts)
            # if the production country is the same continent, it was assumed no ships were necessary 
            # and only frozen road impacts were considered.
            else:
                dist_road2 = (sub['roaddistance_km_new'].values[0]*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', 'd2f13d15af29946b73584e52978f4520')"]['GHG'].values[0]/1000000)
                transportimpactslist.append(dist_road2)
        except IndexError:
                impacts = 0
                transportimpactslist.append(impacts)
    return transportimpactslist

In [13]:
def transportimpacts2(countries,location): 
    # this is the function to calculate distances and impacts if the food item is refrigerated.
    # this function requires a list of countries from which each food item can come from, and to 
    # where they will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').  
    editedtransport = transport[(transport['country_eco_start']==location)]
    transportimpactslist = []
    for i in countries:
        sub = editedtransport[(editedtransport['country_eco_end']==i)]
        try:
            # if the production country is a different continent than the consumption country, 
            # it is assumed to come by ship. This calculates the distance and impact via ship 
            #(as frozen),and the road distance to get to the initial and final ports from the capital city.           
            if sub['roaddistance_km_new'].values[0]==0:  
                dist_ship = (sub['seadistance'].values[0]*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', '61441303ba5832f6f371d58ba9bfc7c0')"]['GHG'].values[0]/1000000)
                dist_road = ((sub['capitalport1'].values[0]+sub['capitalport2'].values[0])*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', 'b9986f3a64dc89380350a4be59f85da1')"]['GHG'].values[0]/1000000)

                impacts = dist_ship+dist_road
                #transportimpactslist.append((i,impacts,'if')) # this was to check
                transportimpactslist.append(impacts)
            # if the production country is the same continent, it was assumed no ships were necessary 
            # and only frozen road impacts were considered.
            else:
                dist_road2 = (sub['roaddistance_km_new'].values[0]*
                         other_impacts[other_impacts['key']==
                        "('cutoff35', 'b9986f3a64dc89380350a4be59f85da1')"]['GHG'].values[0]/1000000)
                transportimpactslist.append(dist_road2)
        except IndexError:
                impacts = 0
                transportimpactslist.append(impacts)
    return transportimpactslist

## Seasonality of Fresh Fruits and Vegetables
- which countries grow which crops during which months for each fresh fruit and vegetable
- this is only applicable to fresh fruits and vegetables. Any processed items (canned, frozen, or dried), items capable of longer term storage (grains, nuts, processed food items), or items not influenced by seasonality (fish, meats and meat/dairy products). Fish capture could be seasonal, but this was not included.
- this was based on Pfister water demand schedule. **This work could be greatly improved by incorporating country specific fruit and vegetable seasonality rather than depending on the water demand schedule**

In [14]:
newseasonality['crop'] = newseasonality['crop_jan']
## add mushrooms to always be in season (can be grown year round)
newseasonality.loc[newseasonality['crop']=='mushroom',:'CI_dec'] = 1

In [15]:
mnths = ['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec']
# shift by two months to account for harvest time rather than simply water demand. Crops
# demanding water won't be available immediately, but rather have a delay until harvest time.
mnths_change = ['mar','apr','may','jun','jul','aug','sep','oct','nov','dec','jan','feb'] 
seasonality = pd.DataFrame()
for i,j in zip(mnths,mnths_change):
    sub = newseasonality.filter(regex=i)
    sub.columns = [col.replace('_%s'%i,'_%s'%j)for col in sub.columns]
    seasonality = pd.concat([seasonality,sub],axis=1)
seasonality['crop'] = newseasonality['crop']

In [16]:
# make these products avaiable year round due to storage but only in Switzerland 
# because of trade restrictions and because they are stored long term (Stoessel)
listofcolumns = seasonality.filter(regex='CH').columns.tolist()
mask = (seasonality['crop'].str.contains('carrot'))|(seasonality['crop'].str.contains('onion'))|\
        (seasonality['crop'].str.contains('^apple'))|(seasonality['crop'].str.contains('^potato')
                                                     |(seasonality['crop'].str.contains('kiwi')))
seasonality.at[mask,listofcolumns] = 1

In [17]:
seasonalitydict = seasonality.set_index('crop').T.to_dict('dict')

In [18]:
def seasonalavailability(location,month):
    # this function sees what crops are in season in the given month in what countries,
    # and matches the regional ecoinvent impacts with the countries that produce those crops.
    # if ecoinvent country specific impact data is available for a country, that is used directly.
    # if ecoinvent country specific impact data is not available, either RoW or Global impacts
    # were used for that country.
    # final output is a column that includes all countries that produce the items for that month,
    # and the impacts associated with production in that country for climate change and biodiversity loss.  
    
    
    df = dfimport # df that has impacts based on ecoinvent and biodiversity loss for each crop in it.
    # find seasonal availability for all fresh fruits and vegetables and make a new column 
    # with the production impacts and locations
    
    # this isolates fresh fruits and vegetables, as seasonality is only important for non-processed items.
    maskf_v = (df['Group'].isin(['DGC','DGR','FAT']))&(~df['Food Name'].str.contains('canned|frozen|pickled|dried'))
    seasonalitydictmonth = {k1:{k2:v2 for k2,v2 in v1.items() if ((month in k2)&(v2==1)) } for k1,v1 in seasonalitydict.items()}
    df['inseason'] = df[maskf_v]['root *'].map(seasonalitydictmonth) # the countries that produce it that month for fresh foods
    
    # this is all other products in which seasonality doesn't matter.
    maskother = ~maskf_v
    countryproddict = {k1:{k2:v2 for k2,v2 in v1.items() if (v2==1) } for k1,v1 in seasonalitydict.items()}   
    df['inseason2'] = df[maskother]['root *'].map(countryproddict) # all countries that can produce it for processed foods
    
    def countries(row1):
        try:
            c = list(set([i[0] for i in row1]))
        except: 
            c = 'no global production'
        return c
     
    df['inseason'] = df['inseason'].fillna(df['inseason2']) # combines inseason columns into a dictionary of all country sources
                                                            # of that food item for that month. Does not include non-crop items.
    df = df.drop('inseason2',1)
    
    def breakapart(row1,row2):
        # take the countries this product is produced in during this month, and find the overlapping regional impacts from ecoinvent
        try:
            c= [i.split('_')[0] for i in row1]
            newlist = list(set(c).intersection([i[0] for i in row2]))
            newsubsea1 = list(set(c)-set(newlist))
            for item in row2:
                if item[0] in ['RoW','GLO']:
                    generic = [(i,item[1]) for i in newsubsea1]
                else: pass
            b = [item for item in row2 if item[0] in newlist]+generic
        except:
            c='Not in season anywhere'
            newlist = 'no overlap with regional production data'
            b = 'Not in season anywhere'
        return b
    df['seasonal_kgCO2_updated'] = df.apply(lambda row:breakapart(row['inseason'],\
                                                            row['regional_kgCO2_gram']),axis=1)
    df['seasonal_landbio_updated'] = df.apply(lambda row:breakapart(row['inseason'],\
                                                            row['regional_landbio_pergram']),axis=1)
    df.loc[maskother,'seasonal_kgCO2_updated'] = np.where(df.loc[maskother,'seasonal_kgCO2_updated']=='Not in season anywhere',
                    df.loc[maskother,'seasonal_kgCO2'],df.loc[maskother,'seasonal_kgCO2_updated'] )
    
    df.loc[maskother,'seasonal_landbio_updated'] = np.where(df.loc[maskother,'seasonal_landbio_updated']=='Not in season anywhere',
        df.loc[maskother,'regional_landbio_pergram'],df.loc[maskother,'seasonal_landbio_updated'] )   
    
    df['countriesthatproduceit_season_GHG'] = df.apply(lambda row:countries(row['seasonal_kgCO2_updated']),axis=1) 
    df['countriesthatproduceit_season_BIO'] = df.apply(lambda row:countries(row['seasonal_landbio_updated']),axis=1)
    return df

## Processing and Home Cooking Life Cycle Stages
####  electricity (regional) and steam impacts

In [19]:
def elecprocessingimpacts(countries,location,elec):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH'), as well 
    # the impact to produce the electricity in the consumption country (elec).  
    proc_elec_impacts_loc = []
    if any(location not in s for s in countries):
        # calculates impacts for each production country based on the required electricity use to process
        for i in countries:
            try:
                prodelecimpacts = elec*other_impacts[(other_impacts['country']==i)&
                    (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
                proc_elec_impacts_loc.append(prodelecimpacts)
            # if no data available for production country, defaults to RoW electricity impacts
            except IndexError:
                prodelecimpacts = elec*other_impacts[(other_impacts['country']=='RoW')&
                    (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
                proc_elec_impacts_loc.append(prodelecimpacts)
    else:
        # calculates impacts for processing in home country
        prodelecimpact=elec*other_impacts[(other_impacts['country']==location)&
                                          (other_impacts['unit']=='kilowatt hour')]['GHG'].values[0]
        proc_elec_impacts_loc.append(prodelecimpact)
    proc_elec_impacts_loc = [0 if math.isnan(x) else x for x in proc_elec_impacts_loc]
    return proc_elec_impacts_loc

In [20]:
def steamprocessingimpacts(countries,prodsteam):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH'), as well 
    # the impact to produce steam. As steam production is not regionalized, only one impact 
    # value was used.
    prodsteamimpact = prodsteam*other_impacts[(other_impacts['name'].str.contains('steam',na=False))
                             &(other_impacts['country']=='RoW')]['GHG'].values[0]/3.6
    steamimpacts = [prodsteamimpact]*len(countries)
    steamimpacts = [0 if math.isnan(x) else x for x in steamimpacts]
    return steamimpacts

In [21]:
def homecookingbeans(countries,location):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').
    # This function only calculates impacts to cook dried beans in the consumption country
    
    cookingenergy = 0.0011625 #kWh/gram
    testimpact = other_impacts[(other_impacts['country']==location)&\
                                    (other_impacts['unit'].str.contains('kilowatt hour'))]['GHG'].values[0]
    
    driedbeans1 = cookingenergy*testimpact
    driedbeans2 = [driedbeans1]*len(countries)
    driedbeans = [0 if math.isnan(x) else x for x in driedbeans2]
    return driedbeans

In [22]:
def homecookingvegetablesmeats(countries,location):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').
    # This function calculates impacts to cook other foods in the consumption country.      
    
    #Energy Use for Cooking and Other Stages in the Life Cycle of Food
    mincooking = 0.000356481 # kWh/gram
    maxcooking = 0.000939815 # kWh/gram
    avgcooking = np.mean([mincooking,maxcooking])
    impact = other_impacts[(other_impacts['country']==location)&\
                                    (other_impacts['unit'].str.contains('kilowatt hour'))]['GHG'].values[0]   
    test1 = avgcooking*impact
    test2 = [test1]*len(countries)
    homecooked = [0 if math.isnan(x) else x for x in test2]  
    return homecooked

## Storage Life Cycle Stage Impacts

In [23]:
def storagefunccooled(countries,location,month):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').
    # This function only calculates refrigerated storage impacts in the consumption country  for
    # milk, cheese, and fresh meat items based on Stoessel
    
    length = len(countries)
    meat_cheese_milk = []
    for i in ['freshmeat','cheese','milk']:
        hoursstored = (processedfoodresidencetime[processedfoodresidencetime['product']==i]['warehouse_h'].values[0]
        +processedfoodresidencetime[processedfoodresidencetime['product']==i]['store_h'].values[0])
        elec = np.mean(storagedisplayenergy[storagedisplayenergy['product'].str.contains(i)]['energy_kWh_kg_h'])/1000# change to per gram
        total = hoursstored*elec # total impacts to store cooled food long term  
        impacts = other_impacts[(other_impacts['country']==location)&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
        cooledstoragedisplay = []
        try:
            storageimpacts = total*impacts
            cooledstoragedisplay.append(storageimpacts)
        except IndexError:
            storageimpacts = total*other_impacts[(other_impacts['country']=='RoW')&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
            cooledstoragedisplay.append(storageimpacts)
        cooledstorage = cooledstoragedisplay*length
        meat_cheese_milk.append(cooledstorage)

    return meat_cheese_milk

In [24]:
def storagefuncfrozen(countries,location,month):
    # this function requires a list of countries from which each food item can come from, and  
    # where it will be consumed as inputs (ex. countries = ['ES','US','CA'],location = 'CH').
    # This function only calculates frozen storage impacts in the consumption country  for
    # frozen food items based on Stoessel
    
    length = len(countries)
    hoursstored = 10*30*24 # 10 months, 30 days per month, 24 hours per day
    elec = np.mean(longtermstorage[longtermstorage['product'].str.contains('frozen')]['energyperkgfoodperhour_kWh'])/1000# change to per gram
    total = hoursstored*elec # total impacts to store frozen food long term  
    
    hoursstoreddisplay = 120 #hrs
    elecdisplay = np.mean(storagedisplayenergy[storagedisplayenergy['product'].str.contains('frozen')]['energy_kWh_kg_h'])/1000
    totaldisplay = hoursstoreddisplay*elecdisplay
    
    
    frozenstoragelong = []
    frozenstoragedisplay = []
    try:
        storageimpacts = total*other_impacts[(other_impacts['country']==location)&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
        frozenstoragelong.append(storageimpacts)
        storageimpacts2 = totaldisplay*other_impacts[(other_impacts['country']==location)&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]  
        frozenstoragedisplay.append(storageimpacts2)    
        
    except IndexError:
            storageimpacts = total*other_impacts[(other_impacts['country']=='RoW')&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
            frozenstoragelong.append(storageimpacts)
            storageimpacts2 = totaldisplay*other_impacts[(other_impacts['country']=='RoW')&
                        (other_impacts['name']=='market for electricity, medium voltage')]['GHG'].values[0]
            frozenstoragedisplay.append(storageimpacts2)
    frozenstoragelong = frozenstoragelong*length
    frozenstoragedisplay = frozenstoragedisplay*length
    totalstorage = np.array(frozenstoragedisplay)+np.array(frozenstoragelong)
    totalstorage = totalstorage.tolist()
    return totalstorage

### Adding life cycle stage impacts and their contribution to the total impact

In [25]:
def sumimpacts(base,trans,elec,heat,sto_fro,sto_ref,home):
    try:
        add =   [(n,m+t+e+h+l+d+hc) for (n,m,t,e,h,l,d,hc) in zip(
           [i[0] for i in base],[i[1] for i in base],[i for i in trans],[i for i in elec],
            [i for i in heat],[i for i in sto_fro],[i for i in sto_ref],[i for i in home])]
        return add
    except:return 'Not in season'

In [26]:
def percentimpacts(base,trans,elec,heat,sto_fro,sto_ref,home):
    try:
        add = [((m+t+e+h+l+d+hc),('prod_%s'%n,round((m/(m+t+e+h+l+d+hc))*100)),\
             ('trans',round((t/(m+t+e+h+l+d+hc))*100)), ('elecprod',round((e/(m+t+e+h+l+d+hc))*100)),
             ('steamprod',round((h/(m+t+e+h+l+d+hc))*100)),('homecooking',round((hc/(m+t+e+h+l+d+hc))*100)),
             ('longfreez',round((l/(m+t+e+h+l+d+hc))*100)), ('display',round((d/(m+t+e+h+l+d+hc))*100)))    
            for n,m,t,e,h,l,d,hc in zip([i[0] for i in base],[i[1] for i in base],[i for i in trans],[i for i in elec],
            [i for i in heat],[i for i in sto_fro],[i for i in sto_ref],[i for i in home])]
        return add
    except:return 'Not in season'

### These are functions to improve visualization, searching, labeling, etc. Not necessary for calculating impacts

In [27]:
def addtradedata(column1,column2,column3,column4,location):
    try:
        tr = column1 +[(location,0)]
        newlist = list(set([i[0] for i in column2]).intersection([i[0] for i in tr]))
        tr = [item for item in column2 if item[0] in newlist]
        return tr
    except: 
        tr = column2
        return tr

In [28]:
def minimpcou(row1,row2):
    try:return sorted([(j,i) for i,j in row1])[0][1]
    except:
        try:
            return sorted([(j,i) for i,j in row2])[0][1]
        except:return 0
def minimp(row1,row2):
    try:return sorted([(j,i) for i,j in row1])[0][0]
    except:
        try:return sorted([(j,i) for i,j in row2])[0][0]
        except:return 0

In [29]:
def getothervalue(column1,column2):
    try:return [i[1] for i in column1 if column2 in i][0]
    except: 
        try:return np.mean([i[1] for i in column1])
        except: return 0

In [30]:
def biocalc(column1,column2,column3):
    newlist1 = [i[0] for i in column1]
    newlist2 = [i for i in column3 if i[0] in newlist1]
    newlist3 = [i for i in newlist2 if not any(isinstance(n, float) and math.isnan(n) for n in i)]
    return newlist3

In [31]:
# A FUNCTION TO EDIT THE STRINGS SHOWN IN THE DATABASE. THIS IS SIMPLY FOR BETTER VISUALIZATION.
def newcolumn(row1,row2):
    if row2 =='FAT':
        keep = ['frozen', 'dessicated','canned']
        name = [i for i in row1.split(', ')][:1]
        other = [i for i in row1.split(', ') if any(i.startswith(s) for s in keep)]
        final = ', '.join(name+other)
        #name = [i for i in row1.replace(',','').split() if i in keep]#[0]
        return final
    
    if row2 in ('DGR','DGC','DFR','DFC','DAR','DAC'):
        keep = ['frozen', 'boiled in unsalted water','canned','baby','steamed','mature','pickled','fried','baked',\
                'cherry','red','white/mooli','acorn','butternut','spaghetti','green','bulbs','dried','sugar-snap'\
               ,'microwaved','flesh']
        name = [i for i in row1.split(', ')][:1]
        other = [i for i in row1.split(', ') if any(i.startswith(s) for s in keep)]
        final = ', '.join(name+other)
        return final        
    if row2 in ('DBR','DBC'):
        keep = ['frozen', 'boiled','canned','dried']
        name = [i for i in row1.split(', ')][:2][::-1]
        name = [name[0]+' '+name[1]]
        other = [i for i in row1.split(', ') if any(i.startswith(s) for s in keep)]
        final = ', '.join(name+other)
        return final      
    if row2 in ('BAE','BAH','BAK','BLS','BLM','BLH','BLF','BN','BJC','BNV','BAV'):
        try:
            
            return ([i for i in row1.split(', ')][1]+' '+[i for i in row1.split(', ')][0]+' '+\
                   ', '.join([i for i in row1.split(', ')][2:]))
        except IndexError:
            try:
                return ([i for i in row1.split(', ')][1]+' '+[i for i in row1.split(', ')][0])
            except IndexError:
                return ([i for i in row1.split(', ')][0])           
    if row2 in('MI'):
        return row1.replace('_avg','')
   
    else: 
        return row1
        

### Function only used in the case of Swiss diet

In [32]:
# function only used in the Swiss case to force chosing Swiss products despite lower impact items
# being available.
def editforCH(df):
    mask = (df['Food Name'].str.contains('carrot'))|(df['Food Name'].str.contains('onion'))|\
              (df['Food Name'].str.contains('^apple'))|(df['Food Name'].str.contains('^potato'))|\
                (df['Food Name'].str.contains('cheese'))|\
            ((df['Food Name'].str.contains('milk'))&(~df['Food Name'].str.contains('soy')))

    for i in df[mask].index.tolist():
        try:
            df.at[i,'optimization_value_GHG_1_trade'] = [j[1] for j in df.loc[i,'trade_impacts_GHG'] if j[0]=='CH'][0]
            df.at[i,'optimization_country_GHG_1_trade'] = [j[0] for j in df.loc[i,'trade_impacts_GHG'] if j[0]=='CH'][0]
            df.at[i,'optimization_value_BIO_1_trade'] = [j[1] for j in df.loc[i,'trade_impacts_BIO'] if j[0]=='CH'][0]
            df.at[i,'optimization_country_BIO_1_trade'] = [j[0] for j in df.loc[i,'trade_impacts_BIO'] if j[0]=='CH'][0]
        except:
            try:
                df.at[i,'optimization_value_GHG_1_trade'] = min([(j[1],j[0]) for j in df.loc[i,'trade_impacts_GHG']])[0]
                df.at[i,'optimization_country_GHG_1_trade'] = min([(j[1],j[0]) for j in df.loc[i,'trade_impacts_GHG']])[1]
                df.at[i,'optimization_value_BIO_1_trade'] = min([(j[1],j[0]) for j in df.loc[i,'trade_impacts_BIO']])[0]
                df.at[i,'optimization_country_BIO_1_trade'] = min([(j[1],j[0]) for j in df.loc[i,'trade_impacts_BIO']])[1]        
            except:
                df.at[i,'optimization_value_GHG_1_trade'] = 0
                df.at[i,'optimization_country_GHG_1_trade'] = 0
                df.at[i,'optimization_value_BIO_1_trade'] = 0
                df.at[i,'optimization_country_BIO_1_trade'] = 0                
           
   
    return df

## Function below uses all functions above to build the food item, nutrient, impact database that is pulled into the diet optimization tool

In [107]:
def calc_impacts(location,month):
    
########   BUILD COUNTRY/MONTH DATABASE  ###################################################################    

######################################     seasonality   ##################################################

    # run the seasonality function to see which countries produce which food items and when.
    
    df = seasonalavailability(location,month)
    df['Food Item'] = df.apply(lambda row:newcolumn(row['Food Name'],row['Group']),axis=1)

######################################     new columns to be populated      ##########################    

    for i in ['home_cooked','transport','storage_frozen','storage_refrig','proc_elec','proc_heat','totalGHG_all',
              'percentimpacts_all','optimization_country_GHG_1_all','optimization_value_GHG_1_all',
                'optimization_country_BIO_1_all','optimization_value_BIO_1_all','trade_impacts_GHG','trade_impacts_BIO',
                'optimization_value_GHG_1_trade','optimization_country_GHG_1_trade','bio_GHGopt_value1',
              'optimization_value_BIO_1_trade','optimization_country_BIO_1_trade','GHG_bioopt_value1',
               'trade_impacts_GHG','trade_impacts_BIO']:
        df[i]=0
        df[i]=df[i].astype(object)
        
######################################     home cooking impacts  ###########################################

    # see Diet_Optimization_Tool.iypnb for what the group codes mean
    # these impacts are based on the electricity mix of the country of consumption

    # isolate food items that require home cooking for dried beans
    maskdriedbeans = (df['Food Name'].str.contains('dried'))&((df['Group'].str.startswith('DBR'))|\
        (df['Group'].str.startswith('DFR')))&(~df['Food Name'].str.contains('canned'))#maskdriedbeans
    boiledathomebean =  (df['Food Name'].str.contains('beans|lentils'))&(df['Food Name'].str.contains('boiled'))#boiledathomebean
    homecooking = np.logical_or(maskdriedbeans,boiledathomebean)
    
    # isolate other food items that would require cooking prior to consumption
    cookedgrains = (df['Group'].str.startswith('AC'))|(df['Group'].str.startswith('DA'))|\
        (df['Group'].str.startswith('AA'))|(df['Group'].str.startswith('AD'))
    cookedmeatandfish = ((df['Group'].str.startswith('M'))&(df['Group'].str.endswith('C')))|\
          ((df['Group'].str.startswith('J'))&(df['Group'].str.endswith('C')))
    cookedstarch = df['Group']=='DAC';vegproducts = df['Group']=='VEG'; vegcooked =df['Group']=='DGC' # cookedveggies
    cookedeggs = (df['Group'].str.startswith('CA'))
    cookedall = pd.Series(np.any((vegproducts,vegcooked,cookedgrains,cookedeggs,cookedmeatandfish,cookedstarch),axis=0))
    
    # calculate the cooking impacts and add them to the home_cooked column
    df.at[homecooking,'home_cooked'] = df[homecooking].apply(lambda row:homecookingbeans(\
                [i[0] for i in row['seasonal_kgCO2_updated']],location),axis=1) # add impacts for cooking beans
    df.at[cookedall,'home_cooked'] = df[cookedall].apply(lambda row:homecookingvegetablesmeats(\
                [i[0] for i in row['seasonal_kgCO2_updated']],location),axis=1)  # add impacts for cooking everything else
    df.at[~(homecooking|cookedall),'home_cooked'] = df[(~(homecooking|cookedall))]\
        .apply(lambda row:[0]*(len([i[0] for i in row['seasonal_kgCO2_updated']])),axis=1) # add 0 for things that don't need to be cooked

######################################     transport impacts   ############################################
    
    # isolate meat, fish and frozen items to capture impacts from frozen transport
    meat = [1035,1058,1166,1501,1527,1540,1553,1562,1570,867,977] # roots for meat products
    frozenalways = df['Food Name'].str.contains('frozen') # frozen food products
    frozenfortransport = df['root *'].isin(meat)&(~df['Food Name'].str.contains('canned'))
    frozen = np.logical_or(frozenalways,frozenfortransport)

    # isolate canned food items to incorporate additional impacts due to higher weight
    cannedprocessed = df['Food Name'].str.contains('canned|pickled')

    # isolate fresh food items that will likely be flown because they are delicate
    maskflown = (df['root *'].isin(['raspberry','blueberry','strawberry','lettuce','spinach',
                    'papaya','asparagus','tropicalnes','mango'])& df['Food Name'].str.contains('raw')&
                  ~df['Food Name'].str.contains('frozen|canned'))

    # all remaining food items
    remaining = ~pd.Series(np.any((cannedprocessed,frozen,maskflown),axis=0))

    # calculate transport impacts zip1: masks above, zip2:functions to calculate transport impacts
    # see separate functions above for details of calculation methods.
    for mask,function in zip([maskflown,frozen,cannedprocessed,remaining],\
                 [transportimpactsfly,transportfrozen,transportimpacts2,transportimpacts2]):
        df.at[mask,'transport'] = df.loc[mask].apply(lambda row:
                                function([i[0] for i in row['seasonal_kgCO2_updated']],location),axis=1)

    # double transport impacts for canned items to account for weight of can and added water
    df.at[cannedprocessed,'transport'] = df.loc[cannedprocessed].apply(lambda row:([i*2 for i in row['transport']]),axis=1)
    
######################################     storage impacts   ############################################
    
    # isolate food items that require refrigerated or frozen storage (frozen items are above)
    meatstorage = df['root *'].isin(meat)&(~df['Food Name'].str.contains('frozen|canned|dried')) #refrigerated
    milk = (df['root *']==882)&(~df['Food Name'].str.contains('cheese|butter')) #refrigerated
    cheese = (df['root *']==882)&(df['Food Name'].str.contains('cheese|butter')) #refrigerated

    # calculate impacts for frozen storage
        ### as of right now, this assumes maximum storage time. Could add a variable time depending on 
        ### how far away from harvest time to consumption time it is consumed.
    df.at[frozenalways,'storage_frozen'] = df[frozenalways].apply(
        lambda row:storagefuncfrozen([i[0] for i in row['seasonal_kgCO2_updated']],location,month),axis=1)
    # zero values for anything not requiring refrigeration    
    df.at[~frozenalways,'storage_frozen'] = df[~frozenalways].apply(
        lambda row:[0]*(len([i[0] for i in row['seasonal_kgCO2_updated']])),axis=1)

    # calculate impacts for refrigerated storage
    for prod,num in zip([meatstorage,milk,cheese],[0,1,2]):
        df.at[prod,'storage_refrig'] = df.loc[prod].apply(lambda row:storagefunccooled\
        ([i[0] for i in row['seasonal_kgCO2']],location,month)[num],axis=1)
    # zero values for anything not requiring refrigeration
    df.at[~pd.Series(np.any((meatstorage,milk,cheese),axis=0)),'storage_refrig'] = \
            df.loc[~pd.Series(np.any((meatstorage,milk,cheese),axis=0))].\
            apply(lambda row:[0]*(len([i[0] for i in row['seasonal_kgCO2_updated']])),axis=1)

    
######################################     processing impacts   ############################################    
    
    # isolate all food items undergoing processing
    mask1 = df['Processing_kWh_gram_elec'].notnull()
    mask2 = df['Processing_kWh_gram_steam'].notnull()
    
    # calculate impacts for all items requiring processing, and add zero for those not requiring it.
        ## impacts are based on the electricity mix in which the product was grown and processed
    df.at[mask1,'proc_elec'] = df[mask1].apply(lambda row:elecprocessingimpacts([i[0] for i in row['seasonal_kgCO2_updated']],\
                                location,row['Processing_kWh_gram_elec']),axis=1)
    df.at[~mask1,'proc_elec'] = df[~mask1].apply(lambda row:[0]*len([i[0] for i in row['seasonal_kgCO2_updated']]),axis=1)
    df.at[mask2,'proc_heat'] = (df[mask2]['Processing_kWh_gram_steam']*other_impacts[(other_impacts['name'].str.contains('steam',na=False))
                    &(other_impacts['country']=='RoW')]['GHG'].values[0]/3.6)
    df.at[mask2,'proc_heat']=df[mask2].apply(lambda row:[row['proc_heat']]*len([t[0] for t in row['seasonal_kgCO2_updated']]),axis=1)
    df.at[~mask2,'proc_heat'] = df[~mask2].apply(lambda row:[0]*len([i[0] for i in row['seasonal_kgCO2_updated']]),axis=1)
    

##############################         add all life cycle stage impacts   #####################################        
    
    df['totalGHG_all'] = df.apply(lambda row:sumimpacts(row['seasonal_kgCO2_updated'],row['transport'],
                    row['proc_elec'],row['proc_heat'],row['storage_frozen'],row['storage_refrig'],
                   row['home_cooked']),axis=1)
    df['percentimpacts_all'] = df.apply(lambda row:percentimpacts(row['seasonal_kgCO2_updated'],row['transport'],
                    row['proc_elec'],row['proc_heat'],row['storage_frozen'],row['storage_refrig'],
                   row['home_cooked']),axis=1)
    

######################################     incorporate trade   ############################################
    
    # totalGHG_all is impacts from all producing countries.
    # trade_impacts_GHG is impacts from only countries trading with the country of consumption 
    # for that particular food item

    trade = findfiles(location+'_trade.p','')
    newdf = pd.merge(df,trade,how = 'left', right_on = 'Food', left_on = 'tradename')
    newdf['trade_impacts_GHG'] = newdf.apply(lambda row:addtradedata(row['Trade'],row['totalGHG_all'],row['Food Name'],\
                                                                row['Group'],location),axis=1)
    mask = newdf['trade_impacts_GHG'].str.len()==0
    newdf.at[mask,'trade_impacts_GHG']=newdf[mask]['totalGHG_all']
    

###################################### find country with lowest impact ############################################    

    newdf['seasonal_landbio_updated']=newdf.apply(lambda \
        row:[i for i in row['seasonal_landbio_updated'] if not any(isinstance(n, float) and math.isnan(n) for n in i)],axis=1)
    
    newdf['optimization_value_GHG_1_trade'] = newdf.apply(lambda row:minimp(row['trade_impacts_GHG'],row['totalGHG_all']),axis=1)
    newdf['optimization_country_GHG_1_trade'] = newdf.apply(lambda row:minimpcou(row['trade_impacts_GHG'],row['totalGHG_all']),axis=1)
    newdf['bio_GHGopt_value1'] = newdf.apply(lambda row:\
            getothervalue(row['seasonal_landbio_updated'],row['optimization_country_GHG_1_trade']),axis=1)
    
    newdf['trade_impacts_BIO'] = newdf.apply(lambda row:biocalc(row['trade_impacts_GHG'],\
           row['optimization_country_GHG_1_trade'],row['seasonal_landbio_updated']),axis=1)
    
    newdf['optimization_country_BIO_1_trade'] = newdf.apply(lambda row:minimpcou(row['trade_impacts_BIO'],\
                                                    row['seasonal_landbio_updated']),axis=1)
    
    newdf['optimization_value_BIO_1_trade'] = newdf.apply(lambda row:minimp(row['trade_impacts_BIO'],\
                                                    row['seasonal_landbio_updated']),axis=1)
    newdf['GHG_bioopt_value1'] = newdf.apply(lambda row:\
            getothervalue(row['totalGHG_all'],row['optimization_country_BIO_1_trade']),axis=1)
    
    # this was only done for Switzerland to match with known food availabilities (see function above)
    if location == 'CH':
        newdf = editforCH(newdf)
    else:pass

    
######### formatting for User_Input file.
    newdf['FoodItem_2']=newdf['Food Name']
    newdf['FoodItem_2'] = newdf['Food Name'].map(lambda x: x.replace('dried,','') if 'canned' in x else x)
    return newdf

In [70]:
# to run just this notebook to get the dataframe with impacts specific to a country and month:
#df = calc_impacts('CH','aug')