## Food for thought: how to eat sustainably

### Note to reader
This is a very rough draft. The order and layout is nowhere near final; some visualizations will be made more compact using subplots and interactivity, some visualizations will be drastically altered or left out entirely.


### Introduction

In the face of growing concerns about environmental sustainability, understanding the impact of our food choices has become paramount. This data story aims to scientifically examine the sustainability of various food sources, taking into account crucial factors like CO2 emissions, water usage, and the distinction between local and global production. By presenting this information concisely and incorporating charts and plots, our goal is to provide accessible insights that empower individuals to make informed decisions for more sustainable food consumption practices.

A significant debate centers around whether local or global food production is more environmentally preferable. Advocates of the local perspective argue that transportation emissions significantly contribute to the carbon footprint of our food. They stress the importance of reducing the distance between farm and table to enhance sustainability. Conversely, proponents of the global perspective emphasize evaluating the environmental impact of different food types, such as animal products versus plant-based alternatives. Metrics like calories or protein produced per unit of emissions are considered crucial in this evaluation.

To comprehensively assess the sustainability of food sources, this data story will analyze a range of common options. Environmental impact will be examined across multiple dimensions, including CO2 emissions, water usage, and the balance between local and non-local production.

### Dataset and preprocessing

The dataset we used can be downloaded from https://www.science.org/doi/10.1126/science.aaq0216. It is called 'aaq0216_datas1.xls'. The dataset encompasses variables such as land use (in m2), greenhouse gas emissions (in kg CO2-eq), eutrophication (in kg PO43-eq) and freshwater (in liters), for 43 different agricultural products. The dataset is an xls file containing multiple Excel sheets. 

To look at the global totals of Greenhouse gas emissions for each product, for example, we can look at the following dataframe:

In [46]:
import plotly.graph_objs as go
import plotly.express as px
import matplotlib as plt
import pandas as pd
import numpy as np
import seaborn as sns
from plotly.subplots import make_subplots
from ipywidgets import interact, interactive, fixed, interact_manual
from ipywidgets import GridspecLayout
import ipywidgets as widgets

layout = go.Layout(
        font=dict(
        family=""""Lato, "Helvetica Neue", Helvetica, Arial, "Liberation Sans", sans-serif""",
        ),
        title=dict(font=dict(size=24)),
        newshape_label_padding=8,
        margin_pad=5,
        legend=dict(font=dict(size=14)),
    )

df_ghg = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, F:L")
df_ghg['Total'] = df_ghg[['LUC', 'Feed', 'Farm', 'Processing', 'Transport', 'Packging', 'Retail']].sum(axis=1)
df_ghg.head()


# helper functions 

def get_traceindex(category, fig):
    for i, trace in enumerate(fig.data):
        if trace.name == category:
            return i
    raise ValueError(f'No trace with name "{category}" found in figure')

Here, we can see that the greenhouse gas emissions are divided into different stages of the production and distribution of the product. We added another column to see the total amount of greenhouse gas emissions, as that is often the most important metric to compare.

### Brief description of arguments (only for draft)
Debate 1: Emissions vs Other Factors

Perspective 1: Greenhouse Gas Emissions
- Argument 1: Emphasizing the Carbon Footprint
  - This argument highlights the importance of considering greenhouse gas emissions as the primary sustainability factor when evaluating food sources.
- Argument 2: Comparative Emissions Analysis
  - This argument suggests comparing the emissions produced by different food sources to identify the most environmentally friendly options.
- Visualization 1: Amount of freshwater withdrawals per nutritional units
  - This visualization illustrates the disparity in water usage per nutritional unit across various food sources, highlighting the substantial water footprint associated with nuts and cheese compared to plant-based alternatives. It supports the argument that emissions should be a primary consideration when evaluating sustainability.

Perspective 2: Multiple Sustainability Factors
- Argument 1: Holistic Sustainability Assessment
  - This argument advocates for considering factors beyond emissions, such as land use, in order to assess the overall sustainability of food sources.
- Argument 2: Environmental Impact Analysis
  - This argument emphasizes the need for a comprehensive evaluation of food production's broader environmental consequences.
- The plan is for Visualisation 3 to support this perspective.

Debate 2: Global vs Local Sourcing

Perspective 1: Importance of Local Production
- Argument 1: Emissions from Food Transportation
  - This argument highlights the significant emissions generated during the transportation of food and advocates for reducing them through local sourcing.
- Argument 2: Total Emissions Reduction
  - This argument asserts that prioritizing local production can contribute to a substantial reduction in the overall emissions associated with food consumption.
- Visualization 4: Scatter plot of global production and greenhouse gas emissions
  - This visualization demonstrates the correlation between high global production and lower emissions per kilogram, supporting the argument that producing food locally can help reduce emissions.
- Visualization 5: Bar chart of transport emissions per product
  - This visualization highlights the significant CO2 costs associated with food transport, supporting the argument that sourcing locally can reduce emissions. However, it may require further refinement to enhance its effectiveness.

Perspective 2: Food Type Matters
- Argument 1: Evaluation of Food Choices
  - This argument emphasizes the importance of considering the type of food consumed, particularly distinguishing between animal-based and plant-based products.
- Argument 2: Transport Emissions Comparison
  - This argument suggests evaluating the emissions associated with food transport in order to make informed choices about sourcing.
- Visualization 2: Sunburst figures of vegan vs non-vegan emissions
  - These figures visually depict the stark contrast in land use and greenhouse gas emissions between vegan and non-vegan products. By showcasing the impact of food choices on multiple sustainability factors, it supports the argument that evaluating the type of food consumed is essential.
- Visualization 6: Animated bar chart of transport emissions per product compared to other types of emissions
  - This visualization demonstrates the proportion of greenhouse gas emissions attributed to transport for each food type, emphasizing that transport emissions constitute a small portion of the total emissions. It supports the argument that the type of food consumed is more crucial than the sourcing location.

Visualizations:
- Visualization 1: Amount of freshwater withdrawals per nutritional units
- Visualization 2: Sunburst figures of vegan vs non-vegan emissions
- Visualization 3: Currently aimed at exploration (not linked to specific arguments)
- Visualization 4: Scatter plot of global production and greenhouse gas emissions
- Visualization 5: Bar chart of transport emissions per product
- Visualization 6: Animated bar chart of transport emissions per product compared to other types of emissions

We have focused until now on working with visualizations and exploring the data; hence, the various perspectives have gotten less attention so far. We'll follow the data where it leads us.

#### Visualisations

##### Visualisation 1: bar charts of the amount of freshwater used for each product, measured in liters per NU (nutritional unit) or FU (fixture unit)

* Certain food groups, like animal products, require much more freshwater to produce than other groups like vegetables, per nutritional unit.
* Thus, it is more efficient to produce certain food groups than others in terms of freshwater.
* This supports the perspective that different food groups have varying environmental impact, and our food choices could greatly assist in environmental sustainability.

In [3]:
df_land = pd.read_excel('dataset1.xls', sheet_name=1, skiprows=2,nrows=43)
colormap = {
    'Wheat & Rye (Bread)':'darkblue',
    'Maize (Meal)':'darkblue',
    'Barley (Beer)':'darkblue',
    'Oatmeal':'darkblue',
    'Rice':'darkblue',
    'Potatoes':'darkblue',
    'Cassava':'darkblue',
    'Cane Sugar':'magenta',
    'Beet Sugar':'magenta',
    'Other Pulses':'magenta',
    'Peas':'brown',
    'Nuts':'brown',
    'Groundnuts':'brown',
    'Soymilk':'purple',
    'Tofu':'purple',
    'Soybean Oil':'black',
    'Palm Oil':'black',
    'Sunflower Oil':'black',
    'Rapeseed Oil':'black',
    'Olive Oil':'black',
    'Tomatoes':'green',
    'Onions & Leeks':'green',
    'Root Vegetables':'green',
    'Brassicas':'green',
    'Other Vegetables':'green',
    'Citrus Fruit':'orange',
    'Bananas':'orange',
    'Apples':'orange',
    'Berries & Grapes':'orange',
    'Wine':'orange',
    'Other Fruit':'orange',
    'Coffee':'gold',
    'Dark Chocolate':'gold',
    'Bovine Meat (beef herd)':'red',
    'Bovine Meat (dairy herd)':'red',
    'Lamb & Mutton':'red',
    'Pig Meat':'red',
    'Poultry Meat':'red',
    'Milk':'grey',
    'Cheese':'grey',
    'Eggs':'grey',
    'Fish (farmed)':'blue',
    'Crustaceans (farmed)':'blue'
}
fig = px.histogram(df_land, x='Product', y='Mean.5', color='Product', color_discrete_map=colormap)
fig.update_layout(
    font=dict(
        family='Lato, "Helvetica Neue", Helvetica, Arial, "Liberation Sans", sans-serif',
        size=13,
    ),
    yaxis=dict(title='Average Freshwater Withdrawals (L/NU)'),
    xaxis=dict(title='Products'),
    title='Freshwater Withdrawals (L/NU)',
    showlegend=False,
)
values = ['starchy', 'sugars', 'legume', 'vegan alt', 'oils', 'vegetables', 'fruits','proc nuts', 'meat', 'animal prod', 'fish']
colors = ['darkblue', 'magenta', 'brown', 'purple', 'black', 'green', 'orange','gold', 'red', 'grey', 'blue']


for i in range(len(values)):
    fig.add_annotation(
        x=1.05, y=0.85 - i*0.10,
        xref='paper', yref='paper',
        text=values[i],
        showarrow=True,
        arrowcolor='white',
        font=dict(color='white'),
        borderwidth=1,
        bgcolor= colors[i]
    )
fig.show()

#### fig1: Amount of freshwater withdrawals per nutritional units

In figure 1 you can see the different amount of liters water used per nutritional unit. In the figure you can also see that nuts and cheese use alot of water for the amount of nutritions that they give when eating them. Most of the meat substitutes are made from wheat, soy and different kinds of fungi(https://www.milieucentraal.nl/eten-en-drinken/milieubewust-eten/vleesvervangers/). The figure shows that there is a substantial difference between amount of water used for the different kinds of meat and the water usage from wheat and soymilk. This shows that there is also a large difference in the amount of water used between meat and plant based products. 

In [4]:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd

df_land = pd.read_excel('dataset1.xls', sheet_name=0, skiprows=2,nrows=43)

fig = px.histogram(df_land, x='Product', y='Mean.5', color='Product', color_discrete_map=colormap)
fig.update_layout(
    font=dict(
        family='Lato, "Helvetica Neue", Helvetica, Arial, "Liberation Sans", sans-serif',
        size=13,
    ),
    xaxis=dict(title='Average Freshwater Withdrawals (L/FU)'),
    yaxis=dict(title='Products'),
    title = 'Freshwater Withdrawals (L/FU)',
    showlegend=False
)
values = ['starchy', 'sugars', 'legume', 'vegan alt', 'oils', 'vegetables', 'fruits','proc nuts', 'meat', 'animal prod', 'fish']
colors = ['darkblue', 'magenta', 'brown', 'purple', 'black', 'green', 'orange','gold', 'red', 'grey', 'blue']


for i in range(len(values)-1):
    fig.add_annotation(
        x=1.05, y=1 -i*0.11,
        xref='paper', yref='paper',
        text=values[i],
        showarrow=True,
        arrowcolor='white',
        font=dict(color='white'),
        borderwidth=1,
        bgcolor= colors[i]
    )
fig.show()
# weet niet zeker of we deze moeten gebruiken het zegt niet extreem veel en is vrij onduidelijk naar mijn mening (Daan)

#### fig1.5: Amount of freshwater withdrawals per Fixture unit

In [5]:
#Mauro
non_vegan_products = ['Barley (Beer)','Cane Sugar','Milk','Cheese','Eggs','Fish (farmed)','Crustaceans (farmed)','Bovine Meat (beef herd)','Bovine Meat (dairy herd)','Lamb & Mutton','Pig Meat','Poultry Meat']
meat_products = ['Bovine Meat (beef herd)', 'Bovine Meat (dairy herd)', 'Lamb & Mutton', 'Pig Meat','Poultry Meat','Fish (farmed)','Crustaceans (farmed)']
dairy_products = ['Milk','Cheese','Eggs']
other_non_vegan_products = ['Cane Sugar','Crustaceans (farmed)','Barley (Beer)']

df_all_products = pd.read_excel("dataset1.xls", sheet_name="Results - Retail Weight", skiprows=2,nrows=43, index_col=None, na_values=["NA"])

#non vegan products
non_vegan = df_all_products["Product"].isin(non_vegan_products)
meat = df_all_products["Product"].isin(meat_products)
dairy = df_all_products["Product"].isin(dairy_products)
other_non_vegan = df_all_products["Product"].isin(other_non_vegan_products)

df_non_vegan_products = df_all_products[non_vegan]
df_meat_products = df_all_products[meat]
df_dairy_products = df_all_products[dairy]
df_other_non_vegan_products = df_all_products[other_non_vegan]

#non vegan
df_non_vegan_products_land_use = df_non_vegan_products.iloc[:, [0, 3]]
df_non_vegan_products_ghg_2013 = df_non_vegan_products.iloc[:, [0, 9]]
df_non_vegan_products_ghg_2007 = df_non_vegan_products.iloc[:, [0, 15]]
df_non_vegan_products_acid = df_non_vegan_products.iloc[:, [0, 21]]
df_non_vegan_products_eutro = df_non_vegan_products.iloc[:, [0, 27]]
df_non_vegan_products_fresh_water_withdraw = df_non_vegan_products.iloc[:, [0, 33]]
df_non_vegan_products_stress_water_use = df_non_vegan_products.iloc[:, [0, 39]]

#meat products
df_meat_products_land_use = df_meat_products.iloc[:, [0, 3]]
df_meat_products_ghg_2013 = df_meat_products.iloc[:, [0, 9]]
df_meat_products_ghg_2007 = df_meat_products.iloc[:, [0, 15]]
df_meat_products_acid = df_meat_products.iloc[:, [0, 21]]
df_meat_products_eutro = df_meat_products.iloc[:, [0, 27]]
df_meat_products_fresh_water_withdraw = df_meat_products.iloc[:, [0, 33]]
df_meat_products_stress_water_use = df_meat_products.iloc[:, [0, 39]]
#dairy
df_dairy_products_land_use = df_dairy_products.iloc[:, [0, 3]]
df_dairy_products_ghg_2013 = df_dairy_products.iloc[:, [0, 9]]
df_dairy_products_ghg_2007 = df_dairy_products.iloc[:, [0, 15]]
df_dairy_products_acid = df_dairy_products.iloc[:, [0, 21]]
df_dairy_products_eutro = df_dairy_products.iloc[:, [0, 27]]
df_dairy_products_fresh_water_withdraw = df_dairy_products.iloc[:, [0, 33]]
df_dairy_products_stress_water_use = df_dairy_products.iloc[:, [0, 39]]

#other non vegan
df_other_non_vegan_products_land_use = df_other_non_vegan_products.iloc[:, [0, 3]]
df_other_non_vegan_products_ghg_2013 = df_other_non_vegan_products.iloc[:, [0, 9]]
df_other_non_vegan_products_ghg_2007 = df_other_non_vegan_products.iloc[:, [0, 15]]
df_other_non_vegan_products_acid = df_other_non_vegan_products.iloc[:, [0, 21]]
df_other_non_vegan_products_eutro = df_other_non_vegan_products.iloc[:, [0, 27]]
df_other_non_vegan_products_fresh_water_withdraw = df_other_non_vegan_products.iloc[:, [0, 33]]
df_other_non_vegan_products_stress_water_use = df_other_non_vegan_products.iloc[:, [0, 39]]


#vegan products
grains_products = ['Wheat & Rye (Bread)', 'Maize (Meal)', 'Oatmeal', 'Rice']
vegetables_products = ['Potatoes', 'Cassava', 'Tomatoes', 'Onions & Leeks', 'Root Vegetables', 'Brassicas', 'Other Vegetables']
fruits_products = ['Citrus Fruit', 'Bananas', 'Apples', 'Berries & Grapes', 'Other Fruit']
legumes_products = ['Other Pulses', 'Peas', 'Nuts', 'Groundnuts']
plant_based_alternatives_products = ['Soymilk', 'Tofu']
oils_products = ['Soybean Oil', 'Sunflower Oil', 'Olive Oil']
others_products = ['Coffee', 'Dark Chocolate']

vegan = ~df_all_products['Product'].isin(non_vegan_products)
grains = df_all_products["Product"].isin(grains_products)
vegetables = df_all_products["Product"].isin(vegetables_products)
fruits = df_all_products["Product"].isin(fruits_products)
legumes = df_all_products["Product"].isin(legumes_products)
plant_based_alternatives = df_all_products["Product"].isin(plant_based_alternatives_products)
oils = df_all_products["Product"].isin(oils_products)
others = df_all_products["Product"].isin(others_products)


df_vegan_products = df_all_products[vegan]
df_grains_products = df_all_products[grains]
df_vegetables_products = df_all_products[vegetables]
df_fruits_products = df_all_products[fruits]
df_legumes_products = df_all_products[legumes]
df_plant_based_alternatives_products = df_all_products[plant_based_alternatives]
df_oils_products = df_all_products[oils]
df_others_products = df_all_products[others]

#vegan
df_vegan_products_land_use = df_vegan_products.iloc[:, [0, 3]]
df_vegan_products_ghg_2013 = df_vegan_products.iloc[:, [0, 9]]
df_vegan_products_ghg_2007 = df_vegan_products.iloc[:, [0, 15]]
df_vegan_products_acid = df_vegan_products.iloc[:, [0, 21]]
df_vegan_products_eutro = df_vegan_products.iloc[:, [0, 27]]
df_vegan_products_fresh_water_withdraw = df_vegan_products.iloc[:, [0, 33]]
df_vegan_products_stress_water_use = df_vegan_products.iloc[:, [0, 39]]





##### Visualisation 2: sunburst charts that show the division of the ecological footprint between vegan and non-vegan products
* Overall meat is the biggest contributor to a large ecological footprint
* Lamb & mutton are the overall biggest contributer
* the non-vegan products have an ecological foot print that is about ten times larger then the vegan products


In [6]:
# Define the data for the bar plots
categories = ['Land use', 'GHG emission 2013', 'GHG emission 2007', 'Acidification', 'Eutrophication', 'Fresh Water Withdrawal', 'Stress Water Use']
vegan_values = [df_vegan_products_land_use["Mean"].sum(),
                df_vegan_products_ghg_2013["Mean.1"].sum(),
                df_vegan_products_ghg_2007["Mean.2"].sum(),
                df_vegan_products_acid["Mean.3"].sum(),
                df_vegan_products_eutro["Mean.4"].sum(),
                df_vegan_products_fresh_water_withdraw["Mean.5"].sum(),
                df_vegan_products_stress_water_use["Mean.6"].sum()]
non_vegan_values = [df_non_vegan_products_land_use["Mean"].sum(),
                    df_non_vegan_products_ghg_2013["Mean.1"].sum(),
                    df_non_vegan_products_ghg_2007["Mean.2"].sum(),
                    df_non_vegan_products_acid["Mean.3"].sum(),
                    df_non_vegan_products_eutro["Mean.4"].sum(),
                    df_non_vegan_products_fresh_water_withdraw["Mean.5"].sum(),
                    df_non_vegan_products_stress_water_use["Mean.6"].sum()]
yaxvalues = ['(m2/FU)', '(kg CO2eq/FU)', '(kg CO2eq/FU)', '(g SO2eq/FU)', '(g PO43-eq/FU)', '(L/FU)', '(L/FU)']

# function to create sunbursts    
def create_sunburst_figure(title, values):
    colors = ['darkgreen', 'darkred'] * 2  
    fig = go.Figure(go.Sunburst(
        labels=labels,
        parents=parents,
        values=values,
        marker=dict(colors=colors),
    ))
    
    fig.update_layout(title=title)
    return fig

labels = ["", "Non-vegan", "Vegan"] + list(df_non_vegan_products["Product"]) + list(df_vegan_products["Product"])
parents = ["", "", ""] + ["Non-vegan"] * len(df_non_vegan_products) + ["Vegan"] * len(df_vegan_products)

# Creating the sunbursts
# Land use
values = [0] * 3 + list(df_non_vegan_products_land_use["Mean"]) + list(df_vegan_products_land_use["Mean"])
data1 = create_sunburst_figure("Land use", values)
# GHG emission 2013
values = [0] * 3 + list(df_non_vegan_products_ghg_2013["Mean.1"]) + list(df_vegan_products_ghg_2013["Mean.1"])
data2 = create_sunburst_figure("GHG emission 2013", values)
# GHG emission 2007
values = [0] * 3 + list(df_non_vegan_products_ghg_2007["Mean.2"]) + list(df_vegan_products_ghg_2007["Mean.2"])
data3 = create_sunburst_figure("GHG emission 2007", values)
# Acidification
values = [0] * 3 + list(df_non_vegan_products_acid["Mean.3"]) + list(df_vegan_products_acid["Mean.3"])
data4 = create_sunburst_figure("Acidification", values)
# Eutrophication
values = [0] * 3 + list(df_non_vegan_products_eutro["Mean.4"]) + list(df_vegan_products_eutro["Mean.4"])
data5 = create_sunburst_figure("Eutrophication", values)
# Fresh Water Withdrawal
values = [0] * 3 + list(df_non_vegan_products_fresh_water_withdraw["Mean.5"]) + list(df_vegan_products_fresh_water_withdraw["Mean.5"])
data6 = create_sunburst_figure("Fresh Water Withdrawal", values)
# Stress Water Use
values = [0] * 3 + list(df_non_vegan_products_stress_water_use["Mean.6"]) + list(df_vegan_products_stress_water_use["Mean.6"])
data7 = create_sunburst_figure("Stress Water Use", values)

# Creating the figure
specs_list1 = []
specs_list2 = []

for i in range(len(categories)):
    specs_list1.append({'type': 'bar'})
for i in range(len(categories)):
    specs_list2.append({"type": "sunburst"})
specs_list = [specs_list2, specs_list1]

fig = make_subplots(rows=2, cols=len(categories), specs=specs_list, vertical_spacing=0.2)

# Creating and adding the bar plots to the figure
for category, vegan_value, non_vegan_value in zip(categories, vegan_values, non_vegan_values):
    fig.add_trace(go.Bar(
        x=['Vegan', 'Non-vegan'],
        y=[vegan_value, non_vegan_value],
        name=category,
        marker_color=['darkgreen', 'darkred']
    ), row=2, col=categories.index(category) + 1)

    fig.update_xaxes(title_text=category, row=2, col=categories.index(category) + 1)
    fig.update_yaxes(title_text=yaxvalues[categories.index(category)], row=2, col=categories.index(category) + 1)

# Adding sunbursts to the figure
fig.add_trace(data1.data[0], row=1, col=1)
fig.add_trace(data2.data[0], row=1, col=2)
fig.add_trace(data3.data[0], row=1, col=3)
fig.add_trace(data4.data[0], row=1, col=4)
fig.add_trace(data5.data[0], row=1, col=5)
fig.add_trace(data6.data[0], row=1, col=6)
fig.add_trace(data7.data[0], row=1, col=7)

fig.update_layout(width = 2250, height=800, showlegend=False)
fig.show()

#### fig2: Sunburst figures of vegan vs non-vegan emissions

These figures show the significant difference in land use and greenhouse gas emissions between vegan and non-vegan products. In all cases, square meters per fixture unit is used for land use, and kg CO2 equivalent per fixture unit is used for GHG emissions.

The figures are meant to support the perspective that the type of food matters more than the sourcing; we need to evaluate what
it is we eat, not necessarily where it’s from.

##### Visualisation 3: interactive bar chart with dropdown menu
* The interactive bar chart provides a comprehensive view of the important aspects of the ecological footprint of various products
* The chart's interactivity allows users to dynamically visualize and compare the ecological footprint of various products across different metrics.
* Within each category, users can further explore specific aspects by selecting from a dropdown menu.


In [52]:
df_land = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, C:E")
df_land['Total'] = df_land[['Arable', 'Fallow', 'Perm Past']].sum(axis=1)

df_ghg = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, F:L")
df_ghg['Total'] = df_ghg[['LUC', 'Feed', 'Farm', 'Processing', 'Transport', 'Packging', 'Retail']].sum(axis=1)

df_eutr = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, N")
df_eutr.rename(columns={"Total.1": "Total"}, inplace=True)

df_fresh = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, O")
df_fresh.rename(columns={"Total.2": "Total"}, inplace=True)

# Combine the dataframes into a single dataframe
df_combined = pd.concat([df_land['Total'], df_ghg['Total'], df_eutr['Total'], df_fresh['Total']], axis=1)
df_combined.columns = ['Land Use', 'GHG Emissions', 'Eutrophication', 'Freshwater Use']

# Define custom colors for GHG Emissions
color_mapping = {'low': 'lightsteelblue', 'medium': 'mediumseagreen', 'high': 'salmon'}
df_combined[df_combined.columns] = df_combined[df_combined.columns].apply(lambda x: pd.qcut(x, q=3, labels=['low', 'medium', 'high']))
colors = [color_mapping[category] for category in df_combined['GHG Emissions']]

# Create Parallel categories plot
dimensions = [
    go.parcats.Dimension(values=df_combined['Land Use'], label='Land Use', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Eutrophication'], label='Eutrophication', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Freshwater Use'], label='Freshwater Use', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['GHG Emissions'], label='GHG Emissions', categoryorder='array', categoryarray=['low', 'medium', 'high'])
]

parcats_trace = go.Parcats(dimensions=dimensions, line={'color': colors})

fig = go.Figure(data=parcats_trace)
fig.update_layout(title='Analysis of Environmental Factors')
fig.show()


#### fig3: Parallel categories plot

#### Visualization 4 - Emissions vs Global production

In [8]:
standard_discrete = px.colors.qualitative.T10

df_per_product = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=70, usecols="A, F:L")
df_per_product = df_per_product.rename(columns={'LUC': 'Land Use Change', 'Packging': 'Packaging'})

df = df_per_product[0:43].copy()
df['Total'] = df.iloc[:, 1:].sum(axis=1)
# display(df)
df = df.sort_values(by='Total')

# display(df.Product)

df_melted = pd.melt(df, id_vars='Product', value_vars=['Land Use Change', 'Feed', 'Farm', 'Processing', 'Transport', 'Packaging', 'Retail'], var_name='Stage', value_name='Emissions')

# Reshape the DataFrame into a "long" format
df_melted = pd.melt(df, id_vars='Product', value_vars=['Land Use Change', 'Feed', 'Farm', 'Processing', 'Transport', 'Packaging', 'Retail'], var_name='Stage', value_name='Emissions')


In [9]:
df_cat = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A:B, F:L")
df_cat = df_cat.rename(columns={'LUC': 'Land Use Change', 'Packging': 'Packaging', "Food and Waste ('000 t, 2009-11 avg.)": 'Amount produced'})
df_cat['Total emissions'] = df_cat.iloc[:, 2:].sum(axis=1)
df_cat = pd.concat([df_cat, pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="M:P")], axis=1)
df_cat = df_cat.rename(columns={'Total': 'Acidification', 'Total.1': 'Eutrophication', 'Total.2': 'Freshwater', 'Total.3': 'Stress-weighted water usage'})





# assigns categories
df_cat['Type'] = 'Crops'
df_cat.loc[33:37, 'Type'] = 'Meat'
df_cat.loc[38:40, 'Type'] = 'Dairy'
df_cat.loc[41:42, 'Type'] = 'Seafood'


highlighted = ['Olive Oil', 'Bovine Meat (beef herd)', 'Cheese', 'Lamb & Mutton', 'Dark Chocolate', 'Fish (farmed)', 'Eggs', 'Poultry Meat', 'Milk', 'Potatoes', 'Milk', 'Rice', 'Soymilk', 'Nuts', 'Bananas', 'Apples']


# creates subcategory for labels
df_cat['labeled'] = df_cat['Product'].where(df_cat['Product'].isin(highlighted))



fig = px.scatter(
    df_cat, 
    x="Amount produced",
    y="Total emissions",
    log_x=True,
    height=500,
    text='labeled',
    # log_y=True,
    # size="Total impact",
    size_max=60,
    hover_data=['Product'],
    title="GHG impact and global production"
    )

df_filtered = df_cat[df_cat['Product'].isin(highlighted)]

annotation= [{
    'x': np.log10(df_cat.loc[df_cat['Product'] == name, 'Amount produced'].iloc[0]),
    'y': df_cat.loc[df['Product'] == name, 'Total emissions'].iloc[0],
    'text': name,  # text
    'showarrow': True,  # would you want to see arrow
} for name in highlighted]


updatemenus = [
    {
        "buttons": [
            {
                "label": col,
                "method": "update",
                "args": [
                    {"y": [df_cat[col]]},
                    {"yaxis": {"title": {"text": col}}}
                ],
            }
            for col in list(df_cat.loc[:, 'Total emissions': 'Stress-weighted water usage'].columns)
        ],
        "x": 0,
        "y": 1.2,
    }
]





fig.update_layout(updatemenus=updatemenus, title_x=0.5, yaxis_title='Emissions (kg CO2eq / kg)', xaxis_title='Global production (tonnes)')





fig.update_traces(marker={
        "color": pd.Categorical(df_cat["Type"]).codes,
        "colorscale": [(0,"green"), (0.33,"orange"), (0.66,"red"), (1,"blue")]
    },
    marker_size=10,
    marker_opacity=df_cat['labeled'].notnull().map({True: 0.8, False: 0.35}).values,
    textposition="top center",
    hoverinfo="name+x+y",
    textfont={'size': 13}
)





def add_label(name, df, figure):
    figure.add_annotation(
        x=np.log10(df.loc[df['Product'] == name, 'Amount produced'].iloc[0]),
        y=df.loc[df['Product'] == name, 'Total emissions'].iloc[0],
        text=name,
        showarrow=True,
        yshift=5
    )


def add_labels(names, df, figure):
    for name in names:
        add_label(name, df, figure)
# add_labels(highlighted, df_cat, fig)

display(list(df_cat.loc[:, 'Total emissions': 'Stress-weighted water usage'].columns))

fig.show()

# f = df_cat.loc[:, 'Acidification': 'Total emissions']

# display(f'{f.columns[1]}')
for col in df_cat.loc[:, 'Acidification': 'Total emissions']:
    print(df_cat[col])



['Total emissions',
 'Acidification',
 'Eutrophication',
 'Freshwater',
 'Stress-weighted water usage']

#### fig4: Scatter plot of global production (in tonnes) and greenhouse gas emissions (in kg CO2eq / kg)

This scatter plot shows that high global production and low emissions per kg are highly correlated. This figure is meant to support the perspective that it matters a lot whether you produce food locally or globally; the emissions
produced during transport are significant, and a huge factor in the total emissions of the
food we consume.

#### Visualisation 5 - Transport emissions

In [10]:
df_transport = df.sort_values(by='Transport')

fig1 = px.bar(df_transport,
    y='Product',
    x='Transport',
    height=800,
    title="Transport emissions per product")

fig1.show()

#### fig5: Bar chart of transport emissions per product
This is meant to show there a significant CO2 costs associated with transport, and is meant to support the perspective that it matters a lot whether you produce food locally or globally; the emissions
produced during transport are significant. However, this obviously needs some work.

#### Visualization 6 - Some perspective

In [60]:
# updatemenus = [dict(
#     type='buttons',
#     showactive=True,
#     xanchor='right',
#     yanchor='top',
#     x=1,
#     y=1.09,
#     pad=dict(l=20, r=20),
#     buttons=[dict(
#         label='Higlight Transport',
#         method='restyle',
#         args2=[{'marker.opacity': 0.1}, [i for i in range(len(fig.data)) if i != get_traceindex('Transport', fig)]],
#         args=[{'marker.opacity': 1}, list(range(len(fig.data)))]
#     )]
# )]
#  buttons=[dict(
#         label='Play',
#         method='animate',
#         args=[None, dict(frame=dict(duration=100), transition=dict(duration=200))]
#     )]

print(sns.color_palette("tab10").as_hex())
# Create a horizontal stacked bar plot using plotly.express
fig = px.bar(df_melted,
             y='Product',
             x='Emissions',
             color='Stage',
             height=800,
             color_discrete_sequence=standard_discrete,
             title="Greenhouse gas emissions per food type over the supply chain")


frames = [go.Frame(data=[go.Bar(marker=dict(opacity=opacity)) if i != get_traceindex('Transport', fig) else go.Bar(marker=dict(opacity=1)) for i in range(len(fig.data))]) for opacity in [1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]]

updatemenus = [dict(
    type='buttons',
    showactive=True,
    xanchor='right',
    yanchor='top',
    x=1,
    y=1.09,
    direction="left",
    buttons=[
    dict(
        label='Play',
        method='animate',
        args=[None, dict(frame=dict(duration=5), transition=dict(duration=100))]
    ),
    dict(
        label='Reset',
        method='restyle',
        args=[{'marker.opacity': 1}, list(range(len(fig.data)))]    )
]
)]




fig.update_traces()

fig.update_layout(barmode='relative')

fig.add_scatter(y=df['Product'], x=df['Total'], mode='markers', name='Total', marker=dict(symbol='line-ns-open', color="black", size=7))

fig.update_layout(layout,        
        legend=dict(
        orientation="h",
        yanchor="top",
        y=1.065,
        xanchor="left",
        x=-0.06),
        updatemenus=updatemenus
)


# fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 2000



fig.frames = frames


# fig.updatemenus[0].buttons[0].args[1] = dict(frame=dict(duration=1000), transition=dict(duration=500))


fig.show()




['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']


#### fig6: Animated bar chart of transport emissions per product compared to other types of emissions

This figure shows what part of the greenhouse gas emissions for each food type is caused by transport of the product. The figure is meant to show that transport is a tiny portion of the total greenhouse gas emissions, and supports the perspective that the type of food matters more than the sourcing; we need to evaluate what
it is we eat, not necessarily where it’s from.

A pie chart more concretely shows the relative (total) CO2 impact of transport:

In [13]:
df2 = df_per_product.iloc[55]
display(df2)
fig = px.pie(df2, values=df2.values[1:], names=df2.index[1:], hole=.3, height=600, title='Percentage of total GHG emissions by part of supply chain')
fig.update_traces(textposition='outside', textinfo='label+percent', marker=dict(colors=standard_discrete, line=dict(color='#000000', width=2)), showlegend=False)
fig.show()

Product            Food Total (M ha; Gg; km3)
Land Use Change                    2379469.67
Feed                              1098394.985
Farm                              7463342.423
Processing                         604297.515
Transport                          801403.803
Packaging                           626870.68
Retail                             394202.635
Name: 55, dtype: object

#### fig6.5: Pie chart of the total greenhouse gas emissions divided into each part of the supply chain

### Reflection

following feedback session

### Work distribution