# Food for thought: how to eat sustainably

Student names: Ardjano Mark 14713926, Daan Huisman 14650797, Ivo de Brouwer 11045841, Mauro Dieters 14533391

Team number: I2

In [1]:
# Load image from link
url = 'https://cdn.cbs.nl/images/3743526a303256676975626e61366c7a383931695a513d3d/900x450.jpg'

# Display image from URL with smaller size and subtitle
from IPython.display import Image, display

# Set the desired image width and height
width = 600
height = 300

# Set the subtitle text
subtitle = "© ANP / Robin Utrecht"

# Create an Image instance with the URL
image = Image(url=url, width=width, height=height)

# Display the image and subtitle
display(image)
print(subtitle)

© ANP / Robin Utrecht


## Introduction

Our world is changing. Humans have dominated this planet and its resources since the Agricultural Revolution about 12,000 years ago, when hunter-gatherers exchanged their nomadic lifestyles for permanent settlements and farming. Since then, our population and our resource exploitation has increased precipitously. The downside of this development is becoming ever more apparent.

The reports from the United Nations’ Intergovernmental Panel on Climate Change are increasingly alarming. Their 2021 report found that human activity is changing the Earth’s climate in “unprecedented” ways, with some of the changes now inevitable and “irreversible”[1].  It would seem drastic change at both the societal and individual level is an immediate necessity. 

A significant piece of the puzzle is food. Recent estimates of the contribution of food emissions to worldwide greenhouse gas emissions range from one-quarter to one-third[2.] Moreover, agriculture takes up two third of our freshwater use and about half of the world’s habitable land[3].  It also causes severe acidification and eutrophication, of which the drastic implications are evidenced by the ongoing ‘nitrogen crisis’ in the Netherlands.

As is the case with most complex issues, there is a wide range of perspectives on how we should think about the sustainability of food. This makes deciding how—and what—to eat a daunting task for those concerned about minimizing impact on the planet.

In this data story, we visually explore the data to discuss the following questions:

-	The relative impact of food transportation on greenhouse gas emissions. Should we focus on locally sourcing our food, or would it be better to focus on other things?

-	Impacts of agriculture are not limited to greenhouse gas emissions: how much attention should we pay to land use, freshwater use, acidification and eutrophication? Can we kill two birds with one stone by choosing products that have low impact across all these dimensions, or are we left with difficult tradeoffs?

-	Finally, what are some concrete steps we can take when choosing what to eat, in order to minimize impact on the planet?

We hope this helps you make informed decisions about the food you eat. We encourage you to interact with the visualizations to aid the process of discovery.


## Dataset and Preprocessing

All visualizations are based on a dataset compiled by Poore and Nemecek [3].  The dataset describes various environmental impacts of 40 different agricultural goods around the world, sampled from 38,700 farms. The data was derived from a comprehensive meta-analysis of 1530 studies from 2000 to 2016, of which 570 were eventually included based on methodological criteria. 

Per agricultural product (e.g. rice, tofu, palm oil, pig meat) five environmental indicators are covered: land use, freshwater use (weighted by scarcity), greenhouse gas emissions, eutrophying emissions and acidifying emissions. For greenhouse gas emissions, more fine-grained detail is available for the source of the emissions along the supply chain (e.g. from farming, processing or transport).

The dataset is downloadable an xls file containing multiple Excel sheets. 

To look at fine-grained data about greenhouse gas emissions for each product, for example, we can look at the following dataframe:

In [2]:
import plotly.graph_objs as go
import plotly.express as px
import matplotlib as plt
import pandas as pd
import numpy as np
import seaborn as sns
from plotly.subplots import make_subplots
from ipywidgets import interact, interactive, fixed, interact_manual
from ipywidgets import GridspecLayout
import ipywidgets as widgets

layout = go.Layout(
        font=dict(
        family=""""Lato, "Helvetica Neue", Helvetica, Arial, "Liberation Sans", sans-serif""",
        ),
        title=dict(font=dict(size=24)),
        newshape_label_padding=8,
        margin_pad=5,
        legend=dict(font=dict(size=14)),
    )

df_ghg = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, F:L")
df_ghg['Total'] = df_ghg[['LUC', 'Feed', 'Farm', 'Processing', 'Transport', 'Packging', 'Retail']].sum(axis=1)
display(df_ghg.head())


# helper functions 

def get_traceindex(category, fig):
    for i, trace in enumerate(fig.data):
        if trace.name == category:
            return i
    raise ValueError(f'No trace with name "{category}" found in figure')

Unnamed: 0,Product,LUC,Feed,Farm,Processing,Transport,Packging,Retail,Total
0,Wheat & Rye (Bread),0.1,0.0,0.847,0.217,0.129,0.09,0.058,1.441
1,Maize (Meal),0.315,0.0,0.475,0.052,0.06,0.06,0.026,0.988
2,Barley (Beer),0.009,0.0,0.176,0.128,0.035,0.497,0.264,1.109
3,Oatmeal,0.001,0.0,1.37,0.042,0.067,0.066,0.029,1.575
4,Rice,-0.022,0.0,3.553,0.065,0.096,0.084,0.063,3.839


Here, we can see that the greenhouse gas emissions are divided into different stages of the production and distribution of the product. 

To prepare the data for making the visualizations, we made a couple of changes/additions:
-	We added a ‘Total’ column for emissions (as seen above), summing greenhouse gas emissions across the supply chain.
-	We split the data in categorically, various ways; e.g. into vegan and non-vegan, into crops, seafood, meat and dairy, and into even more fine-grained food categories.


## Locally sourcing food: yay or nay?

Globalization through trade and technology has made food from across the globe readily available to us. 17 of every 100 kilograms of food produced is transported internationally, increasing to 50kg for nuts and 56 kg for oils[3].  Especially high-income countries have the resources to import their food from far away. 

By producing food locally, we can minimise the environmental impact associated with transportation and decrease greenhouse gas emissions. Proponents of local food systems argue it can also foster community resilience, support small-scale farmers, and promote food security. Focussing our efforts on local food production is seen by many as a crucial step towards a more sustainable and resilient future.

Just how big an impact can eating locally make? We ran the numbers.

### Transport emissions

In [3]:
df_land = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, C:E")
df_land['Total'] = df_land[['Arable', 'Fallow', 'Perm Past']].sum(axis=1)

df_ghg = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, F:L")
df_ghg['Total'] = df_ghg[['LUC', 'Feed', 'Farm', 'Processing', 'Transport', 'Packging', 'Retail']].sum(axis=1)

df_acid = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, M")

df_eutr = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, N")
df_eutr.rename(columns={"Total.1": "Total"}, inplace=True)

df_fresh = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, O")
df_fresh.rename(columns={"Total.2": "Total"}, inplace=True)

df_stress = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, P")
df_stress.rename(columns={"Total.3": "Total"}, inplace=True)

crop_color = '#35A85A'
meat_color = '#ff1a0d'
dairy_color = '#fd9a04'
seafood_color = '#1092d1'

df_transport =  pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A, J")

# assigns categories
df_transport['Type'] = 'Crops'
df_transport.loc[33:37, 'Type'] = 'Meat'
df_transport.loc[38:40, 'Type'] = 'Dairy'
df_transport.loc[41:42, 'Type'] = 'Seafood'

df_transport = df_transport.sort_values(by='Transport')
df_transport = df_transport.set_index('Product')

fig1 = px.bar(df_transport,
    y=df_transport.index,
    x='Transport',
    orientation='h',
    color='Type',
        color_discrete_map={
        'Crops': crop_color,
        'Meat': meat_color,
        'Dairy': dairy_color,
        'Seafood': seafood_color
    },
    height=800,
    title="Transport emissions per product")

fig1.update_layout(yaxis={'categoryorder':'array', 'categoryarray':df_transport.index}, xaxis_title='Transport emissions (kg CO<sub>2</sub> eq. / kg)')
fig1.show()

> *Figure 1: Bar chart of transport emissions. It shows the transport emissions for each product, and we can clearly observe that various products can have very different transport emissions. Hover over each bar to see the amount of emissions.*

Transport emissions for different products have substantial variation. Generally, it seems that more energy is spent transporting meat than dairy or seafood. There is a huge variation in transport emissions between different crops: cane sugar and beet sugar top the list in terms of transport emissions per kg, while crops like maize and barley are at the bottom.

It seems that we could make a difference by sourcing products with high transport emissions locally: to especially pay attention to where our sugar, meat and oils come from. 


### The bigger picture

Of course, transport is only one part of the supply chain. How big a role does it play in total food emissions?

Broadly, the emissions involved in food production are categorized as follows from start to finish:

1.	Land use change: emissions from burning, and the net change in carbon storage. 
Can take a negative value when more carbon is sequestered in the new vegetation than previously.
2.	Farm: for example emissions from fertilizers, farm machinery, methane emissions from cows or rice.
3.	Animal feed: emissions from producing and processing crops into feed for livestock.
4.	Processing: emissions from processing the raw products into food items.
5.	Transport: all transport emissions of food items.
6.	Retail: emissions from e.g. refrigeration, heating.
7.	Packaging: emissions from production of packaging materials.

Adding it all up, we get the following picture *(make sure to press the 'Play' button on the upper right!)*:



In [4]:
standard_discrete = px.colors.qualitative.T10

df_per_product = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=70, usecols="A, F:L")
df_per_product = df_per_product.rename(columns={'LUC': 'Land Use Change', 'Packging': 'Packaging'})

df = df_per_product[0:43].copy()
df['Total'] = df.iloc[:, 1:].sum(axis=1)
df = df.sort_values(by='Total')


df_melted = pd.melt(df, id_vars='Product', value_vars=['Land Use Change', 'Feed', 'Farm', 'Processing', 'Transport', 'Packaging', 'Retail'], var_name='Stage', value_name='Emissions')

# Reshape the DataFrame into a "long" format
df_melted = pd.melt(df, id_vars='Product', value_vars=['Land Use Change', 'Feed', 'Farm', 'Processing', 'Transport', 'Packaging', 'Retail'], var_name='Stage', value_name='Emissions')


df_melted['Percentage'] = 100 * df_melted['Emissions'] / df_melted.groupby('Product')['Emissions'].transform('sum')

# Create a horizontal stacked bar plot using plotly.express
fig = px.bar(df_melted, 
             y='Product',
             x='Emissions',
             color='Stage',
             hover_data=['Percentage', 'Stage'],
             height=800,
             color_discrete_sequence=standard_discrete,
             title="Greenhouse gas emissions per food type over the supply chain")

frames = [go.Frame(data=[go.Bar(marker=dict(opacity=opacity)) if i != get_traceindex('Transport', fig) else go.Bar(marker=dict(opacity=1)) for i in range(len(fig.data))]) for opacity in [1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]]

updatemenus = [dict(
    type='buttons',
    showactive=True,
    xanchor='right',
    yanchor='top',
    x=1,
    y=1.09,
    direction="left",
    buttons=[
    dict(
        label='Play',
        method='animate',
        args=[None, dict(frame=dict(duration=5), transition=dict(duration=150))]
    ),
    dict(
        label='Reset',
        method='restyle',
        args=[{'marker.opacity': 1}]    )
]
)]

fig.update_traces(hovertemplate='<b>%{y}</b><br>Stage: %{customdata[1]}<br>%{x:.2f} kg CO2 eq.<br>%{customdata[0]:.2f}% of total<extra></extra>')



fig.update_layout(barmode='relative')

fig.update_layout(layout,        
        legend=dict(
        orientation="h",
        yanchor="top",
        y=1.065,
        xanchor="left",
        x=-0.06),
        updatemenus=updatemenus
)


# fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 2000



fig.frames = frames


# fig.updatemenus[0].buttons[0].args[1] = dict(frame=dict(duration=1000), transition=dict(duration=500))


fig.show()

> *Figure 2: Animated bar chart of transport emissions per product compared to other types of emissions. Hover over each part to see the relative contribution of the stage to the total emissions. The figure is meant to show that transport is a tiny portion of the total greenhouse gas emissions, and supports the perspective that the type of food matters more than the sourcing; we need to evaluate what it is we eat, not necessarily where it’s from.*

Seeing the entire picture of the supply chain reveals that transport is a relatively small factor compared to the other parts of the supply chain. In the most polluting products, like beef, cheese and dark chocolate, transport doesn’t account for more than 2% of total emissions. 

If we look further down the list *(hover to check for yourself!)*, we do see that for some products transport emissions do comprise a fairly large portion of total emissions: 43% for bananas, 41% for beet sugar, 29% for cane sugar and 20% for berries and grapes. For these products, a lot of ground can be gained by sourcing locally.

The bulk of emissions seem be caused by other stages. Let's delve into that a bit further.

### Where do the emissions come from?

In [5]:
df2 = df_per_product.iloc[55]
fig = px.pie(df2, values=df2.values[1:], names=df2.index[1:], hole=.3, height=600, title='Percentage of total GHG emissions by part of supply chain')
fig.update_traces(textposition='outside', textinfo='label+percent', marker=dict(colors=standard_discrete, line=dict(color='#000000', width=2)), showlegend=False)
fig.show()

> *Figure 3: Pie chart of the total greenhouse gas emissions divided into each part of the supply chain. Transport emissions are just a small part of the emissions caused by the entire supply chain.*

When considering the total emissions over the entire supply chain, transport is only fourth on the list. More than half of all emissions from food come from farming, 18% from land use change and 8% from the production of animal feed. While sourcing food locally is of course still a good idea—especially for products like cane and beet sugar— it seems that we need to think about more than the local-global dimension.

But if producing food locally is not *the* answer, what is?

## How can we reduce our ecological footprint?

### It's not just CO<sub>2</sub>

When we think about the dangers of global warming and the devastation it could cause, we often focus on greenhouse gas emissions. However, our planet is aching in more than one way. Here are some other categories that measure impact:

-	Land Use: besides the increase in greenhouse gas emissions, the problems associated with land use include habitat loss and soil degradation. Forests and wildlands that used to cover large parts of the world have disappeared.

-	Acidification: acidic substances like nitrogen oxides from fertilizers increase acidity in the soil and water bodies, which can have harmful effects on aquatic life, forests and crops.

-	Eutrophication: when too many nutrients (like nitrogen and phosphorus) enter a body of water, this leads to excessive growth of algae. These algae blooms deplete the oxygen in the water, leading to fish death and other negative impacts.

-	Freshwater use: especially as global temperatures rise, the demand for water is higher than ever.

-	Stress-weighted freshwater use: weights water use according to water stress in a particular region. This simply accounts for scarcity.

This is a lot to take into account when deciding what to eat—perhaps too much. One thing that could help simplify the problem is seeing whether foods that are bad in some impact categories are also bad in others: if this is the case, our choices are simpler. This is what we aim to discover in the following section.

In [6]:
# Combine the dataframes into a single dataframe
df_combined = pd.concat([df_land['Total'], df_ghg['Total'], df_acid['Total'], df_eutr['Total'], df_fresh['Total'], df_stress['Total']], axis=1)
df_combined.columns = ['Land Use', 'GHG Emissions', 'Acidification', 'Eutrophication', 'Freshwater Use', 'Stress-weighted Water Use']

# Define custom colors for GHG Emissions
df_combined[df_combined.columns] = df_combined[df_combined.columns].apply(lambda x: pd.qcut(x, q=3, labels=['low', 'medium', 'high']))
# color_mapping = {'low': 'lightsteelblue', 'medium': 'mediumseagreen', 'high': 'salmon'}
# colors = [color_mapping[category] for category in df_combined['GHG Emissions']]

# assigns categories
df_combined['Type'] = 0
df_combined.loc[33:37, 'Type'] = 3
df_combined.loc[38:40, 'Type'] = 2
df_combined.loc[41:42, 'Type'] = 1

df_combined['Class'] = 'Vegan'
df_combined.loc[33:42, 'Class'] = 'Non-vegan'

colorscale = [[0, crop_color], [0.33, seafood_color], [0.66, dairy_color], [1, meat_color]];
colors = df_combined['Type'];

highlighted = ['Olive Oil', 'Bovine Meat (beef herd)', 'Cheese', 'Lamb & Mutton', 'Dark Chocolate', 'Fish (farmed)', 'Eggs', 'Poultry Meat', 'Milk', 'Potatoes', 'Milk', 'Rice', 'Soymilk', 'Nuts', 'Bananas', 'Apples']


# Create Parallel categories plot
dimensions = [
    go.parcats.Dimension(values=df_combined['Class'], label='Vegan', categoryorder='array', categoryarray=['Vegan', 'Non-vegan'], ticktext=['Yes', 'No']),
    go.parcats.Dimension(values=df_combined['Type'], label='Type', categoryorder='array', categoryarray=[0, 1, 2, 3], ticktext=['Crops', 'Seafood', 'Dairy', 'Meat']),
    go.parcats.Dimension(values=df_combined['Land Use'], label='Land Use', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Acidification'], label='Acidification', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Eutrophication'], label='Eutrophication', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Freshwater Use'], label='Freshwater Use', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['Stress-weighted Water Use'], label='Stress-weighted', categoryorder='array', categoryarray=['low', 'medium', 'high']),
    go.parcats.Dimension(values=df_combined['GHG Emissions'], label='GHG Emissions', categoryorder='array', categoryarray=['low', 'medium', 'high'])
]

parcats_trace = go.Parcats(dimensions=dimensions, line={'color': colors, 'colorscale':colorscale})

fig = go.Figure(data=parcats_trace)
fig.update_layout(title='Analysis of Environmental Factors')
fig.show()


> *Figure 4: Parallel categories plot of vegan vs. non-vegan emissions. This plot provides strong evidence that vegan products have a lower environmental impact than non-vegan products in a variety of factors. It also shows that those factors may be correlated, as products that score low on one type of emission typically score low on all other environmental factors as well.*

In the introduction, one of the questions we raised was *Can we kill two birds with one stone by choosing products that have low impact across all these dimensions, or are we left with difficult tradeoffs?* 

The potential correlation between environmental factors could indeed imply certain products have low emissions across the board, which means we could heavily mitigate environmental impact by focusing on those products. 

Let's consider the following heat map displaying correlations between environmental factors:

In [12]:
df_cat = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="A:B, F:L")
df_cat = df_cat.rename(columns={'LUC': 'Land Use Change', 'Packging': 'Packaging', "Food and Waste ('000 t, 2009-11 avg.)": 'Amount produced'})
df_cat['GHG emissions'] = df_cat.iloc[:, 2:].sum(axis=1)
df_cat['Land use'] = df_land['Total']
df_cat = pd.concat([df_cat, pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2, nrows=43, usecols="M:P")], axis=1)
df_cat = df_cat.rename(columns={'Total': 'Acidification', 'Total.1': 'Eutrophication', 'Total.2': 'Freshwater', 'Total.3': 'Stress-weighted water usage'})

# assigns categories
df_cat['Type'] = 'Crops'
df_cat.loc[33:37, 'Type'] = 'Meat'
df_cat.loc[38:40, 'Type'] = 'Dairy'
df_cat.loc[41:42, 'Type'] = 'Seafood'


highlighted = ['Crustaceans (farmed)','Olive Oil', 'Bovine Meat (beef herd)', 'Cheese', 'Lamb & Mutton', 'Dark Chocolate', 'Fish (farmed)', 'Eggs', 'Poultry Meat', 'Milk', 'Potatoes', 'Milk', 'Rice', 'Soymilk', 'Nuts', 'Bananas', 'Apples']


# creates subcategory for labels
df_cat['labeled'] = df_cat['Product'].where(df_cat['Product'].isin(highlighted))

correlations = df_cat.loc[:, 'GHG emissions': 'Stress-weighted water usage']


corr = correlations.corr(method='pearson')


fig = px.imshow(corr, text_auto='.2f', aspect='auto', color_continuous_scale='viridis_r', title='Pearson correlations between categories')
fig.update_xaxes(side="top")
fig.show()

> *Figure 5: Heat map showing the correlations between various environmental factors. Here we can clearly see strong correlation between greenhouse gas emissions, acidification and eutrophication, but not necessarily with freshwater or stress-weighted water usage.*

From this analysis, we can conclude that reducing greenhouse gas emissions may also contribute to mitigating acidification and eutrophication, as these factors are strongly correlated. However, it suggests that addressing freshwater usage and stress-weighted water usage may require a different approach. By understanding these correlations, we can make more informed decisions and prioritize products that have a small environmental impact, ultimately working towards a more sustainable and balanced food production system.

## The type of food we consume is the most important factor

As we delve into the data, it becomes evident that different food choices have varying ecological footprints, looking at factors such as greenhouse gas emissions, land use, freshwater consumption, and more. In this section, we explore the significance of these choices and present visual data that shows that the type of food we consume might be the most important factor in reducing our emissions.

### Some food groups are thirstier than others

In the previous section, we've observed that freshwater and stress-weighted water usage are not strongly correlated to the other environmental factors. Water usage is still an important factor, so let's explore the data and see which products are the 'thirstiest'.

In [8]:
df_land = pd.read_excel('dataset1.xls', sheet_name=2, skiprows=2,nrows=43)

names= ['starchy', 'sugars', 'legume', 'vegan alt', 'oils', 'vegetables', 'fruits','proc nuts', 'meat', 'animal prod', 'fish']
colors = standard_discrete
types = [
list(df_land['Product'][:7]),
list(df_land['Product'][7:10]),
list(df_land['Product'][10:13]),
list(df_land['Product'][13:15]),
list(df_land['Product'][15:20]),
list(df_land['Product'][20:25]),
list(df_land['Product'][25:31]),
list(df_land['Product'][31:33]),
list(df_land['Product'][33:38]),
list(df_land['Product'][38:41]),
list(df_land['Product'][41:])]
values = [
list(df_land['Total.2'][:7]),
list(df_land['Total.2'][7:10]),
list(df_land['Total.2'][10:13]),
list(df_land['Total.2'][13:15]),
list(df_land['Total.2'][15:20]),
list(df_land['Total.2'][20:25]),
list(df_land['Total.2'][25:31]),
list(df_land['Total.2'][31:33]),
list(df_land['Total.2'][33:38]),
list(df_land['Total.2'][38:41]),
list(df_land['Total.2'][41:])]
types1 = [
list(df_land['Product'][:7]),
list(df_land['Product'][7:10]),
list(df_land['Product'][10:13]),
list(df_land['Product'][13:15]),
list(df_land['Product'][15:20]),
list(df_land['Product'][20:25]),
list(df_land['Product'][25:31]),
list(df_land['Product'][31:33]),
list(df_land['Product'][33:38]),
list(df_land['Product'][38:41]),
list(df_land['Product'][41:])]
values1 = [
list(df_land['Total.3'][:7]),
list(df_land['Total.3'][7:10]),
list(df_land['Total.3'][10:13]),
list(df_land['Total.3'][13:15]),
list(df_land['Total.3'][15:20]),
list(df_land['Total.3'][20:25]),
list(df_land['Total.3'][25:31]),
list(df_land['Total.3'][31:33]),
list(df_land['Total.3'][33:38]),
list(df_land['Total.3'][38:41]),
list(df_land['Total.3'][41:])]

dairy_color = '#fd9a04'
seafood_color = '#1092d1'
trace1 = go.Bar(
    x=types[0],
    y=values[0],
    name=names[0],
    marker=dict(color=colors[0]),
    width=0.7
)
trace2 = go.Bar(
    x=types[1],
    y=values[1],
    name=names[1],
    marker=dict(color=colors[1]),
    width=0.7
)
trace3 = go.Bar(
    x=types[2],
    y=values[2],
    name=names[2],
    marker=dict(color=colors[2]),
    width=0.7
)
trace4 = go.Bar(
    x=types[3],
    y=values[3],
    name=names[3],
    marker=dict(color=colors[3]),
    width=0.7
)
trace5 = go.Bar(
    x=types[4],
    y=values[4],
    name=names[4],
    marker=dict(color=colors[8]),
    width=0.7
)
trace6 = go.Bar(
    x=types[5],
    y=values[5],
    name=names[5],
    marker=dict(color=crop_color),
    width=0.7
)
trace7 = go.Bar(
    x=types[6],
    y=values[6],
    name=names[6],
    marker=dict(color=colors[5]),
    width=0.7
)
trace8 = go.Bar(
    x=types[7],
    y=values[7],
    name=names[7],
    marker=dict(color=colors[6]),
    width=0.7
)
trace9 = go.Bar(
    x=types[8],
    y=values[8],
    name=names[8],
    marker=dict(color=meat_color),
    width=0.7
)
trace10 = go.Bar(
    x=types[9],
    y=values[9],
    name=names[9],
    marker=dict(color=dairy_color),
    width=0.7
)
trace11 = go.Bar(
    x=types[10],
    y=values[10],
    name=names[10],
    marker=dict(color=seafood_color),
    width=0.7
)
trace12 = go.Bar(
    x=types1[0],
    y=values1[0],
    name=names[0],
    marker=dict(color=colors[0]),
    width=0.7,
    visible=False  
)

trace13 = go.Bar(
    x=types1[1],
    y=values1[1],
    name=names[1],
    marker=dict(color=colors[1]),
    width=0.7,
    visible=False  
)

trace14 = go.Bar(
    x=types1[2],
    y=values1[2],
    name=names[2],
    marker=dict(color=colors[2]),
    width=0.7,
    visible=False  
)

trace15 = go.Bar(
    x=types1[3],
    y=values1[3],
    name=names[3],
    marker=dict(color=colors[3]),
    width=0.7,
    visible=False  
)

trace16 = go.Bar(
    x=types1[4],
    y=values1[4],
    name=names[4],
    marker=dict(color=colors[8]),
    width=0.7,
    visible=False  
)

trace17 = go.Bar(
    x=types1[5],
    y=values1[5],
    name=names[5],
    marker=dict(color=crop_color),
    width=0.7,
    visible=False  
)

trace18 = go.Bar(
    x=types1[6],
    y=values1[6],
    name=names[6],
    marker=dict(color=colors[5]),
    width=0.7,
    visible=False  
)

trace19 = go.Bar(
    x=types1[7],
    y=values1[7],
    name=names[7],
    marker=dict(color=colors[6]),
    width=0.7,
    visible=False  
)

trace20 = go.Bar(
    x=types1[8],
    y=values1[8],
    name=names[8],
    marker=dict(color=meat_color),
    width=0.7,
    visible=False  
)

trace21 = go.Bar(
    x=types1[9],
    y=values1[9],
    name=names[9],
    marker=dict(color=dairy_color),
    width=0.7,
    visible=False  
)

trace22 = go.Bar(
    x=types1[10],
    y=values1[10],
    name=names[10],
    marker=dict(color=seafood_color),
    width=0.7,
    visible=False  
)

update_menus = [
    dict(
        buttons=[
            dict(
                label="Total fresh water",
                method="update",
                args=[
                    {"visible": [True] * 11 + [False] * 11},
                    {'title.text': 'Total Freshwater Usage (L)'}
                ]
            ),
            dict(
                label="Liters per Fixture unit",
                method="update",
                args=[
                    {"visible": [False] * 11 + [True] * 11},
                    {'title.text': 'Total stresswater usage(L/eq)'}
                ]
            )
        ],
        active=0,
        y=1.2,
        x=0.1
    )
]

layout1 = go.Layout(
    title='Average Freshwater Withdrawals (L/NU)', 
    title_x=0.5,
    barmode='group',
    legend=dict(
        title='Legend',
        x=1.1,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0.7)',
        bordercolor='black',
        borderwidth=1,
        font=dict(
            family='Arial',
            size=12,
            color='black'
        )
    ),
    updatemenus=update_menus, 
    
)
data = [trace1, trace2, trace3, trace4, trace5, trace6, trace7, trace8, trace9, trace10, trace11,
          trace12, trace13, trace14, trace15, trace16, trace17, trace18, trace19, trace20, trace21, trace22]
fig = go.Figure(data=data, layout=layout1)
fig.show()

> *Figure 6: Amount of freshwater withdrawals per nutritional units. In this figure we can see the different amount of liters water used per nutritional unit. We can also see that nuts and cheese use a lot of water relative to the amount of nutritions that they provide. Most of the meat substitutes are made from wheat, soy and different kinds of fungi.[4] The figure shows that there is a substantial difference between amount of water used for the different kinds of meat and the water usage from wheat and soymilk. This shows that there is also a large difference in the amount of water used between meat and plant based products.*

It becomes evident that certain food choices place a heavier burden on water resources than others. These findings highlight the critical role of consumer choices in reducing water consumption and fostering sustainability. By opting for water-efficient alternatives, we can contribute to conserving water resources and building a more environmentally conscious future.

### Vegan is better

The data presented underscores the significant ecological impact of meat consumption. Among the various meat options, lamb and mutton have the highest ecological footprint. The non-vegan products, in general, exhibit an ecological footprint several times larger than that of vegan products.

In [9]:
#Mauro
non_vegan_products = ['Barley (Beer)','Cane Sugar','Milk','Cheese','Eggs','Fish (farmed)','Crustaceans (farmed)','Bovine Meat (beef herd)','Bovine Meat (dairy herd)','Lamb & Mutton','Pig Meat','Poultry Meat']
meat_products = ['Bovine Meat (beef herd)', 'Bovine Meat (dairy herd)', 'Lamb & Mutton', 'Pig Meat','Poultry Meat','Fish (farmed)','Crustaceans (farmed)']
dairy_products = ['Milk','Cheese','Eggs']
other_non_vegan_products = ['Cane Sugar','Crustaceans (farmed)','Barley (Beer)']

df_all_products = pd.read_excel("dataset1.xls", sheet_name="Results - Retail Weight", skiprows=2,nrows=43, index_col=None, na_values=["NA"])

#non vegan products
non_vegan = df_all_products["Product"].isin(non_vegan_products)
meat = df_all_products["Product"].isin(meat_products)
dairy = df_all_products["Product"].isin(dairy_products)
other_non_vegan = df_all_products["Product"].isin(other_non_vegan_products)

df_non_vegan_products = df_all_products[non_vegan]
df_meat_products = df_all_products[meat]
df_dairy_products = df_all_products[dairy]
df_other_non_vegan_products = df_all_products[other_non_vegan]

#non vegan
df_non_vegan_products_land_use = df_non_vegan_products.iloc[:, [0, 3]]
df_non_vegan_products_ghg_2013 = df_non_vegan_products.iloc[:, [0, 9]]
df_non_vegan_products_ghg_2007 = df_non_vegan_products.iloc[:, [0, 15]]
df_non_vegan_products_acid = df_non_vegan_products.iloc[:, [0, 21]]
df_non_vegan_products_eutro = df_non_vegan_products.iloc[:, [0, 27]]
df_non_vegan_products_fresh_water_withdraw = df_non_vegan_products.iloc[:, [0, 33]]
df_non_vegan_products_stress_water_use = df_non_vegan_products.iloc[:, [0, 39]]

#meat products
df_meat_products_land_use = df_meat_products.iloc[:, [0, 3]]
df_meat_products_ghg_2013 = df_meat_products.iloc[:, [0, 9]]
df_meat_products_ghg_2007 = df_meat_products.iloc[:, [0, 15]]
df_meat_products_acid = df_meat_products.iloc[:, [0, 21]]
df_meat_products_eutro = df_meat_products.iloc[:, [0, 27]]
df_meat_products_fresh_water_withdraw = df_meat_products.iloc[:, [0, 33]]
df_meat_products_stress_water_use = df_meat_products.iloc[:, [0, 39]]
#dairy
df_dairy_products_land_use = df_dairy_products.iloc[:, [0, 3]]
df_dairy_products_ghg_2013 = df_dairy_products.iloc[:, [0, 9]]
df_dairy_products_ghg_2007 = df_dairy_products.iloc[:, [0, 15]]
df_dairy_products_acid = df_dairy_products.iloc[:, [0, 21]]
df_dairy_products_eutro = df_dairy_products.iloc[:, [0, 27]]
df_dairy_products_fresh_water_withdraw = df_dairy_products.iloc[:, [0, 33]]
df_dairy_products_stress_water_use = df_dairy_products.iloc[:, [0, 39]]

#other non vegan
df_other_non_vegan_products_land_use = df_other_non_vegan_products.iloc[:, [0, 3]]
df_other_non_vegan_products_ghg_2013 = df_other_non_vegan_products.iloc[:, [0, 9]]
df_other_non_vegan_products_ghg_2007 = df_other_non_vegan_products.iloc[:, [0, 15]]
df_other_non_vegan_products_acid = df_other_non_vegan_products.iloc[:, [0, 21]]
df_other_non_vegan_products_eutro = df_other_non_vegan_products.iloc[:, [0, 27]]
df_other_non_vegan_products_fresh_water_withdraw = df_other_non_vegan_products.iloc[:, [0, 33]]
df_other_non_vegan_products_stress_water_use = df_other_non_vegan_products.iloc[:, [0, 39]]


#vegan products
grains_products = ['Wheat & Rye (Bread)', 'Maize (Meal)', 'Oatmeal', 'Rice']
vegetables_products = ['Potatoes', 'Cassava', 'Tomatoes', 'Onions & Leeks', 'Root Vegetables', 'Brassicas', 'Other Vegetables']
fruits_products = ['Citrus Fruit', 'Bananas', 'Apples', 'Berries & Grapes', 'Other Fruit']
legumes_products = ['Other Pulses', 'Peas', 'Nuts', 'Groundnuts']
plant_based_alternatives_products = ['Soymilk', 'Tofu']
oils_products = ['Soybean Oil', 'Sunflower Oil', 'Olive Oil']
others_products = ['Coffee', 'Dark Chocolate']

vegan = ~df_all_products['Product'].isin(non_vegan_products)
grains = df_all_products["Product"].isin(grains_products)
vegetables = df_all_products["Product"].isin(vegetables_products)
fruits = df_all_products["Product"].isin(fruits_products)
legumes = df_all_products["Product"].isin(legumes_products)
plant_based_alternatives = df_all_products["Product"].isin(plant_based_alternatives_products)
oils = df_all_products["Product"].isin(oils_products)
others = df_all_products["Product"].isin(others_products)


df_vegan_products = df_all_products[vegan]
df_grains_products = df_all_products[grains]
df_vegetables_products = df_all_products[vegetables]
df_fruits_products = df_all_products[fruits]
df_legumes_products = df_all_products[legumes]
df_plant_based_alternatives_products = df_all_products[plant_based_alternatives]
df_oils_products = df_all_products[oils]
df_others_products = df_all_products[others]

#vegan
df_vegan_products_land_use = df_vegan_products.iloc[:, [0, 3]]
df_vegan_products_ghg_2013 = df_vegan_products.iloc[:, [0, 9]]
df_vegan_products_ghg_2007 = df_vegan_products.iloc[:, [0, 15]]
df_vegan_products_acid = df_vegan_products.iloc[:, [0, 21]]
df_vegan_products_eutro = df_vegan_products.iloc[:, [0, 27]]
df_vegan_products_fresh_water_withdraw = df_vegan_products.iloc[:, [0, 33]]
df_vegan_products_stress_water_use = df_vegan_products.iloc[:, [0, 39]]

# Define the data for the bar plots
categories = ['Land use', 'GHG emission', 'Acidification', 'Eutrophication', 'Fresh Water Withdrawal', 'Stress Water Use']
vegan_values = [df_vegan_products_land_use["Mean"].sum(),
                df_vegan_products_ghg_2013["Mean.1"].sum(),
                df_vegan_products_ghg_2007["Mean.2"].sum(),
                df_vegan_products_acid["Mean.3"].sum(),
                df_vegan_products_eutro["Mean.4"].sum(),
                df_vegan_products_fresh_water_withdraw["Mean.5"].sum(),
                df_vegan_products_stress_water_use["Mean.6"].sum()]
non_vegan_values = [df_non_vegan_products_land_use["Mean"].sum(),
                    df_non_vegan_products_ghg_2013["Mean.1"].sum(),
                    df_non_vegan_products_ghg_2007["Mean.2"].sum(),
                    df_non_vegan_products_acid["Mean.3"].sum(),
                    df_non_vegan_products_eutro["Mean.4"].sum(),
                    df_non_vegan_products_fresh_water_withdraw["Mean.5"].sum(),
                    df_non_vegan_products_stress_water_use["Mean.6"].sum()]
yaxvalues = ['(m2/FU)', '(kg CO2eq/FU)', '(kg CO2eq/FU)', '(g SO2eq/FU)', '(g PO43-eq/FU)', '(L/FU)', '(L/FU)']


labels = ["", "Non-vegan", "Vegan"] + list(df_non_vegan_products["Product"]) + list(df_vegan_products["Product"])
parents = ["", "", ""] + ["Non-vegan"] * len(df_non_vegan_products) + ["Vegan"] * len(df_vegan_products)

# Creating the figure
specs_list1 = []
for i in range(3):
    specs_list1.append({'type': 'bar'})
specs_list = [specs_list1, specs_list1]

fig = make_subplots(rows=2, cols=3, specs=specs_list, vertical_spacing=0.2)

# Creating and adding the bar plots to the figure
i = 0
for category, vegan_value, non_vegan_value in zip(categories, vegan_values, non_vegan_values):
    i += 1
    if i < 4:
        fig.add_trace(go.Bar(
            x=['Vegan', 'Non-vegan'],
            y=[vegan_value, non_vegan_value],
            name=category,
            marker_color=['#35A85A', '#ff1a0f']
        ), row=1, col=categories.index(category) + 1)
        fig.update_xaxes(title_text=category, row=1, col=categories.index(category) + 1)
        fig.update_yaxes(title_text=yaxvalues[categories.index(category)], row=1, col=categories.index(category) + 1)
    else:
        fig.add_trace(go.Bar(
            x=['Vegan', 'Non-vegan'],
            y=[vegan_value, non_vegan_value],
            name=category,
            marker_color=['#35A85A', '#ff1a0f']
        ), row=2, col=categories.index(category) - 2)

        fig.update_xaxes(title_text=category, row=2, col=categories.index(category) - 2)
        fig.update_yaxes(title_text=yaxvalues[categories.index(category)], row=2, col=categories.index(category) - 2)

fig.update_layout(width = 1200, height=800,showlegend=False, title='Vegan vs Non-Vegan', title_x=0.5)
fig.show()

> *Figure 7: Bar plots of vegan vs. non-vegan emissions. These figures show the significant difference in land use and greenhouse gas emissions between vegan and non-vegan products. In all cases, square meters per fixture unit is used for land use, kg CO2 equivalent per fixture unit is used for GHG emissions. The figures are meant to support the perspective that the type of food matters more than the sourcing; we need to evaluate what
it is we eat, not necessarily where it’s from.*

## Explore the data

We highly encourage you to explore the data yourself to see what products have the highest environmental impact. The dropdown menu in the following scatter plot will let you swap freely between various environmental factors.

In [10]:
fig = px.scatter(
    df_cat, 
    x="Amount produced",
    y="GHG emissions",
    log_x=True,
    height=500,
    text='labeled',
    # log_y=True,
    # size="Total impact",
    size_max=60,
    hover_data=['Product'],
    title="GHG impact and global production"
    )

df_filtered = df_cat[df_cat['Product'].isin(highlighted)]

annotation= [{
    'x': np.log10(df_cat.loc[df_cat['Product'] == name, 'Amount produced'].iloc[0]),
    'y': df_cat.loc[df['Product'] == name, 'GHG emissions'].iloc[0],
    'text': name,  # text
    'showarrow': True,  # would you want to see arrow
} for name in highlighted]


updatemenus = [
    {
        "buttons": [
            {
                "label": col,
                "method": "update",
                "args": [
                    {"y": [df_cat[col]]},
                    {"yaxis": {"title": {"text": col}}}
                ],
            }
            for col in list(df_cat.loc[:, 'GHG emissions': 'Stress-weighted water usage'].columns)
        ],
        "x": 0,
        "y": 1.2,
    }
]


fig.update_layout(updatemenus=updatemenus, title_x=0.5, yaxis_title='Emissions (kg CO2eq / kg)', xaxis_title='Global production (tonnes)')





fig.update_traces(marker={
        "color": pd.Categorical(df_cat["Type"]).codes,
        "colorscale": [(0,crop_color), (0.33,dairy_color), (0.66,meat_color), (1,seafood_color)]
    },
    marker_size=10,
    marker_opacity=df_cat['labeled'].notnull().map({True: 0.8, False: 0.35}).values,
    textposition="top center",
    hoverinfo="name+x+y",
    hovertemplate='Amount produced=%{x}<br>Total=%{y}<br>Product=%{customdata[0]}<extra></extra>',
    textfont={'size': 12}
)

# adds custom legend
types = ['Crops', 'Seafood', 'Dairy', 'Meat']
colors = [crop_color, seafood_color, dairy_color, meat_color]
for t, color in zip(types,colors):
    fig.add_trace(
        go.Scatter(
            x=[None], y=[None],
            mode='markers',
            marker=dict(
                size=10,
                color=color
            ),
            showlegend=True,
            name=t
        )
    )

fig.show()

> *Figure 8: Scatter plot of global production (in tonnes) and various environmental factors (in kg eq / kg, or L / kg for water usage). The dropdown menu allows the user to view different environmental factors on the y-axis.*

## Conclusion

Conclusion WIP

## Reflection

Following the feedback session, we improved the legend of the first plot as well as the readability.
We made sure to use a consistent style and color scheme for all figures. 
For each plot, we made sure it relates to an argument that supports one of the perspectives described in the introduction.
Furthermore, we limited any dropdown menus to at most 5 options, and limited the amount of different variables for each individual plot.
Aside from this, we continued working on our data story, adding supportive text to each visualisation and improved on some minor visual details as discussed during the feedback session.

## Work Distribution

#### Ardjano Mark 14713926
Visualisation 1-5, 8, came up with the perspectives, animation
#### Daan Huisman 14650797
Visualisation 6, plot styles, making sure the visualisations look good and consistent
#### Ivo de Brouwer 11045841
Visualisation 4, 8, introduction/dataset and preprocessing/reflection sections, supporting text under figures
#### Mauro Dieters 14533391
Visualisation 7, helped with most other visualisations

---

Every team member helped to some extent with all visualisations as well as editing the supporting text.

## References

[1]  https://www.theguardian.com/environment/2021/aug/10/code-red-for-humanity-what-the-papers-say-about-the-ipcc-report-on-the-climate-crisis

[2] https://ourworldindata.org/greenhouse-gas-emissions-food

[3] https://www.science.org/doi/10.1126/science.aaq0216

[4] https://www.milieucentraal.nl/eten-en-drinken/milieubewust-eten/vleesvervangers/

## Appendix

Generative AI (ChatGPT/Bing chat with GPT 4) is used to facilitate the creation of this document, as shown in the table below.

| Reasons of Usage | In which parts? | Which prompts were used? |
| ------------------------ | --------------------------------- | -------------------------------------------- |
| Brainstorm research questions and identify keywords for further search | The entire project framing | "Give keywords about the current debate in climate change with brief explanations" |
| Improve writing clarity and enhance readability | All sections | "Edit the following text to make it more clear. Do not alter the meaning." |
| Enhance readability | All sections | "Revise the paragraph to improve readability." |
| Ensure grammatical accuracy |  All sections | "Correct any grammatical errors in the text." |
| Provide alternative phrasing | Descriptions of the perspectives | "Suggest alternative phrases for better clarity." |
| Optimize sentence structure | All sections | "Restructure the sentence for better flow." |
| Condense lengthy sentences | All sections | "Simplify the following sentences without losing important information."|

> *Table 1: Usage of generative AI to facilitate the creation of this document.*