In [24]:
import pandas as pd
import numpy as np
import plotly as py
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')

# Palm Oil: Debunking Myths and Exploring Solutions

Student names: Maywand Hashemi, Joren Melissen, Mikail van Dartel and Aylwin Gall
Team number: I5

## Introduction

With the emergence of more and more palm oil-free products, it seems that palm oil is almost a thing of the past. Spurred on by horror stories of orangutans without trees and burning trees giving way to rows of oil palm trees, more and more people are opting for these products. This seems a very reasonable choice. However, it could be that the solution to this problem takes a more counter-intuitive approach: choose palm oil instead.

Using various charts, we will try to present relevant data in as clear a way as possible, so that you can learn more about these two sides of the discussion and make your own choice on which option is best.

The first perspective is that we must stop the production of palm oil, because it leads to deforestation and it emits more CO2 per kg of product than others. The second perspective is that palm oil is actually more efficient in land use and replacing it with other oils might make the deforestation worse. Thus the ploblem lies with the global demand for vegetable oils, not palm oil in particular.

In this analysis, we will concentrate exclusively on Malaysia and Indonesia, as these two countries are the predominant producers of palm oil globally. Together, they account for nearly 85% of the world's palm oil supply, making them the most significant players in the industry.

In [25]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

# Load the dataset
df = pd.read_csv('../Datasets/palm-oil-production.csv')

# Filter data for the year 2021
df_2021 = df.loc[df['Year'] == 2021]

# Sort by production to get the top 5 entries
df_2021_top5 = df_2021.sort_values(by='Production', ascending=False).head(5)

# Calculate the total production of the remaining entries
others_production = df_2021.sort_values(by='Production', ascending=False).iloc[5:]['Production'].sum()

# Create a new DataFrame for the 'Other' category
df_others = pd.DataFrame({'Entity': ['Other'], 'Production': [others_production]})

# Concatenate the top 5 entries with the 'Other' category
df_2021_final = pd.concat([df_2021_top5, df_others], ignore_index=True)

# Define labels and values for the donut chart
labels = df_2021_final['Entity']
values = df_2021_final['Production']

# Define colors for the producers using provided palette
colors = px.colors.qualitative.T10

# Create the donut chart using Plotly
fig = go.Figure(data=[go.Pie(
    labels=labels, 
    values=values, 
    hole=.7,  # Set the hole size for the donut chart
    textinfo='label+percent',
    textposition='outside',
    insidetextorientation='radial',
    marker=dict(colors=colors)
)])

fig.update_layout(
    title_text='Palm oil production in 2021',
    showlegend=False,
    height=600 
)

# Show the plot
fig.show()


## Dataset and Preprocessing

### Datasets

Th
https://ourworldindata.org/grapher/oil-yield-by-crop?tab=table

https://www.kaggle.com/datasets/karnikakapoor/global-forest-data-2001-2022

https://ourworldindata.org/palm-oil

https://ourworldindata.org/grapher/vegetable-oil-production

### Preprocessing

We have sorted the global data within the separate code cells. Most of the data is sorted for Indonesia and Malaysia and the years 2000-2023.

## Perspective 1: We must stop the production of palm Oil

### Argument 1: It leads to deforestation

As we can see in the following graphs, the production of palm oil in Indonesia and Malaysia, has been steadily increasing over the years. With the total production in 2023 being more than three times as much as in 2000.

In [49]:
import pandas as pd
import plotly.express as px

# Load the data from the provided CSV file
file_path = '../Datasets/palm-oil-production.csv'
data = pd.read_csv(file_path)

# Aggregate data to get the total production for each country per year
total_production_by_year = data.groupby(['Entity', 'Year'])['Production'].sum().reset_index()

# Sort the data by Year to ensure correct order in the animation
total_production_by_year = total_production_by_year.sort_values(by='Year')

# Custom color scale with values from 0 to 350 million
color_scale = [
    (0.0, "rgb(255,182,193)"),  # Light Pink
    (0.2, "rgb(255,105,180)"),  # Hot Pink
    (0.4, "rgb(255,0,0)"),      # Red
    (0.6, "rgb(139,0,0)"),      # Dark Red
    (0.8, "rgb(105,0,0)"),      # Darker Red
    (1.0, "rgb(50,0,0)")        # Very Dark Red
]

# Create a choropleth map using Plotly with animation
fig = px.choropleth(
    total_production_by_year,
    locations="Entity",
    locationmode="country names",
    color="Production",
    hover_name="Entity",
    animation_frame="Year",
    color_continuous_scale=color_scale,
    range_color=(0, 50e6),
    title="Palm Oil Production by Country per Year"
)

# Update the layout to focus on Southeast Asia
fig.update_geos(
    projection_scale=4,  # This controls the zoom level
    center={"lat": 2.5, "lon": 115},  # Center the map roughly around Indonesia/Malaysia
    visible=False  # Hide the frame
)

fig.update_layout(
    # height=600,
    # width=800
)

# Show the map
fig.show()


In [35]:
import pandas as pd
import plotly.graph_objs as go

# Load palm oil production data
palm = pd.read_csv('../Datasets/Oils/archive/cleaned/palm.csv')

# Filter data for Indonesia and Malaysia and years >= 2000
df_Indo = palm[(palm['Country'] == 'Indonesia') & (palm['Year'] >= 2000)]
df_Malaysia = palm[(palm['Country'] == 'Malaysia') & (palm['Year'] >= 2000)]

# Extract production values and years
xi = df_Indo['Production'].values
xm = df_Malaysia['Production'].values
years = df_Indo['Year'].values  # Both Indo and Malaysia have the same years

# Calculate total production
total_production = xi + xm

# Create traces for Plotly
trace_indo = go.Scatter(x=years, y=xi, mode='lines', name='Indonesia')
trace_malaysia = go.Scatter(x=years, y=xm, mode='lines', name='Malaysia')
trace_total = go.Scatter(x=years, y=total_production, mode='lines', name='Indonesia + Malaysia')

# Layout
layout = go.Layout(
    title='Palm Oil Production in Indonesia and Malaysia (2000-2023)',
    xaxis=dict(title='Year'),
    yaxis=dict(title='Production'),
    legend=dict(x=0.1, y=1.1, orientation='h')
)

# Create figure
fig = go.Figure(data=[trace_indo, trace_malaysia, trace_total], layout=layout)

# Show plot
fig.show()


At the same time we can see, as shown in the next graph, that the total tree loss has also been steadily increasing since 2001.

In [28]:
import pandas as pd
import plotly.graph_objs as go

# Load the dataset
df = pd.read_csv('../Datasets/Deforestation/Subnational 1 tree cover loss.csv')

# Define the years to plot
years = [f'tc_loss_ha_{year}' for year in range(2001, 2023)]

# Mapping dictionary from Indonesian to English names for Kalimantan provinces
province_mapping = {
    'Kalimantan Barat': 'West Kalimantan',
    'Kalimantan Tengah': 'Central Kalimantan',
    'Kalimantan Timur': 'East Kalimantan',
    'Kalimantan Selatan': 'South Kalimantan'
}

# Create an empty DataFrame to store totals
totals_df = pd.DataFrame(columns=['year', 'cumulative_tc_loss_ha'])

# Loop through each province
for province_indo, province_eng in province_mapping.items():
    # Filter the DataFrame for the current province and threshold == 0
    province_df = df[(df['subnational1'] == province_indo) & (df['threshold'] == 0)]
    
    # Keep only relevant columns
    columns_to_keep = ['subnational1'] + years
    province_df = province_df[columns_to_keep]
    
    # Melt the DataFrame to long format
    province_df_long = province_df.melt(id_vars='subnational1', var_name='year', value_name='tc_loss_ha')
    
    # Extract year from the column name and convert to integer
    province_df_long['year'] = province_df_long['year'].str.extract(r'(\d+)').astype(int)
    
    # Sort the DataFrame by year
    province_df_long = province_df_long.sort_values(by='year')
    
    # Calculate cumulative tree loss
    province_df_long['cumulative_tc_loss_ha'] = province_df_long['tc_loss_ha'].cumsum()
    
    # Append to totals_df
    totals_df = pd.concat([totals_df, province_df_long[['year', 'cumulative_tc_loss_ha']]], ignore_index=True)

# Calculate the total cumulative tree loss across all provinces
totals_df = totals_df.groupby('year')['cumulative_tc_loss_ha'].sum().reset_index()

# Initialize Figure using Plotly
fig = go.Figure()

# Plot cumulative tree cover loss for Sarawak
sarawak_df = df[(df['subnational1'] == 'Sarawak') & (df['threshold'] == 0)]
sarawak_df = sarawak_df[years]
sarawak_df_long = sarawak_df.melt(var_name='year', value_name='tc_loss_ha')
sarawak_df_long['year'] = sarawak_df_long['year'].str.extract(r'(\d+)').astype(int)
sarawak_df_long = sarawak_df_long.sort_values(by='year')
sarawak_df_long['cumulative_tc_loss_ha'] = sarawak_df_long['tc_loss_ha'].cumsum()

fig.add_trace(go.Scatter(x=sarawak_df_long['year'], 
                         y=sarawak_df_long['cumulative_tc_loss_ha'], 
                         mode='lines', 
                         name='Sarawak'))

# Plot cumulative tree cover loss for Sabah
sabah_df = df[(df['subnational1'] == 'Sabah') & (df['threshold'] == 0)]
sabah_df = sabah_df[years]
sabah_df_long = sabah_df.melt(var_name='year', value_name='tc_loss_ha')
sabah_df_long['year'] = sabah_df_long['year'].str.extract(r'(\d+)').astype(int)
sabah_df_long = sabah_df_long.sort_values(by='year')
sabah_df_long['cumulative_tc_loss_ha'] = sabah_df_long['tc_loss_ha'].cumsum()

fig.add_trace(go.Scatter(x=sabah_df_long['year'], 
                         y=sabah_df_long['cumulative_tc_loss_ha'], 
                         mode='lines', 
                         name='Sabah'))

# Combine Sarawak and Sabah into 'East Malaysia'
east_malaysia_df = pd.DataFrame()
east_malaysia_df['year'] = sarawak_df_long['year']
east_malaysia_df['tc_loss_ha'] = sarawak_df_long['tc_loss_ha'] + sabah_df_long['tc_loss_ha']
east_malaysia_df['cumulative_tc_loss_ha'] = sarawak_df_long['cumulative_tc_loss_ha'] + sabah_df_long['cumulative_tc_loss_ha']

fig.add_trace(go.Scatter(x=east_malaysia_df['year'], 
                         y=east_malaysia_df['cumulative_tc_loss_ha'], 
                         mode='lines', 
                         name='East Malaysia'))

# Plot cumulative tree cover loss for Kalimantan provinces
for province_indo, province_eng in province_mapping.items():
    # Filter the DataFrame for the current province and threshold == 0
    province_df = df[(df['subnational1'] == province_indo) & (df['threshold'] == 0)]
    
    # Keep only relevant columns
    province_df = province_df[years]
    
    # Melt the DataFrame to long format
    province_df_long = province_df.melt(var_name='year', value_name='tc_loss_ha')
    province_df_long['year'] = province_df_long['year'].str.extract(r'(\d+)').astype(int)
    province_df_long = province_df_long.sort_values(by='year')
    
    # Calculate cumulative tree loss
    province_df_long['cumulative_tc_loss_ha'] = province_df_long['tc_loss_ha'].cumsum()
    
    # Add trace for the current province
    fig.add_trace(go.Scatter(x=province_df_long['year'], 
                             y=province_df_long['cumulative_tc_loss_ha'], 
                             mode='lines', 
                             name=province_eng))

# Add trace for total cumulative tree cover loss in Kalimantan
fig.add_trace(go.Scatter(x=totals_df['year'], 
                         y=totals_df['cumulative_tc_loss_ha'], 
                         mode='lines', 
                         name='Total for Kalimantan'))

# Update layout
fig.update_layout(title='Cumulative tree cover loss on Borneo per country and province (2001-2022)',
                  xaxis_title='Year',
                  yaxis_title='Cumulative Tree Cover Loss (ha)',
                  legend_title='Province',
                  hovermode='x unified',
                  template='plotly_white')

# Show the plot
fig.show()


There seems to be a strong negative correlation between the palm oil production and the total tree cover. This can be seen in the next graph, wich contains the data for Indonesia.

In [None]:
import pandas as pd
import plotly.graph_objects as go
from scipy import interpolate
from plotly.subplots import make_subplots

# Load palm oil production data
palm = pd.read_csv('../Datasets/Oils/archive/cleaned/palm.csv')

# Filter data for Indonesia and years >= 2000
df_Indo = palm[(palm['Country'] == 'Indonesia') & (palm['Year'] >= 2000)]

# Extract production values and years
xi = df_Indo['Production'].values
years_palm = df_Indo['Year'].values

# Load the dataset for Kalimantan forest cover
df = pd.read_csv('../Datasets/Deforestation/Subnational 1 tree cover loss.csv')

# Filter for Kalimantan provinces and threshold == 0
provinces = ['Kalimantan Barat', 'Kalimantan Tengah', 'Kalimantan Timur', 'Kalimantan Selatan']
kalimantan_df = df[(df['subnational1'].isin(provinces)) & (df['threshold'] == 0)]

# Calculate initial tree cover in 2000
initial_extent_2000 = kalimantan_df['extent_2000_ha'].sum()

# Calculate tree cover gain spread over 22 years
gain_spread = (kalimantan_df['gain_2000-2020_ha'].sum() / 22)

# Calculate cumulative tree cover losses from 2001 to 2022
years_tree_cover = range(2001, 2023)
cumulative_losses = []
for year in years_tree_cover:
    loss_column = f'tc_loss_ha_{year}'
    cumulative_loss = (kalimantan_df[loss_column].sum() - gain_spread)
    cumulative_losses.append(cumulative_loss)

# Calculate total tree cover per year
annual_tree_cover = []
for i, year in enumerate(years_tree_cover):
    total_tree_cover = initial_extent_2000 - sum(cumulative_losses[:i + 1])
    annual_tree_cover.append(total_tree_cover)

# Interpolate palm oil production to match the years_tree_cover
interp_func = interpolate.interp1d(years_palm, xi, kind='linear', fill_value='extrapolate')
xi_interpolated = interp_func(list(years_tree_cover))

# Create subplots with two y-axes
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add total tree cover trace
fig.add_trace(go.Scatter(x=list(years_tree_cover), y=annual_tree_cover,
                    mode='lines',
                    name='Total Tree Cover (Kalimantan)',
                    line=dict(color='blue')),
                    secondary_y=False)

# Add palm oil production trace
fig.add_trace(go.Scatter(x=list(years_tree_cover), y=xi_interpolated,
                    mode='lines',
                    name='Palm Oil Production (Indonesia)',
                    line=dict(color='green')),
                    secondary_y=True)

# Update layout
fig.update_layout(title='Palm Oil Production vs Total Tree Cover in Kalimantan (2000-2022)',
                  xaxis_title='Year',
                  plot_bgcolor='white')

# Set y-axis titles
fig.update_yaxes(title_text="Total Tree Cover (ha)", secondary_y=False)
fig.update_yaxes(title_text="Palm Oil Production (tonnes)", secondary_y=True)

# Show figure
fig.show()


In [48]:
import pandas as pd
import plotly.graph_objs as go

# Load the dataset for tree cover loss
df = pd.read_csv('../Datasets/Deforestation/Subnational 1 tree cover loss.csv')
# Define the years to plot
years = [f'tc_loss_ha_{year}' for year in range(2001, 2023)]

# Mapping dictionary from Indonesian to English names for Kalimantan provinces
province_mapping = {
    'Kalimantan Barat': 'West Kalimantan',
    'Kalimantan Tengah': 'Central Kalimantan',
    'Kalimantan Timur': 'East Kalimantan',
    'Kalimantan Selatan': 'South Kalimantan'
}

# Create an empty DataFrame to store totals
totals_df = pd.DataFrame(columns=['year', 'cumulative_tc_loss_ha'])

# Loop through each province
for province_indo, province_eng in province_mapping.items():
    # Filter the DataFrame for the current province and threshold == 0
    province_df = df[(df['subnational1'] == province_indo) & (df['threshold'] == 0)]
    
    # Keep only relevant columns
    columns_to_keep = ['subnational1'] + years
    province_df = province_df[columns_to_keep]
    
    # Melt the DataFrame to long format
    province_df_long = province_df.melt(id_vars='subnational1', var_name='year', value_name='tc_loss_ha')
    
    # Extract year from the column name and convert to integer
    province_df_long['year'] = province_df_long['year'].str.extract(r'(\d+)').astype(int)
    
    # Sort the DataFrame by year
    province_df_long = province_df_long.sort_values(by='year')
    
    # Calculate cumulative tree loss
    province_df_long['cumulative_tc_loss_ha'] = province_df_long['tc_loss_ha'].cumsum()
    
    # Append to totals_df
    totals_df = pd.concat([totals_df, province_df_long[['year', 'cumulative_tc_loss_ha']]], ignore_index=True)

# Calculate the total cumulative tree loss across all provinces
totals_df = totals_df.groupby('year')['cumulative_tc_loss_ha'].sum().reset_index()

# Load palm oil production data
palm = pd.read_csv('../Datasets/Oils/archive/cleaned/palm.csv')

# Filter data for Indonesia and years >= 2000
df_Indo = palm[(palm['Country'] == 'Indonesia') & (palm['Year'] >= 2000)]

# Ensure both datasets are aligned by year
merged_df = pd.merge(totals_df, df_Indo[['Year', 'Production']], left_on='year', right_on='Year')

# Create 3D line plot
trace_3d = go.Scatter3d(
    x=merged_df['year'],
    y=merged_df['cumulative_tc_loss_ha'],
    z=merged_df['Production'],
    mode='lines+markers',
    line=dict(
        color='blue',
        width=3  # Increase the line width
    )
)

layout_3d = go.Layout(
    title='Palm oil production vs. cumulative tree cover loss in Kalimantan (2001-2022)',
    scene=dict(
        xaxis=dict(title='Year'),
        yaxis=dict(title='Cumulative Tree Cover Loss (ha)'),
        zaxis=dict(title='Palm Oil Production')
    ),
    # width=800,  # Increase the width of the plot
    # height=500   # Increase the height of the plot
)

fig3 = go.Figure(data=[trace_3d], layout=layout_3d)

# Show 3D plot
fig3.show()


In [30]:
import pandas as pd

# Load palm oil production data
palm = pd.read_csv('../Datasets/Oils/archive/cleaned/palm.csv')

# Filter data for Indonesia and years >= 2000
df_Indo = palm[(palm['Country'] == 'Indonesia') & (palm['Year'] >= 2000)]

# Extract production values
xi = df_Indo['Production'].values

# Load the dataset for Kalimantan forest cover
df = pd.read_csv('../Datasets/Deforestation/Subnational 1 tree cover loss.csv')

# Filter for Kalimantan provinces and threshold == 0
provinces = ['Kalimantan Barat', 'Kalimantan Tengah', 'Kalimantan Timur', 'Kalimantan Selatan']
kalimantan_df = df[(df['subnational1'].isin(provinces)) & (df['threshold'] == 0)]

# Calculate initial tree cover in 2000
initial_extent_2000 = kalimantan_df['extent_2000_ha'].sum()

# Calculate tree cover gain spread over 22 years
gain_spread = (kalimantan_df['gain_2000-2020_ha'].sum() / 22)

# Calculate cumulative tree cover losses from 2001 to 2022
years = range(2001, 2023)
cumulative_losses = []
for year in years:
    loss_column = f'tc_loss_ha_{year}'
    cumulative_loss = (kalimantan_df[loss_column].sum() - gain_spread)
    cumulative_losses.append(cumulative_loss)

# Calculate total tree cover per year
annual_tree_cover = []
for i, year in enumerate(years):
    total_tree_cover = initial_extent_2000 - sum(cumulative_losses[:i + 1])
    annual_tree_cover.append(total_tree_cover)

# Calculate correlation between palm oil production and annual tree cover
correlation = pd.Series(xi).corr(pd.Series(annual_tree_cover))

print(f"Pearson correlation between Palm Oil Production in Indonesia and Total Tree Cover in Kalimantan: {correlation:.2f}")


Pearson correlation between Palm Oil Production in Indonesia and Total Tree Cover in Kalimantan: -0.99


From this data it is evident that the production of palm oil is a mayor factor in more than 10 million hectares of tree cover loss in the Kalimantan provinces of Indonesia alone.

### Argument 2: Palm oil production emits more CO2 than other vegetable oils.

From the graph below, it is clear that palm oil emits the most CO2 per kg of product of all different types of oil, up to more than double the amount. This is mainly due to the land use change. This is caused by the peat soil being exposed after deforestation, allowing oxigen in, which leads to oxidation of the soil (_Manning, F. C. et al, 2019_).

In [32]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px

# Load your dataset
fp_raw = pd.read_csv('../Datasets/Food_Production.csv')

# Filter for oils
oils = fp_raw[fp_raw['Food product'].str.contains('Oil')]

# Specify the order of oils
oil_order = ['Sunflower Oil', 'Rapeseed Oil', 'Olive Oil', 'Soybean Oil', 'Palm Oil']

# Set the order in the DataFrame
oils['Food product'] = pd.Categorical(oils['Food product'], categories=oil_order, ordered=True)
oils = oils.sort_values(by='Food product')

# Select relevant columns for plotting
data2 = oils.iloc[:, [0, 1, 2, 3, 4, 5]]  # Assuming columns 1 to 5 are the stages of emissions

# Define colors for each stage based on provided color palette
colors = px.colors.qualitative.T10[:len(data2.columns[1:])]

# Create traces for each stage
traces = []
for i, stage in enumerate(data2.columns[1:]):
    trace = go.Bar(
        y=data2['Food product'],
        x=data2[stage],
        name=stage,
        orientation='h',
        marker=dict(color=colors[i])
    )
    traces.append(trace)

# Layout settings
layout = go.Layout(
    title='Greenhouse gas emissions across stages of oil production lifecycle',
    barmode='stack',
    yaxis=dict(title='Oil Type'),
    xaxis=dict(title='Emissions [Kg CO2 - equivalents per kg product]'),
    margin=dict(l=200),  # Adjust left margin to accommodate food product labels
    height=600,  # Adjust height as needed
    hovermode='closest',
    showlegend=True
)

# Create figure object
fig = go.Figure(data=traces, layout=layout)

# Display the plot
fig.show()


## Perspective 2: The problem is with global demand for vegetable oil, not palm oil

The solution seems straightforward: stop using palm oil. However, the solution might be a little more counter-intuitive. In fact, palm oil has some environmental benefits, especially its high yield per hectare. Looking at the chart below, it's clear that other oils require much more land to produce the same amount, with the next best already using 4 times as much area. So, replacing palm oil with other types could lead to more deforestation and all the problems that come with it.

In [33]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

# Load the dataset
oils_df = pd.read_csv('../Datasets/Oils/oil-yield-by-crop.csv')

# Filter for the entity 'World'
world_df = oils_df[oils_df['Entity'] == 'World']

# Filter for the year 2021
df_2021 = world_df.loc[world_df['Year'] == 2021]

# Drop unnecessary columns
df_2021_clean = df_2021.drop(columns=['Entity', 'Code', 'Year'])

# Calculate the mean for each type of oil yield for the year 2021, ignoring NaNs
mean_columns_2021 = df_2021_clean.mean()

# Sort the mean values in descending order
mean_columns_2021_sorted = mean_columns_2021.sort_values(ascending=False)

# Define colors from Plotly's T10 qualitative color palette
colors = px.colors.qualitative.T10[:len(mean_columns_2021_sorted)]

# Create a bar plot using Plotly
fig = go.Figure()

fig.add_trace(go.Bar(
    x=mean_columns_2021_sorted.values,
    y=mean_columns_2021_sorted.index,
    orientation='h',  # horizontal bar chart
    marker_color=colors,  # assign colors to bars
))

fig.update_layout(
    title='Global mean oil yield per hectare in 2021 per crop',
    xaxis_title='Mean yield per hectare (tonnes)',
    yaxis_title='Type of oil',
    height=600,
    margin=dict(l=0, r=0, t=30, b=0),  # adjust margins for better appearance
)

# Show the plot
fig.show()


In fact, as you can see in the following chart, almost 40 per cent of all vegetable oil in the world comes from the oil palm. Should this be replaced by any other oil, the total land area on which vegetable oil crops are grown would be at least doubled.

In [34]:
import pandas as pd
import plotly.graph_objects as go

# Load the dataset
oils_df = pd.read_csv('../Datasets/vegetable-oil-production.csv')

# Filter for the entity 'World'
world_df = oils_df[oils_df['Entity'] == 'World']

# Filter for the year 2021
df_2021 = world_df.loc[world_df['Year'] == 2021]

# Drop unnecessary columns
df_2021_clean = df_2021.drop(columns=['Entity', 'Code', 'Year'])

# Calculate the mean for each type of oil yield for the year 2021, ignoring NaNs
mean_columns_2021 = df_2021_clean.mean()

# Sort the mean values in descending order
mean_columns_2021_sorted = mean_columns_2021.sort_values(ascending=False)

# Combine the lowest 5 categories into 'Other'
top_categories = mean_columns_2021_sorted[:-5]
other_category = mean_columns_2021_sorted[-5:].sum()
mean_combined = pd.concat([top_categories, pd.Series({'Other': other_category})])

# Prepare the data for the Plotly chart
labels = mean_combined.index.tolist()
values = mean_combined.values.tolist()
colors = px.colors.qualitative.T10

# Create the donut chart using Plotly
fig = go.Figure(data=[go.Pie(
    labels=labels, 
    values=values, 
    hole=.7,  # Set the hole size for the donut chart
    textinfo='label+percent',
    textposition='outside',
    insidetextorientation='radial',
    marker=dict(colors=colors)
    )])

fig.update_layout(
    title_text='Total vegetable oil production in 2021',
    showlegend=False,
    height=600 
)

# Show the plot
fig.show()



What is the real solution to the problem? The issue lies not just with palm oil but with the global demand for vegetable oil in general. To address this, we need to consider reducing the consumption of all oils. This approach would alleviate the pressure on ecosystems currently threatened by extensive oil production. Reducing oil consumption can lead to decreased deforestation, less habitat destruction, and lower greenhouse gas emissions associated with oil cultivation and processing. By tackling the root cause—our overall demand for vegetable oils—we can make significant strides toward more sustainable consumption patterns and a healthier planet.

## Reflections

The feedback from our peers and the TA helped us to improve a number of things. First of all we reduced the number of line and bar charts, to get a more diverse set of visualisations. This makes sure we can visualize the data in better way, so the readers of our story get a better understanding of the data. Secondly, we added another argument to the second perspective. This gives it more weight and credibility. Lastly we improved a number of small things thanks to the feedback, such as the addition of a map, an interactive plot and we fixed the 'hide-input' tags to make the github page neater. 

## Work distribution

Aylwin:

- All arguments and perspectives <br>
- Visualisations:
    - Palm oil production in 2021
    - Cumulative tree cover loss on Borneo per country and province (2001-2022)
    - Global mean oil yield per hectare in 2021 per crop
    - Total vegetable oil production in 2021
    - Some polishing on a number of others

Joren:

- Visualisations
    - Greenhouse gas emissions across stages of oil production lifecycle

Mikail:

- Visualisations
    - Palm Oil Production in Indonesia and Malaysia (2000-2023)
- Fixed hide-input tags

Maywand:

- Visualisations
    - 3D Scatter Plot of Palm Oil Production and Tree Cover Loss
    - Palm Oil Production by Country per Decade

## References

_Manning, F. C., Kho, L. K., Hill, T. C., Cornulier, T., & Teh, Y. A. (2019). Carbon emissions from oil palm plantations on peat soil. Frontiers in Forests and Global Change, 2, 37._