## Draft Data Story
20-06-23
- Maurits van der Does Willebois
- Luc buijs
- Lucas Woudstra
- Finn Govers


## Introduction

Climate change is one of the most pressing debates of todays society, (most) scientists have come to the conclusion that if we don't do something quickly, humanity is in serious danger from the consequences. We decided to look at what the root of this problem is, more specificly, which industrial sector is currently the biggest threat to our society. There are a few obvious contenders, many may think the biggest emission source are fossil fuels, or more general the fuel industry. We will try to see wether this is accurate or not by comparing the emissions of different
industries throughout the years. We have found two different perspectives: 

Our first perspective on the topic is that it is necessary to switch from fossil fuels to
renewable energy sources. Fossil fuels have a massive impact on climate change due to their large
amount of greenhouse gas emission. Immediate action is required to look for greener fuels and energy
with lower amounts of greenhouse gas emission. This would help slow down the climate change drastically.

Our second perspective is that the biggest problem isn’t the energy sector, but something
entirely different. For instance a while ago it came to light how big of an impact the meat
industry has on CO2 emissions across the world, because of how many acres of land
espescially cows need to be fed and kept. Examples like this lead us to believe that the
fossil fuel industrie probably isn’t the biggest threat to the rappid climate change
happening right now.

## Dataset and Preprocessing

We did not preprocess the dataset "Energy data 1990 - 2020.csv" in any way.

However, we did preprocess the dataset "historical_emissions.csv" and saved the result as "historical_emissions (cleanest).csv". The preprocessing of this dataset went as follows:

We removed all columns that only had one unique value. We also merged all rows that contained data about the same year but about different countries by summing up the CO2 emissions, so that we now had data about the global CO2 emissions instead of emissions of individual countries. Furthermore, we reorganised the data in a way that resembles a transposition: The original data had a unique column for every year, and had one column for all sectors with the value of that column for any row being the name of the sector about which that row contained data. But after our transformation, the dataset had one unique column for every sector, and had one column for all years with the value of that column for any row being the year about which that row contained data. Lastly, We merged the sectors "Energy" and "Electricity/Heat" by summing up their CO2 emission, because those sectors both belong to the energy production sector, which is the sector that we are interested in. We did all this preprocessing with the code displayed below:

In [1]:
"""
CODE USED FOR PREPROCESSING:


df2 = pd.read_csv('historical_emissions.csv')

X = range(1990, 2020)
sectors = list(df2['Sector'].unique())

sector_dict = {}

for sector in sectors:
    if sector != 'Electricity/Heat':
        sector_dict[sector] = []

        for i in X:
            if sector != 'Energy':
                total_sector_emission = df2[df2['Sector'] == sector][f"{i}"].sum()
                sector_dict[sector].append(total_sector_emission)
            else:
                total_energy_emission = df2[df2['Sector'] == sector][f"{i}"].sum()
                total_elec_heat_emission = df2[df2['Sector'] == 'Electricity/Heat'][f"{i}"].sum()

                total_sector_emission = total_energy_emission + total_elec_heat_emission
                sector_dict[sector].append(total_sector_emission)

frame = []

for i in range(len(X)):
    temp_lst = [X[i]]
    
    for key in sector_dict:
        temp_lst.append(sector_dict[key][i])
    
    frame.append(temp_lst)
    
sectors.remove('Electricity/Heat')
    
df_vis_2 = pd.DataFrame(
             frame,
             columns = ['Year'] + sectors)
             
df_vis_2.to_csv('historical_emissions (cleanest).csv')
"""

print('', end='')

## Use of Generative AI
We used ChatGPT for the following things:

Generate code that produced figure 4 when given a self-made figure made with matplotlib.pyplot.

Generate code that produced figure 3 when given the relevant data from our datasets.

Solve an issue with Github that prevented us from displaying figures.

## Link to GitHub Online Repository
https://github.com/Mau13-dot/informatievisu/blob/main/Project/Data_Story.ipynb

## Load Data

In [2]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
import kaleido

df = pd.read_csv('historical_emissions (cleanest).csv')
df1 = pd.read_csv('Energy data 1990 - 2020.csv')

# Plot Data
## First Perspective

In [3]:
# Calculate the average MtCO₂e for all sectors per year
df['Average MtCO₂e'] = df.mean(axis=1)

# Create the plot using Plotly
import plotly.express as px
from IPython.display import display

fig = px.line(df, x='Year', y='Average MtCO₂e', title='Average MtCO₂e for All Sectors per Year')

# Add a caption annotation
caption_text = "Fig.1. This graph shows the average MtCO₂e for all countries per year"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"))

fig.write_image("figure1.png")

![Plotly Figure](figure1.png)

### Argument 1
Figure 1 shows an increasing trend of CO2 production per year of all the countries combined. The most shocking aspect of this graph is the fact that the increasing trend stays the same. In other words, the total emission of CO2 keeps rising at the same rate, with no sign of slowing down.This shows the urgency of developing a more efficient, more renewable and greener form of energy source to reduce the total emissions. 


In [4]:
sectors = df.columns[3:]  # Assuming the sector columns start from the fourth column

fig = go.Figure()

for sector in sectors:
    average_values = df[sector].mean(axis=0)
    fig.add_trace(go.Scatter(x=df['Year'], y=df[sector], mode='markers+lines', name=sector))

fig.update_layout(title='Average MtCO₂e per Year for each Sector',
                  xaxis_title='Year',
                  yaxis_title='MtCO₂e')

# Add a caption annotation
caption_text = "Fig.2. This graph shows the average MtCO₂e for each Sector per year"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)


fig.write_image("figure2.png")

![Plotly Figure](figure2.png)

### Argument 2
Figure 2 shows the average emission of each sector per year, with every sector given a different color. The energy sector shows to have the highest average emissions of CO2 per year, followed by the electricity/heat sector. Only using this graph already gives the implication that the energy sector is the biggest cause of CO2 emissions. (Furthermore, if the energy sector is to be compared with the electricity/heat sector, there could be a correlation between the two sectors based on the growth trend of the two sectors.)


In [5]:
sectors = df.columns[3:]  # Assuming the sector columns start from the fourth column
average_values_total = df[sectors].mean().values  # Use .values to convert to a numpy array

fig = go.Figure(data=[go.Pie(labels=sectors, values=average_values_total, hole=0.5)])

fig.update_layout(title='Average MtCO₂e of Each Sector')

# Add a caption annotation
caption_text = "Fig.3. This pie chart shows the percentage of MtCO₂e for each Sector from 1990 till 2019"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)


fig.write_image("figure3.png")


![Plotly Figure](figure3.png)

### Argument 3
Figure 3 displays a donut chart where all the sectors are shown with their contribution of CO2 emission. Again, energy is the biggest polluter, being responsible for almost a half of the total CO2 emissions from all the sectors combined. Based on this chart, if the world wants to emit less CO2, everyone should start with searching for greener options in the energy sector.



## Second Perspective

In [6]:
total_electricity = df1['Electricity production (TWh)']
renewable_electricity = df1['Share of renewables in electricity production (%)']

df1['Productiojn that is not green (%)'] = 100 - renewable_electricity
average_df = df1.groupby('Year').mean().reset_index()

fig = px.area(average_df, x='Year', y=['Share of renewables in electricity production (%)', 'Productiojn that is not green (%)'],
              labels={'value': '%', 'variable': 'Energy Type'},
              title='Percentages of non-renewable vs renewable energy being produced')


fig.update_layout(legend_title_text='Energy Type')


# Add a caption annotation
caption_text = "Fig.4. This figure displays the fact that the percentage of renewable energy being produces is already growing"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)
fig.write_image("figure4.png")


### Argument 1
Figure 4 displays an area chart that gives us a good look at how the energy sector is already improng on using renewable energy sources, in the last 10 years almost 10% of energy produced has gotten renewable. This may not seem like a lot, but when you see that the trend is also rising, meaning that the growth is getting larger and larger each year, one can be positive-minded about the future of humanity. This shows that the energy sector is already working hard at finding greener sources, so to really make all the difference we might need to look elsewhere in sectors where this kind of development is not yet in full motion.

In [7]:
X = np.array(range(1990, 2021))

Y_prod = np.array(
    [df1[df1['Year'] == x]['Total energy production (Mtoe)'].sum()
    for x in X]
)
Y_cons = np.array(
    [df1[df1['Year'] == x]['Total energy consumption (Mtoe)'].sum()
    for x in X]
)

prod = Y_prod / Y_prod * 100
cons = Y_cons / Y_prod * 100
diff = (Y_prod - Y_cons) / Y_cons * 100
    
fig = go.Figure()
fig.add_trace(go.Scatter(
    name='Energy Need',
    x=X, y=cons,
    hoverinfo='x+y',
    mode='lines',
    line=dict(width=0.5, color='green'),
    stackgroup='one' # define stack group
))
fig.add_trace(go.Scatter(
    name='Energy Surplus',
    x=X, y=diff,
    hoverinfo='x+y',
    mode='lines',
    line=dict(width=0.5, color='red'),
    stackgroup='one'
))

fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="Percentage of Produced Energy")
fig.update_layout(yaxis_range=(0, 100), height=500, width=800,
                  title='Share of Needed and Unutilized Energy of Total Energy Production')

caption_text = 'Fig.5. This area plot shows that we need almost all of the energy that we produce.'
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)

fig.write_image('figure5.png')

### Argument 2
There is a reason that the energy sector is so big: we need energy for a lot of things and we can't so easily use less energy, which means that we should be careful about setting limitations for how much energy we can produce. The only CO2 emissions from energy that we should try to decrease are the emissions that result from unneccessary energy production. However, there is only is small amount of energy that we produce but don't use, as seen in figure 5. This means that there is only a small amount of unneccessary CO2 emissions that is caused by the energy sector: much less than one would initially expect.

In [8]:
last_years = 10

x = np.array(df['Year'])[-last_years:]
sectors = list(df.columns)

# Make dict
energy_rest_dict = {
    'Year': x,
    'Energy': np.array(df['Energy'])[-last_years:],
    'Rest': np.zeros(len(x[-last_years:]))
}

# Fill 'Rest' category
for sector in sectors:
    if sector != 'Year' and sector != 'Energy':
        energy_rest_dict['Rest'] += np.array(df[sector])[-last_years:]
        
# Find derivatives
for key in energy_rest_dict:
    if key != 'Year':
        y = np.array(energy_rest_dict[key])
        
        # Normalize
        y /= max(y)

        dy = np.zeros(y.shape)
        dy[0:-1] = np.diff(y)/np.diff(x)
        dy[-1] = (y[-1] - y[-2])/(x[-1] - x[-2])
        
        energy_rest_dict[key] = dy
        
frame = []

for i in range(len(x)):
    temp_lst = []
    
    for key in energy_rest_dict:
        temp_lst.append(energy_rest_dict[key][i])
    
    frame.append(temp_lst)
    
df_vis_4 = pd.DataFrame(
             frame,
             columns = ['Year', 'Energy', 'Average of All Other Sectors'])

df_vis_4 = df_vis_4.melt(id_vars='Year', value_vars=['Year', 'Energy', 'Average of All Other Sectors'])

fig = px.line(df_vis_4, x='Year' , y='value' , color='variable')

fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="Growth of CO2 Emissions")
fig.update_layout(height=500, width=800,
                  title='Growth of CO2 Emissions for Different Sectors in the Last 10 Years')

caption_text = "Fig.6. This line chart shows that the energy sector is \
already slowing it's CO2 emissions faster than other sectors."
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.2,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)

fig.write_image('figure6.png')

### Argument 3
The energy sector may be responsible for most CO2 emissions because of the sheer size of the sector, but the energy sector is already making big steps in reducing it's CO2 emissions. In fact, the energy sector is reducing it's CO2 emissions faster than most other sectors, as seen in figure 6. In the last ten years, it has been rare for the energy sector to have faster growing CO2 emissions than the average of all other sectors. The energy sector even shows a downward trend in it's CO2 emission growth, whereas the average of all other sectors shows an upward trend. It seems that the energy sector is already making great progress in it's CO2 emissions, and that we should in fact focus on the other sectors to stimulate a reduction in their CO2 emission growth.