## Introduction
This study aims to analyze Spains' energy generation during 2015 using two comprehensive and extensive datasets. One dataset contains hourly weather information such as wind speeds and temperature from five of Spains' cities. The other dataset has the hourly energy generation from various energy sources like nuclear-, solar- and wind energy. Initially this dataset was used to make predictions about energy generations and associated costs, which are included in the dataset as well.

To facilitate the usability of the two datasets, we needed to do some preprocessing.
The datasets have been merged based on the date, with the weather features averaged. Irrelevant columns have been dropped to streamline the data. Since weather patterns follow an annual cycle, only the first 8,760 rows, corresponding to the first year, have been utilized in the analysis. For the specific changes, take a look at the corresponding cells below.

In [25]:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import myst_nb

energy_df = pd.read_csv('energy_dataset.csv').drop(['generation hydro pumped storage aggregated', 'generation marine', 'forecast wind offshore eday ahead'], axis=1)

li = range(13)
weather_df = pd.read_csv('weather_features.csv', usecols=li).drop('city_name', axis=1)
weather_df = weather_df.groupby('time').mean()

df = weather_df.merge(energy_df, on='time')[:8760]
times = []
for time in df['time']:
    good, bad = time.split('+')
    times.append(good)
df['time']= times

Now we can gather information from this dataset to gain insights regarding Spains' energy generation and the viability of the different energy sources. By making valuable visualizations of the data we can easily compare and examine the varying levels of efficiency and consistency within the different energy production methods.

Comparing and evaluating these methods is of great importance in a world that faces the challenges of accelerated global warming every day. Governing entities such as Spain need to expect that combating the climate crisis will entail moving away from fossil fuels while moving toward sustainable energy sources. The findings of this study hold the potential to inform policymakers and guide them toward the best investments for the future with respect to generating energy and conserving energy security.

We will provide two data stories centering around wind energy, one of the most promising sustainable energy sources.

A story about the consistency of wind energy, emphasizing the energy source’s predictability, reliability, its established contribution, and the influence of wind velocities on the energy price.

A story about the drawbacks of wind energy, highlighting the lack of consistency and the way other energy sources compensate for this within Spain.

Before we dive into our stories, to get a general sense of Spains' total energy output. Take a look at the donut chart displaying the average contribution of different energy sources.

In [26]:
sum_biomass = df['generation biomass'].sum()
sum_coal = df['generation fossil brown coal/lignite'].sum()
sum_gas = df['generation fossil gas'].sum()
sum_hard_coal = df['generation fossil hard coal'].sum()
sum_oil = df['generation fossil oil'].sum()
sum_hydro = df['generation hydro pumped storage consumption'].sum() + df['generation hydro run-of-river and poundage'].sum() + df['generation hydro water reservoir'].sum()
sum_nuclear = df['generation nuclear'].sum()
sum_other = df['generation other'].sum() + df['generation other renewable'].sum() + df['generation waste'].sum() + df['generation fossil oil'].sum() + df['generation fossil brown coal/lignite'].sum() + df['generation biomass'].sum()
sum_solar = df['generation solar'].sum()
sum_waste = df['generation waste'].sum()
sum_wind = df['generation wind onshore'].sum()


labels = ['Gas', 'Hard coal', 'Hydro', 'Nuclear', 'Other', 'Solar', 'Wind']
values = [sum_gas, sum_hard_coal, sum_hydro, sum_nuclear, sum_other, sum_solar, sum_wind]
layout = go.Layout(
    height=600,
    title='Overview of energy generated in Spain, 2015',
    showlegend=False,
    hovermode=False,

)

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.8, textinfo='label + percent', marker=dict(colors=px.colors.qualitative.T10), textposition='outside')], layout=layout)

fig.show()


This plot gives an overview of the percentual generated energy distribution in spain in 2015.

## A story about the benefits of wind energy

In [27]:
# Pro
color_wind = '#D7008A'
color_solar = 'orange'
color_hydro = 'blue'

fig = go.Figure()
fig.add_trace(go.Violin(y=df['generation wind onshore'], meanline=dict(color=color_wind), name='Wind energy generated', marker_color=color_wind))
fig.add_trace(go.Violin(y=df['generation solar']*df['generation wind onshore'].mean()/df['generation solar'].mean(), meanline=dict(color=color_solar), name='Solar energy generated', marker_color=color_solar))
fig.add_trace(go.Violin(y= df['generation hydro water reservoir'] * df['generation wind onshore'].mean() / df['generation hydro water reservoir'].mean(), name='Hydro energy generated', meanline=dict(color=color_hydro), marker_color=color_hydro))
fig.update_layout(hovermode=False, showlegend=False, title='Distribution of wind and solar energy scaled to the same average',
                  xaxis=go.layout.XAxis(
        title='Different sustainable energy generation methods'
    ),
                  yaxis=go.layout.YAxis(
        title='Energy generated (MW)'
    )
                  )
fig.show()


As evident from the donut-chart there are three sustainable energy sources utilized in Spain.
This violin-plot examines these three energy sources and represents them on the x-axis. The y-axis represents the generated energy in Mega-Watts. Furthermore, the energy sources have been standardized to possess the same mean, allowing for a comparison of how each energy source is distributed around the mean line. The distribution of wind energy appears highly compact in contrast to solar energy, which exhibits significant fluctuations. Similarly, the spread of hydro energy resembles that of wind energy, albeit with a higher peak and a broader base.

These findings highlight that among the three sustainable energy generation methods analyzed, wind energy demonstrates the highest consistency.

In [28]:
# Pro
smoothed_values_wind = df['generation wind onshore'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_solar = df['generation solar'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_hydro = df['generation hydro water reservoir'].rolling(365, min_periods=1, center=True).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_solar, name='Solar', mode='lines', stackgroup='one', line=dict(color=color_solar)))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_hydro, name='Hydro', mode='lines', stackgroup='one', line=dict(color=color_hydro)))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_wind, name='Wind', mode='lines', stackgroup='one', line=dict(color=color_wind)))



fig.update_layout(title='Energy contribution of wind, hydro and solar', hovermode=False, xaxis=go.layout.XAxis(title='Date'), yaxis=go.layout.YAxis(title='Energy generated (MW)'))
fig.show()


In the previous graph wind energy demonstrated the highest level of consistency compared to hydro and solar energy.
This chart displays the cumulative energy contribution of the previously mentioned energy sources. The x-axis represents the dates, while the y-axis displays the generated energy contribution in Mega-watts. The area representing wind energy is notably the largest, indicating that wind energy is the most significant contributor to sustainable energy production.

In [29]:
# Pro
fig = go.Figure()

# Scatter plot for solar energy
fig.add_trace(go.Scatter(
    x=df['forecast solar day ahead'],
    y=df['generation solar'],
    mode='markers',
    name='Solar Energy',
    marker=dict(color=color_solar)
))

fig.add_trace(go.Scatter(
    x=df['forecast wind onshore day ahead'],
    y=df['generation wind onshore'],
    mode='markers',
    visible=False,
    name='Wind Energy',
    marker=dict(color=color_wind)
))
fig.update_layout(xaxis=go.layout.XAxis(title='Forecast energy day ahead (MW)'), yaxis=go.layout.YAxis(title='Energy generated (MW)'),
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(label="Solar",
                     method="update",
                     args=[{"visible": [True, False]},
                           {"title": "Yahoo"}]),
                dict(label="Wind",
                     method="update",
                     args=[{"visible": [False, True]},
                           {"title": ""}]),
                dict(label="Both",
                     method="update",
                     args=[{"visible": [True, True]},
                           {"title": "Both"}]),
            ]),
        )
    ])

fig.show()

The three energy sources mentioned earlier are weather-dependent. As mentioned the dataset was initially used to predict the amount of energy generated from wind and solar sources one day in advance. A scatter plot has been created to depict the correlation between the predicted and actual energy values. The x-axis represents the forecasted energy, while the y-axis represents the actual generated energy, both measured in Mega-watts. By utilizing the dropdown menu, one can view charts specifically for solar energy, wind energy, or a combination of both.

In the scatter plot, the wind energy data points form a linear pattern with only a few outliers, suggesting a strong correlation between the forecasted and actual generated wind energy. On the other hand, the solar energy data points form a more scattered distribution. The correlation between the forecasted and actual generated wind energy is stronger compared to that of solar energy. This means that wind energy is more predictable, which is important to manage energy security.


In [30]:
# Pro
df['date'] = pd.to_datetime(df['time'])


fig = px.scatter(df, x=df['wind_speed'], y=df['generation wind onshore'], color=df['price actual'], opacity=1, animation_frame=df['date'].dt.month_name())
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 3000
fig.update_layout(title='Monthly comparison of the effect of wind speed on generated wind energy and energy price', xaxis_title="Wind speed (M/S)", yaxis_title='Wind energy generated (MW)')
print("fuckign kutjest jkut")

fig.show()

Not only is wind energy predictable, it is also more reliable than it might appear.

This interactive scatter plot illustrates the relationship between wind speed, generated wind energy, and energy prices. The x-axis represents wind speed in meters per second, while the y-axis displays the generated wind energy in Mega-watts. Additionally, the color of the dots indicates the corresponding energy prices at each moment. By utilizing the slider at the bottom, one can navigate through the months of 2015.
In most months, there is a slight positive correlation between wind speed and generated wind energy, as well as a positive correlation between wind speed and energy prices. However, an interesting reversal occurs during the months of June and July, where both correlations are inverted.

These findings suggest that wind energy is generated even with low wind speeds, this refutes the common conception that there is no wind energy generation with low wind speeds. This showcases the reliability of wind generated energy.
In addition, it is observed that higher wind velocities are associated with a significant reduction in overall energy prices.


### conclusion
Based on the analyzed graphs, wind energy demonstrates consistency, substantial contribution to total energy generation, predictability, independence from wind speed, and cost reduction potential.
This information makes it crystal clear that wind energy should be the first method taken into consideration while we disengage from excessive fossil fuel use.

## A story about the drawbacks of wind energy



In [31]:
# Con
color_nuclear = '#008A00'
fig = go.Figure()
fig.add_trace(go.Box(y=df['generation wind onshore'], name='Wind energy generated', marker_color=color_wind))
fig.add_trace(go.Box(y=df['generation nuclear']*df['generation wind onshore'].mean()/df['generation nuclear'].mean(), name='Nuclear energy generated', marker_color=color_nuclear))
fig.update_layout(hovermode=False, showlegend=False, title='Distribution of wind and nuclear energy scaled to the same average', xaxis=go.layout.XAxis(
        title='Different energy generation methods'
    ),
                  yaxis=go.layout.YAxis(
        title='Energy generated (MW)'
    ))


The box plot presented provides insights into the consistency of two energy generation methods. On the x-axis, the two methods are represented, while the y-axis displays their respective generated energy in Mega-watts.

The plot for wind energy shows a wide spread, accompanied by outliers at the upper end. Conversely, the box representing nuclear energy is compact, with a small interquartile range. This stark contrast indicates a notably higher level of consistency for nuclear energy when compared to wind energy. Thus, wind energy is characterized as being inconsistent.

In [32]:
# Con
color_coal = '#003366'
values_wind = df['generation wind onshore']
total = df['total load actual']
values_coal = df['generation fossil hard coal']
values_nuclear = df['generation nuclear']

wind_percentages = []
coal_percentages = []
nuclear_percentages = []
for i in range(len(values_coal)):
    if 0<values_wind[i]/total[i]<1:
        wind_percentages.append(values_wind[i]/total[i])
    else:
        wind_percentages.append(0)
    if 0<values_coal[i]/total[i]<1:
        coal_percentages.append(values_coal[i]/total[i])
    else:
        coal_percentages.append(0)
    if 0<values_nuclear[i]/total[i]<1:
        nuclear_percentages.append(values_nuclear[i]/total[i])
    else:
        nuclear_percentages.append(0)
df_2= pd.DataFrame(data =[wind_percentages, coal_percentages, nuclear_percentages]).T
df_2.columns=['wind', 'coal', 'nuclear']

smoothed_values_wind = df_2['wind'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_coal = df_2['coal'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_nuclear = df_2['nuclear'].rolling(365, min_periods=1, center=True).mean()

fig = go.Figure()
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_wind, name='Percentage wind energy', line=dict(color=color_wind)))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_coal, name='Percentage coal energy', line=dict(color=color_coal)))

fig.update_layout(title='Percentage contribution of total energy used', hovermode=False, xaxis=go.layout.XAxis(
        title='Dates'
    ),
                  yaxis=go.layout.YAxis(
        title='P
import myst_nbercentage of total energy contribution',
                      tickformat=',.0%'
    ))
fig.show()
# fig = px.line(x=df['time'], y=smoothed_values, title='Life expectancy in Canada')
# fig.show()

SyntaxError: unterminated string literal (detected at line 39) (4275432033.py, line 39)

As showcased by the previous box-plot wind energy is not as consistent as other energy sources. The above chart will visualize the consequences of this inconsistency.
This line plot illustrates the percentage contribution of wind and coal energy to the total energy generated throughout 2015. The x-axis represents the dates, while the y-axis displays the respective percentages of their contribution to the total energy generation.


Observing the graph, the lines representing wind and coal energy exhibit almost horizontal mirroring. Indicating wind energy is not a reliable energy source. This lack of consistency is compensated by burning coal. Increasing Spains' usage of unsustainable and polluting energy sources.

### Conclusion
Based on the analyzed graphs, wind energy demonstrates inconsistency in comparison to other energy sources, this inconsistency results in Spain being reliant on other energy sources like coal. The mentioned drawbacks of wind energy are detrimental for reaching climate goals, this is why countries such as Spain should consider investing in different energy generating methods.


### Reflection

The initial version of our draft, which was submitted to the teaching assistant, was deemed nearly complete and ready for final delivery. However, during our presentation at the work college, we received valuable feedback regarding several aspects:

Color choice:
Martijn Wiegmans suggested that we could enhance the representation of energy sources by employing more appropriate colors. For instance, he recommended using blue to signify hydro energy.

Coherence in storytelling:
Our narrative lacked coherence, as pointed out by our teaching assistant. To address this, it was suggested that we incorporate more connecting words or phrases between the graphs, enabling a smoother flow of our story.

Placement of charts:
We received feedback regarding the placement of the donut chart, which seemed abrupt and disconnected from the rest of the presentation. To rectify this, it was advised that we introduce and refer to the donut chart in the introduction, establishing its relevance within the context.

Conclusion for each story:
Another crucial feedback highlighted the absence of a concluding section for each story. As a result, the stories abruptly concluded without a proper summary. To improve this, we were advised to provide a conclusion based on the arguments presented in each graph, effectively wrapping up the narrative.

We want to thank Martijn Wiegmans for the feedback and his guidance during this project.

### Work distribution

During this project we chose to work on Science Park a location at the University of Amsterdam. Every session we planned with our team we made sure every group member could be present. This means that each individual has spent an equal amount of time on this project. During these sessions we helped each other on all the aspects including writing, coding and styling. This contributed to a pleasant working experience for all of us.

### Appendix:
| Generative AI Usage  | Reason of usage  |In which parts|Which prompts were used|
|----------------------|------------------|---------------|----------------------|
| ChatGPT with GPT 3.5 | Finding Synonims | Introduction, and captions of graphs|"Give academic synonyms for distribution"|
|ChatGPT with GPT 3.5 | Finding a color scheme| In the graphs| "Give the complementary color of #D7008A"|