In [70]:
%%HTML
<script src="require.js"></script>

# <u><b> Study of climate change in Europe between 1980 and 2019</b></u>

## <u>Table of Contents</u>

#### &rarr; [1.Introduction](#introduction)
#### &rarr; [2.Dataframe Pre-Processing](#dpp)
#### &rarr; [3.Exploratory Data Analysis](#eda)


## <u>1.Introduction<a id="introduction"></a></u>

### <u>1.1.Objectives and Problem Statement</u>

#### <u>Objectives</u>

Climate change refers to the long-term alteration of Earth's climate patterns, characterized by changes in temperature, precipitation, and other climatic variables. While climate fluctuations have occurred naturally over geological timescales, the current climate change is primarily driven by human activities, particularly the emission of greenhouse gases (GHGs) such as carbon dioxide and methane. These emissions result from various human activities, including burning fossil fuels for energy, deforestation, industrial processes, and agriculture.

The consequences of climate change are far-reaching and impact various aspects of the environment and society. However, some people remain skeptical about the actual presence of climate change in recent years. Today, we present a simple visualization of the climate changes that have occurred in Europe between 1980 and 2019.

#### <u>Problem statement</u>

We will simply try to answer the question: 

<span style="font-size: 16px;"> **Is there really climate change in Europe?**</span>

### <u>1.2.Where do the data come from?</u>

This data package contains radiation and temperature data, at hourly resolution, for Europe, aggregated by Renewables.ninja from the NASA MERRA-2 reanalysis. It covers the European countries using a population-weighted mean across all MERRA-2 grid cells within the given country. You can find this data file on : https://data.open-power-system-data.org/weather_data/

### <u>1.3.Which countries are included in this data file?</u>

This CSV file includes the following countries:

<table>
  <tr>
    <td><code>AT</code>: Austria</td>
    <td><code>BE</code>: Belgium</td>
  </tr>
  <tr>
    <td><code>BG</code>: Bulgaria</td>
    <td><code>CH</code>: Switzerland</td>
  </tr>
  <tr>
    <td><code>CZ</code>: Czech Republic</td>
    <td><code>DE</code>: Germany</td>
  </tr>
  <tr>
    <td><code>DK</code>: Denmark</td>
    <td><code>EE</code>: Estonia</td>
  </tr>
  <tr>
    <td><code>ES</code>: Spain</td>
    <td><code>FI</code>: Finland</td>
  </tr>
  <tr>
    <td><code>FR</code>: France</td>
    <td><code>GB</code>: United Kingdom</td>
  </tr>
  <tr>
    <td><code>GR</code>: Greece</td>
    <td><code>HR</code>: Croatia</td>
  </tr>
  <tr>
    <td><code>HU</code>: Hungary</td>
    <td><code>IE</code>: Ireland</td>
  </tr>
  <tr>
    <td><code>IT</code>: Italy</td>
    <td><code>LT</code>: Lithuania</td>
  </tr>
  <tr>
    <td><code>LU</code>: Luxembourg</td>
    <td><code>LV</code>: Latvia</td>
  </tr>
  <tr>
    <td><code>NL</code>: Netherlands</td>
    <td><code>NO</code>: Norway</td>
  </tr>
  <tr>
    <td><code>PL</code>: Poland</td>
    <td><code>PT</code>: Portugal</td>
  </tr>
  <tr>
    <td><code>RO</code>: Romania</td>
    <td><code>SE</code>: Sweden</td>
  </tr>
  <tr>
    <td><code>SI</code>: Slovenia</td>
    <td><code>SK</code>: Slovakia</td>
  </tr>
</table>

## <u>2.Dataframe pre-processing.<a id="dpp"></a></u>

### <u>2.1.Dataframe</u>

The DataFrame represents a series of data containing meteorological information for different time periods recorded in Coordinated Universal Time (UTC) timestamps. The columns of the DataFrame are as follows:

1. `utc_timestamp`: This column contains UTC timestamps indicating the moment when each meteorological measurement was recorded.

2. `XX_temperature`: This column contains temperature data recorded at each timestamp. The temperature is expressed in Celsius, and it represents the thermal conditions at that specific moment.

3. `XX_radiation_direct_horizontal`: This column contains data for direct solar radiation recorded at each timestamp. It measures the amount of direct solar radiation reaching the horizontal surface of the Earth, typically expressed in watts per square meter (W/m²).

4. `XX_radiation_diffuse_horizontal`: This column contains data for diffuse solar radiation recorded at each timestamp. Diffuse solar radiation comes from solar radiation scattered in the atmosphere and reaching the horizontal surface of the Earth. It is also typically expressed in watts per square meter (W/m²).

5. `XX` : The initial of each country present in the DataFrame.

This DataFrame has already been cleaned and therefore contains no null, duplicates or NAN values.


<details>
<summary><b>Click to toggle code</b></summary>

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

df = pd.read_csv("data/weather_data.csv") 
df.head(5)
```
    


In [36]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import warnings
from IPython.display import display, HTML

# Set the warning filter to "ignore" to suppress warning messages
warnings.filterwarnings("ignore")

df = pd.read_csv("data/weather_data.csv")
df.head(3)

Unnamed: 0,utc_timestamp,AT_temperature,AT_radiation_direct_horizontal,AT_radiation_diffuse_horizontal,BE_temperature,BE_radiation_direct_horizontal,BE_radiation_diffuse_horizontal,BG_temperature,BG_radiation_direct_horizontal,BG_radiation_diffuse_horizontal,...,RO_radiation_diffuse_horizontal,SE_temperature,SE_radiation_direct_horizontal,SE_radiation_diffuse_horizontal,SI_temperature,SI_radiation_direct_horizontal,SI_radiation_diffuse_horizontal,SK_temperature,SK_radiation_direct_horizontal,SK_radiation_diffuse_horizontal
0,1980-01-01T00:00:00Z,-3.64,0.0,0.0,-0.72,0.0,0.0,4.664,0.0,0.0,...,0.0,-3.945,0.0,0.0,-3.055,0.0,0.0,-4.648,0.0,0.0
1,1980-01-01T01:00:00Z,-3.803,0.0,0.0,-1.165,0.0,0.0,4.052,0.0,0.0,...,0.0,-4.053,0.0,0.0,-3.272,0.0,0.0,-4.554,0.0,0.0
2,1980-01-01T02:00:00Z,-3.969,0.0,0.0,-1.434,0.0,0.0,3.581,0.0,0.0,...,0.0,-4.129,0.0,0.0,-3.639,0.0,0.0,-4.455,0.0,0.0



In this instance, our focus will be solely on the temperature data for each country.

### <u>2.2.Reorganization of the DataFrame.</u>

We are reorganizing our DataFrame to keep only the temperatures by country and the date.
<details>
<summary><b>Click to toggle code</b></summary>

```python
# Creating a list with the column names
df_cols = df.columns.tolist()

# We want to keep only the temperatures of the countries, so we will filter this list
temp = "temperature"
temp_cols = [col for col in df_cols if temp in col]

# Creating the new DataFrame with the date and temperatures of the countries
data = pd.concat([df['utc_timestamp'], df[temp_cols]], axis=1)

# Renaming country columns
data.columns = data.columns.str.replace('_temperature', '')

data.head(3)
```

In [2]:
# Creating a list with the column names
df_cols = df.columns.tolist()

# We want to keep only the temperatures of the countries, so we will filter this list
temp = "temperature"
temp_cols = [col for col in df_cols if temp in col]

# Creating the new DataFrame with the date and temperatures of the countries
data = pd.concat([df['utc_timestamp'], df[temp_cols]], axis=1)

# Renaming country columns
data.columns = data.columns.str.replace('_temperature', '')

data.head(3)

Unnamed: 0,utc_timestamp,AT,BE,BG,CH,CZ,DE,DK,EE,ES,...,LU,LV,NL,NO,PL,PT,RO,SE,SI,SK
0,1980-01-01T00:00:00Z,-3.64,-0.72,4.664,-6.287,-3.422,-1.261,-1.87,-7.06,8.066,...,-5.15,-7.166,2.382,-7.038,-3.721,12.862,-0.031,-3.945,-3.055,-4.648
1,1980-01-01T01:00:00Z,-3.803,-1.165,4.052,-6.602,-3.36,-1.414,-1.914,-7.341,7.96,...,-5.333,-7.371,2.236,-6.941,-3.806,12.757,0.311,-4.053,-3.272,-4.554
2,1980-01-01T02:00:00Z,-3.969,-1.434,3.581,-6.981,-3.429,-1.571,-1.976,-7.591,8.008,...,-5.167,-7.342,2.086,-6.856,-3.868,12.674,0.568,-4.129,-3.639,-4.455


#### <u>2.2.1.Reorganization of the DataFrame's time dimension.</u>

<u>1. Evolution of the annual temperature.</u>

First, we want to study the overall evolution of the temperature in European countries over the years. To do this, we will create a new DataFrame containing the average temperatures of each country per year.

<details>
<summary><b>Click to toggle code</b></summary>

```python

# Copy the DataFrame 'data' to a new DataFrame 'data_year' to work with a copy and avoid modifying the original data
data_year = data.copy()

# Convert the 'utc_timestamp' column to datetime objects using pd.to_datetime()
# Then, format the datetime objects into strings with the format '%Y', which represents the four-digit year (e.g., '2023')
# This extracts only the year from the 'utc_timestamp' column
data_year.utc_timestamp = pd.to_datetime(data_year['utc_timestamp']).dt.strftime('%Y')

# Group the 'data_year' DataFrame by the 'year' column and calculate the mean of each group
data_year = data_year.groupby('utc_timestamp').mean()

# Reset the index of the DataFrame after the grouping operation
data_year.reset_index(inplace=True)

# Rename the 'utc_timestamp' column to 'year' in the 'data_year' DataFrame
# This provides a more appropriate name for the column as it now contains only the year of the record
data_year.rename(columns={'utc_timestamp': 'year'}, inplace=True)

data_year.head(3)
```

In [3]:
# Copy the DataFrame 'data' to a new DataFrame 'data_year' to work with a copy and avoid modifying the original data
data_year = data.copy()

# Convert the 'utc_timestamp' column to datetime objects using pd.to_datetime()
# Then, format the datetime objects into strings with the format '%Y', which represents the four-digit year (e.g., '2023')
# This extracts only the year from the 'utc_timestamp' column
data_year.utc_timestamp = pd.to_datetime(data_year['utc_timestamp']).dt.strftime('%Y')

# Group the 'data_year' DataFrame by the 'year' column and calculate the mean of each group
data_year = data_year.groupby('utc_timestamp').mean()

# Reset the index of the DataFrame after the grouping operation
data_year.reset_index(inplace=True)

# Rename the 'utc_timestamp' column to 'year' in the 'data_year' DataFrame
# This provides a more appropriate name for the column as it now contains only the year of the record
data_year.rename(columns={'utc_timestamp': 'year'}, inplace=True)

data_year.head(3)

Unnamed: 0,year,AT,BE,BG,CH,CZ,DE,DK,EE,ES,...,LU,LV,NL,NO,PL,PT,RO,SE,SI,SK
0,1980,6.049147,8.791302,9.432581,6.060315,5.839449,7.232458,7.28784,4.338776,14.352198,...,7.386867,4.340944,9.069256,3.287483,5.939743,15.095769,8.171396,4.74233,7.502274,5.928217
1,1981,7.191993,9.008968,10.226017,6.552329,6.988352,7.816293,7.301089,4.853085,15.010856,...,7.960409,5.428035,9.152883,2.879921,7.094175,15.725073,9.209093,4.774986,8.664291,7.279068
2,1982,7.566535,9.54383,10.110494,7.526256,7.626532,8.655136,8.001437,4.991037,14.793416,...,8.801837,5.497602,9.680261,3.936079,7.839661,15.149457,9.628713,5.578134,8.954765,7.707852


<u>2. Evolution of the annual temperature for each season.</u>

In a second step, we want to observe the seasonal evolution of the temperature for each country.

<details>
<summary><b>Click to toggle code</b></summary>

```python

# Copy the DataFrame 'data' to a new DataFrame 'data_year' to work with a copy and avoid modifying the original data
df_s = data.copy()

# Convert the 'utc_timestamp' column to datetime objects using pd.to_datetime()
df_s['utc_timestamp'] = pd.to_datetime(df_s['utc_timestamp'])

# Extract the year from the 'utc_timestamp' column and create a new 'year' column
df_s['year'] = df_s['utc_timestamp'].dt.year

# Extract the quarter (season) from the 'utc_timestamp' column and create a new 'season' column
df_s['season'] = df_s['utc_timestamp'].dt.quarter

# Define a dictionary with season numbers as keys and corresponding season names as values
season_names = {1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Autumn'}

# Map the season numbers to their corresponding season names using the dictionary
df_s['season'] = df_s['season'].map(season_names)

# Group the data by 'year' and 'season', then calculate the mean of each group (average temperatures)
data_season = df_s.groupby(['year', 'season']).mean()

data_season.head(4)
```

In [9]:
# Copy the DataFrame 'data' to a new DataFrame 'data_year' to work with a copy and avoid modifying the original data
df_s = data.copy()

# Convert the 'utc_timestamp' column to datetime objects using pd.to_datetime()
df_s['utc_timestamp'] = pd.to_datetime(df_s['utc_timestamp'])

# Extract the year from the 'utc_timestamp' column and create a new 'year' column
df_s['year'] = df_s['utc_timestamp'].dt.year

# Extract the quarter (season) from the 'utc_timestamp' column and create a new 'season' column
df_s['season'] = df_s['utc_timestamp'].dt.quarter

# Define a dictionary with season numbers as keys and corresponding season names as values
season_names = {1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Autumn'}

# Map the season numbers to their corresponding season names using the dictionary
df_s['season'] = df_s['season'].map(season_names)

# Group the data by 'year' and 'season', then calculate the mean of each group (average temperatures)
data_season = df_s.groupby(['year', 'season']).mean()

data_season.head(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,AT,BE,BG,CH,CZ,DE,DK,EE,ES,FI,...,LU,LV,NL,NO,PL,PT,RO,SE,SI,SK
year,season,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1980,Autumn,1.488466,5.064023,6.504269,1.579658,1.41605,3.120929,4.851288,1.155783,10.688797,-2.572383,...,2.919924,1.357816,6.045992,-1.619904,2.095689,12.635912,4.236633,0.55829,2.556412,1.045207
1980,Spring,9.025762,11.204824,12.947861,8.279474,9.256128,10.193806,9.966512,8.701015,15.299509,8.430994,...,10.208624,9.162796,11.226821,8.334772,9.877262,15.762358,12.657973,9.287961,10.569586,10.092748
1980,Summer,15.345997,15.961112,19.093919,14.696428,14.791049,15.512826,15.289546,14.804468,22.684515,13.851111,...,15.088076,14.670868,16.169462,12.674218,15.156883,21.061553,18.013053,14.501887,16.922412,15.352824
1980,Winter,-1.715685,2.897418,-0.889715,-0.359963,-2.155192,0.05646,-1.017142,-7.386194,8.684665,-9.270083,...,1.295304,-7.908439,2.789949,-6.288379,-3.429906,10.884728,-2.286985,-5.440089,-0.088481,-2.82782


## <u>3.Exploratory Data Analysis.<a id="EDA"></a></u>

### <u>3.1.Exploratory Data Analysis (EDA) of annual climate change.</u>

First, we will focus on the annual temperature evolution in Europe. We will provide some key insights and a dynamic data visualization.

#### <u>3.1.1.Climate Change</u>

First, we would like to observe the temperature increase between 1980 and 2019 for each of the countries present in the DataFrame. So, we will create a new table containing the temperature difference between these two dates.

<details>
<summary><b>Click to toggle code</b></summary>

```python

# Calculate the average temperature for each country in 1980 and 2019
temp_mean_1980 = data_year.iloc[0][1:]
temp_mean_2019 = data_year.iloc[-1][1:]

# Calculate the difference between 2019 and 1980 for each country
temperature_diff = temp_mean_2019 - temp_mean_1980

# Create a new DataFrame with Temperature difference
country_temp_diff = pd.DataFrame({'Temperature difference': temperature_diff}).transpose()

country_temp_diff
```

In [33]:
# Calculate the average temperature for each country in 1980 and 2019
temp_mean_1980 = data_year.iloc[0][1:]
temp_mean_2019 = data_year.iloc[-1][1:]

# Calculate the difference between 2019 and 1980 for each country
temperature_diff = temp_mean_2019 - temp_mean_1980

# Create a new DataFrame with Temperature difference
country_temp_diff = pd.DataFrame({'Temperature difference': temperature_diff})

country_temp_diff

Unnamed: 0,Temperature difference
AT,3.149491
BE,2.185264
BG,3.284148
CH,2.541605
CZ,3.823105
DE,3.028974
DK,2.371045
EE,2.667369
ES,1.29545
FI,2.050201


We can observe that all countries have experienced an increase in their average annual temperature between the years 1980 and 2019.

We would also like to know the country with the highest temperature increase in Europe and the one with the lowest.

<details>
<summary><b>Click to toggle code</b></summary>

```python

# Ensure the "Temperature difference" column is float
country_temp_diff['Temperature difference'] = pd.to_numeric(country_temp_diff['Temperature difference'])

# Find the country with the largest temperature difference
country_max_diff = country_temp_diff.loc[country_temp_diff['Temperature difference'].idxmax()]

# Find the country with the smallest temperature difference
country_min_diff = country_temp_diff.loc[country_temp_diff['Temperature difference'].idxmin()]

print("Country with the largest temperature difference:")
print(country_max_diff)

print("\nCountry with the smallest temperature difference:")
print(country_min_diff)
```

<u>Country with the largest temperature difference:</u>  
*Temperature difference:*   **4.0322**  
*Name:* **PL**  

<u>Country with the smallest temperature difference:</u>  
*Temperature difference:*    **0.345844**  
*Name:* **PT**  

The country with the highest temperature increase between 1980 and 2019 is **Poland** with an increase of **4.0322°C**.


The country with the lowest temperature increase between 1980 and 2019 is **Portugal** with an increase of **0.345844°C**.

 
 
We would like to know the average annual temperature increase for Europe between the years 1980 and 2019. For that, we will take the "country_temp_diff" DataFrame and calculate the average across all countries :

<details>
<summary><b>Click to toggle code</b></summary>

```python
#Calculate the average temperature increase for the European countries
average_EU_temp_increase = country_temp_diff.mean()
print(average_EU_temp_increase)
```

In [54]:
# Calculate the average temperature increase for the European countries
average_EU_temp_increase = country_temp_diff.mean()
print(average_EU_temp_increase)

Temperature difference    2.606518
dtype: float64


**Europe (assuming that all countries are present in our dataset) experienced an overall annual increase of 2.606518°C between the years 1980 and 2019.**

#### <u>3.1.1.Data visualization of climate changes in European countries. </u>

To represent climate changes, we have created an interactive graph that allows you to explore the data dynamically. 

On this graph, you can deselect countries by clicking on their names in the legend.
We also have a line indicating the European average temperature (black line).


<details>
<summary><b>Click to toggle code</b></summary>

```python
import plotly.graph_objects as go

# Calculate the average temperature of Europe for each year
data_year['Average Temperature'] = data_year.mean(axis=1)

#Create the interactive fig
fig = px.line(
    data_year,
    x='year',
    y=[
        'AT', 'BE', 'BG', 'CH',
        'CZ', 'DE', 'DK', 'EE',
        'ES', 'FI', 'FR', 'GB',
        'GR', 'HR', 'HU', 'IE',
        'IT', 'LT', 'LU', 'LV',
        'NL', 'NO', 'PL', 'PT',
        'RO', 'SE', 'SI', 'SK'
    ],
    labels={
        'value': 'Average Temperature (°C)',
        'variable': 'Country'
    },
    title='Average Temperature in European Country from 1980 to 2019',
    template='plotly_white')

# Create a separate trace for the average temperature of Europe
trace_europe = go.Scatter(
    x=data_year['year'],
    y=data_year['Average Temperature'],
    mode='lines',
    name='Average Temperature of Europe',
    line=dict(width=4, color='black')
)

# Add the Europe trace to the figure
fig.add_trace(trace_europe)

fig.update_xaxes(dtick=5)  # Set the x-axis tick interval to 10
fig.show()
```

In [73]:
import plotly.graph_objects as go

# Calculate the average temperature of Europe for each year
data_year['Average Temperature'] = data_year.mean(axis=1)

#Create the interactive fig
fig = px.line(
    data_year,
    x='year',
    y=[
        'AT', 'BE', 'BG', 'CH',
        'CZ', 'DE', 'DK', 'EE',
        'ES', 'FI', 'FR', 'GB',
        'GR', 'HR', 'HU', 'IE',
        'IT', 'LT', 'LU', 'LV',
        'NL', 'NO', 'PL', 'PT',
        'RO', 'SE', 'SI', 'SK'
    ],
    labels={
        'value': 'Average Temperature (°C)',
        'variable': 'Country'
    },
    title='Average Temperature in European Country from 1980 to 2019',
    template='plotly_white')

# Create a separate trace for the average temperature of Europe
trace_europe = go.Scatter(
    x=data_year['year'],
    y=data_year['Average Temperature'],
    mode='lines',
    name='Average Temperature of Europe',
    line=dict(width=4, color='black')
)

# Add the Europe trace to the figure
fig.add_trace(trace_europe)

fig.update_xaxes(dtick=5)  # Set the x-axis tick interval to 10
display(fig)




We also provide the overall evolution of annual temperature in Europe:

<details>
<summary><b>Click to toggle code</b></summary>

```python
# Create the line plot 
fig = px.line(
    data_year,
    x='year',
    y='Average Temperature',
    labels={
        'Average Temperature': 'Average Temperature (°C)',
        'year': 'Year'
    },
    title='Average Temperature in Europe from 1980 to 2019',
    template='plotly_white'
)

fig.update_xaxes(dtick=5)  # Set the x-axis tick interval to 1
fig.show()
```

In [59]:
# Create the line plot 
fig = px.line(
    data_year,
    x='year',
    y='Average Temperature',
    labels={
        'Average Temperature': 'Average Temperature (°C)',
        'year': 'Year'
    },
    title='Average Temperature in Europe from 1980 to 2019',
    template='plotly_white'
)

fig.update_xaxes(dtick=5)  # Set the x-axis tick interval to 1
fig.show()