# Session 18: Pandas practice

Power market data from Spain.

## Data description

- **datetime_utc**: The date and time in UTC format.
- **spot_price**: The spot price of electricity at the given datetime in euros per megawatt-hour (€/MWh).
- **gen_ccgt**: Generation from Combined Cycle Gas Turbine (CCGT) plants in megawatts (MWh).
- **gen_coal**: Generation from coal-fired power plants in megawatts (MWh).
- **gen_hydro**: Generation from hydroelectric power plants in megawatts (MWh).
- **gen_nuclear**: Generation from nuclear power plants in megawatts (MWh).
- **gen_solar_pv**: Generation from solar photovoltaic (PV) power plants in megawatts (MWh).
- **gen_solar_th**: Generation from solar thermal power plants in megawatts (MWh).
- **gen_total**: Total electricity generation from all sources in megawatts (MWh).
- **gen_wind**: Generation from wind power plants in megawatts (MWh).
- **demand_total**: Total electricity demand in megawatts (MWh).
- **year**: The year of the datetime.
- **month**: The month of the datetime.
- **day**: The day of the datetime.
- **hour**: The hour of the datetime.
- **weekday**: The day of the week (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
- **is_weekend**: A binary indicator of whether the date is a weekend (1 = Yes, 0 = No).
- **is_holiday**: A binary indicator of whether the date is a holiday (True = Yes, False = No).

In [7]:
import pandas as pd

energy = pd.read_csv('C:/Users/SABIO/Documents/GitHub/IE-University/CSV_FILES/df_final.csv')

energy.head()

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,19426.9,2023,11,1,2,2,0,True
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,19100.8,2023,11,1,3,2,0,True


## Exercise 1

What's the initial and final datetime in the dataset?

In [10]:
print(energy['datetime_utc'].min())
print(energy['datetime_utc'].max())

energy['datetime_utc'].apply(['min', 'max'])

2022-12-31 23:00:00+00:00
2024-11-29 23:00:00+00:00


min    2022-12-31 23:00:00+00:00
max    2024-11-29 23:00:00+00:00
Name: datetime_utc, dtype: object

## Exercise 2

What's the average spot price in the dataset? And the median?

In [9]:
print(energy['spot_price'].mean())
print(energy['spot_price'].median())

energy['spot_price'].apply(['mean', 'median'])

73.29295784169125
80.1


mean      73.292958
median    80.100000
Name: spot_price, dtype: float64

## Exercise 2.5

What's the cause of NaN values in the dataset?

In [9]:
energy.isna().mean()

datetime_utc    0.000000
spot_price      0.000000
gen_ccgt        0.477692
gen_coal        0.845379
gen_hydro       0.000000
gen_nuclear     0.000000
gen_solar_pv    0.000000
gen_solar_th    0.015425
gen_total       0.000000
gen_wind        0.000000
demand_total    0.000000
year            0.000000
month           0.000000
day             0.000000
hour            0.000000
weekday         0.000000
is_weekend      0.000000
is_holiday      0.000000
dtype: float64

## Exercise 3

What's the yearly evolution of the average spot price? And the monthly evolution?

In [10]:
energy.groupby('year')['spot_price'].mean()

year
2022     0.000000
2023    87.059305
2024    58.303823
Name: spot_price, dtype: float64

In [None]:
energy.groupby(['year', 'month'], as_index = False)[['spot_price']].mean().head(5) # The extra square brackets turns the whole thing into a dataframe

Unnamed: 0,year,month,spot_price
0,2022,12,0.0
1,2023,1,67.899667
2,2023,2,133.520586
3,2023,3,91.91939
4,2023,4,74.160818


## Exercise 4

Calculate the gap between the `demand_total` and the `gen_total` for each row. What's the average gap?

In [15]:
energy['gap'] = energy['demand_total'] - energy['gen_total']

energy['gap'].mean()

-2089.132866273353

## Exercise 5

What's the correlation between the spot price and the total generation? And the demand? And the gap?

In [21]:
energy[['spot_price', 'gen_total', 'demand_total', 'gap']].corr()

Unnamed: 0,spot_price,gen_total,demand_total,gap
spot_price,1.0,-0.198832,0.190751,0.596665
gen_total,-0.198832,1.0,0.82229,-0.666328
demand_total,0.190751,0.82229,1.0,-0.123584
gap,0.596665,-0.666328,-0.123584,1.0


## Exercise 6

On average, in Spain, is the spot price higher during the weekends?

In [25]:
energy.groupby('is_weekend')[['spot_price']].mean()

Unnamed: 0_level_0,spot_price
is_weekend,Unnamed: 1_level_1
0,79.102809
1,58.728261


## Exercise 7

Knowing that the average nuclear power plant in Spain is 1000 MW, how many nuclear power plants do we have in Spain?

In [27]:
round(energy['gen_nuclear'].max() / 1000)

7

## Exercise 8

When is the demand peaking? In summer or in winter?

In [14]:
energy.head(3)

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True


In [19]:
# Months 6-8 = summer
# Months 12-2 = winter
import numpy as np

# Using np.where for multiple conditions
energy['season'] = np.where(energy['month'].isin([6, 7, 8]), 'summer', np.where(energy['month'].isin([12, 1, 2]), 'winter', 'other'))

energy.head(3)



Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday,season
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False,other
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True,other
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True,other


In [25]:
energy.groupby('month')['demand_total'].sum().sort_values(ascending = False)

month
7     40032668.9
1     39366932.3
8     39038530.7
3     36821241.5
2     36196584.6
10    35618722.0
9     35254757.2
11    35063044.7
6     34690918.8
5     34382635.2
4     32905392.9
12    18681684.6
Name: demand_total, dtype: float64

In [27]:
energy[energy['season'].isin(['summer', 'winter'])].groupby('season')['demand_total'].sum().reset_index()

Unnamed: 0,season,demand_total
0,summer,113762118.4
1,winter,94245201.5


## Exercise 9

Calculate, for each date, the difference between the maximum and the minimum demand. What's the average difference? Which month has the highest difference?

In [45]:
energy['date'] = pd.to_datetime(energy['datetime_utc']).dt.date

energy_max_min = energy.groupby('date')['demand_total'].agg(['max', 'min'])

energy_max_min['difference'] = energy_max_min['max'] - energy_max_min['min']



print(energy_max_min['difference'].mean())

energy_max_min.sort_values('difference', ascending = False)

8778.647714285715


Unnamed: 0_level_0,max,min,difference
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-30,35882.4,21047.5,14834.9
2024-01-08,35293.6,21084.2,14209.4
2023-02-06,34818.7,20668.2,14150.5
2023-01-23,34983.2,21153.0,13830.2
2023-02-27,34506.7,20892.9,13613.8
...,...,...,...
2024-01-31,24830.5,24830.5,0.0
2023-02-28,26092.0,26092.0,0.0
2024-02-29,24984.0,24984.0,0.0
2023-01-31,26069.0,26069.0,0.0


## Exercise 10

Does the spot price correlate with the demand difference?

In [52]:
energy_max_min = energy.groupby('date')[['demand_total', 'spot_price']].agg({
    'demand_total':['max', 'min'], 
    'spot_price':['sum']})

energy_max_min.columns = ['demand_max', 'demand_min', 'spot_price_sum']

energy_max_min['difference'] = energy_max_min['demand_max'] - energy_max_min['demand_min']

energy_max_min.corr()

Unnamed: 0,demand_max,demand_min,spot_price_sum,difference
demand_max,1.0,0.683777,0.413955,0.860352
demand_min,0.683777,1.0,0.193148,0.216367
spot_price_sum,0.413955,0.193148,1.0,0.418947
difference,0.860352,0.216367,0.418947,1.0


## Exercise 11

Which month has had the day with the highest spot price?

In [53]:
energy.head(3)

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday,season,date
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False,other,2023-10-31
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True,other,2023-11-01
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True,other,2023-11-01


In [58]:
energy.groupby(['month', 'day'])['spot_price'].sum().sort_values(ascending = False).reset_index().max()

month           12.00
day             31.00
spot_price    6247.41
dtype: float64

## Exercise 12

Are we using coal and gas power plants to cover the demand peaks when the renewable sources are not enough?

In [63]:
energy[['demand_total', 'gen_ccgt',	'gen_coal',	'gen_hydro', 'gen_nuclear',	'gen_solar_pv',	'gen_solar_th',	'gen_total', 'gen_wind']].corr()

Unnamed: 0,demand_total,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind
demand_total,1.0,0.177118,0.491688,0.151421,0.251486,0.440687,0.35597,0.82229,0.061869
gen_ccgt,0.177118,1.0,0.283612,-0.049059,0.176534,-0.185872,-0.116084,0.030451,-0.373806
gen_coal,0.491688,0.283612,1.0,0.236793,0.115786,0.187118,-0.109379,0.35523,-0.209502
gen_hydro,0.151421,-0.049059,0.236793,1.0,-0.180258,-0.413091,-0.390626,-0.077831,0.069065
gen_nuclear,0.251486,0.176534,0.115786,-0.180258,1.0,-0.091144,-0.050308,0.119518,-0.086043
gen_solar_pv,0.440687,-0.185872,0.187118,-0.413091,-0.091144,1.0,0.787957,0.680565,-0.292676
gen_solar_th,0.35597,-0.116084,-0.109379,-0.390626,-0.050308,0.787957,1.0,0.536712,-0.312395
gen_total,0.82229,0.030451,0.35523,-0.077831,0.119518,0.680565,0.536712,1.0,0.254945
gen_wind,0.061869,-0.373806,-0.209502,0.069065,-0.086043,-0.292676,-0.312395,0.254945,1.0
