# Session 18: Pandas practice

Power market data from Spain.

## Data description

- **datetime_utc**: The date and time in UTC format.
- **spot_price**: The spot price of electricity at the given datetime in euros per megawatt-hour (€/MWh).
- **gen_ccgt**: Generation from Combined Cycle Gas Turbine (CCGT) plants in megawatts (MWh).
- **gen_coal**: Generation from coal-fired power plants in megawatts (MWh).
- **gen_hydro**: Generation from hydroelectric power plants in megawatts (MWh).
- **gen_nuclear**: Generation from nuclear power plants in megawatts (MWh).
- **gen_solar_pv**: Generation from solar photovoltaic (PV) power plants in megawatts (MWh).
- **gen_solar_th**: Generation from solar thermal power plants in megawatts (MWh).
- **gen_total**: Total electricity generation from all sources in megawatts (MWh).
- **gen_wind**: Generation from wind power plants in megawatts (MWh).
- **demand_total**: Total electricity demand in megawatts (MWh).
- **year**: The year of the datetime.
- **month**: The month of the datetime.
- **day**: The day of the datetime.
- **hour**: The hour of the datetime.
- **weekday**: The day of the week (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
- **is_weekend**: A binary indicator of whether the date is a weekend (1 = Yes, 0 = No).
- **is_holiday**: A binary indicator of whether the date is a holiday (True = Yes, False = No).

In [1]:
import pandas as pd

energy = pd.read_csv('C:/Users/SLO/Documents/GitHub/IE-University/CSV_FILES/df_final.csv')

energy.head()

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,19426.9,2023,11,1,2,2,0,True
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,19100.8,2023,11,1,3,2,0,True


## Exercise 1

What's the initial and final datetime in the dataset?

In [3]:
print(energy['datetime_utc'].max())
print(energy['datetime_utc'].min())

2024-11-29 23:00:00+00:00
2022-12-31 23:00:00+00:00


## Exercise 2

What's the average spot price in the dataset? And the median?

In [4]:
print(energy['spot_price'].apply(['median', 'mean']))

median    80.100000
mean      73.292958
Name: spot_price, dtype: float64


## Exercise 2.5

What's the cause of NaN values in the dataset?

In [5]:
energy.isna().mean()

datetime_utc    0.000000
spot_price      0.000000
gen_ccgt        0.477692
gen_coal        0.845379
gen_hydro       0.000000
gen_nuclear     0.000000
gen_solar_pv    0.000000
gen_solar_th    0.015425
gen_total       0.000000
gen_wind        0.000000
demand_total    0.000000
year            0.000000
month           0.000000
day             0.000000
hour            0.000000
weekday         0.000000
is_weekend      0.000000
is_holiday      0.000000
dtype: float64

## Exercise 3

What's the yearly evolution of the average spot price? And the monthly evolution?

In [6]:
energy.groupby('year')['spot_price'].mean()

year
2022     0.000000
2023    87.059305
2024    58.303823
Name: spot_price, dtype: float64

In [9]:
energy.groupby(['year', 'month'])['spot_price'].mean().head()

year  month
2022  12         0.000000
2023  1         67.899667
      2        133.520586
      3         91.919390
      4         74.160818
Name: spot_price, dtype: float64

## Exercise 4

Calculate the gap between the `demand_total` and the `gen_total` for each row. What's the average gap?

In [10]:
energy.head()

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,19426.9,2023,11,1,2,2,0,True
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,19100.8,2023,11,1,3,2,0,True


In [12]:
energy['gap'] = energy['demand_total'] - energy['gen_total']

energy['gap'].mean()

-2089.132866273353

## Exercise 5

What's the correlation between the spot price and the total generation? And the demand? And the gap?

In [14]:
energy[['spot_price', 'demand_total']].corr()

Unnamed: 0,spot_price,demand_total
spot_price,1.0,0.190751
demand_total,0.190751,1.0


## Exercise 6

On average, in Spain, is the spot price higher during the weekends?

In [16]:
energy.groupby('is_weekend', as_index = False)['spot_price'].mean()

Unnamed: 0,is_weekend,spot_price
0,0,79.102809
1,1,58.728261


## Exercise 7

Knowing that the average nuclear power plant in Spain is 1000 MW, how many nuclear power plants do we have in Spain?

In [17]:
energy.head()

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday,gap
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False,-1379.1
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True,-1195.6
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True,-1388.3
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,19426.9,2023,11,1,2,2,0,True,-1362.2
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,19100.8,2023,11,1,3,2,0,True,-1473.5


In [19]:
energy['gen_nuclear'].max() / 1000

7.1395

## Exercise 8

When is the demand peaking? In summer or in winter?

In [27]:
import numpy as np

energy['season'] = np.where(energy['month'].isin([6, 7, 8]), 'summer', 
                            np.where(energy['month'].isin([12, 1, 2]), 'winter', 
                                     np.where(energy['month'].isin([3, 4, 5]), 'spring', 'fall')))

energy.groupby('season')['demand_total'].mean()

season
fall      25050.017475
spring    24330.280346
summer    26592.360542
winter    27035.341796
Name: demand_total, dtype: float64

## Exercise 9

Calculate, for each date, the difference between the maximum and the minimum demand. What's the average difference? Which month has the highest difference?

In [32]:
energy['date'] = pd.to_datetime(energy['datetime_utc']).dt.date


energy['demand_max'] = energy.groupby('date')['demand_total'].transform('max')
energy['demand_min'] = energy.groupby('date')['demand_total'].transform('min')


energy['diff'] = energy['demand_max'] = energy['demand_min']

energy.head()

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,...,hour,weekday,is_weekend,is_holiday,gap,season,date,demand_max,demand_min,diff
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,...,23,1,0,False,-1379.1,fall,2023-10-31,21757.3,21757.3,21757.3
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,...,0,2,0,True,-1195.6,fall,2023-11-01,18891.6,18891.6,18891.6
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,...,1,2,0,True,-1388.3,fall,2023-11-01,18891.6,18891.6,18891.6
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,...,2,2,0,True,-1362.2,fall,2023-11-01,18891.6,18891.6,18891.6
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,...,3,2,0,True,-1473.5,fall,2023-11-01,18891.6,18891.6,18891.6


date
2022-12-31     19646.3
2023-01-01    469530.0
2023-01-02    544595.4
2023-01-03    623130.6
2023-01-04    628311.8
                ...   
2024-11-25    637375.7
2024-11-26    648493.9
2024-11-27    653901.4
2024-11-28    655460.9
2024-11-29    641768.5
Name: demand_total, Length: 700, dtype: float64

8778.647714285715


Unnamed: 0_level_0,max,min,difference
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-30,35882.4,21047.5,14834.9
2024-01-08,35293.6,21084.2,14209.4
2023-02-06,34818.7,20668.2,14150.5
2023-01-23,34983.2,21153.0,13830.2
2023-02-27,34506.7,20892.9,13613.8
...,...,...,...
2024-01-31,24830.5,24830.5,0.0
2023-02-28,26092.0,26092.0,0.0
2024-02-29,24984.0,24984.0,0.0
2023-01-31,26069.0,26069.0,0.0


9032.163003933138


Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,...,is_holiday,gap,season,date,min_day_demand,max_day_demand,diff_demand,renewable_gen,renewable_gap,thermal_gen
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,...,False,-1379.1,fall,2023-10-31,21757.3,21757.3,0.0,15205.6,6551.7,
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,...,True,-1195.6,fall,2023-11-01,18891.6,25426.0,6534.4,14661.8,5999.2,
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,...,True,-1388.3,fall,2023-11-01,18891.6,25426.0,6534.4,14787.6,5163.9,
3,2023-11-01 02:00:00+00:00,4.3,,,3433.1,4307.5,51.2,194.1,20789.1,10736.3,...,True,-1362.2,fall,2023-11-01,18891.6,25426.0,6534.4,14414.7,5012.2,
4,2023-11-01 03:00:00+00:00,4.3,,,3286.3,4307.5,51.2,194.1,20574.3,10669.2,...,True,-1473.5,fall,2023-11-01,18891.6,25426.0,6534.4,14200.8,4900.0,


## Exercise 10

Does the spot price correlate with the demand difference?

Unnamed: 0,demand_max,demand_min,spot_price_sum,difference
demand_max,1.0,0.683777,0.413955,0.860352
demand_min,0.683777,1.0,0.193148,0.216367
spot_price_sum,0.413955,0.193148,1.0,0.418947
difference,0.860352,0.216367,0.418947,1.0


Unnamed: 0,spot_price,diff_demand
spot_price,1.0,0.240926
diff_demand,0.240926,1.0


## Exercise 11

Which month has had the day with the highest spot price?

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday,season,date
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False,other,2023-10-31
1,2023-11-01 00:00:00+00:00,12.52,,291.0,3989.8,4305.5,51.4,194.1,21856.6,10426.5,20661.0,2023,11,1,0,2,0,True,other,2023-11-01
2,2023-11-01 01:00:00+00:00,4.99,,180.0,3709.6,4306.5,51.2,194.1,21339.8,10832.7,19951.5,2023,11,1,1,2,0,True,other,2023-11-01


month           12.00
day             31.00
spot_price    6247.41
dtype: float64

1

## Exercise 12

Are we using coal and gas power plants to cover the demand peaks when the renewable sources are not enough?

Unnamed: 0,demand_total,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind
demand_total,1.0,0.177118,0.491688,0.151421,0.251486,0.440687,0.35597,0.82229,0.061869
gen_ccgt,0.177118,1.0,0.283612,-0.049059,0.176534,-0.185872,-0.116084,0.030451,-0.373806
gen_coal,0.491688,0.283612,1.0,0.236793,0.115786,0.187118,-0.109379,0.35523,-0.209502
gen_hydro,0.151421,-0.049059,0.236793,1.0,-0.180258,-0.413091,-0.390626,-0.077831,0.069065
gen_nuclear,0.251486,0.176534,0.115786,-0.180258,1.0,-0.091144,-0.050308,0.119518,-0.086043
gen_solar_pv,0.440687,-0.185872,0.187118,-0.413091,-0.091144,1.0,0.787957,0.680565,-0.292676
gen_solar_th,0.35597,-0.116084,-0.109379,-0.390626,-0.050308,0.787957,1.0,0.536712,-0.312395
gen_total,0.82229,0.030451,0.35523,-0.077831,0.119518,0.680565,0.536712,1.0,0.254945
gen_wind,0.061869,-0.373806,-0.209502,0.069065,-0.086043,-0.292676,-0.312395,0.254945,1.0


Unnamed: 0,renewable_gap,thermal_gen
renewable_gap,1.0,0.834513
thermal_gen,0.834513,1.0
