# Session 18: Pandas practice

Power market data from Spain.

## Data description

- **datetime_utc**: The date and time in UTC format.
- **spot_price**: The spot price of electricity at the given datetime in euros per megawatt-hour (€/MWh).
- **gen_ccgt**: Generation from Combined Cycle Gas Turbine (CCGT) plants in megawatts (MWh).
- **gen_coal**: Generation from coal-fired power plants in megawatts (MWh).
- **gen_hydro**: Generation from hydroelectric power plants in megawatts (MWh).
- **gen_nuclear**: Generation from nuclear power plants in megawatts (MWh).
- **gen_solar_pv**: Generation from solar photovoltaic (PV) power plants in megawatts (MWh).
- **gen_solar_th**: Generation from solar thermal power plants in megawatts (MWh).
- **gen_total**: Total electricity generation from all sources in megawatts (MWh).
- **gen_wind**: Generation from wind power plants in megawatts (MWh).
- **demand_total**: Total electricity demand in megawatts (MWh).
- **year**: The year of the datetime.
- **month**: The month of the datetime.
- **day**: The day of the datetime.
- **hour**: The hour of the datetime.
- **weekday**: The day of the week (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
- **is_weekend**: A binary indicator of whether the date is a weekend (1 = Yes, 0 = No).
- **is_holiday**: A binary indicator of whether the date is a holiday (True = Yes, False = No).

In [2]:
import pandas as pd

energy = pd.read_csv('df_final.csv')

energy.head()

FileNotFoundError: [Errno 2] No such file or directory: 'df_final.csv'

## Exercise 1

What's the initial and final datetime in the dataset?

In [10]:
print(energy['datetime_utc'].min())
print(energy['datetime_utc'].max())

energy['datetime_utc'].apply(['min', 'max'])

2022-12-31 23:00:00+00:00
2024-11-29 23:00:00+00:00


min    2022-12-31 23:00:00+00:00
max    2024-11-29 23:00:00+00:00
Name: datetime_utc, dtype: object

## Exercise 2

What's the average spot price in the dataset? And the median?

In [9]:
print(energy['spot_price'].mean())
print(energy['spot_price'].median())

energy['spot_price'].apply(['mean', 'median'])

73.29295784169125
80.1


mean      73.292958
median    80.100000
Name: spot_price, dtype: float64

## Exercise 2.5

What's the cause of NaN values in the dataset?

In [12]:
energy.isna().mean()

datetime_utc    0.000000
spot_price      0.000000
gen_ccgt        0.477692
gen_coal        0.845379
gen_hydro       0.000000
gen_nuclear     0.000000
gen_solar_pv    0.000000
gen_solar_th    0.015425
gen_total       0.000000
gen_wind        0.000000
demand_total    0.000000
year            0.000000
month           0.000000
day             0.000000
hour            0.000000
weekday         0.000000
is_weekend      0.000000
is_holiday      0.000000
dtype: float64

## Exercise 3

What's the yearly evolution of the average spot price? And the monthly evolution?

In [13]:
energy.groupby('year')['spot_price'].mean()

year
2022     0.000000
2023    87.059305
2024    58.303823
Name: spot_price, dtype: float64

In [None]:
energy.groupby(['year', 'month'], as_index = False)[['spot_price']].mean().head(5) # The extra square brackets turns the whole thing into a dataframe

Unnamed: 0,year,month,spot_price
0,2022,12,0.0
1,2023,1,67.899667
2,2023,2,133.520586
3,2023,3,91.91939
4,2023,4,74.160818


## Exercise 4

Calculate the gap between the `demand_total` and the `gen_total` for each row. What's the average gap?

In [15]:
energy['gap'] = energy['demand_total'] - energy['gen_total']

energy['gap'].mean()

-2089.132866273353

## Exercise 5

What's the correlation between the spot price and the total generation? And the demand? And the gap?

In [21]:
energy[['spot_price', 'gen_total', 'demand_total', 'gap']].corr()

Unnamed: 0,spot_price,gen_total,demand_total,gap
spot_price,1.0,-0.198832,0.190751,0.596665
gen_total,-0.198832,1.0,0.82229,-0.666328
demand_total,0.190751,0.82229,1.0,-0.123584
gap,0.596665,-0.666328,-0.123584,1.0


## Exercise 6

On average, in Spain, is the spot price higher during the weekends?

In [25]:
energy.groupby('is_weekend')[['spot_price']].mean()

Unnamed: 0_level_0,spot_price
is_weekend,Unnamed: 1_level_1
0,79.102809
1,58.728261


## Exercise 7

Knowing that the average nuclear power plant in Spain is 1000 MW, how many nuclear power plants do we have in Spain?

In [27]:
round(energy['gen_nuclear'].max() / 1000)

7

## Exercise 8

When is the demand peaking? In summer or in winter?

In [1]:
# Months 6-8 = summer
# Months 12-2 = winter

energy.groupby('month')

NameError: name 'energy' is not defined

## Exercise 9

Calculate, for each date, the difference between the maximum and the minimum demand. What's the average difference? Which month has the highest difference?

In [29]:
energy['date'] = pd.to_datetime(energy['datetime_utc']).dt.date

energy.groupby('date').transform(min)

Unnamed: 0,datetime_utc,spot_price,gen_ccgt,gen_coal,gen_hydro,gen_nuclear,gen_solar_pv,gen_solar_th,gen_total,gen_wind,demand_total,year,month,day,hour,weekday,is_weekend,is_holiday,gap
0,2023-10-31 23:00:00+00:00,16.75,,292.0,5008.4,4304.5,51.4,194.1,23136.4,9951.7,21757.3,2023,10,31,23,1,0,False,-1379.1
1,2023-11-01 00:00:00+00:00,0.01,,180.0,3159.7,3281.1,51.2,194.1,20574.3,10426.5,18891.6,2023,11,1,0,2,0,True,-6052.2
2,2023-11-01 00:00:00+00:00,0.01,,180.0,3159.7,3281.1,51.2,194.1,20574.3,10426.5,18891.6,2023,11,1,0,2,0,True,-6052.2
3,2023-11-01 00:00:00+00:00,0.01,,180.0,3159.7,3281.1,51.2,194.1,20574.3,10426.5,18891.6,2023,11,1,0,2,0,True,-6052.2
4,2023-11-01 00:00:00+00:00,0.01,,180.0,3159.7,3281.1,51.2,194.1,20574.3,10426.5,18891.6,2023,11,1,0,2,0,True,-6052.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16267,2024-10-30 00:00:00+00:00,40.00,68.0,350.0,2950.9,6048.0,55.7,81.5,21508.3,7062.3,21161.6,2024,10,30,0,2,0,False,-4362.4
16268,2024-10-30 00:00:00+00:00,40.00,68.0,350.0,2950.9,6048.0,55.7,81.5,21508.3,7062.3,21161.6,2024,10,30,0,2,0,False,-4362.4
16269,2024-10-30 00:00:00+00:00,40.00,68.0,350.0,2950.9,6048.0,55.7,81.5,21508.3,7062.3,21161.6,2024,10,30,0,2,0,False,-4362.4
16270,2024-10-30 00:00:00+00:00,40.00,68.0,350.0,2950.9,6048.0,55.7,81.5,21508.3,7062.3,21161.6,2024,10,30,0,2,0,False,-4362.4


## Exercise 10

Does the spot price correlate with the demand difference?

## Exercise 11

Which month has had the day with the highest spot price?

## Exercise 12

Are we using coal and gas power plants to cover the demand peaks when the renewable sources are not enough?