## 📊 Data Required for the Project

### ⚡ Electricity Demand & Time Features
1. **Electricity Demand** ✅  
2. **Day** ✅  
3. **Month** ✅  
4. **Hour** ✅  
5. **Year** ✅  
6. **Day of Week** ✅  
7. **Is Holiday** ✅  
---
### 🌤️ Weather Features 
8. **Temperature** ✅  
9. **Solar Radiation** ✅  
10. **Wind Speed** ✅ 
11. **Cloud Coverage** ✅  
---
### ♻️ Carbon Intensity & Energy Mix
12. **% of Energy from Renewables, Low Carbon, Zero Carbon and Renowables** ✅ 
13. **Total Generated** ✅  
14. **Generation by Source** ✅ 
15. **Carbon Intensity** ✅
---
### 💸 Price
16. **Price per MW** ✅ 

In [1]:
# Standart libraries
import os
import glob

# Third-party libraries
import pandas as pd

# Local libraries
from utils import data_check as ck
from utils import data_cleaning as dc

# Enable auto-reload for modules during development
%load_ext autoreload
%autoreload 2

In [680]:
file_path = "../data"

### Electricity Demand

In [681]:
demand_path = os.path.join(file_path, "electricity_demand")

#### UK

In [682]:
# Exploring the data
uk_demand_path = os.path.join(demand_path, "uk")

In [683]:
# Read the files in the directory
file_list = glob.glob(os.path.join(uk_demand_path, 'demanddata_*.csv'))

# Initialize an empty dictionary to store the dataframes
uk_demand = {}

# Loop through the files and read them into a dictionary
for file in file_list:

    # Get the file name with the path
    file_name = os.path.basename(file)

    # Extract the year from the file name
    year = file_name.split('_')[1].split('.')[0]

    # Read the file
    df = pd.read_csv(file)
    uk_demand[year] = df

    print(f'demanddata_{year} loaded')


demanddata_2019 loaded
demanddata_2020 loaded
demanddata_2021 loaded
demanddata_2022 loaded
demanddata_2023 loaded
demanddata_2024 loaded
demanddata_2025 loaded


In [684]:
uk_demand['2019'].head()

Unnamed: 0,SETTLEMENT_DATE,SETTLEMENT_PERIOD,ND,TSD,ENGLAND_WALES_DEMAND,EMBEDDED_WIND_GENERATION,EMBEDDED_WIND_CAPACITY,EMBEDDED_SOLAR_GENERATION,EMBEDDED_SOLAR_CAPACITY,NON_BM_STOR,...,IFA_FLOW,IFA2_FLOW,BRITNED_FLOW,MOYLE_FLOW,EAST_WEST_FLOW,NEMO_FLOW,NSL_FLOW,ELECLINK_FLOW,VIKING_FLOW,GREENLINK_FLOW
0,01-JAN-2019,1,23808,25291,22393,2912,5918,0,12663,0,...,1553,0,176,-455,-250,0,0,0,0,0
1,01-JAN-2019,2,24402,25720,22962,2761,5918,0,12663,0,...,1554,0,194,-455,-236,0,0,0,0,0
2,01-JAN-2019,3,24147,25495,22689,2709,5918,0,12663,0,...,1505,0,581,-410,-311,0,0,0,0,0
3,01-JAN-2019,4,23197,24590,21849,2638,5918,0,12663,0,...,1503,0,600,-450,-315,0,0,0,0,0
4,01-JAN-2019,5,22316,24346,20979,2557,5918,0,12663,0,...,1503,0,675,-442,-463,0,0,0,0,0


##### 📘 Columns Description

1. **SETTLEMENT_DATE**  
   The calendar date corresponding to the data.

2. **SETTLEMENT_PERIOD**  
   Each day is divided into 48 half-hour periods. Some days may have 46 or 50 periods due to daylight saving changes.

3. **ND (National Demand)**  
   The total electricity demand across the entire UK (excluding some special cases).

4. **TSD (Transmission System Demand)**  
   National Demand **plus** electricity used by:
   - Power stations themselves (they need power to operate),
   - Pumped storage (pumping water uphill to store energy),
   - Exports to other countries via interconnectors.

5. **ENGLAND_WALES_DEMAND**  
   Similar to ND, but only includes electricity demand in **England and Wales**, excluding Scotland and Northern Ireland.

6. **EMBEDDED_WIND_GENERATION**  
   Estimated electricity generated by small, non-grid-connected wind farms. These farms reduce grid demand but are hard to measure directly — this is a modeled estimate.

7. **EMBEDDED_WIND_CAPACITY**  
   Theoretical maximum electricity these small wind farms could generate under ideal wind conditions.

8. **EMBEDDED_SOLAR_GENERATION**  
   Estimated electricity generated by small, non-grid-connected solar panels. Like wind, this estimate helps model grid demand more accurately.

9. **EMBEDDED_SOLAR_CAPACITY**  
   The maximum output solar panels could produce under full sunlight.

10. **NON_BM_STOR**  
   Electricity from reserves or generators not part of the main balancing mechanism. This includes:
   - Extra capacity ready to dispatch,
   - Demand reduction agreements,
   - Acts as a backup to balance supply and demand.

11. **PUMPED_STORAGE_USAGE**
   Electricity used to **pump water uphill** in hydro storage plants (like charging a battery). Usually a **negative value**, as it represents electricity consumption for future generation.

12. **Interconnector Flows**  
   Cables connecting the UK grid to other countries. They allow for imports and exports of electricity.  
   Includes:
   - **IFA_FLOW** → France  
   - **IFA2_FLOW** → France
   - **MOYLE_FLOW** → Northern Ireland  
   - **EAST_WEST_FLOW** → Republic of Ireland  
   - **NEMO_FLOW** → Belgium  
   - **NSL_FLOW** → Norway (North Sea Link)  
   - **BRITNED_FLOW** -> Netherlands
   - **ELECLINK_FLOW** -> France
   - **GREENLINK_FLOW** -> Ireland

   📌 *Positive values = Imports into the UK*  
   📌 *Negative values = Exports from the UK*


In [685]:
# Snake case columns
for key, value in uk_demand.items():
    # Rename the columns
    uk_demand[key] = dc.snake(value)

In [686]:
uk_demand['2019'].head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,england_wales_demand,embedded_wind_generation,embedded_wind_capacity,embedded_solar_generation,embedded_solar_capacity,non_bm_stor,...,ifa_flow,ifa2_flow,britned_flow,moyle_flow,east_west_flow,nemo_flow,nsl_flow,eleclink_flow,viking_flow,greenlink_flow
0,01-JAN-2019,1,23808,25291,22393,2912,5918,0,12663,0,...,1553,0,176,-455,-250,0,0,0,0,0
1,01-JAN-2019,2,24402,25720,22962,2761,5918,0,12663,0,...,1554,0,194,-455,-236,0,0,0,0,0
2,01-JAN-2019,3,24147,25495,22689,2709,5918,0,12663,0,...,1505,0,581,-410,-311,0,0,0,0,0
3,01-JAN-2019,4,23197,24590,21849,2638,5918,0,12663,0,...,1503,0,600,-450,-315,0,0,0,0,0
4,01-JAN-2019,5,22316,24346,20979,2557,5918,0,12663,0,...,1503,0,675,-442,-463,0,0,0,0,0


In [687]:
# Extracting necessary columns
uk_demand_clean = dc.extract_columns(uk_demand,["settlement_date", "settlement_period", "nd", "tsd"])

Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2019
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2020
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2021
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2022
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2023
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2024
Columns ['settlement_date', 'settlement_period', 'nd', 'tsd'] extracted from data 2025


In [688]:
# Check the 2019 data types, missing values, and duplicates
ck.check(uk_demand_clean['2019'])

Number of columns: 4 and rows: 17520

Data types:
settlement_date      object
settlement_period     int64
nd                    int64
tsd                   int64
dtype: object

Unique values count:
settlement_date        365
settlement_period       50
nd                   12402
tsd                  12327
dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index([], dtype='object')

Unique value count for categorical columns:

Count of null values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of missing() values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of duplicated values:
0


In [689]:
# Let's convert settlement_date to datetime format
for key, value in uk_demand_clean.items():
    # Rename the columns
    uk_demand_clean[key] = dc.convert_to_datetime(value, columns=["settlement_date"]) 

In [690]:
ck.check(uk_demand_clean['2019'])

Number of columns: 4 and rows: 17520

Data types:
settlement_date      datetime64[ns]
settlement_period             int64
nd                            int64
tsd                           int64
dtype: object

Unique values count:
settlement_date        365
settlement_period       50
nd                   12402
tsd                  12327
dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index([], dtype='object')

Unique value count for categorical columns:

Count of null values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of missing() values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of duplicated values:
0


In [691]:
# Let's found the dates with the daylight saving time change in all the yeas from the dataset
daylight_saving_time = ck.check_settlement_period(uk_demand_clean)

for year, periods in daylight_saving_time.items():
    print(f"Year: {year}, DST change dates:")
    for period_type, dates in periods.items():
        print(f"  {period_type}: {dates}")

Year: 2019, DST change dates:
  46_periods: DatetimeIndex(['2019-03-31'], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex(['2019-10-27'], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2020, DST change dates:
  46_periods: DatetimeIndex(['2020-03-29'], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex(['2020-10-25'], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2021, DST change dates:
  46_periods: DatetimeIndex(['2021-03-28'], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex(['2021-10-31'], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2022, DST change dates:
  46_periods: DatetimeIndex(['2022-03-27'], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex(['2022-10-30'], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2023, DST change dates:
  46_periods: DatetimeInde

The values correspond with the Daylight Saving Time in the UK:

- **2019**: Sun, Mar 31 – Sun, Oct 27  
- **2020**: Sun, Mar 29 – Sun, Oct 25  
- **2021**: Sun, Mar 28 – Sun, Oct 31  
- **2022**: Sun, Mar 27 – Sun, Oct 30  
- **2023**: Sun, Mar 26 – Sun, Oct 29  
- **2024**: Sun, Mar 31 – Sun, Oct 27  
- **2025**: Data ends on Mar 23, before Daylight Saving Time begins.

In [692]:
# Adjust the periods for daylight saving time
for key, value in uk_demand_clean.items():
    # Rename the columns
    uk_demand_clean[key] = dc.adjust_dst_periods(value) 

Adjusting periods for date: 2019-03-31 00:00:00
New legth for date 2019-03-31 00:00:00 is 48
Adjusting periods for date: 2019-10-27 00:00:00
New legth for date 2019-10-27 00:00:00 is 48
Adjusting periods for date: 2020-03-29 00:00:00
New legth for date 2020-03-29 00:00:00 is 48
Adjusting periods for date: 2020-10-25 00:00:00
New legth for date 2020-10-25 00:00:00 is 48
Adjusting periods for date: 2021-03-28 00:00:00
New legth for date 2021-03-28 00:00:00 is 48
Adjusting periods for date: 2021-10-31 00:00:00
New legth for date 2021-10-31 00:00:00 is 48
Adjusting periods for date: 2022-03-27 00:00:00
New legth for date 2022-03-27 00:00:00 is 48
Adjusting periods for date: 2022-10-30 00:00:00
New legth for date 2022-10-30 00:00:00 is 48
Adjusting periods for date: 2023-03-26 00:00:00
New legth for date 2023-03-26 00:00:00 is 48
Adjusting periods for date: 2023-10-29 00:00:00
New legth for date 2023-10-29 00:00:00 is 48
Adjusting periods for date: 2024-03-31 00:00:00
New legth for date 202

In [693]:
ck.check(uk_demand_clean['2019'])

Number of columns: 4 and rows: 17520

Data types:
settlement_date      datetime64[ns]
settlement_period             int64
nd                            int64
tsd                           int64
dtype: object

Unique values count:
settlement_date        365
settlement_period       48
nd                   12399
tsd                  12324
dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index([], dtype='object')

Unique value count for categorical columns:

Count of null values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of missing() values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
dtype: int64

Count of duplicated values:
0


In [694]:
# Lets check the all the DST have were adjusted
daylight_saving_time = ck.check_settlement_period(uk_demand_clean)

for year, periods in daylight_saving_time.items():
    print(f"Year: {year}, DST change dates:")
    for period_type, dates in periods.items():
        print(f"  {period_type}: {dates}")

Year: 2019, DST change dates:
  46_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2020, DST change dates:
  46_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2021, DST change dates:
  46_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2022, DST change dates:
  46_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
Year: 2023, DST change dates:
  46_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([],

In [695]:
# Adding bank holidays to the dataset
# The following dates are the bank holidays in the UK for the years 2019-2025 (end-date: 23-03-2025).
bank_holidays = ['01-01-2019', '19-04-2019', '22-04-2019', '06-05-2019', '27-05-2019',
                '26-08-2019', '25-12-2019', '26-12-2019', '01-01-2020', '10-04-2020',
                '13-04-2020', '08-05-2020', '25-05-2020', '31-08-2020', '25-12-2020',
                '28-12-2020', '01-01-2021', '02-04-2021', '05-04-2021', '03-05-2021',
                '31-05-2021', '30-08-2021', '27-12-2021', '28-12-2021','03-01-2022', 
                '15-04-2022', '18-04-2022', '02-05-2022', '03-06-2022', '29-08-2022',
                '26-12-2022', '27-12-2022', '02-01-2023', '07-04-2023', '10-04-2023', 
                '01-05-2023', '08-05-2023', '29-05-2023', '28-08-2023', '25-12-2023',
                '26-12-2023', '01-01-2024', '29-03-2024', '01-04-2024', '06-05-2024',
                '27-05-2024', '26-08-2024', '25-12-2024', '26-12-2024', '01-01-2025',
                '18-04-2025', '21-04-2025', '05-05-2025', '26-05-2025', '25-08-2025',
                '25-12-2025', '26-12-2025', '01-01-2026', '03-04-2026', '06-04-2026',
                '04-05-2026', '25-05-2026', '31-08-2026', '25-12-2026', '28-12-2026']

for key, value in uk_demand_clean.items():
    uk_demand_clean[key] = dc.add_holiday_column(value, bank_holidays) 

Holidays date from school might also be incorporated, but variate around the country, so for an specific region might be more useful

In [696]:
uk_demand_clean['2019'].assign(
    settlement_date=uk_demand_clean['2019']['settlement_date'].dt.strftime('%Y-%m-%d')
)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
0,2019-01-01,1,23808,25291,True
1,2019-01-01,2,24402,25720,True
2,2019-01-01,3,24147,25495,True
3,2019-01-01,4,23197,24590,True
4,2019-01-01,5,22316,24346,True
...,...,...,...,...,...
17517,2019-12-31,46,27530,28817,False
17518,2019-12-31,47,26765,27941,False
17519,2019-12-31,48,26257,27230,False
17520,2019-03-31,3,25165,26586,False


In [697]:
# Let's check the data for 01 Apr 2024
uk_demand_clean['2024'][uk_demand_clean['2024']['settlement_date'] == '2024-04-01'].assign(
    settlement_date=uk_demand_clean['2019']['settlement_date'].dt.strftime('%Y-%m-%d')
)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
4366,2019-04-02,1,20610,22969,True
4367,2019-04-02,2,20179,22661,True
4368,2019-04-02,3,20079,22235,True
4369,2019-04-02,4,20504,22695,True
4370,2019-04-02,5,20323,22694,True
4371,2019-04-02,6,20082,22498,True
4372,2019-04-02,7,19849,22160,True
4373,2019-04-02,8,19580,21892,True
4374,2019-04-02,9,19328,21553,True
4375,2019-04-02,10,19315,21456,True


In [698]:
# Merging the dataframes into one
uk_demand_merged = pd.concat(uk_demand_clean.values(), ignore_index=True)

uk_demand_merged.assign(
    settlement_date=uk_demand_merged['settlement_date'].dt.strftime('%Y-%m-%d')
)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
0,2019-01-01,1,23808,25291,True
1,2019-01-01,2,24402,25720,True
2,2019-01-01,3,24147,25495,True
3,2019-01-01,4,23197,24590,True
4,2019-01-01,5,22316,24346,True
...,...,...,...,...,...
109147,2025-03-23,44,27583,29683,False
109148,2025-03-23,45,26280,28380,False
109149,2025-03-23,46,24970,27038,False
109150,2025-03-23,47,23841,26124,False


In [699]:
# Ordering the datframe by date and settlement period
uk_demand_merged = uk_demand_merged.sort_values(by=["settlement_date", "settlement_period"])
uk_demand_merged = uk_demand_merged.reset_index(drop=True)

In [700]:
# Check the data types, missing values, and duplicates
ck.check(uk_demand_merged)

Number of columns: 5 and rows: 109152

Data types:
settlement_date      datetime64[ns]
settlement_period             int64
nd                            int64
tsd                           int64
is_bank_holiday                bool
dtype: object

Unique values count:
settlement_date       2274
settlement_period       48
nd                   25987
tsd                  25152
is_bank_holiday          2
dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index(['is_bank_holiday'], dtype='object')

Unique value count for categorical columns:

is_bank_holiday
False    106752
True       2400
Name: count, dtype: int64

Count of null values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
is_bank_holiday      0
dtype: int64

Count of missing() values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
is_bank_holiday      0
dtype: int64

Count of duplicated values:
0


In [701]:
# Saving the data to a CSV file
uk_demand_merged = dc.drop_nan_rows(uk_demand_merged)
uk_demand_merged.to_csv(os.path.join(uk_demand_path, "uk_demand_merged.csv"), index=False)
print("Data saved to uk_demand_merged.csv")

Number of rows dropped due to NaN values: 0
Data saved to uk_demand_merged.csv


#### Demand Data Update

In [702]:
# Adding to the data the last updates

file = os.path.join(uk_demand_path, 'demanddataupdate.csv')

# Read the file
df = pd.read_csv(file)
print(f'Update demand data loaded')

Update demand data loaded


In [703]:
# Snake case columns 
uk_demand_update = dc.snake(df)

In [704]:
uk_demand_update.head()

Unnamed: 0,settlement_date,settlement_period,nd,forecast_actual_indicator,tsd,england_wales_demand,embedded_wind_generation,embedded_wind_capacity,embedded_solar_generation,embedded_solar_capacity,...,ifa_flow,ifa2_flow,britned_flow,moyle_flow,east_west_flow,nemo_flow,nsl_flow,eleclink_flow,viking_flow,greenlink_flow
0,2025-03-01,1,26764,A,28817,24750,1444,6606,0,18720,...,1937,985,-263,0,0,-62,1397,996,-816,-258
1,2025-03-01,2,27403,A,29289,25324,1444,6606,0,18720,...,1943,992,-236,0,0,-50,1397,996,-835,-139
2,2025-03-01,3,26928,A,28662,24944,1445,6606,0,18720,...,1938,983,-3,-35,0,-115,1397,998,-719,-245
3,2025-03-01,4,26300,A,28172,24431,1445,6606,0,18720,...,1938,983,0,-6,0,-117,1397,997,-697,-390
4,2025-03-01,5,25852,A,28331,24083,1423,6606,0,18720,...,1998,991,-110,-118,0,-247,1397,997,-222,-507


In [705]:
uk_demand_merged.tail()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
109147,2025-03-23,44,27583,29683,False
109148,2025-03-23,45,26280,28380,False
109149,2025-03-23,46,24970,27038,False
109150,2025-03-23,47,23841,26124,False
109151,2025-03-23,48,23259,25542,False


In [706]:
# Extracting necessary columns
uk_demand_update_cleaned = uk_demand_update[["settlement_date", "settlement_period", "nd", "tsd", "forecast_actual_indicator"]]

# Convert to datetime format
uk_demand_update_cleaned = dc.convert_to_datetime(uk_demand_update_cleaned, columns=["settlement_date"]) 

# Filter the data to include only the last updates
uk_demand_update_cleaned = uk_demand_update_cleaned[(uk_demand_update_cleaned["forecast_actual_indicator"] == "A") & (uk_demand_update_cleaned["tsd"] > 500)]
uk_demand_update_cleaned = uk_demand_update_cleaned[uk_demand_update_cleaned["settlement_date"] > uk_demand_merged["settlement_date"].max()]
uk_demand_update_cleaned = uk_demand_update_cleaned.drop(columns=["forecast_actual_indicator"])

In [707]:
uk_demand_update_cleaned.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd
1104,2025-03-24,1,23427,25925
1105,2025-03-24,2,24135,26637
1106,2025-03-24,3,23810,26474
1107,2025-03-24,4,23400,26030
1108,2025-03-24,5,22679,25352


In [708]:
uk_demand_update_cleaned.tail()

Unnamed: 0,settlement_date,settlement_period,nd,tsd
2208,2025-04-16,3,21133,22352
2209,2025-04-16,4,21151,22370
2210,2025-04-16,5,20492,22462
2211,2025-04-16,6,20165,22251
2212,2025-04-16,7,19664,23178


In [709]:
# Lets check for DST
uk_demand_update_cleaned_dict = {"update": uk_demand_update_cleaned}
daylight_saving_time = ck.check_settlement_period(uk_demand_update_cleaned_dict)

for key, value in daylight_saving_time.items():
    for period_type, dates in value.items():
        print(f"  {period_type}: {dates}")

  46_periods: DatetimeIndex(['2025-03-30'], dtype='datetime64[ns]', name='settlement_date', freq=None)
  50_periods: DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)


In [710]:
print(daylight_saving_time)

{'update': {'46_periods': DatetimeIndex(['2025-03-30'], dtype='datetime64[ns]', name='settlement_date', freq=None), '50_periods': DatetimeIndex([], dtype='datetime64[ns]', name='settlement_date', freq=None)}}


In [711]:
# Adjust the periods for daylight saving time
for key, value in uk_demand_update_cleaned_dict.items():
    uk_demand_update_cleaned = dc.adjust_dst_periods(value) 

Adjusting periods for date: 2025-03-30 00:00:00
New legth for date 2025-03-30 00:00:00 is 48


In [712]:
# Add holiday column
uk_demand_update_cleaned = dc.add_holiday_column(uk_demand_update_cleaned, bank_holidays) 

In [713]:
uk_demand_update_cleaned.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
0,2025-03-24,1,23427,25925,False
1,2025-03-24,2,24135,26637,False
2,2025-03-24,3,23810,26474,False
3,2025-03-24,4,23400,26030,False
4,2025-03-24,5,22679,25352,False


In [714]:
# Ordering the datframe by date and settlement period
uk_demand_merged_update = pd.concat([uk_demand_merged, uk_demand_update_cleaned], ignore_index=True)
uk_demand_merged_update = uk_demand_merged_update.sort_values(by=["settlement_date", "settlement_period"])
uk_demand_merged_update = uk_demand_merged_update.reset_index(drop=True)
ck.check(uk_demand_merged_update)

Number of columns: 5 and rows: 110263

Data types:
settlement_date      datetime64[ns]
settlement_period             int64
nd                            int64
tsd                           int64
is_bank_holiday                bool
dtype: object

Unique values count:
settlement_date       2298
settlement_period       48
nd                   26012
tsd                  25176
is_bank_holiday          2
dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index(['is_bank_holiday'], dtype='object')

Unique value count for categorical columns:

is_bank_holiday
False    107863
True       2400
Name: count, dtype: int64

Count of null values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
is_bank_holiday      0
dtype: int64

Count of missing() values:
settlement_date      0
settlement_period    0
nd                   0
tsd                  0
is_bank_holiday      0
dtype: int64

Count of duplicated values:
0


In [715]:
uk_demand_merged_update.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
0,2019-01-01,1,23808,25291,True
1,2019-01-01,2,24402,25720,True
2,2019-01-01,3,24147,25495,True
3,2019-01-01,4,23197,24590,True
4,2019-01-01,5,22316,24346,True


In [716]:
uk_demand_merged_update.tail()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday
110258,2025-04-16,3,21133,22352,False
110259,2025-04-16,4,21151,22370,False
110260,2025-04-16,5,20492,22462,False
110261,2025-04-16,6,20165,22251,False
110262,2025-04-16,7,19664,23178,False


In [717]:
# Creating shifted columns
uk_demand_merged_update_shifted = uk_demand_merged_update.copy()

In [718]:
# Adding rolling and lag features
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='hours', window_size=3)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='hours', window_size=1)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='hours', window_size=12)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='hours', window_size=6)

uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='days', window_size=1)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='days', window_size=3)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='days', window_size=2)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='days', window_size=5)

uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='weeks', window_size=1)
uk_demand_merged_update_shifted = dc.create_rolling_features(uk_demand_merged_update_shifted, columns=["nd", "tsd"], type='weeks', window_size=2)

#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='hours', window_size=3)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='hours', window_size=1)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='hours', window_size=12)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='hours', window_size=6)

#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='days', window_size=1)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='days', window_size=3)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='days', window_size=2)
#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='days', window_size=5)

#uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='weeks', window_size=1)
uk_demand_merged_update_shifted = dc.create_lag_features(uk_demand_merged_update_shifted, columns=['nd', 'tsd'], type='weeks', window_size=2)

uk_demand_merged_update_shifted = uk_demand_merged_update_shifted.dropna()
uk_demand_merged_update_shifted = uk_demand_merged_update_shifted.drop(columns=["tsd"])
uk_demand_merged_update_shifted.head()

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,nd_rolling_1_weeks_for_2_week,nd_rolling_std_1_weeks_for_2_week,tsd_rolling_1_weeks_for_2_week,tsd_rolling_std_1_weeks_for_2_week,nd_rolling_2_weeks_for_2_week,nd_rolling_std_2_weeks_for_2_week,tsd_rolling_2_weeks_for_2_week,tsd_rolling_std_2_weeks_for_2_week,nd_lag_2_weeks,tsd_lag_2_weeks
1343,2019-01-28,48,29402,False,31266.833333,3993.244867,32476.166667,3857.737338,27109.5,1038.739862,...,34441.866071,7540.631246,35916.738095,7509.941719,34254.348214,7313.410507,35839.275298,7124.685777,26375.0,27827.0
1344,2019-01-29,1,29041,False,29438.5,3355.971677,30706.0,3293.081718,26155.0,311.126984,...,34445.883929,7535.721617,35920.354167,7505.451409,34257.513393,7309.341824,35842.162202,7120.797253,25935.0,27231.0
1345,2019-01-29,2,29560,False,28091.666667,2461.10444,29388.666667,2497.164045,26135.0,282.842712,...,34449.660714,7531.323251,35923.958333,7501.127611,34260.389881,7305.837058,35844.889881,7117.264295,26335.0,27553.0
1346,2019-01-29,3,29424,False,27070.833333,1581.604428,28402.0,1543.215345,26152.5,258.093975,...,34453.020833,7527.274291,35927.681548,7496.65157,34263.102679,7302.413821,35847.967262,7113.228453,25970.0,27563.0
1347,2019-01-29,4,28844,False,26287.833333,859.88916,27755.833333,601.221562,25619.0,496.38896,...,34456.220238,7523.127655,35932.565476,7490.567257,34266.184524,7298.172404,35852.212798,7107.34809,25268.0,27443.0


In [719]:
ck.check(uk_demand_merged_update_shifted)

Number of columns: 46 and rows: 108920

Data types:
settlement_date                        datetime64[ns]
settlement_period                               int64
nd                                              int64
is_bank_holiday                                  bool
nd_rolling_3_hours_for_2_week                 float64
nd_rolling_std_3_hours_for_2_week             float64
tsd_rolling_3_hours_for_2_week                float64
tsd_rolling_std_3_hours_for_2_week            float64
nd_rolling_1_hours_for_2_week                 float64
nd_rolling_std_1_hours_for_2_week             float64
tsd_rolling_1_hours_for_2_week                float64
tsd_rolling_std_1_hours_for_2_week            float64
nd_rolling_12_hours_for_2_week                float64
nd_rolling_std_12_hours_for_2_week            float64
tsd_rolling_12_hours_for_2_week               float64
tsd_rolling_std_12_hours_for_2_week           float64
nd_rolling_6_hours_for_2_week                 float64
nd_rolling_std_6_hours_for_2_w

In [720]:
uk_demand_merged_update_shifted.tail(20)

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,nd_rolling_1_weeks_for_2_week,nd_rolling_std_1_weeks_for_2_week,tsd_rolling_1_weeks_for_2_week,tsd_rolling_std_1_weeks_for_2_week,nd_rolling_2_weeks_for_2_week,nd_rolling_std_2_weeks_for_2_week,tsd_rolling_2_weeks_for_2_week,tsd_rolling_std_2_weeks_for_2_week,nd_lag_2_weeks,tsd_lag_2_weeks
110243,2025-04-15,36,31947,False,22300.5,3271.108971,23360.833333,3230.474294,25912.0,1350.573952,...,24638.327381,4466.421747,26937.020833,4515.708146,25983.0625,4538.264288,28187.18006,4561.142893,26867.0,27946.0
110244,2025-04-15,37,31876,False,23955.0,3465.476187,24943.833333,3352.063389,27644.0,1098.843938,...,24615.21131,4426.387432,26908.604167,4471.543087,25970.623512,4520.045795,28171.752976,4540.378634,28421.0,29112.0
110245,2025-04-15,38,32052,False,25717.666667,3344.759702,26602.666667,3244.17778,29175.5,1067.024133,...,24596.449405,4390.20479,26884.502976,4429.593409,25959.56994,4501.200105,28157.38244,4517.387358,29930.0,30571.0
110246,2025-04-15,39,32267,False,27411.333333,3030.759553,28229.0,2861.349891,30541.5,864.791593,...,24581.252976,4358.437191,26864.657738,4392.793989,25949.755952,4482.610598,28145.453869,4497.370008,31153.0,31696.0
110247,2025-04-15,40,31980,False,28793.666667,2545.584543,29578.833333,2339.057196,31293.5,198.697006,...,24569.127976,4333.54444,26848.303571,4362.796557,25941.416667,4467.137725,28135.172619,4480.328857,31434.0,32115.0
110248,2025-04-15,41,31690,False,29949.0,1962.259412,30689.333333,1849.349904,31661.5,321.733585,...,24560.970238,4317.097389,26836.297619,4341.018218,25934.946429,4455.336546,28126.924107,4466.781468,31889.0,32696.0
110249,2025-04-15,42,31345,False,30661.0,1274.991922,31375.333333,1314.171628,31514.0,530.330086,...,24553.651786,4303.795644,26825.446429,4323.296289,25929.212798,4446.133267,28119.651786,4456.363018,31139.0,32062.0
110250,2025-04-15,43,29582,False,30886.166667,848.120609,31636.5,847.135113,30455.5,966.61497,...,24546.642857,4293.330171,26814.886905,4309.475098,25923.424107,4438.57637,28112.363095,4448.145764,29772.0,30679.0
110251,2025-04-15,44,28128,False,30646.5,1270.407927,31434.0,1217.172297,29132.0,905.09668,...,24541.0625,4286.956248,26805.625,4300.624244,25918.401786,4433.747902,28105.977679,4443.266062,28492.0,29356.0
110252,2025-04-15,45,26445,False,29887.5,2036.610395,30702.166667,2057.865828,27545.5,1338.553137,...,24534.127976,4281.71848,26794.535714,4294.495902,25912.973214,4430.672369,28099.247024,4441.044706,26599.0,27305.0


In [721]:
null_dem = pd.DataFrame(uk_demand_merged_update_shifted.isnull().sum())
null_dem.head(50)

Unnamed: 0,0
settlement_date,0
settlement_period,0
nd,0
is_bank_holiday,0
nd_rolling_3_hours_for_2_week,0
nd_rolling_std_3_hours_for_2_week,0
tsd_rolling_3_hours_for_2_week,0
tsd_rolling_std_3_hours_for_2_week,0
nd_rolling_1_hours_for_2_week,0
nd_rolling_std_1_hours_for_2_week,0


In [722]:
# Saving the data to a CSV file
uk_demand_merged_update = dc.drop_nan_rows(uk_demand_merged_update)
uk_demand_merged_update.to_csv(os.path.join(uk_demand_path, "uk_demand_merged_update.csv"), index=False)
print("Data saved to uk_demand_merged_update.csv")

Number of rows dropped due to NaN values: 0
Data saved to uk_demand_merged_update.csv


### Carbon Intensity and Generation Mix

In [723]:
carbon_path = os.path.join(file_path, "carbon_intensity")

#### UK

In [724]:
# Exploring the data
uk_carbon_path = os.path.join(carbon_path, "uk")

# Read the files in the directory
file_list = glob.glob(os.path.join(uk_carbon_path, 'df_fuel*.csv'))

# Initialize an empty dictionary to store the dataframes
carbon = {}

# Loop through the files and read them into a dictionary
for file in file_list:

    # Read the file
    df = pd.read_csv(file)
    carbon['uk'] = df

    print(f'carbon intensity loaded')

carbon intensity loaded


In [725]:
carbon['uk'].head()

Unnamed: 0,DATETIME,GAS,COAL,NUCLEAR,WIND,WIND_EMB,HYDRO,IMPORTS,BIOMASS,OTHER,...,IMPORTS_perc,BIOMASS_perc,OTHER_perc,SOLAR_perc,STORAGE_perc,GENERATION_perc,LOW_CARBON_perc,ZERO_CARBON_perc,RENEWABLE_perc,FOSSIL_perc
0,2009-01-01T00:00:00,8369.0,15037.0,7099.0,237.0,59.0,246,2524.0,0.0,0.0,...,7.5,0.0,0.0,0.0,0.0,100.0,22.8,24.5,1.6,69.7
1,2009-01-01T00:30:00,8501.0,15095.0,7087.0,218.0,54.0,245,2502.0,0.0,0.0,...,7.4,0.0,0.0,0.0,0.0,100.0,22.6,24.2,1.5,70.0
2,2009-01-01T01:00:00,8475.0,15087.0,7074.0,196.0,49.0,246,2472.0,0.0,0.0,...,7.4,0.0,0.0,0.0,0.0,100.0,22.5,24.2,1.5,70.1
3,2009-01-01T01:30:00,8320.0,15031.0,7064.0,181.0,45.0,246,2445.0,0.0,0.0,...,7.3,0.0,0.0,0.0,0.0,100.0,22.6,24.3,1.4,70.1
4,2009-01-01T02:00:00,8295.0,14999.0,7052.0,167.0,42.0,246,2370.0,0.0,0.0,...,7.1,0.0,0.0,0.0,0.0,100.0,22.6,24.3,1.4,70.2


##### Dataset Column Descriptions

1. **DATETIME**: 
   - The date and time when the data was recorded, typically in a `YYYY-MM-DD HH:MM:SS` format. This column helps track the specific time of each data entry.

2. **GAS**: 
   - The amount of electricity generated from gas-powered sources, usually measured in megawatt-hours (MWh).

3. **COAL**: 
   - The amount of electricity generated from coal-powered sources, typically measured in megawatt-hours (MWh).

4. **NUCLEAR**: 
   - The amount of electricity generated from nuclear energy, typically measured in megawatt-hours (MWh).

5. **WIND**: 
   - The amount of electricity generated from wind power, typically measured in megawatt-hours (MWh).

6. **WIND_EMB**: 
   - Estimated electricity generated by small, non-grid-connected wind farms. These farms reduce grid demand but are hard to measure directly — this is a modeled estimate.

7. **HYDRO**: 
   - The amount of electricity generated from hydroelectric power, typically measured in megawatt-hours (MWh).

8. **IMPORTS**: 
   - The amount of electricity imported from other regions or countries, typically measured in megawatt-hours (MWh).

9. **BIOMASS**: 
   - The amount of electricity generated from biomass, typically measured in megawatt-hours (MWh).

10. **OTHER**: 
   - The amount of electricity generated from other non-renewable or non-typical sources, typically measured in megawatt-hours (MWh).

11. **SOLAR**: 
   - The amount of electricity generated from solar power, typically measured in megawatt-hours (MWh).

12. **STORAGE**: 
   - The amount of electricity stored in batteries or other storage technologies, typically measured in megawatt-hours (MWh).

13. **GENERATION**: 
   - The total electricity generation across all sources in the dataset, typically measured in megawatt-hours (MWh).

14. **CARBON_INTENSITY**: 
   - The amount of carbon emissions associated with the electricity generated, typically measured in grams of CO2 per kilowatt-hour (gCO2/kWh). This helps assess the environmental impact of electricity generation.

15. **LOW_CARBON**: 
   - The amount of electricity generated from low-carbon sources (such as wind, solar, hydro, and biomass), typically measured in megawatt-hours (MWh).

16. **ZERO_CARBON**: 
   - The amount of electricity generated from zero-carbon sources (such as wind, solar, and hydro), typically measured in megawatt-hours (MWh).

17. **RENEWABLE**: 
   - The amount of electricity generated from renewable sources (such as wind, solar, hydro, and biomass), typically measured in megawatt-hours (MWh).

18. **FOSSIL**: 
   - The amount of electricity generated from fossil fuels (such as gas, coal, and oil), typically measured in megawatt-hours (MWh).

19. **GAS_perc**: 
   - The percentage of total electricity generation that comes from gas-powered sources. This is calculated as `(GAS / GENERATION) * 100`.

20. **COAL_perc**: 
   - The percentage of total electricity generation that comes from coal-powered sources. This is calculated as `(COAL / GENERATION) * 100`.

21. **NUCLEAR_perc**: 
   - The percentage of total electricity generation that comes from nuclear power. This is calculated as `(NUCLEAR / GENERATION) * 100`.

22. **WIND_perc**: 
   - The percentage of total electricity generation that comes from wind power. This is calculated as `(WIND / GENERATION) * 100`.

23. **WIND_EMB_perc**: 
   - The percentage of total electricity generation that comes from wind power, non-grid-connected wind farms. This is calculated as `(WIND_EMB / GENERATION) * 100`.

24. **HYDRO_perc**: 
   - The percentage of total electricity generation that comes from hydroelectric power. This is calculated as `(HYDRO / GENERATION) * 100`.

25. **IMPORTS_perc**: 
   - The percentage of total electricity generation that comes from imported electricity. This is calculated as `(IMPORTS / GENERATION) * 100`.

26. **BIOMASS_perc**: 
   - The percentage of total electricity generation that comes from biomass. This is calculated as `(BIOMASS / GENERATION) * 100`.

27. **OTHER_perc**: 
   - The percentage of total electricity generation that comes from other non-renewable or non-typical sources. This is calculated as `(OTHER / GENERATION) * 100`.

28. **SOLAR_perc**: 
   - The percentage of total electricity generation that comes from solar power. This is calculated as `(SOLAR / GENERATION) * 100`.

29. **STORAGE_perc**: 
   - The percentage of total electricity generation that comes from electricity storage. This is calculated as `(STORAGE / GENERATION) * 100`.

30. **GENERATION_perc**: 
   - The total percentage of electricity generation. Typically, this would be `100%` unless you have some form of normalization or scaling.

31. **LOW_CARBON_perc**: 
   - The percentage of total electricity generation that comes from low-carbon sources. This is calculated as `(LOW_CARBON / GENERATION) * 100`.

32. **ZERO_CARBON_perc**: 
   - The percentage of total electricity generation that comes from zero-carbon sources. This is calculated as `(ZERO_CARBON / GENERATION) * 100`.

33. **RENEWABLE_perc**: 
   - The percentage of total electricity generation that comes from renewable sources. This is calculated as `(RENEWABLE / GENERATION) * 100`.

34. **FOSSIL_perc**: 
   - The percentage of total electricity generation that comes from fossil fuel sources. This is calculated as `(FOSSIL / GENERATION) * 100`.


In [726]:
# Snake case columns
for key, value in carbon.items():
    # Rename the columns
    carbon[key] = dc.snake(value)

In [727]:
# Let's convert datetime to datetime format
for key, value in carbon.items():
    # Rename the columns
    carbon[key] = dc.convert_to_datetime(value, columns=["datetime"]) 

In [728]:
ck.check(carbon['uk'])

Number of columns: 34 and rows: 285570

Data types:
datetime            datetime64[ns]
gas                        float64
coal                       float64
nuclear                    float64
wind                       float64
wind_emb                   float64
hydro                        int64
imports                    float64
biomass                    float64
other                      float64
solar                      float64
storage                      int64
generation                 float64
carbon_intensity           float64
low_carbon                 float64
zero_carbon                float64
renewable                  float64
fossil                     float64
gas_perc                   float64
coal_perc                  float64
nuclear_perc               float64
wind_perc                  float64
wind_emb_perc              float64
hydro_perc                 float64
imports_perc               float64
biomass_perc               float64
other_perc                 float64
sol

In [729]:
carbon['uk'].head()

Unnamed: 0,datetime,gas,coal,nuclear,wind,wind_emb,hydro,imports,biomass,other,...,imports_perc,biomass_perc,other_perc,solar_perc,storage_perc,generation_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc
0,2009-01-01 00:00:00,8369.0,15037.0,7099.0,237.0,59.0,246,2524.0,0.0,0.0,...,7.5,0.0,0.0,0.0,0.0,100.0,22.8,24.5,1.6,69.7
1,2009-01-01 00:30:00,8501.0,15095.0,7087.0,218.0,54.0,245,2502.0,0.0,0.0,...,7.4,0.0,0.0,0.0,0.0,100.0,22.6,24.2,1.5,70.0
2,2009-01-01 01:00:00,8475.0,15087.0,7074.0,196.0,49.0,246,2472.0,0.0,0.0,...,7.4,0.0,0.0,0.0,0.0,100.0,22.5,24.2,1.5,70.1
3,2009-01-01 01:30:00,8320.0,15031.0,7064.0,181.0,45.0,246,2445.0,0.0,0.0,...,7.3,0.0,0.0,0.0,0.0,100.0,22.6,24.3,1.4,70.1
4,2009-01-01 02:00:00,8295.0,14999.0,7052.0,167.0,42.0,246,2370.0,0.0,0.0,...,7.1,0.0,0.0,0.0,0.0,100.0,22.6,24.3,1.4,70.2


In [730]:
# Dropping generation_percent column
carbon['uk'] = carbon['uk'].drop(columns=["generation_perc"])

In [731]:
carbon['uk'].head()

Unnamed: 0,datetime,gas,coal,nuclear,wind,wind_emb,hydro,imports,biomass,other,...,hydro_perc,imports_perc,biomass_perc,other_perc,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc
0,2009-01-01 00:00:00,8369.0,15037.0,7099.0,237.0,59.0,246,2524.0,0.0,0.0,...,0.7,7.5,0.0,0.0,0.0,0.0,22.8,24.5,1.6,69.7
1,2009-01-01 00:30:00,8501.0,15095.0,7087.0,218.0,54.0,245,2502.0,0.0,0.0,...,0.7,7.4,0.0,0.0,0.0,0.0,22.6,24.2,1.5,70.0
2,2009-01-01 01:00:00,8475.0,15087.0,7074.0,196.0,49.0,246,2472.0,0.0,0.0,...,0.7,7.4,0.0,0.0,0.0,0.0,22.5,24.2,1.5,70.1
3,2009-01-01 01:30:00,8320.0,15031.0,7064.0,181.0,45.0,246,2445.0,0.0,0.0,...,0.7,7.3,0.0,0.0,0.0,0.0,22.6,24.3,1.4,70.1
4,2009-01-01 02:00:00,8295.0,14999.0,7052.0,167.0,42.0,246,2370.0,0.0,0.0,...,0.7,7.1,0.0,0.0,0.0,0.0,22.6,24.3,1.4,70.2


In [732]:
# Extracting setellment period and settlement date from datetime
carbon['uk'] = dc.extract_settlement_period_and_date(carbon['uk'], "datetime")

In [733]:
carbon['uk'].head()

Unnamed: 0,gas,coal,nuclear,wind,wind_emb,hydro,imports,biomass,other,solar,...,biomass_perc,other_perc,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,settlement_date,settlement_period
0,8369.0,15037.0,7099.0,237.0,59.0,246,2524.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,22.8,24.5,1.6,69.7,2009-01-01,1
1,8501.0,15095.0,7087.0,218.0,54.0,245,2502.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,22.6,24.2,1.5,70.0,2009-01-01,2
2,8475.0,15087.0,7074.0,196.0,49.0,246,2472.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,22.5,24.2,1.5,70.1,2009-01-01,3
3,8320.0,15031.0,7064.0,181.0,45.0,246,2445.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,22.6,24.3,1.4,70.1,2009-01-01,4
4,8295.0,14999.0,7052.0,167.0,42.0,246,2370.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,22.6,24.3,1.4,70.2,2009-01-01,5


In [734]:
carbon['uk'] = dc.create_carbon_columns(carbon['uk'])

In [735]:
ck.check(carbon['uk'])

Number of columns: 38 and rows: 285570

Data types:
gas                           float64
coal                          float64
nuclear                       float64
wind                          float64
wind_emb                      float64
hydro                           int64
imports                       float64
biomass                       float64
other                         float64
solar                         float64
storage                         int64
generation                    float64
carbon_intensity              float64
low_carbon                    float64
zero_carbon                   float64
renewable                     float64
fossil                        float64
gas_perc                      float64
coal_perc                     float64
nuclear_perc                  float64
wind_perc                     float64
wind_emb_perc                 float64
hydro_perc                    float64
imports_perc                  float64
biomass_perc                  float6

In [736]:
carbon['uk'].tail()

Unnamed: 0,gas,coal,nuclear,wind,wind_emb,hydro,imports,biomass,other,solar,...,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,settlement_date,settlement_period,low_vs_fossil,zero_vs_fossil,renewable_vs_fossil,green_score
285565,4952.0,0.0,4417.0,13805.0,3656.0,191,4080.0,1152.0,1100.0,1151.0,...,70.6,71.9,54.5,14.4,2025-04-16,14,4.921648,3.718296,3.797052,210.647887
285566,5400.0,0.0,4415.0,14010.0,3760.0,198,4796.0,1154.0,906.0,1800.0,...,69.5,71.4,54.2,14.8,2025-04-16,15,4.692037,3.448704,3.660741,216.575342
285567,5546.0,0.0,4410.0,13745.0,3810.0,192,4872.0,1163.0,894.0,2632.0,...,69.6,70.7,54.7,14.9,2025-04-16,16,4.679409,3.30815,3.67454,224.342466
285568,5540.0,0.0,4414.0,13574.0,3841.0,142,5134.0,1160.0,677.0,3448.0,...,70.1,71.1,55.4,14.6,2025-04-16,17,4.797653,3.272563,3.791516,243.171429
285569,4926.0,0.0,4409.0,13699.0,3907.0,141,5152.0,1155.0,670.0,4257.0,...,71.9,73.0,57.4,12.9,2025-04-16,18,5.596427,3.704629,4.46691,285.015873


In [737]:
# Saving the data to a CSV file
carbon['uk'].to_csv(os.path.join(uk_carbon_path, "carbon_and_mix.csv"), index=False)
print("Data saved to carbon_and_mix.csv")

Data saved to carbon_and_mix.csv


In [738]:
carbon_uk_shifted = carbon['uk'].copy()
# Adding rolling and lag features
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=3)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=1)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=12)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=6)


carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=1)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=3)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=2)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=5)


carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='weeks', window_size=1)
carbon_uk_shifted = dc.create_rolling_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='weeks', window_size=2)

#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=3)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=1)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=6)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='hours', window_size=12)


#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=1)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=3)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=2)
#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='days', window_size=5)


#carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='weeks', window_size=1)
carbon_uk_shifted = dc.create_lag_features(carbon_uk_shifted, columns=["gas", "coal", "nuclear", "wind", "wind_emb", "hydro", "imports", "biomass", "other", "solar", "storage", "generation", "carbon_intensity", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"], type='weeks', window_size=2)
carbon_uk_shifted = carbon_uk_shifted.dropna()
carbon_uk_shifted = carbon_uk_shifted.drop(columns=["gas", "coal", "nuclear", "wind_emb", "hydro", "imports", "biomass", "other", "storage", "generation", "low_carbon", "zero_carbon", "renewable", "fossil", "low_vs_fossil", "zero_vs_fossil", "renewable_vs_fossil", "green_score"])
carbon_uk_shifted = carbon_uk_shifted.drop(columns=[col for col in carbon_uk_shifted.columns if 'perc' in col])
carbon_uk_shifted.head()

Unnamed: 0,wind,solar,carbon_intensity,settlement_date,settlement_period,gas_rolling_3_hours_for_2_week,gas_rolling_std_3_hours_for_2_week,coal_rolling_3_hours_for_2_week,coal_rolling_std_3_hours_for_2_week,nuclear_rolling_3_hours_for_2_week,...,generation_lag_2_weeks,carbon_intensity_lag_2_weeks,low_carbon_lag_2_weeks,zero_carbon_lag_2_weeks,renewable_lag_2_weeks,fossil_lag_2_weeks,low_vs_fossil_lag_2_weeks,zero_vs_fossil_lag_2_weeks,renewable_vs_fossil_lag_2_weeks,green_score_lag_2_weeks
1343,263.0,0.0,603.0,2009-01-28,48,14130.5,2303.563131,22147.166667,740.779432,5802.333333,...,40152.0,607.0,7245.0,7043.0,1558.0,32907.0,0.220166,0.214027,0.047346,1.331137
1344,252.0,0.0,600.0,2009-01-29,1,13166.833333,1574.608068,21855.333333,715.162825,5769.5,...,40316.0,609.0,7156.0,6960.0,1532.0,33159.0,0.215809,0.209898,0.046202,1.29064
1345,277.0,0.0,595.0,2009-01-29,2,12549.0,867.494323,21572.5,603.689573,5725.833333,...,40364.0,608.0,7113.0,6913.0,1550.0,33251.0,0.213918,0.207904,0.046615,1.317434
1346,288.0,0.0,596.0,2009-01-29,3,12277.333333,400.798037,21307.5,456.750588,5682.666667,...,40472.0,605.0,7076.0,6882.0,1510.0,33396.0,0.211882,0.206073,0.045215,1.279339
1347,305.0,0.0,600.0,2009-01-29,4,12172.166667,222.247085,21080.666667,313.728333,5625.833333,...,40071.0,606.0,7005.0,6811.0,1512.0,33066.0,0.211849,0.205982,0.045727,1.280528


In [739]:
ck.check(carbon_uk_shifted)

Number of columns: 446 and rows: 284227

Data types:
wind                                      float64
solar                                     float64
carbon_intensity                          float64
settlement_date                    datetime64[ns]
settlement_period                           int32
                                        ...      
fossil_lag_2_weeks                        float64
low_vs_fossil_lag_2_weeks                 float64
zero_vs_fossil_lag_2_weeks                float64
renewable_vs_fossil_lag_2_weeks           float64
green_score_lag_2_weeks                   float64
Length: 446, dtype: object

Unique values count:
wind                                16062
solar                                9352
carbon_intensity                      616
settlement_date                      5923
settlement_period                      48
                                    ...  
fossil_lag_2_weeks                  39875
low_vs_fossil_lag_2_weeks          283860
zero_vs_foss

In [740]:
null_gen = pd.DataFrame(carbon_uk_shifted.isnull().sum())
null_gen.head(50)

Unnamed: 0,0
wind,0
solar,0
carbon_intensity,0
settlement_date,0
settlement_period,0
gas_rolling_3_hours_for_2_week,0
gas_rolling_std_3_hours_for_2_week,0
coal_rolling_3_hours_for_2_week,0
coal_rolling_std_3_hours_for_2_week,0
nuclear_rolling_3_hours_for_2_week,0


In [741]:
data_uk_mergedshifted = pd.merge(uk_demand_merged_update_shifted,carbon_uk_shifted, how="left", left_on=["settlement_date", "settlement_period"], right_on=["settlement_date", "settlement_period"])
data_uk_mergedshifted = data_uk_mergedshifted.sort_values(by=["settlement_date", "settlement_period"])
data_uk_mergedshifted = data_uk_mergedshifted.reset_index(drop=True)
data_uk_mergedshifted.head()

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,generation_lag_2_weeks,carbon_intensity_lag_2_weeks,low_carbon_lag_2_weeks,zero_carbon_lag_2_weeks,renewable_lag_2_weeks,fossil_lag_2_weeks,low_vs_fossil_lag_2_weeks,zero_vs_fossil_lag_2_weeks,renewable_vs_fossil_lag_2_weeks,green_score_lag_2_weeks
0,2019-01-28,48,29402,False,31266.833333,3993.244867,32476.166667,3857.737338,27109.5,1038.739862,...,28716.0,135.0,18664.0,15446.0,12171.0,8307.0,2.24678,1.859396,1.46515,73.755556
1,2019-01-29,1,29041,False,29438.5,3355.971677,30706.0,3293.081718,26155.0,311.126984,...,28488.0,131.0,18804.0,15568.0,12313.0,8081.0,2.32694,1.926494,1.523698,76.916031
2,2019-01-29,2,29560,False,28091.666667,2461.10444,29388.666667,2497.164045,26135.0,282.842712,...,28817.0,135.0,18763.0,15459.0,12255.0,8505.0,2.206114,1.817637,1.440917,73.785185
3,2019-01-29,3,29424,False,27070.833333,1581.604428,28402.0,1543.215345,26152.5,258.093975,...,28835.0,138.0,18623.0,15258.0,12106.0,8842.0,2.106198,1.725628,1.369147,70.630435
4,2019-01-29,4,28844,False,26287.833333,859.88916,27755.833333,601.221562,25619.0,496.38896,...,28693.0,139.0,18497.0,15091.0,11972.0,8877.0,2.083699,1.700011,1.348654,68.985612


In [742]:
data_uk_merged = pd.merge(uk_demand_merged_update, carbon['uk'], how="left", left_on=["settlement_date", "settlement_period"], right_on=["settlement_date", "settlement_period"])
data_uk_merged = data_uk_merged.sort_values(by=["settlement_date", "settlement_period"])
data_uk_merged = data_uk_merged.reset_index(drop=True)
data_uk_merged.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,low_vs_fossil,zero_vs_fossil,renewable_vs_fossil,green_score
0,2019-01-01,1,23808,25291,True,5853.0,0.0,6924.0,9236.0,2031.0,...,0.0,0.0,72.0,70.4,42.8,21.4,3.356911,2.830173,1.994191,98.255319
1,2019-01-01,2,24402,25720,True,6292.0,0.0,6838.0,9297.0,1997.0,...,0.0,0.1,70.7,69.1,42.3,22.7,3.112524,2.629847,1.860458,93.909091
2,2019-01-01,3,24147,25495,True,5719.0,0.0,6834.0,9356.0,1959.0,...,0.0,0.0,71.3,70.9,42.6,20.9,3.417905,2.895961,2.043539,96.453608
3,2019-01-01,4,23197,24590,True,5020.0,0.0,6830.0,9135.0,1914.0,...,0.0,0.0,72.8,72.8,43.2,19.0,3.838446,3.253586,2.274303,101.5
4,2019-01-01,5,22316,24346,True,4964.0,0.0,6827.0,8912.0,1865.0,...,0.0,0.0,72.4,72.7,42.5,18.9,3.822925,3.242143,2.242546,97.934066


In [743]:
data_uk_merged.tail()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,low_vs_fossil,zero_vs_fossil,renewable_vs_fossil,green_score
110258,2025-04-16,3,21133,22352,False,3012.0,0.0,4419.0,8773.0,2502.0,...,0.0,0.0,66.7,73.5,44.7,11.9,5.627158,4.409695,3.77324,141.5
110259,2025-04-16,4,21151,22370,False,2968.0,0.0,4418.0,9140.0,2536.0,...,0.0,0.0,68.4,75.6,46.4,11.7,5.841981,4.597372,3.963275,157.586207
110260,2025-04-16,5,20492,22462,False,3439.0,0.0,4418.0,10053.0,2571.0,...,0.0,0.0,69.1,74.6,48.0,13.0,5.310555,4.226229,3.689154,162.145161
110261,2025-04-16,6,20165,22251,False,3058.0,0.0,4422.0,11202.0,2662.0,...,0.0,0.0,72.1,78.3,51.5,11.3,6.378352,5.129823,4.554284,215.423077
110262,2025-04-16,7,19664,23178,False,3207.0,0.0,4413.0,12048.0,2792.0,...,0.0,0.0,73.8,78.5,53.7,11.6,6.385719,5.152479,4.647022,227.320755


In [744]:
ck.check(data_uk_merged)

Number of columns: 41 and rows: 110263

Data types:
settlement_date        datetime64[ns]
settlement_period               int64
nd                              int64
tsd                             int64
is_bank_holiday                  bool
gas                           float64
coal                          float64
nuclear                       float64
wind                          float64
wind_emb                      float64
hydro                           int64
imports                       float64
biomass                       float64
other                         float64
solar                         float64
storage                         int64
generation                    float64
carbon_intensity              float64
low_carbon                    float64
zero_carbon                   float64
renewable                     float64
fossil                        float64
gas_perc                      float64
coal_perc                     float64
nuclear_perc                  float6

In [745]:
data_uk_merged_1 = dc.add_time(data_uk_merged, "settlement_date")
data_uk_merged_1.head(48)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,low_vs_fossil,zero_vs_fossil,renewable_vs_fossil,green_score
0,2019-01-01 00:00:00,1,23808,25291,True,5853.0,0.0,6924.0,9236.0,2031.0,...,0.0,0.0,72.0,70.4,42.8,21.4,3.356911,2.830173,1.994191,98.255319
1,2019-01-01 00:30:00,2,24402,25720,True,6292.0,0.0,6838.0,9297.0,1997.0,...,0.0,0.1,70.7,69.1,42.3,22.7,3.112524,2.629847,1.860458,93.909091
2,2019-01-01 01:00:00,3,24147,25495,True,5719.0,0.0,6834.0,9356.0,1959.0,...,0.0,0.0,71.3,70.9,42.6,20.9,3.417905,2.895961,2.043539,96.453608
3,2019-01-01 01:30:00,4,23197,24590,True,5020.0,0.0,6830.0,9135.0,1914.0,...,0.0,0.0,72.8,72.8,43.2,19.0,3.838446,3.253586,2.274303,101.5
4,2019-01-01 02:00:00,5,22316,24346,True,4964.0,0.0,6827.0,8912.0,1865.0,...,0.0,0.0,72.4,72.7,42.5,18.9,3.822925,3.242143,2.242546,97.934066
5,2019-01-01 02:30:00,6,21987,24414,True,5113.0,0.0,6832.0,8966.0,1815.0,...,0.0,0.0,71.9,72.3,42.2,19.4,3.710346,3.156464,2.17524,96.408602
6,2019-01-01 03:00:00,7,20958,23695,True,5148.0,0.0,6814.0,8722.0,1763.0,...,0.0,0.0,72.5,71.8,42.1,20.0,3.620047,3.080614,2.099456,96.911111
7,2019-01-01 03:30:00,8,20640,23398,True,4729.0,0.0,6802.0,8705.0,1710.0,...,0.0,0.0,73.6,73.2,42.7,18.7,3.930006,3.354409,2.277649,102.411765
8,2019-01-01 04:00:00,9,20331,23080,True,4910.0,0.0,6784.0,8575.0,1664.0,...,0.0,0.0,73.1,72.5,42.2,19.5,3.754175,3.209572,2.166802,97.443182
9,2019-01-01 04:30:00,10,20095,22835,True,4915.0,0.0,6772.0,8329.0,1628.0,...,0.0,0.0,72.7,72.1,41.5,19.7,3.692167,3.155036,2.108444,92.544444


In [746]:
data_uk_merged_1.tail(48)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,solar_perc,storage_perc,low_carbon_perc,zero_carbon_perc,renewable_perc,fossil_perc,low_vs_fossil,zero_vs_fossil,renewable_vs_fossil,green_score
110215,2025-04-15 03:30:00,8,21260,22209,False,4268.0,0.0,4443.0,4073.0,489.0,...,0.0,0.0,51.1,54.7,20.1,18.6,2.74508,2.007498,1.081068,35.417391
110216,2025-04-15 04:00:00,9,21302,22172,False,4910.0,0.0,4451.0,3742.0,422.0,...,0.0,0.0,50.3,51.0,18.4,21.3,2.35947,1.684318,0.863747,30.672131
110217,2025-04-15 04:30:00,10,21015,21935,False,5794.0,0.0,4446.0,3208.0,391.0,...,0.0,0.0,46.8,46.3,15.7,24.6,1.903693,1.335865,0.636003,23.588235
110218,2025-04-15 05:00:00,11,21480,22326,False,7784.0,0.0,4447.0,2746.0,391.0,...,0.0,0.0,41.8,40.2,12.9,30.4,1.373715,0.944373,0.423304,17.1625
110219,2025-04-15 05:30:00,12,22020,22940,False,9946.0,0.0,4445.0,2450.0,406.0,...,0.2,0.0,37.9,35.0,11.1,36.0,1.052182,0.709833,0.307963,13.844444
110220,2025-04-15 06:00:00,13,24006,24965,False,11959.0,0.0,4448.0,2070.0,409.0,...,0.7,0.0,34.1,30.7,9.6,39.6,0.861945,0.562672,0.241492,11.571429
110221,2025-04-15 06:30:00,14,25743,26940,False,12839.0,0.0,4447.0,2079.0,380.0,...,1.5,0.0,33.4,29.3,10.0,40.7,0.822182,0.524807,0.244723,12.75
110222,2025-04-15 07:00:00,15,27898,29342,False,12838.0,0.0,4449.0,1992.0,347.0,...,2.6,1.3,34.6,30.0,11.8,39.2,0.881757,0.555071,0.301683,14.42132
110223,2025-04-15 07:30:00,16,29076,30547,False,12982.0,0.0,4451.0,2314.0,337.0,...,3.8,1.4,36.1,31.0,13.9,38.5,0.936296,0.578724,0.361655,18.704663
110224,2025-04-15 08:00:00,17,29909,31380,False,13388.0,0.0,4447.0,2422.0,332.0,...,5.3,1.4,36.5,30.5,15.2,38.4,0.948984,0.565282,0.396474,22.160622


In [747]:
# Saving the data to a CSV file
data_uk_merged_1 = dc.drop_nan_rows(data_uk_merged_1)
data_uk_merged_1.to_csv(os.path.join(file_path, "data_uk_merged_generation_demand.csv"), index=False)
print("Data saved to data_uk_merged_generation_demand.csv")

Number of rows dropped due to NaN values: 0
Data saved to data_uk_merged_generation_demand.csv


### Weather data

In [2]:
weather_path = os.path.join(file_path, "weather")

NameError: name 'file_path' is not defined

#### UK

- latitude: 54.727592
- longitude: -2.8458557
- elevation: 71.0
- utc_offset_seconds: 0
- timezone: GMT
- timezone_abbreviation: GMT


latitude,longitude,elevation,utc_offset_seconds,timezone,timezone_abbreviation
54.760002,-2.7,71.0,3600,Europe/London,GMT+1


In [749]:
# Exploring the data
uk_weather_path = os.path.join(weather_path, "uk")

# Read the files in the directory
file_list = glob.glob(os.path.join(uk_weather_path, 'open-meteo-54.73N2.85W71m.csv'))

# Initialize an empty dictionary to store the dataframes
weather = {}

# Loop through the files and read them into a dictionary
for file in file_list:

    # Read the file
    df = pd.read_csv(file)
    weather['uk'] = df

    print(f'Weather data loaded')

Weather data loaded


In [750]:
weather['uk'].head()

Unnamed: 0,time,temperature_2m (°C),relative_humidity_2m (%),apparent_temperature (°C),precipitation (mm),cloud_cover (%),cloud_cover_low (%),cloud_cover_mid (%),wind_speed_10m (km/h),wind_speed_100m (km/h),...,sunshine_duration (s),direct_radiation (W/m²),shortwave_radiation (W/m²),diffuse_radiation (W/m²),shortwave_radiation_instant (W/m²),diffuse_radiation_instant (W/m²),global_tilted_irradiance_instant (W/m²),global_tilted_irradiance (W/m²),direct_normal_irradiance (W/m²),terrestrial_radiation (W/m²)
0,2019-01-01T00:00,9.7,87,5.4,0.0,99,99,0,25.3,41.9,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2019-01-01T01:00,9.8,87,5.3,0.0,99,99,0,26.9,43.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2019-01-01T02:00,9.8,89,5.6,0.0,100,100,1,25.6,41.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019-01-01T03:00,9.9,89,6.0,0.1,100,100,4,23.4,38.6,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019-01-01T04:00,9.8,92,6.2,0.0,59,59,0,22.0,35.7,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [751]:
# Snake case columns
for key, value in weather.items():
    # Rename the columns
    weather[key] = dc.snake(value)

In [752]:
# Let's convert datetime to datetime format
for key, value in weather.items():
    # Rename the columns
    weather[key] = dc.convert_to_datetime(value, columns=["time"]) 

In [753]:
weather['uk'].head(48)

Unnamed: 0,time,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,cloud_cover_low_%,cloud_cover_mid_%,wind_speed_10m_km/h,wind_speed_100m_km/h,...,sunshine_duration_s,direct_radiation_w/m²,shortwave_radiation_w/m²,diffuse_radiation_w/m²,shortwave_radiation_instant_w/m²,diffuse_radiation_instant_w/m²,global_tilted_irradiance_instant_w/m²,global_tilted_irradiance_w/m²,direct_normal_irradiance_w/m²,terrestrial_radiation_w/m²
0,2019-01-01 00:00:00,9.7,87,5.4,0.0,99,99,0,25.3,41.9,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2019-01-01 01:00:00,9.8,87,5.3,0.0,99,99,0,26.9,43.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2019-01-01 02:00:00,9.8,89,5.6,0.0,100,100,1,25.6,41.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019-01-01 03:00:00,9.9,89,6.0,0.1,100,100,4,23.4,38.6,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019-01-01 04:00:00,9.8,92,6.2,0.0,59,59,0,22.0,35.7,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,2019-01-01 05:00:00,9.1,94,5.7,0.0,99,99,0,20.4,34.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,2019-01-01 06:00:00,8.5,80,4.3,0.0,3,3,0,21.2,34.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,2019-01-01 07:00:00,6.4,79,2.2,0.0,1,1,0,17.8,32.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,2019-01-01 08:00:00,5.0,85,1.3,0.0,2,2,0,13.7,27.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,2019-01-01 09:00:00,4.1,90,0.8,0.0,1,1,0,10.9,23.4,...,0.0,1.0,2.0,1.0,7.1,3.5,1.2,2.0,19.0,12.7


In [754]:
from zoneinfo import ZoneInfo
weather['uk']["time"] = weather['uk']["time"].dt.tz_localize('UTC')
weather['uk']["time_london"] = weather['uk']["time"].dt.tz_convert('Europe/London')

In [755]:
weather['uk'].head(5000)

Unnamed: 0,time,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,cloud_cover_low_%,cloud_cover_mid_%,wind_speed_10m_km/h,wind_speed_100m_km/h,...,direct_radiation_w/m²,shortwave_radiation_w/m²,diffuse_radiation_w/m²,shortwave_radiation_instant_w/m²,diffuse_radiation_instant_w/m²,global_tilted_irradiance_instant_w/m²,global_tilted_irradiance_w/m²,direct_normal_irradiance_w/m²,terrestrial_radiation_w/m²,time_london
0,2019-01-01 00:00:00+00:00,9.7,87,5.4,0.0,99,99,0,25.3,41.9,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01 00:00:00+00:00
1,2019-01-01 01:00:00+00:00,9.8,87,5.3,0.0,99,99,0,26.9,43.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01 01:00:00+00:00
2,2019-01-01 02:00:00+00:00,9.8,89,5.6,0.0,100,100,1,25.6,41.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01 02:00:00+00:00
3,2019-01-01 03:00:00+00:00,9.9,89,6.0,0.1,100,100,4,23.4,38.6,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01 03:00:00+00:00
4,2019-01-01 04:00:00+00:00,9.8,92,6.2,0.0,59,59,0,22.0,35.7,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01 04:00:00+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2019-07-28 03:00:00+00:00,15.2,96,15.3,0.7,100,100,99,9.5,16.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-07-28 04:00:00+01:00
4996,2019-07-28 04:00:00+00:00,14.9,97,14.9,0.5,100,100,74,9.3,16.3,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-07-28 05:00:00+01:00
4997,2019-07-28 05:00:00+00:00,14.8,97,14.8,0.5,100,100,53,9.8,14.7,...,0.0,4.0,4.0,9.9,9.9,4.3,4.0,0.0,45.3,2019-07-28 06:00:00+01:00
4998,2019-07-28 06:00:00+00:00,15.2,97,15.6,0.1,100,100,19,7.9,9.5,...,1.0,20.0,19.0,29.2,27.7,29.2,20.0,6.5,203.4,2019-07-28 07:00:00+01:00


In [756]:
weather['uk'].tail()

Unnamed: 0,time,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,cloud_cover_low_%,cloud_cover_mid_%,wind_speed_10m_km/h,wind_speed_100m_km/h,...,direct_radiation_w/m²,shortwave_radiation_w/m²,diffuse_radiation_w/m²,shortwave_radiation_instant_w/m²,diffuse_radiation_instant_w/m²,global_tilted_irradiance_instant_w/m²,global_tilted_irradiance_w/m²,direct_normal_irradiance_w/m²,terrestrial_radiation_w/m²,time_london
54763,2025-03-31 19:00:00+00:00,10.2,93,9.6,0.0,100,27,2,2.9,10.0,...,2.0,14.0,12.0,0.0,0.0,0.0,14.0,22.9,60.0,2025-03-31 20:00:00+01:00
54764,2025-03-31 20:00:00+00:00,9.6,93,8.8,0.0,100,25,0,2.7,9.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2025-03-31 21:00:00+01:00
54765,2025-03-31 21:00:00+00:00,8.5,96,7.2,0.0,100,15,0,4.8,10.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2025-03-31 22:00:00+01:00
54766,2025-03-31 22:00:00+00:00,7.6,96,5.9,0.0,100,16,2,6.3,13.7,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2025-03-31 23:00:00+01:00
54767,2025-03-31 23:00:00+00:00,7.8,93,5.9,0.0,100,6,0,6.9,15.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2025-04-01 00:00:00+01:00


In [757]:
# Extracting necessary columns
weather = dc.extract_columns(weather,["time_london", "temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", \
                                      "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", \
                                        "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"])

Columns ['time_london', 'temperature_2m_°c', 'relative_humidity_2m_%', 'apparent_temperature_°c', 'precipitation_mm', 'cloud_cover_%', 'wind_speed_10m_km/h', 'wind_speed_100m_km/h', 'wind_direction_10m_°', 'wind_direction_100m_°', 'global_tilted_irradiance_w/m²', 'direct_radiation_w/m²', 'terrestrial_radiation_w/m²', 'shortwave_radiation_w/m²', 'direct_normal_irradiance_w/m²', 'diffuse_radiation_w/m²'] extracted from data uk


In [758]:
# Check the data types, missing values, and duplicates
ck.check(weather['uk'])

Number of columns: 16 and rows: 54768

Data types:
time_london                      datetime64[ns, Europe/London]
temperature_2m_°c                                      float64
relative_humidity_2m_%                                   int64
apparent_temperature_°c                                float64
precipitation_mm                                       float64
cloud_cover_%                                            int64
wind_speed_10m_km/h                                    float64
wind_speed_100m_km/h                                   float64
wind_direction_10m_°                                     int64
wind_direction_100m_°                                    int64
global_tilted_irradiance_w/m²                          float64
direct_radiation_w/m²                                  float64
terrestrial_radiation_w/m²                             float64
shortwave_radiation_w/m²                               float64
direct_normal_irradiance_w/m²                          float64
diff

In [759]:
weather['uk'].tail(100)

Unnamed: 0,time_london,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,wind_speed_10m_km/h,wind_speed_100m_km/h,wind_direction_10m_°,wind_direction_100m_°,global_tilted_irradiance_w/m²,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²
54668,2025-03-27 20:00:00+00:00,10.2,88,4.7,0.3,100,34.6,52.0,207,211,0.0,0.0,0.0,0.0,0.0,0.0
54669,2025-03-27 21:00:00+00:00,10.0,92,5.3,2.4,100,30.3,48.2,220,222,0.0,0.0,0.0,0.0,0.0,0.0
54670,2025-03-27 22:00:00+00:00,9.9,91,5.4,2.9,100,28.0,45.6,227,230,0.0,0.0,0.0,0.0,0.0,0.0
54671,2025-03-27 23:00:00+00:00,9.5,92,5.1,1.2,100,26.9,44.1,227,230,0.0,0.0,0.0,0.0,0.0,0.0
54672,2025-03-28 00:00:00+00:00,9.1,92,5.2,0.6,100,22.4,37.1,231,234,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54763,2025-03-31 20:00:00+01:00,10.2,93,9.6,0.0,100,2.9,10.0,60,70,14.0,2.0,60.0,14.0,22.9,12.0
54764,2025-03-31 21:00:00+01:00,9.6,93,8.8,0.0,100,2.7,9.8,113,104,0.0,0.0,0.0,0.0,0.0,0.0
54765,2025-03-31 22:00:00+01:00,8.5,96,7.2,0.0,100,4.8,10.8,149,136,0.0,0.0,0.0,0.0,0.0,0.0
54766,2025-03-31 23:00:00+01:00,7.6,96,5.9,0.0,100,6.3,13.7,167,150,0.0,0.0,0.0,0.0,0.0,0.0


In [760]:
weather['uk'] = dc.check_time_increase_in_weather(weather['uk'], "time_london")

🕑 Added missing row for: 2019-03-31 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2019-10-27 01:00:00 → averaged values
🕑 Added missing row for: 2020-03-29 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2020-10-25 01:00:00 → averaged values
🕑 Added missing row for: 2021-03-28 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2021-10-31 01:00:00 → averaged values
🕑 Added missing row for: 2022-03-27 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2022-10-30 01:00:00 → averaged values
🕑 Added missing row for: 2023-03-26 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2023-10-29 01:00:00 → averaged values
🕑 Added missing row for: 2024-03-31 01:00:00 → interpolated values
⚠️ Removed duplicate for time: 2024-10-27 01:00:00 → averaged values
🕑 Added missing row for: 2025-03-30 01:00:00 → interpolated values


In [761]:
weather['uk'].tail(100)

Unnamed: 0,time_london,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,wind_speed_10m_km/h,wind_speed_100m_km/h,wind_direction_10m_°,wind_direction_100m_°,global_tilted_irradiance_w/m²,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²
54669,2025-03-27 21:00:00,10.0,92.0,5.3,2.4,100.0,30.3,48.2,220.0,222.0,0.0,0.0,0.0,0.0,0.0,0.0
54670,2025-03-27 22:00:00,9.9,91.0,5.4,2.9,100.0,28.0,45.6,227.0,230.0,0.0,0.0,0.0,0.0,0.0,0.0
54671,2025-03-27 23:00:00,9.5,92.0,5.1,1.2,100.0,26.9,44.1,227.0,230.0,0.0,0.0,0.0,0.0,0.0,0.0
54672,2025-03-28 00:00:00,9.1,92.0,5.2,0.6,100.0,22.4,37.1,231.0,234.0,0.0,0.0,0.0,0.0,0.0,0.0
54673,2025-03-28 01:00:00,9.2,91.0,5.2,0.6,100.0,24.0,36.6,235.0,238.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54764,2025-03-31 20:00:00,10.2,93.0,9.6,0.0,100.0,2.9,10.0,60.0,70.0,14.0,2.0,60.0,14.0,22.9,12.0
54765,2025-03-31 21:00:00,9.6,93.0,8.8,0.0,100.0,2.7,9.8,113.0,104.0,0.0,0.0,0.0,0.0,0.0,0.0
54766,2025-03-31 22:00:00,8.5,96.0,7.2,0.0,100.0,4.8,10.8,149.0,136.0,0.0,0.0,0.0,0.0,0.0,0.0
54767,2025-03-31 23:00:00,7.6,96.0,5.9,0.0,100.0,6.3,13.7,167.0,150.0,0.0,0.0,0.0,0.0,0.0,0.0


In [762]:
# The data is in 1 h intervals, so we need to forward fill the data to get the values for each settlement period
# Since resample dont resample beyond the last date, we need to add a point beyond the last time value, so we have the resample for the last value
# This will be erased after the resample
weather['uk'] = dc.resample(weather['uk'], col='time_london')

In [763]:
weather['uk'].head(48)

Unnamed: 0,time_london,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,wind_speed_10m_km/h,wind_speed_100m_km/h,wind_direction_10m_°,wind_direction_100m_°,global_tilted_irradiance_w/m²,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²
0,2019-01-01 00:00:00,9.7,87.0,5.4,0.0,99.0,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2019-01-01 00:30:00,9.7,87.0,5.4,0.0,99.0,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2019-01-01 01:00:00,9.8,87.0,5.3,0.0,99.0,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019-01-01 01:30:00,9.8,87.0,5.3,0.0,99.0,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019-01-01 02:00:00,9.8,89.0,5.6,0.0,100.0,25.6,41.8,266.0,269.0,0.0,0.0,0.0,0.0,0.0,0.0
5,2019-01-01 02:30:00,9.8,89.0,5.6,0.0,100.0,25.6,41.8,266.0,269.0,0.0,0.0,0.0,0.0,0.0,0.0
6,2019-01-01 03:00:00,9.9,89.0,6.0,0.1,100.0,23.4,38.6,272.0,274.0,0.0,0.0,0.0,0.0,0.0,0.0
7,2019-01-01 03:30:00,9.9,89.0,6.0,0.1,100.0,23.4,38.6,272.0,274.0,0.0,0.0,0.0,0.0,0.0,0.0
8,2019-01-01 04:00:00,9.8,92.0,6.2,0.0,59.0,22.0,35.7,280.0,282.0,0.0,0.0,0.0,0.0,0.0,0.0
9,2019-01-01 04:30:00,9.8,92.0,6.2,0.0,59.0,22.0,35.7,280.0,282.0,0.0,0.0,0.0,0.0,0.0,0.0


In [764]:
ck.check(weather['uk'])

Number of columns: 16 and rows: 109538

Data types:
time_london                      datetime64[ns]
temperature_2m_°c                       float64
relative_humidity_2m_%                  float64
apparent_temperature_°c                 float64
precipitation_mm                        float64
cloud_cover_%                           float64
wind_speed_10m_km/h                     float64
wind_speed_100m_km/h                    float64
wind_direction_10m_°                    float64
wind_direction_100m_°                   float64
global_tilted_irradiance_w/m²           float64
direct_radiation_w/m²                   float64
terrestrial_radiation_w/m²              float64
shortwave_radiation_w/m²                float64
direct_normal_irradiance_w/m²           float64
diffuse_radiation_w/m²                  float64
dtype: object

Unique values count:
time_london                      109538
temperature_2m_°c                   357
relative_humidity_2m_%               81
apparent_temperature_°c 

In [765]:

weather['uk'] = weather['uk'].rename(columns={"time_london": "time"})

In [766]:
ck.check(weather['uk'])

Number of columns: 16 and rows: 109538

Data types:
time                             datetime64[ns]
temperature_2m_°c                       float64
relative_humidity_2m_%                  float64
apparent_temperature_°c                 float64
precipitation_mm                        float64
cloud_cover_%                           float64
wind_speed_10m_km/h                     float64
wind_speed_100m_km/h                    float64
wind_direction_10m_°                    float64
wind_direction_100m_°                   float64
global_tilted_irradiance_w/m²           float64
direct_radiation_w/m²                   float64
terrestrial_radiation_w/m²              float64
shortwave_radiation_w/m²                float64
direct_normal_irradiance_w/m²           float64
diffuse_radiation_w/m²                  float64
dtype: object

Unique values count:
time                             109538
temperature_2m_°c                   357
relative_humidity_2m_%               81
apparent_temperature_°c 

In [767]:
# Saving the data to a CSV file
weather['uk'].to_csv(os.path.join(uk_weather_path, "uk_weather_fix.csv"), index=False)
print("Data saved to uk_weather_fix.csv.csv")

Data saved to uk_weather_fix.csv.csv


In [768]:
# Extracting setellment period and settlement date from datetime
weather['uk'] = dc.extract_settlement_period_and_date(weather['uk'], "time")

In [769]:
weather['uk'].head()

Unnamed: 0,temperature_2m_°c,relative_humidity_2m_%,apparent_temperature_°c,precipitation_mm,cloud_cover_%,wind_speed_10m_km/h,wind_speed_100m_km/h,wind_direction_10m_°,wind_direction_100m_°,global_tilted_irradiance_w/m²,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²,settlement_date,settlement_period
0,9.7,87.0,5.4,0.0,99.0,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01,1
1,9.7,87.0,5.4,0.0,99.0,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01,2
2,9.8,87.0,5.3,0.0,99.0,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01,3
3,9.8,87.0,5.3,0.0,99.0,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01,4
4,9.8,89.0,5.6,0.0,100.0,25.6,41.8,266.0,269.0,0.0,0.0,0.0,0.0,0.0,0.0,2019-01-01,5


In [770]:
ck.check(weather['uk'])

Number of columns: 17 and rows: 109538

Data types:
temperature_2m_°c                       float64
relative_humidity_2m_%                  float64
apparent_temperature_°c                 float64
precipitation_mm                        float64
cloud_cover_%                           float64
wind_speed_10m_km/h                     float64
wind_speed_100m_km/h                    float64
wind_direction_10m_°                    float64
wind_direction_100m_°                   float64
global_tilted_irradiance_w/m²           float64
direct_radiation_w/m²                   float64
terrestrial_radiation_w/m²              float64
shortwave_radiation_w/m²                float64
direct_normal_irradiance_w/m²           float64
diffuse_radiation_w/m²                  float64
settlement_date                  datetime64[ns]
settlement_period                         int32
dtype: object

Unique values count:
temperature_2m_°c                  357
relative_humidity_2m_%              81
apparent_temperatu

In [771]:
weather_uk_shifted = weather['uk'].copy()
# Adding rolling and lag features
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=3)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=1)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=12)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=6)


weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=1)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=3)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=2)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=5)


weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='weeks', window_size=1)
weather_uk_shifted = dc.create_rolling_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='weeks', window_size=2)

#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=3)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=1)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=12)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='hours', window_size=6)

#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=1)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=3)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=2)
#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='days', window_size=5)

#weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='weeks', window_size=1)
weather_uk_shifted = dc.create_lag_features(weather_uk_shifted, columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"], type='weeks', window_size=2)

weather_uk_shifted = weather_uk_shifted.dropna()
weather_uk_shifted = weather_uk_shifted.drop(columns=["temperature_2m_°c", "relative_humidity_2m_%", "apparent_temperature_°c", "precipitation_mm", "cloud_cover_%", "wind_speed_10m_km/h", "wind_speed_100m_km/h", "wind_direction_10m_°", "wind_direction_100m_°", "global_tilted_irradiance_w/m²", "direct_radiation_w/m²", "terrestrial_radiation_w/m²", "shortwave_radiation_w/m²", "direct_normal_irradiance_w/m²", "diffuse_radiation_w/m²"])
weather_uk_shifted.head()

Unnamed: 0,settlement_date,settlement_period,temperature_2m_°c_rolling_3_hours_for_2_week,temperature_2m_°c_rolling_std_3_hours_for_2_week,relative_humidity_2m_%_rolling_3_hours_for_2_week,relative_humidity_2m_%_rolling_std_3_hours_for_2_week,apparent_temperature_°c_rolling_3_hours_for_2_week,apparent_temperature_°c_rolling_std_3_hours_for_2_week,precipitation_mm_rolling_3_hours_for_2_week,precipitation_mm_rolling_std_3_hours_for_2_week,...,wind_speed_10m_km/h_lag_2_weeks,wind_speed_100m_km/h_lag_2_weeks,wind_direction_10m_°_lag_2_weeks,wind_direction_100m_°_lag_2_weeks,global_tilted_irradiance_w/m²_lag_2_weeks,direct_radiation_w/m²_lag_2_weeks,terrestrial_radiation_w/m²_lag_2_weeks,shortwave_radiation_w/m²_lag_2_weeks,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks
1343,2019-01-28,48,6.833333,0.508593,89.0,0.894427,3.4,0.473286,0.033333,0.05164,...,16.0,27.6,247.0,253.0,0.0,0.0,0.0,0.0,0.0,0.0
1344,2019-01-29,1,7.066667,0.480278,88.666667,0.816497,3.6,0.419524,0.033333,0.05164,...,17.9,30.6,245.0,249.0,0.0,0.0,0.0,0.0,0.0,0.0
1345,2019-01-29,2,7.3,0.268328,88.333333,0.516398,3.8,0.178885,0.033333,0.05164,...,17.9,30.6,245.0,249.0,0.0,0.0,0.0,0.0,0.0,0.0
1346,2019-01-29,3,7.416667,0.263944,88.5,0.547723,3.85,0.151658,0.016667,0.040825,...,18.8,31.8,244.0,248.0,0.0,0.0,0.0,0.0,0.0,0.0
1347,2019-01-29,4,7.533333,0.18619,88.666667,0.516398,3.9,0.089443,0.0,0.0,...,18.8,31.8,244.0,248.0,0.0,0.0,0.0,0.0,0.0,0.0


In [772]:
ck.check(weather_uk_shifted)

Number of columns: 317 and rows: 108195

Data types:
settlement_date                                      datetime64[ns]
settlement_period                                             int32
temperature_2m_°c_rolling_3_hours_for_2_week                float64
temperature_2m_°c_rolling_std_3_hours_for_2_week            float64
relative_humidity_2m_%_rolling_3_hours_for_2_week           float64
                                                          ...      
direct_radiation_w/m²_lag_2_weeks                           float64
terrestrial_radiation_w/m²_lag_2_weeks                      float64
shortwave_radiation_w/m²_lag_2_weeks                        float64
direct_normal_irradiance_w/m²_lag_2_weeks                   float64
diffuse_radiation_w/m²_lag_2_weeks                          float64
Length: 317, dtype: object

Unique values count:
settlement_date                                        2256
settlement_period                                        48
temperature_2m_°c_rolling_3_ho

In [773]:
weather_uk_shifted.tail()

Unnamed: 0,settlement_date,settlement_period,temperature_2m_°c_rolling_3_hours_for_2_week,temperature_2m_°c_rolling_std_3_hours_for_2_week,relative_humidity_2m_%_rolling_3_hours_for_2_week,relative_humidity_2m_%_rolling_std_3_hours_for_2_week,apparent_temperature_°c_rolling_3_hours_for_2_week,apparent_temperature_°c_rolling_std_3_hours_for_2_week,precipitation_mm_rolling_3_hours_for_2_week,precipitation_mm_rolling_std_3_hours_for_2_week,...,wind_speed_10m_km/h_lag_2_weeks,wind_speed_100m_km/h_lag_2_weeks,wind_direction_10m_°_lag_2_weeks,wind_direction_100m_°_lag_2_weeks,global_tilted_irradiance_w/m²_lag_2_weeks,direct_radiation_w/m²_lag_2_weeks,terrestrial_radiation_w/m²_lag_2_weeks,shortwave_radiation_w/m²_lag_2_weeks,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks
109533,2025-03-31,46,3.2,1.209959,83.0,4.732864,-0.7,1.388524,0.0,0.0,...,13.2,26.9,130.0,130.0,0.0,0.0,0.0,0.0,0.0,0.0
109534,2025-03-31,47,2.6,1.268069,85.0,4.195235,-1.366667,1.426417,0.0,0.0,...,12.5,26.1,134.0,134.0,0.0,0.0,0.0,0.0,0.0,0.0
109535,2025-03-31,48,2.0,0.942338,87.0,1.788854,-2.033333,1.036661,0.0,0.0,...,12.5,26.1,134.0,134.0,0.0,0.0,0.0,0.0,0.0,0.0
109536,2025-04-01,1,1.55,0.956556,88.0,2.097618,-2.516667,1.022578,0.0,0.0,...,12.3,26.0,136.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0
109537,2025-04-01,2,1.1,0.675278,89.0,1.788854,-3.0,0.675278,0.0,0.0,...,12.3,26.0,136.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0


In [774]:
null_w = pd.DataFrame(weather_uk_shifted.isnull().sum())
null_w.head(500)

Unnamed: 0,0
settlement_date,0
settlement_period,0
temperature_2m_°c_rolling_3_hours_for_2_week,0
temperature_2m_°c_rolling_std_3_hours_for_2_week,0
relative_humidity_2m_%_rolling_3_hours_for_2_week,0
...,...
direct_radiation_w/m²_lag_2_weeks,0
terrestrial_radiation_w/m²_lag_2_weeks,0
shortwave_radiation_w/m²_lag_2_weeks,0
direct_normal_irradiance_w/m²_lag_2_weeks,0


In [775]:
data_uk_mergedshifted_full = pd.merge(data_uk_mergedshifted, weather_uk_shifted, how="left", left_on=["settlement_date", "settlement_period"], right_on=["settlement_date", "settlement_period"])
data_uk_mergedshifted_full = data_uk_mergedshifted_full.sort_values(by=["settlement_date", "settlement_period"])
data_uk_mergedshifted_full = data_uk_mergedshifted_full.reset_index(drop=True)
data_uk_mergedshifted_full["is_bank_holiday"] = data_uk_mergedshifted_full["is_bank_holiday"].astype(int)
data_uk_mergedshifted_full = data_uk_mergedshifted_full.dropna()
data_uk_mergedshifted_full.head()

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,wind_speed_10m_km/h_lag_2_weeks,wind_speed_100m_km/h_lag_2_weeks,wind_direction_10m_°_lag_2_weeks,wind_direction_100m_°_lag_2_weeks,global_tilted_irradiance_w/m²_lag_2_weeks,direct_radiation_w/m²_lag_2_weeks,terrestrial_radiation_w/m²_lag_2_weeks,shortwave_radiation_w/m²_lag_2_weeks,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks
0,2019-01-28,48,29402,0,31266.833333,3993.244867,32476.166667,3857.737338,27109.5,1038.739862,...,16.0,27.6,247.0,253.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2019-01-29,1,29041,0,29438.5,3355.971677,30706.0,3293.081718,26155.0,311.126984,...,17.9,30.6,245.0,249.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2019-01-29,2,29560,0,28091.666667,2461.10444,29388.666667,2497.164045,26135.0,282.842712,...,17.9,30.6,245.0,249.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019-01-29,3,29424,0,27070.833333,1581.604428,28402.0,1543.215345,26152.5,258.093975,...,18.8,31.8,244.0,248.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019-01-29,4,28844,0,26287.833333,859.88916,27755.833333,601.221562,25619.0,496.38896,...,18.8,31.8,244.0,248.0,0.0,0.0,0.0,0.0,0.0,0.0


In [776]:
null_t = pd.DataFrame(data_uk_mergedshifted_full.isnull().sum())
null_t.head(1000)

Unnamed: 0,0
settlement_date,0
settlement_period,0
nd,0
is_bank_holiday,0
nd_rolling_3_hours_for_2_week,0
...,...
direct_radiation_w/m²_lag_2_weeks,0
terrestrial_radiation_w/m²_lag_2_weeks,0
shortwave_radiation_w/m²_lag_2_weeks,0
direct_normal_irradiance_w/m²_lag_2_weeks,0


In [777]:
data_uk_mergedshifted_full.tail(200)

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,wind_speed_10m_km/h_lag_2_weeks,wind_speed_100m_km/h_lag_2_weeks,wind_direction_10m_°_lag_2_weeks,wind_direction_100m_°_lag_2_weeks,global_tilted_irradiance_w/m²_lag_2_weeks,direct_radiation_w/m²_lag_2_weeks,terrestrial_radiation_w/m²_lag_2_weeks,shortwave_radiation_w/m²_lag_2_weeks,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks
107995,2025-03-27,43,28524,0,38032.500000,2029.596290,40140.833333,2022.289734,35812.5,1373.908476,...,4.2,12.1,59.0,55.0,0.0,0.0,0.0,0.0,0.0,0.0
107996,2025-03-27,44,26652,0,36936.166667,2391.082718,39045.666667,2388.061194,34214.5,886.004797,...,4.2,12.1,59.0,55.0,0.0,0.0,0.0,0.0,0.0,0.0
107997,2025-03-27,45,25227,0,35623.666667,2626.154806,37733.333333,2625.825559,32778.0,1145.512986,...,1.2,6.8,129.0,78.0,0.0,0.0,0.0,0.0,0.0,0.0
107998,2025-03-27,46,23684,0,34128.666667,2892.813002,36239.333333,2893.686622,30974.5,1405.021174,...,1.2,6.8,129.0,78.0,0.0,0.0,0.0,0.0,0.0,0.0
107999,2025-03-27,47,22077,0,32585.166667,3125.135096,34735.333333,3062.956720,29165.0,1153.998267,...,2.4,2.0,222.0,180.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
108190,2025-03-31,46,25947,0,32717.000000,3367.281337,34480.166667,3309.498416,29023.0,1329.360749,...,13.2,26.9,130.0,130.0,0.0,0.0,0.0,0.0,0.0,0.0
108191,2025-03-31,47,24338,0,30944.166667,3433.649220,32782.666667,3299.584257,27244.5,1185.818072,...,12.5,26.1,134.0,134.0,0.0,0.0,0.0,0.0,0.0,0.0
108192,2025-03-31,48,23406,0,29324.333333,3236.505255,31220.000000,3064.040861,26009.5,560.735677,...,12.5,26.1,134.0,134.0,0.0,0.0,0.0,0.0,0.0,0.0
108193,2025-04-01,1,22871,0,27981.666667,2535.912906,29986.500000,2388.486697,25746.5,188.797511,...,12.3,26.0,136.0,135.0,0.0,0.0,0.0,0.0,0.0,0.0


In [778]:
data_uk_mergedshifted_full['is_bank_holiday'][data_uk_mergedshifted_full['is_bank_holiday'].isnull()]

Series([], Name: is_bank_holiday, dtype: int64)

In [779]:
data_uk_mergedshifted_full.isnull().sum()

settlement_date                              0
settlement_period                            0
nd                                           0
is_bank_holiday                              0
nd_rolling_3_hours_for_2_week                0
                                            ..
direct_radiation_w/m²_lag_2_weeks            0
terrestrial_radiation_w/m²_lag_2_weeks       0
shortwave_radiation_w/m²_lag_2_weeks         0
direct_normal_irradiance_w/m²_lag_2_weeks    0
diffuse_radiation_w/m²_lag_2_weeks           0
Length: 805, dtype: int64

In [780]:
full_data_uk_merged = pd.merge(data_uk_merged, weather['uk'], how="left", left_on=["settlement_date", "settlement_period"], right_on=["settlement_date", "settlement_period"])
full_data_uk_merged.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,wind_speed_10m_km/h,wind_speed_100m_km/h,wind_direction_10m_°,wind_direction_100m_°,global_tilted_irradiance_w/m²,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²
0,2019-01-01,1,23808,25291,True,5853.0,0.0,6924.0,9236.0,2031.0,...,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2019-01-01,2,24402,25720,True,6292.0,0.0,6838.0,9297.0,1997.0,...,25.3,41.9,264.0,266.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2019-01-01,3,24147,25495,True,5719.0,0.0,6834.0,9356.0,1959.0,...,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019-01-01,4,23197,24590,True,5020.0,0.0,6830.0,9135.0,1914.0,...,26.9,43.4,262.0,265.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019-01-01,5,22316,24346,True,4964.0,0.0,6827.0,8912.0,1865.0,...,25.6,41.8,266.0,269.0,0.0,0.0,0.0,0.0,0.0,0.0


In [781]:
ck.check(full_data_uk_merged)

Number of columns: 56 and rows: 110263

Data types:
settlement_date                  datetime64[ns]
settlement_period                         int64
nd                                        int64
tsd                                       int64
is_bank_holiday                            bool
gas                                     float64
coal                                    float64
nuclear                                 float64
wind                                    float64
wind_emb                                float64
hydro                                     int64
imports                                 float64
biomass                                 float64
other                                   float64
solar                                   float64
storage                                   int64
generation                              float64
carbon_intensity                        float64
low_carbon                              float64
zero_carbon                         

In [782]:
full_data_uk_merged["is_bank_holiday"] = full_data_uk_merged["is_bank_holiday"].astype(int)
ck.check(full_data_uk_merged)

Number of columns: 56 and rows: 110263

Data types:
settlement_date                  datetime64[ns]
settlement_period                         int64
nd                                        int64
tsd                                       int64
is_bank_holiday                           int64
gas                                     float64
coal                                    float64
nuclear                                 float64
wind                                    float64
wind_emb                                float64
hydro                                     int64
imports                                 float64
biomass                                 float64
other                                   float64
solar                                   float64
storage                                   int64
generation                              float64
carbon_intensity                        float64
low_carbon                              float64
zero_carbon                         

In [783]:
data_uk_mergedshifted_full
data_uk_mergedshifted_full = dc.preprocess_datetime(data_uk_mergedshifted_full, "settlement_date")

In [784]:
# Let's split the date into separate columns for year, month, day of week and day
full_data_uk_merged = dc.preprocess_datetime(full_data_uk_merged, "settlement_date")
full_data_uk_merged.head(48)

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,direct_radiation_w/m²,terrestrial_radiation_w/m²,shortwave_radiation_w/m²,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²,year,day,month,day_of_week,season
0,2019-01-01 00:00:00,1,23808,25291,1,5853.0,0.0,6924.0,9236.0,2031.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
1,2019-01-01 00:30:00,2,24402,25720,1,6292.0,0.0,6838.0,9297.0,1997.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
2,2019-01-01 01:00:00,3,24147,25495,1,5719.0,0.0,6834.0,9356.0,1959.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
3,2019-01-01 01:30:00,4,23197,24590,1,5020.0,0.0,6830.0,9135.0,1914.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
4,2019-01-01 02:00:00,5,22316,24346,1,4964.0,0.0,6827.0,8912.0,1865.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
5,2019-01-01 02:30:00,6,21987,24414,1,5113.0,0.0,6832.0,8966.0,1815.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
6,2019-01-01 03:00:00,7,20958,23695,1,5148.0,0.0,6814.0,8722.0,1763.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
7,2019-01-01 03:30:00,8,20640,23398,1,4729.0,0.0,6802.0,8705.0,1710.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
8,2019-01-01 04:00:00,9,20331,23080,1,4910.0,0.0,6784.0,8575.0,1664.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter
9,2019-01-01 04:30:00,10,20095,22835,1,4915.0,0.0,6772.0,8329.0,1628.0,...,0.0,0.0,0.0,0.0,0.0,2019,1,1,2,Winter


In [785]:

data_uk_mergedshifted_full = dc.one_hot_encode(data_uk_mergedshifted_full, 'season')
data_uk_mergedshifted_full.head()

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks,year,day,month,day_of_week,season_Autumn,season_Spring,season_Summer,season_Winter
0,2019-01-28 23:30:00,48,29402,0,31266.833333,3993.244867,32476.166667,3857.737338,27109.5,1038.739862,...,0.0,0.0,2019,28,1,1,0,0,0,1
1,2019-01-29 00:00:00,1,29041,0,29438.5,3355.971677,30706.0,3293.081718,26155.0,311.126984,...,0.0,0.0,2019,29,1,2,0,0,0,1
2,2019-01-29 00:30:00,2,29560,0,28091.666667,2461.10444,29388.666667,2497.164045,26135.0,282.842712,...,0.0,0.0,2019,29,1,2,0,0,0,1
3,2019-01-29 01:00:00,3,29424,0,27070.833333,1581.604428,28402.0,1543.215345,26152.5,258.093975,...,0.0,0.0,2019,29,1,2,0,0,0,1
4,2019-01-29 01:30:00,4,28844,0,26287.833333,859.88916,27755.833333,601.221562,25619.0,496.38896,...,0.0,0.0,2019,29,1,2,0,0,0,1


In [786]:
full_data_uk_merged = dc.one_hot_encode(full_data_uk_merged, 'season')
full_data_uk_merged.head()

Unnamed: 0,settlement_date,settlement_period,nd,tsd,is_bank_holiday,gas,coal,nuclear,wind,wind_emb,...,direct_normal_irradiance_w/m²,diffuse_radiation_w/m²,year,day,month,day_of_week,season_Autumn,season_Spring,season_Summer,season_Winter
0,2019-01-01 00:00:00,1,23808,25291,1,5853.0,0.0,6924.0,9236.0,2031.0,...,0.0,0.0,2019,1,1,2,0,0,0,1
1,2019-01-01 00:30:00,2,24402,25720,1,6292.0,0.0,6838.0,9297.0,1997.0,...,0.0,0.0,2019,1,1,2,0,0,0,1
2,2019-01-01 01:00:00,3,24147,25495,1,5719.0,0.0,6834.0,9356.0,1959.0,...,0.0,0.0,2019,1,1,2,0,0,0,1
3,2019-01-01 01:30:00,4,23197,24590,1,5020.0,0.0,6830.0,9135.0,1914.0,...,0.0,0.0,2019,1,1,2,0,0,0,1
4,2019-01-01 02:00:00,5,22316,24346,1,4964.0,0.0,6827.0,8912.0,1865.0,...,0.0,0.0,2019,1,1,2,0,0,0,1


In [787]:
ck.check(full_data_uk_merged)

Number of columns: 64 and rows: 110263

Data types:
settlement_date      datetime64[ns]
settlement_period             int64
nd                            int64
tsd                           int64
is_bank_holiday               int64
                          ...      
day_of_week                   int64
season_Autumn                 int64
season_Spring                 int64
season_Summer                 int64
season_Winter                 int64
Length: 64, dtype: object

Unique values count:
settlement_date      110263
settlement_period        48
nd                    26012
tsd                   25176
is_bank_holiday           2
                      ...  
day_of_week               7
season_Autumn             2
season_Spring             2
season_Summer             2
season_Winter             2
Length: 64, dtype: int64

These columns appear to be categorical (less than 20 unique values):
Index(['is_bank_holiday', 'year', 'month', 'day_of_week', 'season_Autumn',
       'season_Spring', 's

### Saving Data

In [788]:
# Saving the data to a CSV file
#full_data_uk_merged.to_csv(os.path.join(file_path, "full_data_uk_merged_without_price.csv"), index=False)
#print("Data saved to full_data_uk_merged_without_price.csv")

In [789]:

data_uk_mergedshifted_full.to_csv(os.path.join(file_path, "full_data_uk_merged_without_price_ml_2_week.csv"), index=False)
print("Data saved to full_data_uk_merged_without_price_ml_2_week.csv")


Data saved to full_data_uk_merged_without_price_ml_2_week.csv


In [790]:
data_uk_mergedshifted_full.head()

Unnamed: 0,settlement_date,settlement_period,nd,is_bank_holiday,nd_rolling_3_hours_for_2_week,nd_rolling_std_3_hours_for_2_week,tsd_rolling_3_hours_for_2_week,tsd_rolling_std_3_hours_for_2_week,nd_rolling_1_hours_for_2_week,nd_rolling_std_1_hours_for_2_week,...,direct_normal_irradiance_w/m²_lag_2_weeks,diffuse_radiation_w/m²_lag_2_weeks,year,day,month,day_of_week,season_Autumn,season_Spring,season_Summer,season_Winter
0,2019-01-28 23:30:00,48,29402,0,31266.833333,3993.244867,32476.166667,3857.737338,27109.5,1038.739862,...,0.0,0.0,2019,28,1,1,0,0,0,1
1,2019-01-29 00:00:00,1,29041,0,29438.5,3355.971677,30706.0,3293.081718,26155.0,311.126984,...,0.0,0.0,2019,29,1,2,0,0,0,1
2,2019-01-29 00:30:00,2,29560,0,28091.666667,2461.10444,29388.666667,2497.164045,26135.0,282.842712,...,0.0,0.0,2019,29,1,2,0,0,0,1
3,2019-01-29 01:00:00,3,29424,0,27070.833333,1581.604428,28402.0,1543.215345,26152.5,258.093975,...,0.0,0.0,2019,29,1,2,0,0,0,1
4,2019-01-29 01:30:00,4,28844,0,26287.833333,859.88916,27755.833333,601.221562,25619.0,496.38896,...,0.0,0.0,2019,29,1,2,0,0,0,1


In [791]:
null_al = pd.DataFrame(data_uk_mergedshifted_full.isnull().sum())
null_al.head(1000)

Unnamed: 0,0
settlement_date,0
settlement_period,0
nd,0
is_bank_holiday,0
nd_rolling_3_hours_for_2_week,0
...,...
day_of_week,0
season_Autumn,0
season_Spring,0
season_Summer,0
