# **Forecasting the Impact of Climate Change on Great Britain’s Offshore Wind Generation Using Deep Learning**

### Group 3:
**Candidate Codes:** (KXBS7, PNXF3, NQMJ9, NCJB4, MMVQ8, KVQS4)

## **Table of Contents**

##### 1. Introduction
##### 2. Data Sources
##### 3. Data Extraction
##### 4. Data Preprocessing
##### 5. Baseline Modelling
##### 6. Model Development and Evaluation
##### 7. Hyperparameter Tuning
##### 8. Model application

## **Introduction**
#### Climate change is exerting a significant influence on global weather patterns, as evidenced by the increasing frequency of heatwaves, wildfires, and other extreme weather events. As renewable energy has become central to decarbonization strategies within the power sector, it is critical to understand the long-term impacts of climate change on the performance of renewable energy assets, particularly given that changes in climate patterns can directly affect their energy output.

#### The trajectory of future climate conditions depends largely on present-day energy policies and the rate of technological advancement. Several climate models have been developed to project future scenarios based on varying assumptions. Among these, the Representative Concentration Pathways (RCPs), developed by the climate science community, provide standardized scenarios that relate different policy choices to levels of radiative forcing.

#### The aim of this study is to evaluate the impacts of climate change on offshore wind generation in Great Britain under different RCP scenarios. This study focuses on offshore wind in particular as it constitutes a major component of the renewable energy mix in Great Britain. Offshore wind farms have experienced rapid development due to the region’s favourable wind conditions and the advantages offshore sites offer in terms of consistent wind speeds.

#### A deep learning approach is applied in conducting this study under two specific scenarios:

1. #### **Representative Concentration Pathway (RCP) 8.5 — a high-emission, business-as-usual scenario (equivalent to shared socio-economic pathway SSP5-8.5)**
##### Often called "middle of the road" pathway, this scenario is characterized by strong climate policies, reduction in carbon-intensity of fuels and a sharp decline in greenhouse gas (GHG) emissions leading to radiative forcing of 2.6W/m2 by 2100 (hence the name ssp2.6) or 1.5-2 degrees of warming by 2100, consistent with the Paris accord.

2. #### **Representative Concentration Pathway (RCP) 2.6 — a low-emission, best-case scenario (equivalent to shared socio-economic pathway SSP1-2.6).**
##### Often called "fossil fuel development" (Business as usual) pathway, this scenario is characterized by little to no climate mitigation efforts, increased use of carbon-intensive fossil fuels driven by rapid econonc growth and energy demand, and continually increasing greenhouse gas (GHG) emissions leading to radiative forcing of 8.5W/m2 by 2100 (hence the name ssp8.5) or 4 degrees or more of warming by 2100, which violates the Paris accord agreements.

#### The findings will enable policymakers and stakeholders to better evaluate the resilience of the United Kingdom’s offshore wind strategy and to identify necessary contingencies to maintain long-term energy security.

___


### **Project Aim**
To investigate the impact of climate change on Great Britain's offshore wind generation by developing a deep learning-based forecasting model under the RCP 2.6 and 8.5 scenarios. This analysis is conducted at the wind farm level to extract granular and actionable insights.

### **Objectives**
1. **Data Collection and Preprocessing:**  
Gather and preprocess granular wind farm-level data on offshore wind generation, historical meteorological variables, and future climate projections.

2. **Exploratory Data Analysis:**  
Analyze wind farm-specific data to identify key factors (local wind speeds, seasonal trends, and climate variations) influencing offshore wind generation.

3. **Model Development:**  
Develop a deep learning model capable of forecasting wind farm-level generation based on historical wind generation and climate data.

4. **Scenario-Based Forecasting:**  
Forecast future wind generation at individual wind farm sites under RCP 8.5 and RCP 2.6 climate scenarios using climate data projections.

5. **Impact Analysis:**  
Analyze the impact of climate change on individual wind farms, examining how different climate scenarios affect regional performance and the overall capacity of the offshore wind sector in Great Britain.

## **Methodology**
The approach this study takes is to train a deep learning model to predict wind generation based on historical wind generation and historical climate data. Then, the model is used to predict future wind generation under the RCP 2.6 and 8.5 scenarios using projected climate data for the respective scenarios.

The image below shows the detailed methodology.

![Methodology](Methodology.png)

## **Data Sources**
1. **Elexon** - Wind generation data per wind farm, from 2019 - 2024
2. **Climate Data Store, ERA5** - Historical climate data, 2019 - 2024
3. **Climate Data Store, CMIP5, CMIP6** - Project climate data under RCP 2.6 and RCP 8.5 scenarios, 2025 - 2045

## **Data Extraction**


### **Historical Climate Data Extraction**

The historical climate data is extracted from the "Climate Data Store" under the ERA5 model.

Data is extracted for 2019-2024 as that is the period for which historical wind generation data is extracted.

Note that certain wind farms act as multiple Balancing Mechanism Units (BMUs).

In [17]:
# Import the required packages 
# If needed, first install the packages using "!pip install package_name" (replace package_name with the name of the package)
import cdsapi
import numpy as np
import pandas as pd
import requests
import csv
from io import StringIO
from datetime import datetime, timedelta
import os

In [18]:
# Import the csv file containing all wind farm BMUs & locations
bmu_data_analysed = pd.read_csv("https://raw.githubusercontent.com/OkeMoyo/BENV0148_Group_3/main/Analysis%20of%20BMU%20not%20collecting.csv",
encoding="ISO-8859-1")

In [19]:
# Check the number of wind farms in the data
print("There are ", len(bmu_data_analysed), "total BMUs in the dataset.")

# Inpsect the data
bmu_data_analysed.head()

There are  66 total BMUs in the dataset.


Unnamed: 0.2,Unnamed: 0.1,Common Name,Settlement BMU ID,Data Retrieved?,REPD ID (New),Unnamed: 0,Ref ID,Operator (or Applicant),Site Name,Technology Type,...,Development Status (short),County,Region,Country,X-coordinate,Y-coordinate,Planning Permission Granted,Under Construction,Operational,Data Retrieved?.1
0,0,Burbo Bank Offshore Wind Farm,E_BURBO,YES,2539,2061,2539,Orsted (formerly Dong Energy),Burbo Bank Extension (Burbo Bank 2),Wind Offshore,...,Operational,Offshore,Offshore,England,315815,398892,10/12/2014,10/06/2016,27/04/2017,
1,0,Burbo Bank Offshore Wind Farm,T_BRBEO-1,YES,2539,2061,2539,Orsted (formerly Dong Energy),Burbo Bank Extension (Burbo Bank 2),Wind Offshore,...,Operational,Offshore,Offshore,England,315815,398892,10/12/2014,10/06/2016,27/04/2017,
2,2,Dudgeon Offshore Wind Farm,T_DDGNO-1,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,
3,2,Dudgeon Offshore Wind Farm,T_DDGNO-2,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,
4,2,Dudgeon Offshore Wind Farm,T_DDGNO-3,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,


In [20]:
# Filter out the wind farms whose data were successfully extracted
bmu_data_filtered = bmu_data_analysed[bmu_data_analysed["Data Retrieved?"] == "YES"]
print(len(bmu_data_filtered), "of", len(bmu_data_analysed),"BMUs' data were successfully extracted from Elexon.")
bmu_data_filtered.head()

53 of 66 BMUs' data were successfully extracted from Elexon.


Unnamed: 0.2,Unnamed: 0.1,Common Name,Settlement BMU ID,Data Retrieved?,REPD ID (New),Unnamed: 0,Ref ID,Operator (or Applicant),Site Name,Technology Type,...,Development Status (short),County,Region,Country,X-coordinate,Y-coordinate,Planning Permission Granted,Under Construction,Operational,Data Retrieved?.1
0,0,Burbo Bank Offshore Wind Farm,E_BURBO,YES,2539,2061,2539,Orsted (formerly Dong Energy),Burbo Bank Extension (Burbo Bank 2),Wind Offshore,...,Operational,Offshore,Offshore,England,315815,398892,10/12/2014,10/06/2016,27/04/2017,
1,0,Burbo Bank Offshore Wind Farm,T_BRBEO-1,YES,2539,2061,2539,Orsted (formerly Dong Energy),Burbo Bank Extension (Burbo Bank 2),Wind Offshore,...,Operational,Offshore,Offshore,England,315815,398892,10/12/2014,10/06/2016,27/04/2017,
2,2,Dudgeon Offshore Wind Farm,T_DDGNO-1,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,
3,2,Dudgeon Offshore Wind Farm,T_DDGNO-2,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,
4,2,Dudgeon Offshore Wind Farm,T_DDGNO-3,YES,2538,2060,2538,Statoil / Statkraft,Dudgeon East,Wind Offshore,...,Operational,Offshore,Offshore,England,575000,361000,06/07/2012,17/03/2016,15/10/2017,


In [21]:
# The datasets contain 53 BMUs, but how many of them are unique wind farms?
unique_names = bmu_data_filtered["Common Name"].unique()
print("There are", len(unique_names), "unique wind farms in the dataset")

There are 21 unique wind farms in the dataset


In [22]:
# Inspect the array of unique wind farms
unique_names

array(['Burbo Bank Offshore Wind Farm', 'Dudgeon Offshore Wind Farm',
       'Galloper Offshore Wind Farm',
       'Greater Gabbard Offshore Wind Farm',
       'Gwynt y Mor Offshore Wind Farm', 'Humber Offshore Wind Farm',
       'Lincs Offshore Wind Farm', 'London Array Wind Farm',
       'Ormonde Offshore Wind Farm', 'Race Bank Offshore Wind Farm',
       'Rampion Offshore Wind Farm', 'Sheringham Shoals Wind Farm',
       'Thanet Offshore Wind Farm', 'Walney Offshore Wind Farm',
       'Westermost Rough Wind Farm',
       'West of Duddon Sands Offshore Wind Farm',
       'Aberdeen Offshore Wind Farm', 'Beatrice Offshore Wind Farm',
       'East Anglia Offshore Wind Farm', 'Hornsea Offshore Wind Farm',
       'Triton Knoll Offshore Wind Farm'], dtype=object)

There are 21 unique offshore wind farms in the dataset for which data was able to be extracted from Elexon. This volume of data is acceptable for this project as not all wind farms participate in the UK electricity balancing market.

The x and y coordintaes of the wind farm locations in csv file are in the British National Grid (OSGB36) coordinate system, representing eastings (X) and northings (Y), respectively.

In [None]:
# Extract the x and y coordinates for each unique wind farm in the filtered dataset

for name in unique_names:
    # Get the first occurrence of this Common Name
    row = bmu_data_filtered[bmu_data_filtered["Common Name"] == name].iloc[0]
    
    # Extract coordinates
    x_coord = row['X-coordinate']
    y_coord = row['Y-coordinate']
    
    # Print the result
    print(f"{name} is located at: X = {x_coord} eastings, Y = {y_coord} northings")

To extract the historical climate data for each unique wind farm, the coordinates are transformed from the BNG system to the WGS84 system for compatibility with the requirements of the climate data store. A function is created to generate bounding boxes specifying the location of the wind farms in the WGS84 coordinate system to enable data extraction from the climate data store.

In [12]:
# bounding box function that creates bounding box depending on windfarm location:
from pyproj import Transformer
# Create transformer: BNG (EPSG:27700) → WGS84 (EPSG:4326)
transformer = Transformer.from_crs("epsg:27700", "epsg:4326", always_xy=True)

def create_bounding_box(x_coord, y_coord,resolution):
   
    # Transform to (lon, lat)
    lon, lat = transformer.transform(x_coord, y_coord)

    # Create box centered on the point with size of one CMIP grid cell
    # Return box as [North, West, South, East]
    return [
        lat + (resolution/2),  # North
        lon - (resolution/2),  # West
        lat - (resolution/2),  # South
        lon + (resolution/2)   # East
    ]

References for spatial resolution used above:

https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#heading-Spatialgrid

https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference


In [None]:
# Extract the historical climate data (2019-2024) for each wind farm using the climate data store api and the bounding box function
# Change all file paths below to your preferred directory

from math import cos, radians

# Create a directory to store the data
if not os.path.exists('era5_data'):
    os.makedirs('era5_data')

# Create a directory to store the data
output_dir = "C:/Users/Oke/Documents/UCL MSc ESDA/ESDA_Term 2/BENV0148 Advanced Machine Learning/BENV0148 Coursework/ERA5 Historical Data"

# Dataset name
dataset = "reanalysis-era5-single-levels"

#Create client object
client = cdsapi.Client()

# Base request template
base_request = {
    "product_type": ["reanalysis"],
    "variable": [
        "10m_u_component_of_wind",
        "10m_v_component_of_wind",
        "2m_dewpoint_temperature",
        "2m_temperature",
        "surface_pressure"
    ],
    "year": ["2019", "2020", "2021", "2022", "2023", "2024"],
    "month": [
        "01", "02", "03", "04", "05", "06",
        "07", "08", "09", "10", "11", "12"
    ],
    "day": [
        "01", "02", "03", "04", "05", "06",
        "07", "08", "09", "10", "11", "12",
        "13", "14", "15", "16", "17", "18",
        "19", "20", "21", "22", "23", "24",
        "25", "26", "27", "28", "29", "30", "31"
    ],
    "time": [
        "00:00", "01:00", "02:00", "03:00", "04:00", "05:00",
        "06:00", "07:00", "08:00", "09:00", "10:00", "11:00",
        "12:00", "13:00", "14:00", "15:00", "16:00", "17:00",
        "18:00", "19:00", "20:00", "21:00", "22:00", "23:00"
    ],
    "format": "grib" 
}

# Process each unique wind farm
for farm_name in unique_names:
    # Get the first row with this farm name
    row = bmu_data_filtered[bmu_data_filtered['Common Name'] == farm_name].iloc[0]
    
    # Extract coordinates
    x_coord = row['X-coordinate']
    y_coord = row['Y-coordinate']
    
    # Skip if coordinates are missing
    if pd.isna(x_coord) or pd.isna(y_coord):
        print(f"Skipping {farm_name} due to missing coordinates")
        continue
    
    print(f"Processing {farm_name} at coordinates ({x_coord}, {y_coord})")
    
    # Create a bounding box at ERA5 native resolution
    area_box = create_bounding_box(x_coord, y_coord, 0.28125) # ERA5 resolution is ~0.28125 degrees
    print(f"  ERA5 bounding box: {area_box}")
    
    # Create a custom request for this farm
    farm_request = base_request.copy()
    farm_request["area"] = area_box
    
    # Set output filename - clean up farm name for file safety
    safe_farm_name = ''.join(c if c.isalnum() else '_' for c in farm_name)
    target = os.path.join(output_dir, f"{safe_farm_name}.grib")
    
    try:
        print(f"Downloading ERA5 data for {farm_name}...")
        client.retrieve(dataset, farm_request).download(target)
        print(f"Successfully downloaded data to {target}")
    except Exception as e:
        print(f"Error downloading data for {farm_name}: {str(e)}")

print("Processing complete!")


### **Projected Climate Data Extraction**

##### 1. Projected climate data from the CMIP6 dataset on CDS is only available in monthly resolution.
##### 2. For projecting future wind power generation, upsampling techniques may be used to convert the monthly projected climate data to daily projected climate data to match the resolution of daily historic climate data used. 
##### 3. For consistency, the same function for extration is used as for the historic climate data, with the only difference being adjustments to the base_request template function namely to extract from cmip6 dataset (projected climate data) instead of era5  (historic climate data) and for the years 2025-2045 at a monthly resolution. 
##### 4. Since not all climate projection models cover the all the regions where offshore GB wind farms are located, a loop is used for each windfarm to scan all projection models and find the suitable corresponding model that has data available for the geographic location of that particular windfarm.
##### 5. The same cell below is used to extract data for both scenarios (ssp1-2.6 and ssp5-8.5). To switch between extracting data for the two different scenarios the "experiment" variable in the first line of the cell is uncommentd as needed. 

In [None]:
#UNCOMMENT EITHER OF THE LINES BELOW TO SWITCH BETWWEEN SSP126 AND SSP585 DATA

experiment = 'ssp126'
#experiment = 'ssp585'

# Create a directory to store the data
output_dir = r"C:\Users\LGA\Desktop\UCL-ESDA coursework\TERM 2\BENV0148 - Advanced Machine Learning for Energy Systems\Coursework\Data\gitclone\BENV0148_Group_3\Data\projected_climate_data\ssp126"
dataset = 'projections-cmip6'

base_request1 = {
                        'format': 'netcdf',
                        'temporal_resolution': 'monthly',
                        'experiment': experiment,
                        'model': 'HadGEM3-GC31-LL',
                        'ensemble_member': 'r1i1p1f1',
                         'variable': [
                                    'near_surface_air_temperature',
                                    'eastward_near_surface_wind',
                                    'northward_near_surface_wind',
                                    'surface_air_pressure',
                                    'specific_humidity'
                                    ],
                        'year': [str(y) for y in range(2025, 2046)],
                        'month': [f'{m:02d}' for m in range(1, 13)],
                    }
c = cdsapi.Client()


# Try a few models and ensemble members
models = ['HadGEM3-GC31-LL', 'CNRM-CM6-1', 'EC-Earth3', 'MIROC6']
ensemble_members = ['r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1']


unique_names = bmu_data_filtered["Common Name"].unique()
for farm_name in unique_names:
    # Get the first row with this farm name
    row = bmu_data_filtered[bmu_data_filtered['Common Name'] == farm_name].iloc[0]
    
    # Extract coordinates
    x_coord = row['X-coordinate']
    y_coord = row['Y-coordinate']

    #Extract BMU-ID
    bmu_id = row['Settlement BMU ID']
    
    # Skip if coordinates are missing
    if pd.isna(x_coord) or pd.isna(y_coord):
        print(f"Skipping {farm_name} due to missing coordinates")
        continue
    
    print(f"Processing {farm_name} at coordinates ({x_coord}, {y_coord})")
    
    # Create a bounding box at CMIP native resolution
    area_box = create_bounding_box(x_coord, y_coord, 1.25) # ERA5 resolution is 1.25 degrees
    
    # Create a custom request for this farm
    farm_request = base_request1.copy()
    farm_request["area"] = area_box
    
    # Set output filename - clean up farm name for file safety
    safe_farm_name = ''.join(c if c.isalnum() else '_' for c in farm_name)
    target = os.path.join(output_dir, f"{safe_farm_name}.nc")

    #For the given wind farm, try all models and ensemble members until one works
    
    success = False # set a flag to track if a compatible model+ensemble combination is found
    # Loop through models and ensemble members
    for model in models:
        for ensemble in ensemble_members:
            farm_request["model"] = model
            farm_request["ensemble_member"] = ensemble

            try:
                print(f"Trying {model} | {ensemble} for {farm_name}")
                c.retrieve(dataset, farm_request).download(target),
                print(f"✅ Success: {model} | {ensemble}")
                success = True
            except Exception as e:
                print(f"❌ Failed: {model} | {ensemble}")
                print(f"Error downloading data for {farm_name}: {str(e)}")

            if success:
                break #if successful model+ensemble combination is found, move to next wind farm
        if success:  #if successful model+ensemble combination is found, move to next wind farm
            break


### DATA PRE-PROCESSING

#### Historical Climate data
The snippet of code below concatenates the historic climate data for each windfarm

In [None]:
import xarray as xr
from collections import defaultdict

base_path = r"C:\Users\LGA\Desktop\UCL-ESDA coursework\TERM 2\BENV0148 - Advanced Machine Learning for Energy Systems\Coursework\Data\gitclone\BENV0148_Group_3\Data\historic_climate_data"

# Dictionary to store each windfarm's datasets
windfarm_data = defaultdict(list)

for year_folder in sorted(os.listdir(base_path)):
    year_path = os.path.join(base_path, year_folder)
    if os.path.isdir(year_path):
        for file in os.listdir(year_path):
            if file.endswith(".grib") and not file.endswith(".grib.5b7b6.idx"):

                file_path = os.path.join(year_path, file)
                print(f"📂 Reading file: {file_path}")  # 👈 This line logs the file being read
                try:
                    ds = xr.open_dataset(file_path, engine="cfgrib")
                    year = int(year_folder.split('_')[-1])  # Extract year
                    ds = ds.assign_coords(Year=year)
                    windfarm_data[file].append(ds)
                except Exception as e:
                    print(f"Could not read {file_path}: {e}")

# Concatenate each windfarm’s datasets
final_windfarm_datasets = {
    name.replace('.grib', '').replace('.grb', ''): xr.concat(datasets, dim="time")
    for name, datasets in windfarm_data.items()
}


#### Converting to CSVs
The cell below converts the grib files to CSVs for easy reading. No need to run, if the CSVs are already accessible through git

In [None]:
import xarray as xr

base_path = r"C:\Users\LGA\Desktop\UCL-ESDA coursework\TERM 2\BENV0148 - Advanced Machine Learning for Energy Systems\Coursework\Data\gitclone\BENV0148_Group_3\Data\historic_climate_data" # CHANGE THIS TO YOUR FOLDER

for year_folder in sorted(os.listdir(base_path)):
    year_path = os.path.join(base_path, year_folder)

    if os.path.isdir(year_path):
        for file in os.listdir(year_path):
            if file.endswith(".grib") and not file.endswith(".grib.idx"):
                file_path = os.path.join(year_path, file)
                csv_path = file_path.replace(".grib", ".csv")

                if os.path.exists(csv_path):
                    print(f"🔁 Skipping (already converted): {csv_path}")
                    continue

                print(f"📂 Reading: {file_path}")
                try:
                    ds = xr.open_dataset(file_path, engine="cfgrib")
                    df = ds.to_dataframe().reset_index()
                    df.to_csv(csv_path, index=False)
                    print(f"✅ Saved CSV: {csv_path}")
                except Exception as e:
                    print(f"❌ Failed: {file_path}\n   Reason: {e}")


#### Concatenating the CSVs for each wind farm across the years

For each windfarm, we produce a single dataframe of historic climate data from 2019-2023

In [None]:
from collections import defaultdict

base_path = r"C:\Users\LGA\Desktop\UCL-ESDA coursework\TERM 2\BENV0148 - Advanced Machine Learning for Energy Systems\Coursework\Data\gitclone\BENV0148_Group_3\Data\historic_climate_data" # CHANGE THIS TO YOUR FOLDER

# Dictionary to collect dataframes by windfarm
windfarm_csv_data = defaultdict(list)

for year_folder in sorted(os.listdir(base_path)):
    year_path = os.path.join(base_path, year_folder)

    if os.path.isdir(year_path):
        for file in os.listdir(year_path):
            if file.endswith(".csv"):
                file_path = os.path.join(year_path, file)
                windfarm_name = file.replace(".csv", "")

                try:
                    df = pd.read_csv(file_path)
                    df["Year"] = year_folder.split("_")[-1]  # Add year column
                    windfarm_csv_data[windfarm_name].append(df)
                    print(f"📥 Loaded: {file_path}")
                except Exception as e:
                    print(f"❌ Failed to read {file_path}: {e}")

# Now concatenate all years into one dataframe per windfarm
final_windfarm_dfs = {}

for windfarm_name, dfs in windfarm_csv_data.items():
    final_df = pd.concat(dfs, ignore_index=True)
    final_windfarm_dfs[windfarm_name] = final_df
    print(f"✅ Combined data for: {windfarm_name} ({len(final_df)} rows)")


In [None]:
windfarm_names = list(final_windfarm_dfs.keys())
print(windfarm_names)


['Aberdeen_Offshore_Wind_Farm', 'Beatrice_Offshore_Wind_Farm', 'Burbo_Bank_Offshore_Wind_Farm', 'Dudgeon_Offshore_Wind_Farm', 'East_Anglia_Offshore_Wind_Farm', 'Galloper_Offshore_Wind_Farm', 'Greater_Gabbard_Offshore_Wind_Farm', 'Gwynt_y_Mor_Offshore_Wind_Farm', 'Hornsea_Offshore_Wind_Farm', 'Humber_Offshore_Wind_Farm', 'Lincs_Offshore_Wind_Farm', 'London_Array_Wind_Farm', 'Ormonde_Offshore_Wind_Farm', 'Race_Bank_Offshore_Wind_Farm', 'Rampion_Offshore_Wind_Farm', 'Sheringham_Shoals_Wind_Farm', 'Thanet_Offshore_Wind_Farm', 'Triton_Knoll_Offshore_Wind_Farm', 'Walney_Offshore_Wind_Farm', 'Westermost_Rough_Wind_Farm', 'West_of_Duddon_Sands_Offshore_Wind_Farm']


##### Access example - Aberdeen

In [None]:


# Access data for one windfarm
aberdeen_df = final_windfarm_dfs["Aberdeen_Offshore_Wind_Farm"]
aberdeen_df.head(10)


['Aberdeen_Offshore_Wind_Farm', 'Beatrice_Offshore_Wind_Farm', 'Burbo_Bank_Offshore_Wind_Farm', 'Dudgeon_Offshore_Wind_Farm', 'East_Anglia_Offshore_Wind_Farm', 'Galloper_Offshore_Wind_Farm', 'Greater_Gabbard_Offshore_Wind_Farm', 'Gwynt_y_Mor_Offshore_Wind_Farm', 'Hornsea_Offshore_Wind_Farm', 'Humber_Offshore_Wind_Farm', 'Lincs_Offshore_Wind_Farm', 'London_Array_Wind_Farm', 'Ormonde_Offshore_Wind_Farm', 'Race_Bank_Offshore_Wind_Farm', 'Rampion_Offshore_Wind_Farm', 'Sheringham_Shoals_Wind_Farm', 'Thanet_Offshore_Wind_Farm', 'Triton_Knoll_Offshore_Wind_Farm', 'Walney_Offshore_Wind_Farm', 'Westermost_Rough_Wind_Farm', 'West_of_Duddon_Sands_Offshore_Wind_Farm']


Unnamed: 0,time,latitude,longitude,number,step,surface,valid_time,u10,v10,d2m,t2m,sp,Year
0,2019-01-01 00:00:00,57.256,-2.214,0,0 days,0.0,2019-01-01 00:00:00,7.651512,-4.67992,275.18616,280.91525,101140.03,2019
1,2019-01-01 00:00:00,57.256,-1.963,0,0 days,0.0,2019-01-01 00:00:00,7.86599,-4.702084,275.77222,281.01923,101912.41,2019
2,2019-01-01 00:00:00,57.005,-2.214,0,0 days,0.0,2019-01-01 00:00:00,5.542198,-4.630261,275.3109,281.83804,101401.375,2019
3,2019-01-01 00:00:00,57.005,-1.963,0,0 days,0.0,2019-01-01 00:00:00,5.726341,-4.917637,275.9405,282.00885,102243.47,2019
4,2019-01-01 01:00:00,57.256,-2.214,0,0 days,0.0,2019-01-01 01:00:00,7.496879,-4.67298,275.10327,280.48312,101241.66,2019
5,2019-01-01 01:00:00,57.256,-1.963,0,0 days,0.0,2019-01-01 01:00:00,7.951285,-4.55059,275.5833,280.6814,102010.16,2019
6,2019-01-01 01:00:00,57.005,-2.214,0,0 days,0.0,2019-01-01 01:00:00,5.359855,-3.996467,274.82648,280.88406,101507.06,2019
7,2019-01-01 01:00:00,57.005,-1.963,0,0 days,0.0,2019-01-01 01:00:00,5.824698,-4.053016,275.30615,281.14014,102346.125,2019
8,2019-01-01 02:00:00,57.256,-2.214,0,0 days,0.0,2019-01-01 02:00:00,7.111711,-4.728118,275.13495,280.26086,101357.94,2019
9,2019-01-01 02:00:00,57.256,-1.963,0,0 days,0.0,2019-01-01 02:00:00,7.642045,-4.60559,275.66974,280.54553,102119.875,2019


#### Grouping historic climate data by hour for each windfarm

As seen above, for each timestamp of a given windfarm, four rows of climate data are shown for four sets of coordinates within the bounding box. We group by hour and spatially AVERAGE the climate variables.


In [None]:
hourly_windfarm_dfs = {}

for windfarm_name, df in final_windfarm_dfs.items():
    print(f"⏱️ Aggregating hourly for: {windfarm_name}")

    # Ensure 'time' is in datetime format
    df['time'] = pd.to_datetime(df['time'])

    # Round down to the hour (e.g., 2019-01-01 01:43 → 2019-01-01 01:00)
    df['hour'] = df['time'].dt.floor('H')

    # Group by each hour and take the mean of numeric columns
    hourly_df = df.groupby('hour').mean(numeric_only=True).reset_index()

    # Store result
    hourly_windfarm_dfs[windfarm_name] = hourly_df

    print(f"✅ Aggregated {windfarm_name}: {len(hourly_df)} hourly records")


##### Example - accessing Aberdeen historical concatenated data

In [44]:
# List available windfarms
print(list(hourly_windfarm_dfs.keys()))

# Access data for one windfarm
aberdeen_df = hourly_windfarm_dfs["Aberdeen_Offshore_Wind_Farm"]
aberdeen_df.head()


['Aberdeen_Offshore_Wind_Farm', 'Beatrice_Offshore_Wind_Farm', 'Burbo_Bank_Offshore_Wind_Farm', 'Dudgeon_Offshore_Wind_Farm', 'East_Anglia_Offshore_Wind_Farm', 'Galloper_Offshore_Wind_Farm', 'Greater_Gabbard_Offshore_Wind_Farm', 'Gwynt_y_Mor_Offshore_Wind_Farm', 'Hornsea_Offshore_Wind_Farm', 'Humber_Offshore_Wind_Farm', 'Lincs_Offshore_Wind_Farm', 'London_Array_Wind_Farm', 'Ormonde_Offshore_Wind_Farm', 'Race_Bank_Offshore_Wind_Farm', 'Rampion_Offshore_Wind_Farm', 'Sheringham_Shoals_Wind_Farm', 'Thanet_Offshore_Wind_Farm', 'Triton_Knoll_Offshore_Wind_Farm', 'Walney_Offshore_Wind_Farm', 'Westermost_Rough_Wind_Farm', 'West_of_Duddon_Sands_Offshore_Wind_Farm']


Unnamed: 0,hour,latitude,longitude,number,surface,u10,v10,d2m,t2m,sp
0,2019-01-01 00:00:00,57.1305,-2.0885,0.0,0.0,6.69651,-4.732475,275.552445,281.445342,101674.32125
1,2019-01-01 01:00:00,57.1305,-2.0885,0.0,0.0,6.658179,-4.318263,275.2048,280.79718,101776.25125
2,2019-01-01 02:00:00,57.1305,-2.0885,0.0,0.0,6.554337,-4.067016,275.217807,280.521988,101890.31375
3,2019-01-01 03:00:00,57.1305,-2.0885,0.0,0.0,6.623036,-3.949683,275.29097,280.336948,101971.28875
4,2019-01-01 04:00:00,57.1305,-2.0885,0.0,0.0,6.699568,-4.152366,275.330078,280.151777,102042.3125


#### Historic wind power generation Data processing
1) The data is aggregated from half hourly to hourly. For example, a ceiling function adds the wind power generated for the first 2 settlement periods, so 00:30 + 01:00 → 01:00.
2) The wind power generated is added for all the BMUs of a given windfarm for each time stamp

In [None]:
from collections import defaultdict
import re

# Path to your wind generation CSVs
gen_folder = r"C:\Users\LGA\Desktop\UCL-ESDA coursework\TERM 2\BENV0148 - Advanced Machine Learning for Energy Systems\Coursework\Data\gitclone\BENV0148_Group_3\Data\wind_gen_BMU_data"

# Step 1: Prepare a defaultdict to collect dataframes per windfarm
hourly_generation_dfs = defaultdict(list)

# Normalize filenames: strip BMU ID suffix after _T_ or _E_
def normalize_windfarm_name(filename):
    base = re.split(r'_T_|_E_', filename)[0]
    return base.strip().replace(" ", "_")

# Step 2: Read and process each CSV
for file in os.listdir(gen_folder):
    if file.endswith(".csv"):
        file_path = os.path.join(gen_folder, file)
        windfarm_key = normalize_windfarm_name(file)

        try:
            df = pd.read_csv(file_path)
            df["halfHourEndTime"] = pd.to_datetime(df["halfHourEndTime"])

            # Ceil timestamps so 00:30 + 01:00 → 01:00
            df["hour"] = df["halfHourEndTime"].dt.ceil("H")

            # Sum only the "quantity" column
            hourly_df = df.groupby("hour", as_index=False)["quantity"].sum()

            # Optionally keep other info like BMU_ID if needed (commented for now)
            # hourly_df["BMU_ID"] = df.groupby("hour")["BMU_ID"].first().values

            hourly_generation_dfs[windfarm_key].append(hourly_df)

            print(f"✅ Processed {file} under key '{windfarm_key}'")

        except Exception as e:
            print(f"❌ Failed to process {file}: {e}")

# Step 3: Combine BMUs for each windfarm
final_generation_dfs = {}

for windfarm_key, dfs in hourly_generation_dfs.items():
    combined_df = pd.concat(dfs).groupby("hour", as_index=False).sum()
    final_generation_dfs[windfarm_key] = combined_df
    print(f"📦 Final combined generation for: {windfarm_key}")


#### Example - access historic windfarm generation for Aberdeen

In [46]:
final_generation_dfs["Aberdeen_Offshore_Wind_Farm"].head(10)  # Example to check the data for one windfarm

Unnamed: 0,hour,quantity
0,2019-02-01 01:00:00,3.666
1,2019-02-01 02:00:00,4.424
2,2019-02-01 03:00:00,7.314
3,2019-02-01 04:00:00,20.394
4,2019-02-01 05:00:00,24.833
5,2019-02-01 06:00:00,6.744
6,2019-02-01 07:00:00,7.788
7,2019-02-01 08:00:00,5.234
8,2019-02-01 09:00:00,5.109
9,2019-02-01 10:00:00,2.318


### PRODUCING THE FINAL MERGED DATASET
So far we have:

1) hourly_windfarm_dfs: A data dictionary with historic hourly  climate data for all offshore windfarms for the years 2019-2023

2) final_generation_dfs: A data dictionary with historic hourly  wind generation data for all offshore windfarms for the years 2019-2023

We want to produce a final "merged_data" data dcitionary that combines the two above for each windfarm: combined dataset of climate + generation for modeling or analysis

In [None]:
merged_data = {}

for windfarm_key, climate_df in hourly_windfarm_dfs.items():
    # Some windfarms may not have matching generation data
    if windfarm_key in final_generation_dfs:
        gen_df = final_generation_dfs[windfarm_key]
        
        merged_df = pd.merge(climate_df, gen_df, left_on="hour", right_on="hour", how="inner")
        merged_data[windfarm_key] = merged_df
        print(f"🔗 Merged: {windfarm_key} ({len(merged_df)} rows)")
    else:
        print(f"⚠️ No generation data for: {windfarm_key}")


#### Example - Accessing merged data for Aberdeen offshore windfarm

In [48]:
merged_data["Aberdeen_Offshore_Wind_Farm"].head(10)  # Example to check the data for one windfarm

Unnamed: 0,hour,latitude,longitude,number,surface,u10,v10,d2m,t2m,sp,quantity
0,2019-02-01 01:00:00,57.1305,-2.0885,0.0,0.0,3.347781,-1.712159,270.595155,272.952145,98817.3675,3.666
1,2019-02-01 02:00:00,57.1305,-2.0885,0.0,0.0,3.270191,-1.573564,270.255047,272.9292,98824.22625,4.424
2,2019-02-01 03:00:00,57.1305,-2.0885,0.0,0.0,3.253275,-1.333238,269.973372,272.89429,98839.03,7.314
3,2019-02-01 04:00:00,57.1305,-2.0885,0.0,0.0,3.394168,-1.27239,270.094055,273.110963,98855.1875,20.394
4,2019-02-01 05:00:00,57.1305,-2.0885,0.0,0.0,3.441805,-1.288301,270.861803,273.754715,98840.80375,24.833
5,2019-02-01 06:00:00,57.1305,-2.0885,0.0,0.0,3.371131,-1.281039,271.457212,273.948363,98876.45875,6.744
6,2019-02-01 07:00:00,57.1305,-2.0885,0.0,0.0,3.147674,-1.248965,271.934835,274.146568,98878.2175,7.788
7,2019-02-01 08:00:00,57.1305,-2.0885,0.0,0.0,2.852993,-1.166969,271.980537,274.180465,98955.70875,5.234
8,2019-02-01 09:00:00,57.1305,-2.0885,0.0,0.0,2.699424,-1.00167,272.09562,274.414578,98999.29,5.109
9,2019-02-01 10:00:00,57.1305,-2.0885,0.0,0.0,1.982153,-0.623916,271.787375,274.48492,99032.7975,2.318
