## Yield Predictions for German Offshore Tenders 2024 Project Overview
### Python Programming in Energy Science II - Group 5
---
- [PDF Instructions](./data/PPES-SoSe2024_StudentProject.pdf)
- [Github Repository](https://github.com/boujuan/StudentProject-Yield-Predictions-Offshore)

**AUTHORS:**
- [Jiah Ryu](mailto:jiah.ryu@uni-oldenburg.de)
- [Julia Guimaraes Zimmer](mailto:julia.guimaraes.zimmer@uni-oldenburg.de)
- [Pascal Pflüger](mailto:pascal.pflueger@uni-oldenburg.de)
- [Juan Manuel Boullosa Novo](mailto:juan.manuel.boullosa.novo@uni-oldenburg.de)

**DATA:**
- Floating Lidar Measurements from two positions = `'data/measurements/*.nc'`
- Long-term reference model data (ERA5): 1990-2023 = `'data/reanalysis/*.csv'`
- Turbine coordinates of existing wind farms = `'data/turbine-info/coordinates/existing/*.csv'`
- Turbine coordinates of planned wind farms = `'data/turbine-info/coordinates/planned_future/*.csv'`
- Turbine coordinates in operation before 2023 = `'data/turbine-info/coordinates/planned_in_operation_before_2030/*.csv'`
- Turbine coordinates planned in Netherlands = `'data/turbine-info/coordinates/planned_netherlands/*.csv'`
- Geometric turbine coordinates for the areas of interest N-9.1/N-9.2/N-9.3 (not optimized – see Task 12) = `'data/turbine-info/coordinates/area_of_interest/*.csv'`
- Shapefiles of wind farm areas, the countries Denmark, Germany and the Netherlands = `'data/shapefiles/.../*'`
- Thrust and power curves of wind turbines = `'data/turbine-info/power_curves/*.csv'`

---

**TASKS:**
To estimate the short-term wind climate of the three areas of interest, we first look at the planned area. Furthermore, we explore the structure given in the NC files and choose the data we need for further analytics:

1. Import libraries, set up file paths, and open datasets
2. Set turbine design:
   - Hub height
   - Rotor diameter
   - Model, etc.
3. Plot the field of interest together with the lidar measurement buoy positions
4. Explore the structure and variables inside the NC files
5. Decide which data to use for further analytics
6. Select variables of interest
7. Create a dataframe out of them
8. Check for data gaps, NaN values, and duplicated data
9. Filter incorrect data points
10. Select data for only one year, so that buoy 2 and buoy 6 have the same length
11. Interpolate both buoy datasets, with the 140 m and 200 m to hub height of 150 m
12. Create a new dataframe for the interpolated data
13. Export it to a CSV file for further use
14. Compare Buoy 2 and 6 to ensure that the data processing didn’t go wrong
15. Plot time series of the wind speed for both met masts
16. Calculate the monthly and annual wind statistics for the windspeed and the winddirection for both met masts 
17. Group the data into hours of the day and calculate the mean windspeed and winddirection of every day 
18. Plot the diurnal windspeed and winddirection for both buoys 
19. Perform some checks about the grouping 
20. Plot windroses for both buoys 
21. Plot a Weibull distribution of the windspeed 
22. Calculate the annual Power Production of one Turbine, one field and the entire farm 
23. Plot a power curve 

The correlation should be really good, since buoy 2 consists of data corrected with data from buoy 6. So this is also a test if the steps before are done well. If the correlation is not high (r^2 ~ 0.9), then this is a sign that something went wrong in the steps before. 

**REMARK:** From the 01_Data_and_Windfield_Overview.ipynb we decided to work with: buoy_6_measured and buoy_2_correlated_with_6 because buoy 2 had a lot of data gaps (not a complete year measured).

---

#### Booleans to decide what to plot/compute:

In [None]:
check_environment = 0
plot_wind_farm_data = 0
plot_wind_farm_data_zoomed = 0
netcdf_explore = 0
era5_analyze = 1
foxes_analyze = 1

#### Libraries Import:

In [None]:
if check_environment:
    from checkenv_requirements import check_and_install_packages
    packages_to_check = ['numpy', 'pandas', 'netCDF4', 'matplotlib', 'cartopy']

In [None]:
# import cartopy.crs as ccrs
# import cartopy.feature as cfeature
# import matplotlib.pyplot as plt
# import netCDF4 as nc
# import numpy as np
# import os
# import pandas as pd
# import xarray as xr
import glob
# from cartopy.io.shapereader import Reader
# from matplotlib.projections.polar import PolarAxes
# from scipy.interpolate import interp1d
# from scipy.integrate import quad
# from scipy.stats import linregress, weibull_min
# from sklearn.linear_model import LinearRegression
# from sklearn.metrics import mean_absolute_error, mean_squared_error
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler
# from windrose import WindroseAxes

# Custom libraries
import data_loading
import plotting
import netcdf_exploration
import data_analysis
import era5_analysis
import foxes_analysis

#### File Paths:

In [None]:
# Base paths
measurements_path = 'data/measurements/'
turbine_info_path = 'data/turbine-info/coordinates/'
turbine_power_curves_path = 'data/turbine-info/power_curves/'
shapefiles_path = 'data/shapefiles/'
era5_path = 'data/reanalysis/'

# Buoy NetCDF files
bouy6_path = f'{measurements_path}2023-11-06_Buoy6_BSH_N-9.nc'
bouy2_path = f'{measurements_path}2023-11-09_Buoy2_BSH_N-9.nc'
# Windfarm layout base paths
turbines_existing_path = f'{turbine_info_path}existing/'
turbines_planned_future_path = f'{turbine_info_path}planned_future/'
turbines_planned_in_operation_before_2030_path = f'{turbine_info_path}planned_in_operation_before_2030/'
turbines_planned_netherlands_path = f'{turbine_info_path}planned_netherlands/'
turbines_area_of_interest_path = f'{turbine_info_path}area_of_interest/'
# Countries Shapefiles paths
shapefiles_DEU_path = f'{shapefiles_path}DEU/DEU_adm1.shp'
shapefiles_DNK_path = f'{shapefiles_path}DNK/gadm36_DNK_1.shp'
shapefiles_NLD_path = f'{shapefiles_path}NLD/gadm36_NLD_1.shp'

# Wind field layout files
file_N9_1 = f'{turbines_area_of_interest_path}layout-N-9.1.geom.csv'
file_N9_2 = f'{turbines_area_of_interest_path}layout-N-9.2.geom.csv'
file_N9_3 = f'{turbines_area_of_interest_path}layout-N-9.3.geom.csv'

# Existing turbines
existing_files = glob.glob(f'{turbines_existing_path}*.csv')
# Planned future turbines
planned_future_files = glob.glob(f'{turbines_planned_future_path}*.csv')
# Turbines planned to be in operation before 2030
planned_before_2030_files = glob.glob(f'{turbines_planned_in_operation_before_2030_path}*.csv')
# Planned turbines in the Netherlands
planned_netherlands_files = glob.glob(f'{turbines_planned_netherlands_path}*.csv')


#### 0. Data Loading:

In [None]:
# Load NetCDF buoy datasets
xrbuoy6, xrbuoy2, buoy2_file, buoy6_file = data_loading.datasets(bouy6_path, bouy2_path)

# Load Wind field layout CSV data
data_N9_1, data_N9_2, data_N9_3 = data_loading.csv_files(file_N9_1, file_N9_2, file_N9_3)

# Load other windfarm data
existing_data = data_loading.other_windfarm_data(existing_files)
planned_future_data = data_loading.other_windfarm_data(planned_future_files)
planned_before_2030_data = data_loading.other_windfarm_data(planned_before_2030_files)
planned_netherlands_data = data_loading.other_windfarm_data(planned_netherlands_files)

other_wind_farm_data = existing_data + planned_future_data + planned_before_2030_data + planned_netherlands_data

#### 1. Set the Turbine Design: 
- International Energy Agency (IEA) for a 15 MW offshore wind turbine
- Turbine name: IEA-15MW-D240-H150
-rotor diameter:  240 meters
- hub height: 150 meters

With that we say the height of interest is the one, nearest on the hub height: 140 m 


#### 2. Plot the field of interest together with the lidar measurement buoy positions:

In [None]:
if plot_wind_farm_data:
    plotting.plot_wind_farms_and_buoys(shapefiles_path, data_N9_1, data_N9_2, data_N9_3, other_wind_farm_data)

In [None]:
if plot_wind_farm_data_zoomed:
    plotting.plot_wind_farms_and_buoys_zoomed(data_N9_1, data_N9_2, data_N9_3)

#### 3. Explore the structure and variables inside the  2 netcdf files:

In [None]:
if netcdf_explore:
    netcdf_exploration.overview(buoy2_file)

In [None]:
if netcdf_explore:
    netcdf_exploration.topgroup_variables(buoy2_file,'ZX_LIDAR_WLBZ_2')

In [None]:
if netcdf_explore:
    netcdf_exploration.topgroup_variables(buoy2_file, 'ZX_LIDAR_WLBZ_6_MCP')

In [None]:
if netcdf_explore:
    netcdf_exploration.sub_groups(buoy2_file, 'METEO_WLBZ_2')

#### 4. Decide which Data we use for further analytics:

In [None]:
# Set variables from netcdf files
time2 = xrbuoy2.variables['time'][:]
windspeed_mcp_buoy2 = buoy2_file.groups['ZX_LIDAR_WLBZ_6_MCP'].variables['wind_speed'][:]
windspeed2 = buoy2_file.groups['ZX_LIDAR_WLBZ_2'].variables['wind_speed'][:]

time6 = xrbuoy6.variables['time'][:]
windspeed_mcp_buoy6 = buoy6_file.groups['ZX_LIDAR_WLBZ_2_MCP'].variables['wind_speed'][:]
windspeed6 = buoy6_file.groups['ZX_LIDAR_WLBZ_6'].variables['wind_speed'][:]

In [None]:
plotting.plot_buoy_data(time2, windspeed2, time6, windspeed6, windspeed_mcp_buoy2, windspeed_mcp_buoy6)

#### 6. Select variables of interest
- heights for buoy 6: 14 42 94 140 200 250
- indices for the heights: 0 1 2 3 4 5 

In future we gonna work only with the height measurements of 140 and 200 meters to interpolate 
these two heights to the hub height of 150 meter.
This is why we only convert windspeeds[:, 0, 0, 3] and winddirection_buoy_2[:, 0, 0, 3] for example.
Because these indicies stand for the two heights of interest.

#### 7. Create a dataframe out of the variables of interest

In [None]:
def create_buoy_dataframes(time, windspeed_140, winddirection_140, windspeed_200, winddirection_200):
    # Convert time variables to pandas datetime (though it's already in datetime64[ns] format)
    time = pd.to_datetime(time, unit='ns', origin='unix')
    
    # Create a single DataFrame for the buoy with measurements at 140m and 200m heights
    df_buoy = pd.DataFrame({
        'time': time,
        'wind_speed_140m': windspeed_140,
        'wind_direction_140m': winddirection_140,
        'wind_speed_200m': windspeed_200,
        'wind_direction_200m': winddirection_200
    }).set_index('time')
    
    return df_buoy

df_buoy_2 = create_buoy_dataframes(
    time2,
    windspeed2[:, 0, 0, 3],
    winddirection_buoy_2[:, 0, 0, 3],
    windspeed2[:, 0, 0, 4],
    winddirection_buoy_2[:, 0, 0, 4]
)

df_buoy_6 = create_buoy_dataframes(
    time6,
    windspeed6[:, 0, 0, 3],
    winddirection_buoy_6[:, 0, 0, 3],
    windspeed6[:, 0, 0, 4],
    winddirection_buoy_6[:, 0, 0, 4]
)

#close the files! 
buoy6_file.close()
buoy2_file.close() 


#### 8. Check for data gaps, NaN values, and duplicated data

In [None]:
def check_data_gaps(dataframe):
    
    dataframe.index = dataframe.index.floor('s')  # Truncate microseconds

    # Round timestamps to the nearest 10 minutes
    dataframe.index = dataframe.index.round('10min')
    
    # Generate a complete time range based on the data frequency
    full_time_range = pd.date_range(start=dataframe.index.min(), end=dataframe.index.max(), freq='10min')
    dataframe = dataframe.reindex(full_time_range)
    
    missing_data = dataframe[dataframe.isnull().any(axis=1)]
    
    total_expected = len(full_time_range)
    total_actual = len(dataframe.dropna())
    availability = (total_actual / total_expected) * 100
    
    print(f"Data Availability is {availability:.2f}%")
    
    if not missing_data.empty:
        print("Missing time periods are:")
        print(missing_data.index)
    else:
        print("No data gaps are found.")
    
    return missing_data

#This looks for duplicates and NaN values at the same time!
def drop_duplicates(dataframe):
    # Identify duplicate rows
    duplicates = dataframe[dataframe.duplicated(keep=False)]
    
    # Drop duplicates
    no_duplicate_data = dataframe.drop_duplicates()
    
    # Calculate data availability
    total_expected = len(dataframe)
    total_actual = len(no_duplicate_data)
    availability = (total_actual / total_expected) * 100
    
    print(f"Data Availability is {availability:.2f}%")
    
    if not duplicates.empty:
        # Set display options to show the entire DataFrame
        #with pd.option_context('display.max_rows', None, 'display.max_columns', None):
        print(duplicates)
    else:
        print("No duplicates are found.")
    
    return duplicates

def explore_and_prefilter_df(dataframe):
   check_data_gaps(dataframe)
   drop_duplicates(dataframe)

explore_and_prefilter_df(df_buoy_2)

#### 9. Filter incorrect data points
#### 10. Select data for only one year, so that buoy 2 and buoy 6 have the same length

In [None]:
def replace_nan_and_select_1yr(dataframe):
    #dataframe = dataframe.fillna(dataframe.mean())
    dataframe = dataframe.ffill()
    dataframe = dataframe.iloc[:52560]
    return dataframe

filtered_buoy2 = replace_nan_and_select_1yr(df_buoy_2)
#filtered_buoy2

In [None]:
explore_and_prefilter_df(df_buoy_6)
filtered_buoy6 = replace_nan_and_select_1yr(df_buoy_6)
#filtered_buoy6

#### 11. Interpolate

We use the filtered data to do a linear interpolation to the selected hub height of 150 m. We choose linear interpolation bcs. the wind climate normally behaves exponential with the height. At our interpolation height (140 m - 200 m) we are pretty high, so we assume a linear relationship btw these points, with a high  gradient. 

#### 12. Create a new dataframe for the interpolated data

In [None]:
def interpolate_arrays(array1, array2, height1, height2, target_height):

    # Calculate the interpolation factor
    factor = (target_height - height1) / (height2 - height1) #factor = (x-x1)/(x2-x1)
    # Perform the interpolation
    interpolated_array = array1 + factor * (array2 - array1) #formular: y1 + factor *(y2 - y1)

    return interpolated_array

ws6_150m = interpolate_arrays(filtered_buoy6['wind_speed_140m'], filtered_buoy6['wind_speed_200m'], 140, 200, 150)
wd6_150m = interpolate_arrays(filtered_buoy6['wind_direction_140m'], filtered_buoy6['wind_direction_200m'], 140, 200, 150)
ws2_150m = interpolate_arrays(filtered_buoy2['wind_speed_140m'], filtered_buoy2['wind_speed_200m'], 140, 200, 150)
wd2_150m = interpolate_arrays(filtered_buoy2['wind_direction_140m'], filtered_buoy2['wind_direction_200m'], 140, 200, 150)

# Create a new dataframe with the interpolated arrays
df_interpol_height = pd.DataFrame({
    'ws6_150m': ws6_150m,
    'wd6_150m': wd6_150m,
    'ws2_150m': ws2_150m,
    'wd2_150m': wd2_150m
})

### 13. Export, print the new dataframe and save it for the: 03_Short_Term_Wind_Climate

In [None]:
df_interpol_height.to_csv('interpolated_ws_and_wd_for_150_m.csv', index=True)
df_interpol_height


#### 14. Compare Buoy 2 and 6 to ensure that the data processing didn’t go wrong
- R2=1: This indicates a perfect fit, meaning that the regression line explains 100% of the variance in the dependent variable.
- 0.9≤R^2<1: Indicates an excellent fit, suggesting that the model explains a very high proportion of the variance.
- 0.7≤R^2<0.9: Indicates a good fit, suggesting that the model explains a substantial proportion of the variance.
- 0.5≤R^2<0.7: Indicates a moderate fit, meaning the model explains a reasonable amount of the variance, but there is still significant unexplained variance.
- R^2<0.5: Indicates a poor fit, suggesting that the model does not explain much of the variance in the dependent variable.

In [None]:
def plot_scatter_with_regression(x, y, xlabel, ylabel, title):
    # Perform linear regression
    slope, intercept, r_value, p_value, std_err = linregress(x, y)

    # Create scatter plot
    plt.figure(figsize=(10, 6))
    plt.scatter(x, y, label='Data points') #s=1 change scatter point size 

    # Plot regression line
    plt.plot(x, slope * x + intercept, color='red', label=f'Regression line (R^2 = {r_value**2:.2f})')

    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    plt.legend()
    plt.show()

# Scatter plot for wind speed for 52560 intervall points = one year 
plot_scatter_with_regression(df_interpol_height['ws6_150m'], df_interpol_height['ws2_150m'], 'ws6_150m', 'ws2_150m', 'Wind Speed Comparison at 150m') 
# Scatter plot for wind direction
plot_scatter_with_regression(df_interpol_height['wd6_150m'], df_interpol_height['wd2_150m'], 'wd6_150m', 'wd2_150m', 'Wind Direction Comparison at 150m')


#### 15. Plot time series of the wind speed for both met masts

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 3), sharey=True, sharex=True)

axes[0].plot(filtered_buoy6.index, ws6_150m)
axes[0].set_title('Wind Speed for interpolated 150 m at Buoy 6')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Wind Speed (m/s)')

axes[1].plot(filtered_buoy2.index, ws2_150m)
axes[1].set_title('Wind Speed for interpolated 150 m at Buoy 2')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Wind Speed (m/s)')


# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plots
plt.show()

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 3), sharey=True, sharex=True)

axes[0].plot(filtered_buoy6.index[300:600], ws6_150m[300:600])
axes[0].set_title('Wind Speed for interpolated 150 m at Buoy 6')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Wind Speed (m/s)')
axes[0].tick_params(axis='x', rotation=45)  

axes[1].plot(filtered_buoy2.index[300:600], ws2_150m[300:600])
axes[1].set_title('Wind Speed for interpolated 150 m at Buoy 2')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Wind Speed (m/s)')
axes[1].tick_params(axis='x', rotation=45)  


# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plots
plt.show()

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 3), sharey=True, sharex=True)

# Plotting with dots using scatter
axes[0].scatter(filtered_buoy6.index[300:600], ws6_150m[300:600], marker='+')
axes[0].set_title('Wind Speed for interpolated 150 m at Buoy 6')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Wind Speed (m/s)')
axes[0].tick_params(axis='x', rotation=45)  

axes[1].scatter(filtered_buoy2.index[300:600], ws2_150m[300:600], marker='+')
axes[1].set_title('Wind Speed for interpolated 150 m at Buoy 2')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Wind Speed (m/s)')
axes[1].tick_params(axis='x', rotation=45)  

# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plots
plt.show()

In [None]:
# Close all datasets again! 

xrbuoy6.close()
xrbuoy2.close()

buoy2_file.close()
buoy6_file.close()