<h1 style="text-align: center;" markdown="1">Predicting Power Output in Large-Scale Wave Energy Farms</h1> 
<h2 style="text-align: center;" markdown="2">Optimisation of large wave farms using a multi strategy evolutionary framework</h2>
<h3 style="text-align: center;" markdown="3">Kevin Obote (ADM: 190696)</h3>

## Table of Contents
1. [Introduction](#introduction)
2. [Methodology, Results and Discussion](#methodology-results-and-discussion)
   1. [Data Description](#data-description)
   2. [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
   3. [Data Cleaning/Pre-treatment](#data-cleaning-pre-treatment)
   4. [Predictive Data Analytics](#predictive-data-analytics)
3. [Conclusion](#conclusion)
4. [References](#references)

---

## Introduction

### Background
Wave energy is a rapidly advancing renewable energy source that harnesses the power of ocean waves to generate electricity. It holds great promise in addressing global challenges such as climate change and energy security. However, optimizing the energy output in large wave farms is a complex problem. The hydrodynamic interactions between wave energy converters (WECs) make calculations computationally expensive and challenging. As such, developing efficient and accurate models to predict the power output of wave farms is crucial for the advancement of this technology.

### Research Problem
The primary research problem addressed in this project is the optimization of energy output in large-scale wave farms. The dataset used in this study consists of configurations involving 49 and 100 WECs, along with their power outputs and related variables. The challenge lies in accurately predicting the total power output of the wave farm based on these configurations. This requires overcoming the computational difficulties associated with the interactions between multiple WECs, which impact the overall efficiency of the wave farm.

### Objectives
1. **Develop a Predictive Model**: Create a machine learning model to accurately estimate the total power output of large-scale wave farms based on WEC configurations.
2. **Analyze Key Features**: Identify and analyze the most significant features that influence the power output of wave farms.
3. **Optimize Model Performance**: Implement various machine learning techniques and evaluate their performance to ensure the model's accuracy and efficiency.

### Hypothesis
1. **WEC Configuration Impact**: The configuration of WECs significantly impacts the total power output of a wave farm.
2. **Machine Learning Efficiency**: A well-trained machine learning model can accurately predict the power output of a wave farm, reducing the need for complex hydrodynamic calculations.

## Methodology, Results and Discussion

### Data Description
The dataset used in this project was created to develop a surrogate model for predicting the total power output of large wave farms. It contains 63,600 instances and 149 features, covering 49 and 100 WEC configurations under Perth and Sydney wave scenarios.

#### Source of Data
- **Creators**: Mehdi Neshat, Bradley Alexander, Nataliia Sergiienko, Markus Wagner
- **Published**: 2023
- **License**: Creative Commons Attribution 4.0 International (CC BY 4.0)
- **DOI**: [10.24432/C5GG7Q](https://doi.org/10.24432/C5GG7Q)

#### Period Collected
- **Year and Month/Day**: Data was donated on September 16, 2023.

#### How it was Collected
The dataset was derived from a study published at the GECCO conference, which used a multi-strategy evolutionary framework to optimize large wave farms.

#### Under What Conditions it was Collected
The data collection involved extensive simulations using the Phoenix HPC service at the University of Adelaide to account for hydrodynamic interactions between WECs.

#### Variables
| Variable Name | Role    | Type    | Description                              | Units | Missing Values |
| ------------- | ------- | ------- | ---------------------------------------- | ----- | --------------- |
| X1            | Feature | Integer | X-coordinate of the 1st WEC              | -     | No              |
| Y1            | Feature | Integer | Y-coordinate of the 1st WEC              | -     | No              |
| ...           | ...     | ...     | ...                                      | ...   | ...             |
| Xn            | Feature | Integer | X-coordinate of the nth WEC              | -     | No              |
| Yn            | Feature | Integer | Y-coordinate of the nth WEC              | -     | No              |
| Power         | Target  | Real    | Total power output of the wave farm      | kW    | No              |
| Q-factor      | Feature | Real    | Hydrodynamic interaction factor          | -     | No              |

### Exploratory Data Analysis (EDA)

#### Descriptive Analytics
We start with basic statistics and visualizations to understand the distribution and relationships within the data.



## Conclusion

This project successfully developed a surrogate model to predict the power output of large-scale wave energy farms. Through data exploration, cleaning, and the application of machine learning techniques, the model achieved significant accuracy, providing valuable insights into optimizing wave farm configurations. Future work may involve exploring more advanced models and incorporating additional environmental variables to further enhance prediction accuracy.

## References

- Neshat, M., Alexander, B., Sergiienko, N., & Wagner, M. (2023). Large-scale Wave Energy Farm. UCI Machine Learning Repository. [DOI: 10.24432/C5GG7Q](https://doi.org/10.24432/C5GG7Q)
- Neshat, M., Alexander, B., Sergiienko, N., & Wagner, M. (2020). Optimisation of large wave farms using a multi-strategy evolutionary framework. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 1150-1158.

## Step 1 : Importing the required libraries

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_absolute_error, r2_score



# Step 2: Load the dataset and dipaly information about the datasets

In [6]:
# Load the datasets
perth_49 = pd.read_csv('WEC_Perth_49.csv')
perth_100 = pd.read_csv('WEC_Perth_100.csv')
sydney_49 = pd.read_csv('WEC_Sydney_49.csv')
sydney_100 = pd.read_csv('WEC_Sydney_100.csv')

# Display information about the datasets
# Function to display information about a wec
def wec_info(df, name):
    print(f"\nInformation for {name}")
    print("-" * 40)
    print(df.info())
    print("\nShape:", df.shape)
    print("\nStatistical Summary:")
    print(df.describe())
    print("\nFirst few rows:")
    print(df.head())

wec_info(perth_49, "WEC Perth 49")





Information for WEC Perth 49
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36043 entries, 0 to 36042
Columns: 149 entries, X1 to Total_Power
dtypes: float64(149)
memory usage: 41.0 MB
None

Shape: (36043, 149)

Statistical Summary:
                 X1            Y1            X2            Y2            X3  \
count  36043.000000  36043.000000  36043.000000  36043.000000  36043.000000   
mean     366.597060     18.709550    426.314033     51.085762    477.295590   
std      307.911246     44.043295    265.781316     90.151852    270.322011   
min        0.000000      0.000000      0.000000      0.000000      0.000000   
25%       65.770000      0.000000    200.000000      0.000000    289.950000   
50%      250.000000      0.000000    346.090000     37.520000    400.000000   
75%      600.000000      0.000000    745.980000     37.900000    689.800000   
max     1000.000000    885.590000   1000.000000    939.260000   1000.000000   

          

In [7]:
wec_info(sydney_49, "WEC Sydney 49")


Information for WEC Sydney 49
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17964 entries, 0 to 17963
Columns: 149 entries, X1 to Total_Power
dtypes: float64(149)
memory usage: 20.4 MB
None

Shape: (17964, 149)

Statistical Summary:
                 X1            Y1            X2            Y2            X3  \
count  17964.000000  17964.000000  17964.000000  17964.000000  17964.000000   
mean     138.863588      3.718730    142.885799     66.752149    148.172411   
std      167.910813     28.398116    166.600732     26.712075    166.469037   
min        0.000000      0.000000      0.000000      0.000000      0.000000   
25%        1.000000      0.000000      1.000000     51.000000      1.000000   
50%      198.000000      0.000000    195.960000     70.000000    192.360000   
75%      198.000000      1.000000    197.110000     75.650000    193.700000   
max     1000.000000    988.260000   1000.000000    989.650000   1000.000000   

         

In [8]:
wec_info(perth_100, "WEC Perth 100")



Information for WEC Perth 100
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7277 entries, 0 to 7276
Columns: 302 entries, X1 to Total_Power
dtypes: float64(302)
memory usage: 16.8 MB
None

Shape: (7277, 302)

Statistical Summary:
                X1           Y1           X2           Y2           X3  \
count  7277.000000  7277.000000  7277.000000  7277.000000  7277.000000   
mean    446.407261    15.870636   429.245818    44.186641   406.751613   
std     310.463546    77.125951   288.770531    46.994054   287.804577   
min       0.000000     0.000000     0.000000     0.000000     0.000000   
25%     200.000000     0.000000   146.230000    37.400000   103.000000   
50%     400.000000     0.000000   346.030000    37.480000   318.770000   
75%     600.000000     0.000000   546.090000    37.530000   489.870000   
max    1400.000000  1353.550000  1400.000000  1277.640000  1414.000000   

                Y3           X4           Y4           X5

In [9]:
wec_info(sydney_100, "WEC Sydney 100")


Information for WEC Sydney 100
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2318 entries, 0 to 2317
Columns: 302 entries, X1 to Total_Power
dtypes: float64(302)
memory usage: 5.3 MB
None

Shape: (2318, 302)

Statistical Summary:
                X1           Y1           X2           Y2           X3  \
count  2318.000000  2318.000000  2318.000000  2318.000000  2318.000000   
mean    177.162584     8.159819   204.669676    64.119892   228.071639   
std     174.211383    52.395345   172.438092    79.224562   181.670898   
min       0.000000     0.000000     0.000000     0.000000     0.000000   
25%      48.000000     0.000000   100.000000    51.000000   192.570000   
50%     198.000000     0.000000   197.070000    72.520000   193.700000   
75%     198.000000     1.000000   201.000000    77.580000   250.000000   
max    1398.000000  1381.090000  1414.000000  1316.750000  1400.000000   

                Y3           X4           Y4           X5

In [11]:
# Function to check for and handle null values
def handle_null_values(df):
    print("\nChecking for null values:")
    null_values = df.isnull().sum()
    print(null_values[null_values > 0])
    
    # Fill null values with mean of the respective columns
    df.fillna(df.mean(), inplace=True)
    
    # print("\nNull values after filling:")
    # null_values_after = df.isnull().sum()
    # print(null_values_after[null_values_after > 0])

handle_null_values(perth_49)
handle_null_values(sydney_49)
handle_null_values(perth_100)
handle_null_values(sydney_100)



Checking for null values:
Series([], dtype: int64)

Checking for null values:
Series([], dtype: int64)

Checking for null values:
Series([], dtype: int64)

Checking for null values:
Series([], dtype: int64)
