# Wind Turbine Pitch Prediction – Machine Learning Model Comparison

This notebook loads a cleaned dataset, performs a 70/15/15 train-validation-test split, scales the features, trains multiple regression models, and compares their performance using MAE, RMSE, and R².

---

## Table of Contents

1. **Import Libraries**
2. **Load Dataset, Define Features and Split Data**
3. **Scale Inputs**
4. **Define Models**
5. **Metric Function**
6. **Train Models and Compute Metrics**
7. **Compare Results**


# 1. **Import Libraries**

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


# 2. **Load Dataset, Define Features and Split Data**

In [2]:
DATA_PATH = "../data/region3_clean.csv"

df = pd.read_csv(DATA_PATH)

FEATURES = ["wind_speed", "rotor_speed", "power"]
TARGET = "pitch"

X = df[FEATURES].values
y = df[TARGET].values

# 70 / 15 / 15 split (same random_state!)
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42, shuffle=True
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, shuffle=True
)


# 3. **Scale Inputs**

In [3]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled   = scaler.transform(X_val)
X_test_scaled  = scaler.transform(X_test)


# 4. **Define Models**

In [4]:
models = {
    "LinearRegression": LinearRegression(),
    "RandomForest": RandomForestRegressor(
        n_estimators=300,
        random_state=42,
        n_jobs=-1
    ),
    "GradientBoosting": GradientBoostingRegressor(
        random_state=42
    ),
    "SVR": SVR(
        kernel="rbf",
        C=10.0,
        epsilon=0.1
    ),
    "KNN": KNeighborsRegressor(
        n_neighbors=5
    ),
    "MLP": MLPRegressor(
        hidden_layer_sizes=(64, 64),
        activation="relu",
        solver="adam",
        learning_rate_init=0.001,
        max_iter=500,
        random_state=42
    ),
}


# 5. **Metric Function**

In [5]:
def compute_metrics(y_true, y_pred):
    mae  = mean_absolute_error(y_true, y_pred)
    mse  = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    r2   = r2_score(y_true, y_pred)
    return mae, rmse, r2


# 6. **Train Models and Compute Metrics**

In [6]:
results = []

for name, model in models.items():
    print(f"Training {name}...")
    model.fit(X_train_scaled, y_train)

    for split_name, X_s, y_s in [
        ("Train", X_train_scaled, y_train),
        ("Validation", X_val_scaled, y_val),
        ("Test", X_test_scaled, y_test),
    ]:
        y_pred = model.predict(X_s)
        mae, rmse, r2 = compute_metrics(y_s, y_pred)
        results.append({
            "Model": name,
            "Split": split_name,
            "MAE": mae,
            "RMSE": rmse,
            "R2": r2,
        })

results_df = pd.DataFrame(results)
results_df


Training LinearRegression...
Training RandomForest...
Training GradientBoosting...
Training SVR...
Training KNN...
Training MLP...


Unnamed: 0,Model,Split,MAE,RMSE,R2
0,LinearRegression,Train,0.570917,0.71379,0.951755
1,LinearRegression,Validation,0.607104,0.739467,0.944842
2,LinearRegression,Test,0.600522,0.739435,0.941808
3,RandomForest,Train,0.229736,0.436071,0.981994
4,RandomForest,Validation,0.595458,0.71887,0.947872
5,RandomForest,Test,0.586336,0.7373,0.942144
6,GradientBoosting,Train,0.441031,0.560225,0.970281
7,GradientBoosting,Validation,0.582687,0.709876,0.949168
8,GradientBoosting,Test,0.58489,0.729454,0.943369
9,SVR,Train,0.524572,0.755544,0.945945


# 7. **Compare Results**

In [7]:
test_results = (
    results_df[results_df["Split"] == "Test"]
    .sort_values(by="R2", ascending=False)
    .reset_index(drop=True)
)
test_results


Unnamed: 0,Model,Split,MAE,RMSE,R2
0,SVR,Test,0.561609,0.682786,0.950383
1,MLP,Test,0.557732,0.693551,0.948806
2,GradientBoosting,Test,0.58489,0.729454,0.943369
3,RandomForest,Test,0.586336,0.7373,0.942144
4,LinearRegression,Test,0.600522,0.739435,0.941808
5,KNN,Test,0.595254,0.752733,0.939696
