<img src="https://upload.wikimedia.org/wikipedia/commons/0/06/Imperial_College_London_new_logo.png" alt="Imperial Logo" width="400">

### **Course:** CIVE70111 Machine Learning
### Task 3 PV Plant Modelling and Machine Learning Pipeline

**Project:** Solar PV Plant 1 & Plant 2 — Cleaning, Modelling & Bias–Variance Analysis

**Date:** 09/12/2025  

<p align="right">
Created by: Michael Wong

## 📑 Table of Contents

### 📘 1. [Project Overview & Workflow](#project-overview)
### 🧮 2. [Mathematical Formulation](#mathematical-formulation)

---

## 🔧 Data Cleaning & Preparation

### 🗂️ 3. [Data Cleaning & Pipelines](#data-pipelines)
#### 3.1 🔧 [Shared Utilities & ML Helpers](#utilities-ml)
#### 3.2 🧼 [Plant 1 Cleaning Pipeline](#plant1-pipeline)
#### 3.3 🧼 [Plant 2 Cleaning Pipeline](#plant2-pipeline)

### 💾 4. [Export Cleaned Data to CSV (Plant 1 & 2)](#export-clean)

### 📅 5. [Daily Splitting per Inverter (Plant 1)](#daily-split)

---

## 🤖 Model Training Framework

### 🧠 6. [Per-Inverter Training: Combined & Per-Day](#per-inverter-training)

### 📊 7. [Global Model Comparison Across Inverters](#global-comparison)

### 🔍 8. [Bias–Variance Analysis & NN Diagnostics](#bias-variance)

### 📉 9. [Neural Network Training Visualisation (Plant 1)](#nn-visualisation)

### 🌅 10. [Time-of-Day Operational Plots (Plant 1)](#time-of-day-plots)

---

## 🌞 Plant 2 Modelling

### 🔁 11. [Plant 2 Inverter Training Loop](#plant2-train-loop)

### 🧠 12. [Neural Network Training Visualisation (Plant 2)](#plant2-nn-visualisation)

### 📈 13. [Bias–Variance Proxy Analysis (Plant 2)](#plant2-bias-variance)

### 📊 14. [Global Model Comparison & NN Diagnostics (Plant 2)](#plant2-model-comparison)

---

<a id="project-overview"></a>

# 1. Project Overview & Workflow

This project develops a complete machine-learning framework for modelling and analysing the behaviour of two solar PV plants (**Plant 1** and **Plant 2**) using SCADA inverter data and weather sensor measurements.

The workflow integrates **data cleaning**, **feature engineering**, **per-inverter modelling**, and **global performance diagnostics**, covering both technical modelling quality and operational behaviour.

---

## 🔄 End-to-End Workflow

### **1️⃣ Raw Data Ingestion**
- Import inverter generation data for **Plant 1** and **Plant 2**
- Import weather station measurements:
  - Irradiation  
  - Ambient temperature  
  - Module temperature  

### **2️⃣ Data Cleaning & Alignment**
- Fix timestamp inconsistencies (e.g., swapped day/month formats)
- Clean irradiation signals using daylight rules
- Remove physically impossible values:
  - Negative or zero power under sunlight  
  - Outliers in AC/DC power and yields  
- Merge weather + inverter datasets on aligned timestamps

### **3️⃣ Feature Engineering**
- Core ML features:
  - `IRRADIATION_CLEAN`
  - `AMBIENT_TEMPERATURE`
  - `MODULE_TEMPERATURE`
  - `DAILY_YIELD_CLEAN`
- Target variables:
  - `DC_CLEAN`
  - `AC_CLEAN`
- Optional:
  - Time-of-day cyclical encodings  
  - Day-of-week indicators  

### **4️⃣ Daily Splitting (Plant 1 & Plant 2)**
- Each inverter’s cleaned dataset is split into **one CSV per day**
- Used for per-day modelling (parallel training)

### **5️⃣ Per-Inverter Modelling Framework**
Each inverter is modelled independently using:

- Linear Regression  
- Ridge Regression  
- Lasso Regression  
- Random Forest  
- MLP Neural Network  

Two training regimes:

#### **Combined Training**
- All days concatenated into one dataset  
- One model per inverter

#### **Parallel Per-Day Training**
- One model per day  
- Captures **temporal variability** and **noise sensitivity**

### **6️⃣ Results & Metrics (per inverter)**
Stored in `_results.pkl` files:

- RMSE & MAE for DC and AC prediction  
- Per-day RMSE/MAE distributions  
- Neural network training diagnostics:
  - Loss curves  
  - Iterations  
  - Learning rate  
  - Momentum  
  - Total weights  
  - Training time  

### **7️⃣ Global Analysis Across All Inverters**
We aggregate all inverter results to analyse:

- Model performance consistency  
- Bias–variance behaviour  
- Stability across days  
- Neural network behaviour across plants

### **8️⃣ Visualisation & Diagnostics**

The project produces:

- Time-of-day operational plots  
- Combined vs parallel model comparison charts  
- Boxplots of RMSE/MAE distributions  
- Neural network loss curves (AC & DC)  
- Bias-variance planes  
- Diagnostic histograms and barplots for NN settings  

---

## 🎯 Project Goals

1. **Build a reproducible ML pipeline** for both plants  
2. **Evaluate multiple ML algorithms** under combined and per-day setups  
3. **Assess stability, robustness, and bias–variance tradeoffs**  
4. **Understand operational behaviour** using time-of-day analysis  
5. **Generate interpretable visual diagnostics** across all inverters  

---

## 📦 Output Summary

At the end of the pipeline, you obtain:

- Fully cleaned datasets  
- Daily inverter CSVs  
- Per-inverter model folders (plots + results)  
- Master results files for Plant 1 and Plant 2  
- Global bias–variance and model comparison plots  
- Neural network diagnostic plots  
- Operational time-of-day charts  

---

<a id="mathematical-formulation"></a>

## 2. Mathematical Formulation

### 2.1 Data Representation

For each inverter \( v \), we observe time-indexed measurements:

- Weather and operating features:
  \[
  \mathbf{x}_t = \big(\text{IRRADIATION}, \text{AMBIENT\_TEMPERATURE}, \text{MODULE\_TEMPERATURE}, \text{DAILY\_YIELD}, \dots \big)_t
  \]
- Targets:
  \[
  y^{(DC)}_t = \text{DC\_CLEAN}_t, \qquad
  y^{(AC)}_t = \text{AC\_CLEAN}_t
  \]

For a given inverter, stacking all samples gives:
\[
X \in \mathbb{R}^{N \times d}, \quad
\mathbf{y}^{(DC)}, \mathbf{y}^{(AC)} \in \mathbb{R}^N
\]

We also split by **day**:
\[
X^{(d)},\ \mathbf{y}^{(DC, d)},\ \mathbf{y}^{(AC, d)}
\]
for each calendar day \( d \).

---

### 2.2 Combined vs Parallel Training

**Combined training:** all days concatenated
\[
X^{(\text{comb})} = \bigcup_{d} X^{(d)}, \quad
\mathbf{y}^{(\text{comb})} = \bigcup_{d} \mathbf{y}^{(d)}
\]

A single model is trained on the full dataset.

**Parallel (per-day) training:**  
For each day \( d \), train a separate model and compute per-day errors:
\[
\text{RMSE}^{(d)} = \sqrt{\frac{1}{N_d} \sum_{t \in d} \big(y_t - \hat{y}_t\big)^2}
\]
We then summarise across days via the mean and standard deviation.

---

### 2.3 Models

**Linear Regression**
\[
\hat{y} = X \beta, \quad
\beta = \arg\min_\beta \|y - X\beta\|_2^2
\]

**Ridge Regression**
\[
\beta = \arg\min_\beta \|y - X\beta\|_2^2 + \lambda \|\beta\|_2^2
\]

**Lasso**
\[
\beta = \arg\min_\beta \|y - X\beta\|_2^2 + \lambda \|\beta\|_1
\]

**Random Forest**  
Bagging of regression trees; the prediction is the average across trees:
\[
\hat{y} = \frac{1}{T} \sum_{t=1}^T f_t(X)
\]

**Neural Network (MLP)**  
A feed-forward network with ReLU hidden layers, trained by backpropagation to minimise MSE:

\[
\mathcal{L} = \frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2
\]

The training history is recorded in `loss_curve_` and used for convergence and cost-function plots.

---

### 2.4 Metrics

For each model \( m \), target type (DC/AC) and inverter:

- Root Mean Squared Error (RMSE)
  \[
  \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2}
  \]

- Mean Absolute Error (MAE)
  \[
  \text{MAE} = \frac{1}{N} \sum_{i=1}^N |y_i - \hat{y}_i|
  \]

These are computed both for:
- Combined training (all days merged)
- Parallel training (averaged across days)

---

### 2.5 Bias–Variance Proxies

We use the following **proxies**:

- **Bias proxy:** mean combined RMSE across inverters
  \[
  \text{BiasProxy}_m = \mathbb{E}[\text{RMSE}^{(\text{combined})}_m]
  \]

- **Variance proxy:** standard deviation of per-day RMSE across days and inverters
  \[
  \text{VarProxy}_m = \text{Std}\big(\text{RMSE}^{(d)}_m\big)
  \]

This allows us to plot each model in a “bias–variance plane” and visually compare regimes:
- High-bias / low-variance models (e.g. Linear, Ridge)
- Lower-bias / higher-variance models (e.g. Random Forest, Neural Network)


<a id="data-pipelines"></a>
<a id="utilities-ml"></a>

## 3. Data Cleaning & Pipelines

### 3.1 Shared Utilities & ML Helpers (`Utilities.py`)

This section defines:

- Raw file loading (Plant 1 & 2, weather)
- Time-feature engineering
- ML dataset builders
- Generic model training and evaluation
- Plant 1 cleaning pipeline
- Plant 2 cleaning pipeline
- Inverter experiment runner (combined + per-day)


In [7]:
# ================================================================
# Utilities.py  (Unified Utilities for Plant 1 & 2 pipelines)
# ================================================================
import matplotlib
matplotlib.use("Agg")      # MUST come before pyplot is imported

import matplotlib.pyplot as plt
import os
import numpy as np
import pandas as pd
import datetime as dt
import time
import glob

from sklearn.model_selection import train_test_split, learning_curve
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
)
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor


# -------------------------------------------------------------------
#                       RAW FILE LOADING
# -------------------------------------------------------------------

def load_raw_files(folder: str):
    """Load the 4 CSV datasets from the folder."""
    plant_1 = pd.read_csv(os.path.join(folder, "Plant_1_Generation_Data_updated.csv"))
    weather_1 = pd.read_csv(os.path.join(folder, "Plant_1_Weather_Sensor_Data.csv"))
    plant_2 = pd.read_csv(os.path.join(folder, "Plant_2_Generation_Data.csv"))
    weather_2 = pd.read_csv(os.path.join(folder, "Plant_2_Weather_Sensor_Data.csv"))
    return plant_1, weather_1, plant_2, weather_2


# ===================================================================
#     ML + FEATURE ENGINEERING UTILITIES (Aligned with Cleaned Data)
# ===================================================================

def add_time_features(df: pd.DataFrame):
    """Add time-of-day + cyclical encodings to cleaned dataframe."""
    df = df.copy()
    df["HOUR"] = df.index.hour
    df["DAY_OF_WEEK"] = df.index.dayofweek

    df["HOUR_SIN"] = np.sin(2 * np.pi * df["HOUR"] / 24)
    df["HOUR_COS"] = np.cos(2 * np.pi * df["HOUR"] / 24)

    return df


def build_X_y(df_clean: pd.DataFrame, target_col: str):
    """
    Build feature matrix X and target y from CLEANED df_ps1 inverter data.
    """
    df = df_clean.copy()

    # remove nighttime zero-irradiation rows if predicting AC/DC
    if "IRRADIATION_CLEAN" in df.columns:
        df = df[df["IRRADIATION_CLEAN"] > 0]

    # add time features (index must be datetime)
    df = add_time_features(df)

    feature_cols = [
        c for c in [
            "IRRADIATION_CLEAN",
            "AMBIENT_TEMPERATURE",
            "MODULE_TEMPERATURE",
            "NUM_OPT",
            "NUM_SUBOPT",
            "HOUR", "DAY_OF_WEEK", "HOUR_SIN", "HOUR_COS"
        ] if c in df.columns
    ]

    X = df[feature_cols].values
    y = df[target_col].values

    return X, y, feature_cols


def get_models():
    return {
        "LinearRegression": LinearRegression(),
        "Ridge(alpha=1)": Ridge(alpha=1.0),
        "RandomForest(200)": RandomForestRegressor(
            n_estimators=200, random_state=42, n_jobs=-1
        ),
    }


def train_models(X_train, y_train):
    models = get_models()
    fitted = {}
    for name, model in models.items():
        model.fit(X_train, y_train)
        fitted[name] = model
    return fitted


def evaluate_models(fitted, X_train, y_train, X_test, y_test):
    results = []

    for name, model in fitted.items():
        pred_tr = model.predict(X_train)
        pred_te = model.predict(X_test)

        results.append({
            "model": name,
            "train_RMSE": mean_squared_error(y_train, pred_tr, squared=False),
            "test_RMSE": mean_squared_error(y_test, pred_te, squared=False),
            "train_MAE": mean_absolute_error(y_train, pred_tr),
            "test_MAE": mean_absolute_error(y_test, pred_te),
            "train_R2": r2_score(y_train, pred_tr),
            "test_R2": r2_score(y_test, pred_te),
        })

    return results


def plot_learning_curve(model, X, y, title):
    sizes, train_scores, test_scores = learning_curve(
        model,
        X,
        y,
        train_sizes=np.linspace(0.1, 1.0, 5),
        cv=5,
        scoring="neg_root_mean_squared_error",
        shuffle=True,
        random_state=42,
    )

    train_rmse = -np.mean(train_scores, axis=1)
    test_rmse = -np.mean(test_scores, axis=1)

    plt.figure()
    plt.plot(sizes, train_rmse, marker="o", label="Train RMSE")
    plt.plot(sizes, test_rmse, marker="s", label="Validation RMSE")
    plt.xlabel("Training Size")
    plt.ylabel("RMSE")
    plt.title(title)
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()


def train_mlp(X_train, y_train):
    mlp = MLPRegressor(
        hidden_layer_sizes=(64, 64),
        learning_rate_init=0.001,
        max_iter=200,
        random_state=42
    )
    mlp.fit(X_train, y_train)
    return mlp


def plot_loss_curve(mlp_model, title):
    if not hasattr(mlp_model, "loss_curve_"):
        print("MLP model has no loss curve.")
        return

    plt.figure()
    plt.plot(mlp_model.loss_curve_)
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title(title)
    plt.grid(True)
    plt.tight_layout()
    plt.show()


def compress_cleaned_15min(df: pd.DataFrame):
    """
    Safe 15-minute resampling for CLEANED inverter data.
    """
    df = df.copy()
    df = df.sort_values(df.index.name)

    # Resample to 15 min using first observation within window
    df_15 = df.resample("15T").first()

    # Forward-fill ONLY yield columns (if needed)
    for col in ["DAILY_YIELD_CLEAN", "TOTAL_YIELD_CLEAN"]:
        if col in df_15.columns:
            df_15[col] = df_15[col].fillna(method="ffill")

    return df_15


Code cell — Plant 1 pipeline, Plant 2 pipeline, training experiment

In [8]:
# ===============================================================
#                  PLANT 1 PIPELINE
# ===============================================================

def fix_plant_1_datetime(plant_1_raw: pd.DataFrame) -> pd.DataFrame:
    df = plant_1_raw.copy()

    start = pd.Timestamp('2020-05-15')
    end = pd.Timestamp('2020-06-18')

    df['parsed'] = pd.to_datetime(
        df['DATE_TIME'],
        format='%Y-%m-%d %H:%M:%S',
        errors='coerce'
    )

    invalid = df['parsed'].isna() | (~df['parsed'].between(start, end))

    df.loc[invalid, 'parsed'] = pd.to_datetime(
        df.loc[invalid, 'DATE_TIME'],
        format='%Y-%d-%m %H:%M:%S',
        errors='coerce'
    )

    df['DATE_TIME'] = df['parsed']
    return df.drop(columns=['parsed'])


def preprocess_plant_1(plant_1_df: pd.DataFrame):
    df = plant_1_df.copy()
    df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])

    df = df.drop(columns=['day'], errors='ignore')
    df.set_index('DATE_TIME', inplace=True)

    print("Plant 1 missing values:\n", df.isnull().sum())
    print("Plant 1 shape:", df.shape)

    p1_gp = df.groupby('SOURCE_KEY')
    inv_1 = {sk: g for sk, g in p1_gp}
    source_key_1 = df['SOURCE_KEY'].unique().tolist()

    print("Number of inverters:", len(source_key_1))
    print("Source keys:", source_key_1)

    return df, inv_1, source_key_1


def check_missing_per_inverter(inv_1: dict):
    for sk, df in inv_1.items():
        print(f"\nInverter {sk} missing values:")
        print(df.isnull().sum())
        print("Shape:", df.shape)


def check_constancy(inv_1: dict, source_key_1: list):
    cols = ['DC_POWER', 'AC_POWER', 'DAILY_YIELD', 'TOTAL_YIELD']
    for sk in source_key_1:
        g = inv_1[sk].groupby('DATE_TIME')
        check = g[cols].nunique() == 1
        not_constant = (~check).sum()
        print(f"\nConstancy check for {sk}:")
        print(not_constant)


def aggregate_inverters(inv_1: dict) -> dict:
    agg_inv_1 = {}

    for sk, df in inv_1.items():
        agg_df = df.groupby('DATE_TIME').agg(
            PLANT_ID=('PLANT_ID', 'first'),
            SOURCE_KEY=('SOURCE_KEY', 'first'),
            DC_POWER=('DC_POWER', 'first'),
            AC_POWER=('AC_POWER', 'first'),
            DAILY_YIELD=('DAILY_YIELD', 'first'),
            TOTAL_YIELD=('TOTAL_YIELD', 'first'),
            NUM_OPT=('Operating_Condition', lambda x: (x == 'Optimal').sum()),
            NUM_SUBOPT=('Operating_Condition', lambda x: (x == 'Suboptimal').sum())
        ).reset_index()

        agg_inv_1[sk] = agg_df

    return agg_inv_1


def preprocess_weather_1(weather_1_raw: pd.DataFrame) -> pd.DataFrame:
    df = weather_1_raw.copy()
    df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])

    df = df.drop(columns=['PLANT_ID', 'SOURCE_KEY'], errors='ignore')
    df.set_index('DATE_TIME', inplace=True)

    print("\nWeather missing values:\n", df.isnull().sum())
    return df


def clean_irradiation(weather_1_raw: pd.DataFrame) -> pd.DataFrame:
    df = weather_1_raw.copy()
    df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])

    day_start = dt.time(6, 0)
    day_end = dt.time(18, 30)

    df['expected_day'] = df['DATE_TIME'].dt.time.between(day_start, day_end)

    df['IRRADIATION_CLEAN'] = df['IRRADIATION'].copy()
    df.loc[(~df['expected_day']) & (df['IRRADIATION'] > 0), 'IRRADIATION_CLEAN'] = 0
    df.loc[(df['expected_day']) & (df['IRRADIATION'] == 0), 'IRRADIATION_CLEAN'] = np.nan

    df['IRRADIATION_CLEAN'] = df['IRRADIATION_CLEAN'].interpolate()
    df['IRRADIATION_CLEAN'] = df['IRRADIATION_CLEAN'].fillna(0)

    df = df.set_index('DATE_TIME')
    return df.drop(columns=['SOURCE_KEY'], errors='ignore')


def print_time_differences(agg_inv_1: dict, s1_c: pd.DataFrame):
    for sk, df in agg_inv_1.items():
        df1 = df.set_index('DATE_TIME')
        diff_1_not_2 = df1.index.difference(s1_c.index)
        diff_2_not_1 = s1_c.index.difference(df1.index)

        print(f"\n{sk}:")
        print("  In inverter but not weather:", len(diff_1_not_2))
        print("  In weather but not inverter:", len(diff_2_not_1))


def join_inverter_weather(agg_inv_1: dict, s1_c: pd.DataFrame) -> dict:
    wea_inv_1 = {}
    s1_c_clean = s1_c.drop(columns=['PLANT_ID'], errors='ignore')

    for sk, df in agg_inv_1.items():
        df = df.set_index('DATE_TIME')
        join_df = df.join(s1_c_clean, how='inner')
        wea_inv_1[sk] = join_df

    return wea_inv_1


def clean_ac_dc(wea_inv_1: dict) -> dict:
    df_step_1 = {}

    for sk, df in wea_inv_1.items():
        df = df.copy()

        df['AC_CLEAN'] = df['AC_POWER'].copy()
        df['DC_CLEAN'] = df['DC_POWER'].copy()

        night = df['IRRADIATION_CLEAN'] == 0
        df.loc[night & (df['AC_CLEAN'] > 0), 'AC_CLEAN'] = 0
        df.loc[night & (df['DC_CLEAN'] > 0), 'DC_CLEAN'] = 0

        day = df['IRRADIATION_CLEAN'] > 0
        df.loc[day & (df['AC_CLEAN'] == 0), 'AC_CLEAN'] = np.nan
        df.loc[day & (df['DC_CLEAN'] == 0), 'DC_CLEAN'] = np.nan

        df['AC_CLEAN'] = df['AC_CLEAN'].interpolate().fillna(0)
        df['DC_CLEAN'] = df['DC_CLEAN'].interpolate().fillna(0)

        df_step_1[sk] = df

    return df_step_1


def clean_daily_yield(df_step_1: dict) -> dict:
    df_step_2 = {}

    for sk, df in df_step_1.items():
        df = df.copy()
        df.index = pd.to_datetime(df.index)

        df['DAILY_YIELD_CLEAN'] = df['DAILY_YIELD'].copy()

        all_days = np.unique(df.index.date)
        for d in all_days:
            day_mask = df.index.date == d
            df_day = df.loc[day_mask]

            irr_pos = df_day['IRRADIATION_CLEAN'] > 0

            if not irr_pos.any():
                df.loc[day_mask, 'DAILY_YIELD_CLEAN'] = 0
                continue

            t_start = df_day[irr_pos].index[0]
            t_end = df_day[irr_pos].index[-1]

            night = day_mask & (df.index < t_start)
            evening = day_mask & (df.index > t_end)
            mid = day_mask & (df.index >= t_start) & (df.index <= t_end)

            df.loc[night, 'DAILY_YIELD_CLEAN'] = 0
            df.loc[evening, 'DAILY_YIELD_CLEAN'] = df.at[t_end, 'DAILY_YIELD']

            vals = df.loc[mid, 'DAILY_YIELD_CLEAN'].values.astype(float)
            invalid = vals <= 0

            if len(vals) > 1:
                drops = np.diff(vals) < 0
                invalid[1:][drops] = True

            idx = df.loc[mid].index
            df.loc[idx[invalid], 'DAILY_YIELD_CLEAN'] = np.nan
            df.loc[idx, 'DAILY_YIELD_CLEAN'] = df.loc[idx, 'DAILY_YIELD_CLEAN'].interpolate()

            prev = df.at[idx[0], 'DAILY_YIELD_CLEAN']
            for t in idx[1:]:
                cur = df.at[t, 'DAILY_YIELD_CLEAN']
                if pd.isna(cur) or cur < prev:
                    df.at[t, 'DAILY_YIELD_CLEAN'] = prev
                else:
                    prev = cur

        df_step_2[sk] = df

    return df_step_2


def clean_total_yield(df_step_2: dict) -> dict:
    df_ps1 = {}

    for sk, df in df_step_2.items():
        df = df.copy()
        df.index = pd.to_datetime(df.index)

        df['TOTAL_YIELD_CLEAN'] = df['TOTAL_YIELD'].copy()

        ts = df.index
        for i in range(1, len(ts)):
            t_prev = ts[i - 1]
            t = ts[i]

            new_day = t.date() != t_prev.date()

            TY_prev = df.at[t_prev, 'TOTAL_YIELD_CLEAN']
            TY_now = df.at[t, 'TOTAL_YIELD']

            DY_prev = df.at[t_prev, 'DAILY_YIELD_CLEAN']
            DY_now = df.at[t, 'DAILY_YIELD_CLEAN']

            if new_day:
                df.at[t, 'TOTAL_YIELD_CLEAN'] = TY_prev
                continue

            expected = TY_prev + (DY_now - DY_prev)
            df.at[t, 'TOTAL_YIELD_CLEAN'] = expected if TY_now < TY_prev else TY_now

        cols = [
            'PLANT_ID', 'SOURCE_KEY',
            'AC_CLEAN', 'DC_CLEAN',
            'DAILY_YIELD_CLEAN', 'TOTAL_YIELD_CLEAN',
            'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE',
            'IRRADIATION_CLEAN', 'NUM_OPT', 'NUM_SUBOPT'
        ]

        df_ps1[sk] = df[[c for c in cols if c in df.columns]]

    return df_ps1


def plot_inverter_cleaned(df_ps1: dict, source_key_1: list, idx: int = 0):
    sk = source_key_1[idx]
    df = df_ps1[sk]

    for column in df.columns:
        plt.figure()
        plt.plot(df.index, df[column])
        plt.title(f'{sk} — {column}')
        plt.xlabel('Time')
        plt.ylabel(column)
        plt.xticks(rotation=45)
        plt.tight_layout()


def run_plant_1_pipeline(folder: str):
    plant_1_raw, weather_1_raw, plant_2_raw, weather_2_raw = load_raw_files(folder)
    plant_1_fixed = fix_plant_1_datetime(plant_1_raw)
    plant_1_idx, inv_1, source_key_1 = preprocess_plant_1(plant_1_fixed)

    check_missing_per_inverter(inv_1)
    check_constancy(inv_1, source_key_1)

    agg_inv_1 = aggregate_inverters(inv_1)
    weather_1_idx = preprocess_weather_1(weather_1_raw)
    s1_c = clean_irradiation(weather_1_raw)

    print_time_differences(agg_inv_1, s1_c)
    wea_inv_1 = join_inverter_weather(agg_inv_1, s1_c)

    step1 = clean_ac_dc(wea_inv_1)
    step2 = clean_daily_yield(step1)
    df_ps1 = clean_total_yield(step2)

    return {
        "plant_1_indexed": plant_1_idx,
        "inv_1": inv_1,
        "agg_inv_1": agg_inv_1,
        "weather_1_indexed": weather_1_idx,
        "sensor_clean": s1_c,
        "wea_inv_1": wea_inv_1,
        "df_step_1": step1,
        "df_step_2": step2,
        "df_ps1": df_ps1,
        "source_key_1": source_key_1,
    }


# ===============================================================
#                  PLANT 2 PIPELINE
# ===============================================================

def preprocess_plant_2(plant_2_raw: pd.DataFrame):
    df = plant_2_raw.copy()
    df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"])
    df = df.drop(columns=["PLANT_ID"], errors="ignore")
    df.set_index("DATE_TIME", inplace=True)

    print("Plant 2 missing values:\n", df.isnull().sum())
    print("Plant 2 shape:", df.shape)

    p2_gp = df.groupby("SOURCE_KEY")
    inv_2 = {sk: g for sk, g in p2_gp}
    source_key_2 = df["SOURCE_KEY"].unique().tolist()

    print("Number of Plant 2 inverters:", len(source_key_2))
    print("Source keys:", source_key_2)

    return df, inv_2, source_key_2


def aggregate_inverters_2(inv_2: dict) -> dict:
    agg_inv_2 = {}

    for sk, df in inv_2.items():
        agg_df = df.groupby("DATE_TIME").agg(
            SOURCE_KEY=("SOURCE_KEY", "first"),
            DC_POWER=("DC_POWER", "first"),
            AC_POWER=("AC_POWER", "first"),
            DAILY_YIELD=("DAILY_YIELD", "first"),
            TOTAL_YIELD=("TOTAL_YIELD", "first"),
            NUM_OPT=("Operating_Condition", lambda x: (x == "Optimal").sum()),
            NUM_SUBOPT=("Operating_Condition", lambda x: (x == "Suboptimal").sum()),
        ).reset_index()

        agg_inv_2[sk] = agg_df

    return agg_inv_2


def clean_irradiation_2(weather_2_raw: pd.DataFrame) -> pd.DataFrame:
    df = weather_2_raw.copy()
    df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"])
    df = df.drop(columns=["PLANT_ID", "SOURCE_KEY"], errors="ignore")

    df["HOUR"] = df["DATE_TIME"].dt.hour
    df["EXPECTED_DAY"] = df["HOUR"].between(6, 18)

    df["IRRADIATION_CLEAN"] = df["IRRADIATION"].copy()

    df.loc[(~df["EXPECTED_DAY"]) & (df["IRRADIATION_CLEAN"] > 0), "IRRADIATION_CLEAN"] = 0
    df.loc[(df["EXPECTED_DAY"]) & (df["IRRADIATION_CLEAN"] == 0), "IRRADIATION_CLEAN"] = np.nan

    df["IRRADIATION_CLEAN"] = df["IRRADIATION_CLEAN"].interpolate().fillna(0)

    df = df.set_index("DATE_TIME")
    df = df.drop(columns=["IRRADIATION", "HOUR", "EXPECTED_DAY"], errors="ignore")

    return df


def join_inverter_weather_2(agg_inv_2: dict, s2_c: pd.DataFrame) -> dict:
    wea_inv_2 = {}

    for sk, df in agg_inv_2.items():
        df = df.set_index("DATE_TIME")
        join_df = df.join(s2_c, how="inner")
        wea_inv_2[sk] = join_df

    return wea_inv_2


def run_plant_2_pipeline(folder: str):
    plant_1_raw, weather_1_raw, plant_2_raw, weather_2_raw = load_raw_files(folder)

    plant_2_idx, inv_2, source_key_2 = preprocess_plant_2(plant_2_raw)
    agg_inv_2 = aggregate_inverters_2(inv_2)
    s2_c = clean_irradiation_2(weather_2_raw)
    wea_inv_2 = join_inverter_weather_2(agg_inv_2, s2_c)

    step1 = clean_ac_dc(wea_inv_2)
    step2 = clean_daily_yield(step1)
    df_ps2 = clean_total_yield(step2)

    return {
        "plant_2_indexed": plant_2_idx,
        "inv_2": inv_2,
        "agg_inv_2": agg_inv_2,
        "weather_2_clean": s2_c,
        "wea_inv_2": wea_inv_2,
        "df_step_1_p2": step1,
        "df_step_2_p2": step2,
        "df_ps2": df_ps2,
        "source_key_2": source_key_2,
    }


# Helper: Plant 2 plotting (for Script 2)
def plot_inverter_cleaned_plant2(df_ps2: dict, source_key_2: list, idx: int = 0):
    sk = source_key_2[idx]
    df = df_ps2[sk]

    for column in df.columns:
        plt.figure()
        plt.plot(df.index, df[column])
        plt.title(f'{sk} — {column}')
        plt.xlabel('Time')
        plt.ylabel(column)
        plt.xticks(rotation=45)
        plt.tight_layout()


# ===============================================================
#                  Training PIPELINE (run_inverter_experiment)
# ===============================================================

def run_inverter_experiment(
    inverter_id: str,
    daily_folder: str,
    start_date_str: str,
    end_date_str: str,
    verbose: bool = True,
    save_plots: bool = False,
    plot_folder: str | None = None,
):
    """
    Train multiple regression models for one inverter over multiple days.

    - Loads all daily CSV files in `daily_folder`
    - Filters rows between start_date_str and end_date_str (inclusive)
    - Fixes missing values via interpolation + ffill/bfill
    - Trains 5 models (Linear, Ridge, Lasso, RandomForest, NeuralNet)
      on the combined dataset (all days merged)
    - Trains the same 5 models per-day (per CSV) with a train/test split
    - Computes RMSE and MAE for DC_CLEAN and AC_CLEAN
    - Trains additional NeuralNets (on combined DC & AC) for diagnostics
      (iterations, learning rate, momentum, total weights, loss curve, time)
    - Builds cost/loss curves per model:

        results["loss_curves"]["dc"][model_name] → DC cost per iteration/tree
        results["loss_curves"]["ac"][model_name] → AC cost per iteration/tree

      For:
        - Linear / Ridge / Lasso: single-point MSE (list of length 1)
        - RandomForest: per-tree MSE (growing forest)
        - NeuralNet: MLPRegressor.loss_curve_ (per-iteration loss)

    Parameters
    ----------
    inverter_id : str
        ID of the inverter (used in printouts and plot filenames).
    daily_folder : str
        Folder containing daily CSVs for this inverter.
    start_date_str : str
        e.g. "2020-05-15"
    end_date_str : str
        e.g. "2020-06-17"
    verbose : bool
        If True, print progress and metrics.
    save_plots : bool
        If True, save all plots as PNG files into plot_folder.
    plot_folder : str | None
        Destination folder for plots if save_plots is True.

    Returns
    -------
    results : dict
        {
          "inverter_id": ...,
          "combined": {
              "dc": {model: {"rmse": float, "mae": float}},
              "ac": {model: {"rmse": float, "mae": float}},
              "predictions": {
                  "ModelName_DC": {"y_true": np.array, "y_pred": np.array},
                  "ModelName_AC": {"y_true": np.array, "y_pred": np.array},
              }
          },
          "parallel": {
              "days": [list_of_date_strings_with_enough_samples],
              "dc_rmse": {model: [rmse_per_day...]},
              "ac_rmse": {model: [rmse_per_day...]},
              "dc_mae":  {model: [mae_per_day...]},
              "ac_mae":  {model: [mae_per_day...]},
              "avg_dc_rmse": {model: float},
              "avg_ac_rmse": {model: float},
              "avg_dc_mae":  {model: float},
              "avg_ac_mae":  {model: float},
          },
          "loss_curves": {
              "dc": {model: [cost_values...]},
              "ac": {model: [cost_values...]},
          },
          "nn_diag": {
              "dc": {
                  "iterations": int,
                  "learning_rate": float,
                  "momentum": float,
                  "total_weights": int,
                  "train_time": float,
                  "loss_curve": list_of_floats
              },
              "ac": {
                  "iterations": int,
                  "learning_rate": float,
                  "momentum": float,
                  "total_weights": int,
                  "train_time": float,
                  "loss_curve": list_of_floats
              }
          }
        }
    """

    # ------------------------------------------------------------------
    # 0. CONFIG
    # ------------------------------------------------------------------
    start_date = pd.to_datetime(start_date_str)
    end_date = pd.to_datetime(end_date_str)

    features = [
        "IRRADIATION_CLEAN",
        "AMBIENT_TEMPERATURE",
        "MODULE_TEMPERATURE",
        "DAILY_YIELD_CLEAN",
    ]
    target_dc = "DC_CLEAN"
    target_ac = "AC_CLEAN"

    if save_plots and plot_folder is not None:
        os.makedirs(plot_folder, exist_ok=True)

    # ------------------------------------------------------------------
    # 1. HELPER: LOAD & PREPROCESS ONE CSV
    # ------------------------------------------------------------------
    def load_and_preprocess_csv(path):
        df = pd.read_csv(path)
        df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"])

        # Restrict to date range (defensive)
        df = df[(df["DATE_TIME"] >= start_date) & (df["DATE_TIME"] <= end_date)]
        if df.empty:
            return df

        # Interpolate numerics & fill NaNs
        df = df.interpolate(method="linear")
        df = df.fillna(method="bfill").fillna(method="ffill")

        return df

    # ------------------------------------------------------------------
    # 2. LOAD ALL DAILY FILES
    # ------------------------------------------------------------------
    csv_files = sorted(glob.glob(os.path.join(daily_folder, "*.csv")))

    if verbose:
        print(f"[{inverter_id}] Found {len(csv_files)} CSV files in {daily_folder}")

    daily_dfs = []
    day_labels = []

    for f in csv_files:
        df_day = load_and_preprocess_csv(f)
        if df_day.empty:
            continue

        day_date = df_day["DATE_TIME"].dt.date.iloc[0]
        day_labels.append(str(day_date))
        daily_dfs.append(df_day)

    if not daily_dfs:
        raise ValueError(
            f"[{inverter_id}] No daily data loaded after filtering. "
            f"Check folder and date range."
        )

    # Combined dataframe (all days)
    combined_df = pd.concat(daily_dfs, ignore_index=True)

    # ------------------------------------------------------------------
    # 3. FEATURES + TARGETS (COMBINED)
    # ------------------------------------------------------------------
    X_combined = combined_df[features]
    y_combined_dc = combined_df[target_dc]
    y_combined_ac = combined_df[target_ac]

    # Single train–test split (same indices for DC & AC)
    X_train, X_test, y_train_dc, y_test_dc = train_test_split(
        X_combined, y_combined_dc, test_size=0.2, shuffle=True, random_state=42
    )
    _, _, y_train_ac, y_test_ac = train_test_split(
        X_combined, y_combined_ac, test_size=0.2, shuffle=True, random_state=42
    )

    # ------------------------------------------------------------------
    # 4. DEFINE MODELS (5 TYPES)
    # ------------------------------------------------------------------
    models = {
        "Linear": LinearRegression(),
        "Ridge": Ridge(alpha=1.0),
        "Lasso": Lasso(alpha=0.0005, max_iter=10000, random_state=42),
        "RandomForest": RandomForestRegressor(
            n_estimators=300,
            max_depth=None,
            random_state=42,
            n_jobs=-1,
        ),
        "NeuralNet": MLPRegressor(
            hidden_layer_sizes=(64, 64),
            activation="relu",
            learning_rate_init=0.001,
            momentum=0.9,
            max_iter=2000,
            random_state=42,
        ),
    }

    # Helper: RF per-tree loss curve (MSE)
    def compute_rf_loss_curve(rf_model, X, y_true):
        """
        Compute cost per tree for a RandomForest by incrementally
        averaging trees and measuring MSE on X, y_true.
        """
        n_trees = len(rf_model.estimators_)
        if n_trees == 0:
            return []

        curves = []
        # running sum of predictions
        running_sum = None
        for i, tree in enumerate(rf_model.estimators_):
            pred_i = tree.predict(X)
            if running_sum is None:
                running_sum = pred_i
            else:
                running_sum += pred_i
            y_hat = running_sum / (i + 1)
            mse_i = mean_squared_error(y_true, y_hat)
            curves.append(mse_i)
        return curves

    # ------------------------------------------------------------------
    # 5. TRAIN + EVALUATE ON COMBINED DATA
    # ------------------------------------------------------------------
    combined_results_dc = {}      # model -> {"rmse": ..., "mae": ...}
    combined_results_ac = {}      # model -> {"rmse": ..., "mae": ...}
    combined_pred_store = {}      # "Model_DC/AC" -> {"y_true": arr, "y_pred": arr}

    # cost curves (loss_curves) per model / per target
    loss_curves_dc = {name: [] for name in models.keys()}
    loss_curves_ac = {name: [] for name in models.keys()}

    if verbose:
        print(f"\n[{inverter_id}] ================== COMBINED DATA TRAINING ==================")

    for name, model in models.items():
        # ---- DC ----
        mdl_dc = model
        mdl_dc.fit(X_train, y_train_dc)
        pred_dc = mdl_dc.predict(X_test)

        rmse_dc = np.sqrt(mean_squared_error(y_test_dc, pred_dc))
        mae_dc = mean_absolute_error(y_test_dc, pred_dc)
        mse_dc = mean_squared_error(y_test_dc, pred_dc)

        combined_results_dc[name] = {"rmse": rmse_dc, "mae": mae_dc}
        combined_pred_store[name + "_DC"] = {
            "y_true": y_test_dc.to_numpy(),
            "y_pred": pred_dc,
        }

        # default DC cost curve: single MSE value
        loss_curves_dc[name] = [mse_dc]

        # ---- AC ---- (fresh instance per target)
        if name == "Linear":
            mdl_ac = LinearRegression()
        elif name == "Ridge":
            mdl_ac = Ridge(alpha=1.0)
        elif name == "Lasso":
            mdl_ac = Lasso(alpha=0.0005, max_iter=10000, random_state=42)
        elif name == "RandomForest":
            mdl_ac = RandomForestRegressor(
                n_estimators=300,
                max_depth=None,
                random_state=42,
                n_jobs=-1,
            )
        else:  # NeuralNet
            mdl_ac = MLPRegressor(
                hidden_layer_sizes=(64, 64),
                activation="relu",
                learning_rate_init=0.001,
                momentum=0.9,
                max_iter=2000,
                random_state=42,
            )

        mdl_ac.fit(X_train, y_train_ac)
        pred_ac = mdl_ac.predict(X_test)

        rmse_ac = np.sqrt(mean_squared_error(y_test_ac, pred_ac))
        mae_ac = mean_absolute_error(y_test_ac, pred_ac)
        mse_ac = mean_squared_error(y_test_ac, pred_ac)

        combined_results_ac[name] = {"rmse": rmse_ac, "mae": mae_ac}
        combined_pred_store[name + "_AC"] = {
            "y_true": y_test_ac.to_numpy(),
            "y_pred": pred_ac,
        }

        # default AC cost curve: single MSE value
        loss_curves_ac[name] = [mse_ac]

        # For RandomForest, override single-point cost with per-tree curve
        if name == "RandomForest":
            loss_curves_dc[name] = compute_rf_loss_curve(mdl_dc, X_test, y_test_dc)
            loss_curves_ac[name] = compute_rf_loss_curve(mdl_ac, X_test, y_test_ac)

        if verbose:
            print(
                f"[{inverter_id}] {name:12s} | "
                f"DC  RMSE={rmse_dc:8.3f}, MAE={mae_dc:8.3f} | "
                f"AC  RMSE={rmse_ac:8.3f}, MAE={mae_ac:8.3f}"
            )

    if verbose:
        print(f"[{inverter_id}] ============================================================\n")

    # ------------------------------------------------------------------
    # 6. NEURAL NETWORK DIAGNOSTICS (COMBINED, DC & AC)
    # ------------------------------------------------------------------
    nn_diag = {"dc": {}, "ac": {}}

    # ---- DC diagnostics ----
    nn_dc = MLPRegressor(
        hidden_layer_sizes=(64, 64),
        activation="relu",
        learning_rate_init=0.001,
        momentum=0.9,
        max_iter=2000,
        random_state=42,
    )
    start_time_dc = time.time()
    nn_dc.fit(X_train, y_train_dc)
    end_time_dc = time.time()
    training_time_dc = end_time_dc - start_time_dc
    total_weights_dc = sum(w.size for w in nn_dc.coefs_)

    nn_diag["dc"] = {
        "iterations": nn_dc.n_iter_,
        "learning_rate": nn_dc.learning_rate_init,
        "momentum": nn_dc.momentum,
        "total_weights": int(total_weights_dc),
        "train_time": float(training_time_dc),
        "loss_curve": nn_dc.loss_curve_.copy(),
    }

    # ensure NeuralNet DC loss curve uses full iterative cost
    loss_curves_dc["NeuralNet"] = list(nn_dc.loss_curve_.copy())

    # ---- AC diagnostics ----
    nn_ac = MLPRegressor(
        hidden_layer_sizes=(64, 64),
        activation="relu",
        learning_rate_init=0.001,
        momentum=0.9,
        max_iter=2000,
        random_state=42,
    )
    start_time_ac = time.time()
    nn_ac.fit(X_train, y_train_ac)
    end_time_ac = time.time()
    training_time_ac = end_time_ac - start_time_ac
    total_weights_ac = sum(w.size for w in nn_ac.coefs_)

    nn_diag["ac"] = {
        "iterations": nn_ac.n_iter_,
        "learning_rate": nn_ac.learning_rate_init,
        "momentum": nn_ac.momentum,
        "total_weights": int(total_weights_ac),
        "train_time": float(training_time_ac),
        "loss_curve": nn_ac.loss_curve_.copy(),
    }

    # ensure NeuralNet AC loss curve uses full iterative cost
    loss_curves_ac["NeuralNet"] = list(nn_ac.loss_curve_.copy())

    if verbose:
        print(f"[{inverter_id}] ====== NEURAL NETWORK DIAGNOSTICS (COMBINED DC) ======")
        print(f"Iterations completed : {nn_diag['dc']['iterations']}")
        print(f"Learning rate (init) : {nn_diag['dc']['learning_rate']}")
        print(f"Momentum             : {nn_diag['dc']['momentum']}")
        print(f"Total weights        : {nn_diag['dc']['total_weights']}")
        print(f"Training time (sec)  : {nn_diag['dc']['train_time']:.4f}")
        print("--------------------------------------------------------------")
        print(f"[{inverter_id}] ====== NEURAL NETWORK DIAGNOSTICS (COMBINED AC) ======")
        print(f"Iterations completed : {nn_diag['ac']['iterations']}")
        print(f"Learning rate (init) : {nn_diag['ac']['learning_rate']}")
        print(f"Momentum             : {nn_diag['ac']['momentum']}")
        print(f"Total weights        : {nn_diag['ac']['total_weights']}")
        print(f"Training time (sec)  : {nn_diag['ac']['train_time']:.4f}")
        print("==============================================================\n")

    # ------------------------------------------------------------------
    # 7. PER-DAY (“PARALLEL”) TRAINING
    # ------------------------------------------------------------------
    parallel_rmse_dc = {name: [] for name in models.keys()}
    parallel_rmse_ac = {name: [] for name in models.keys()}
    parallel_mae_dc = {name: [] for name in models.keys()}
    parallel_mae_ac = {name: [] for name in models.keys()}

    valid_day_labels = []  # only days with enough samples

    if verbose:
        print(f"[{inverter_id}] =============== PER-DAY (“PARALLEL”) TRAINING ===============")

    for df_day, day_label in zip(daily_dfs, day_labels):
        # Ensure enough samples to split
        if len(df_day) < 3:
            if verbose:
                print(f"[{inverter_id}] Skipping {day_label}: not enough samples ({len(df_day)})")
            continue

        X_day = df_day[features]
        y_day_dc = df_day[target_dc]
        y_day_ac = df_day[target_ac]

        Xtr, Xte, ytr_dc, yte_dc = train_test_split(
            X_day, y_day_dc, test_size=0.2, shuffle=True, random_state=42
        )
        _, _, ytr_ac, yte_ac = train_test_split(
            X_day, y_day_ac, test_size=0.2, shuffle=True, random_state=42
        )

        valid_day_labels.append(day_label)

        for name in models.keys():
            # Fresh instance per day / target
            if name == "Linear":
                mdl_dc = LinearRegression()
                mdl_ac = LinearRegression()
            elif name == "Ridge":
                mdl_dc = Ridge(alpha=1.0)
                mdl_ac = Ridge(alpha=1.0)
            elif name == "Lasso":
                mdl_dc = Lasso(alpha=0.0005, max_iter=10000, random_state=42)
                mdl_ac = Lasso(alpha=0.0005, max_iter=10000, random_state=42)
            elif name == "RandomForest":
                mdl_dc = RandomForestRegressor(
                    n_estimators=300,
                    max_depth=None,
                    random_state=42,
                    n_jobs=-1,
                )
                mdl_ac = RandomForestRegressor(
                    n_estimators=300,
                    max_depth=None,
                    random_state=42,
                    n_jobs=-1,
                )
            else:  # NeuralNet
                mdl_dc = MLPRegressor(
                    hidden_layer_sizes=(64, 64),
                    activation="relu",
                    learning_rate_init=0.001,
                    momentum=0.9,
                    max_iter=2000,
                    random_state=42,
                )
                mdl_ac = MLPRegressor(
                    hidden_layer_sizes=(64, 64),
                    activation="relu",
                    learning_rate_init=0.001,
                    momentum=0.9,
                    max_iter=2000,
                    random_state=42,
                )

            # DC
            mdl_dc.fit(Xtr, ytr_dc)
            pred_dc = mdl_dc.predict(Xte)
            rmse_dc = np.sqrt(mean_squared_error(yte_dc, pred_dc))
            mae_dc = mean_absolute_error(yte_dc, pred_dc)
            parallel_rmse_dc[name].append(rmse_dc)
            parallel_mae_dc[name].append(mae_dc)

            # AC
            mdl_ac.fit(Xtr, ytr_ac)
            pred_ac = mdl_ac.predict(Xte)
            rmse_ac = np.sqrt(mean_squared_error(yte_ac, pred_ac))
            mae_ac = mean_absolute_error(yte_ac, pred_ac)
            parallel_rmse_ac[name].append(rmse_ac)
            parallel_mae_ac[name].append(mae_ac)

    # Average per-day metrics
    avg_parallel_rmse_dc = {
        name: float(np.mean(vals)) for name, vals in parallel_rmse_dc.items() if len(vals) > 0
    }
    avg_parallel_rmse_ac = {
        name: float(np.mean(vals)) for name, vals in parallel_rmse_ac.items() if len(vals) > 0
    }
    avg_parallel_mae_dc = {
        name: float(np.mean(vals)) for name, vals in parallel_mae_dc.items() if len(vals) > 0
    }
    avg_parallel_mae_ac = {
        name: float(np.mean(vals)) for name, vals in parallel_mae_ac.items() if len(vals) > 0
    }

    if verbose:
        print(f"\n[{inverter_id}] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====")
        for name in models.keys():
            if name in avg_parallel_rmse_dc:
                print(
                    f"{name:12s} | DC  RMSE={avg_parallel_rmse_dc[name]:8.3f}, "
                    f"MAE={avg_parallel_mae_dc[name]:8.3f} | "
                    f"AC  RMSE={avg_parallel_rmse_ac[name]:8.3f}, "
                    f"MAE={avg_parallel_mae_ac[name]:8.3f}"
                )
        print("===================================================================\n")

    # ------------------------------------------------------------------
    # 8. PACK RESULTS INTO A SINGLE DICTIONARY
    # ------------------------------------------------------------------
    results = {
        "inverter_id": inverter_id,
        "combined": {
            "dc": combined_results_dc,
            "ac": combined_results_ac,
            "predictions": combined_pred_store,
        },
        "parallel": {
            "days": valid_day_labels,
            "dc_rmse": parallel_rmse_dc,
            "ac_rmse": parallel_rmse_ac,
            "dc_mae": parallel_mae_dc,
            "ac_mae": parallel_mae_ac,
            "avg_dc_rmse": avg_parallel_rmse_dc,
            "avg_ac_rmse": avg_parallel_rmse_ac,
            "avg_dc_mae": avg_parallel_mae_dc,
            "avg_ac_mae": avg_parallel_mae_ac,
        },
        "loss_curves": {
            "dc": loss_curves_dc,
            "ac": loss_curves_ac,
        },
        "nn_diag": nn_diag,
    }

    # ------------------------------------------------------------------
    # 9. SAVE PLOTS (NO plt.show())
    # ------------------------------------------------------------------
    if save_plots and plot_folder is not None:
        # ---------- A. Combined predictions: Actual vs Predicted + Residuals ----------
        for key, vals in combined_pred_store.items():
            y_true = vals["y_true"]
            y_pred = vals["y_pred"]

            fig, ax = plt.subplots(1, 3, figsize=(18, 5))
            fig.suptitle(f"{inverter_id} — {key} — Combined Model")

            # Scatter: Actual vs Predicted
            ax[0].scatter(y_true, y_pred, alpha=0.7)
            mn = min(y_true.min(), y_pred.min())
            mx = max(y_true.max(), y_pred.max())
            ax[0].plot([mn, mx], [mn, mx], "r--")
            ax[0].set_title("Actual vs Predicted")
            ax[0].set_xlabel("Actual")
            ax[0].set_ylabel("Predicted")
            ax[0].grid(True)

            # Residuals vs Predicted
            residuals = y_true - y_pred
            ax[1].scatter(y_pred, residuals, alpha=0.6)
            ax[1].axhline(0, color="red", linestyle="--")
            ax[1].set_title("Residuals vs Predicted")
            ax[1].set_xlabel("Predicted")
            ax[1].set_ylabel("Residual")
            ax[1].grid(True)

            # Residual distribution
            ax[2].hist(residuals, bins=20, alpha=0.8)
            ax[2].set_title("Residual Distribution")
            ax[2].set_xlabel("Residual")
            ax[2].set_ylabel("Frequency")
            ax[2].grid(True)

            fname = f"{inverter_id}_combined_{key}_performance.png"
            fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
            plt.close(fig)

        # ---------- B. RMSE bar: Combined DC vs AC ----------
        labels = list(models.keys())
        rmse_dc_vals = [combined_results_dc[m]["rmse"] for m in labels]
        rmse_ac_vals = [combined_results_ac[m]["rmse"] for m in labels]

        x = np.arange(len(labels))
        width = 0.35

        fig, ax = plt.subplots(figsize=(10, 5))
        ax.bar(x - width / 2, rmse_dc_vals, width, label="DC RMSE")
        ax.bar(x + width / 2, rmse_ac_vals, width, label="AC RMSE")
        ax.set_xticks(x)
        ax.set_xticklabels(labels)
        ax.set_ylabel("RMSE")
        ax.set_title(f"{inverter_id} — Combined Model RMSE (DC vs AC)")
        ax.legend()
        ax.grid(axis="y")

        fname = f"{inverter_id}_combined_rmse_dc_vs_ac.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

        # ---------- C. Neural Network Loss Curve (DC) ----------
        fig, ax = plt.subplots(figsize=(10, 5))
        loss_arr_dc = np.array(nn_diag["dc"]["loss_curve"])
        ax.plot(loss_arr_dc, label="Training Loss (DC)")

        ax.set_title(f"{inverter_id} — Neural Network Loss Curve (DC, Combined)")
        ax.set_xlabel("Epoch")
        ax.set_ylabel("Loss")
        ax.grid(True)
        ax.legend()

        fname = f"{inverter_id}_nn_loss_curve_DC.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

        # ---------- C2. Neural Network Loss Curve (AC) ----------
        fig, ax = plt.subplots(figsize=(10, 5))
        loss_arr_ac = np.array(nn_diag["ac"]["loss_curve"])
        ax.plot(loss_arr_ac, label="Training Loss (AC)")

        ax.set_title(f"{inverter_id} — Neural Network Loss Curve (AC, Combined)")
        ax.set_xlabel("Epoch")
        ax.set_ylabel("Loss")
        ax.grid(True)
        ax.legend()

        fname = f"{inverter_id}_nn_loss_curve_AC.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

        # ---------- D. NN diagnostic bars (use DC stats) ----------
        fig, axs = plt.subplots(2, 2, figsize=(14, 10))
        axs = axs.ravel()

        axs[0].bar(["Iterations (DC)"], [nn_diag["dc"]["iterations"]])
        axs[0].set_title("Training Iterations (DC)")
        axs[0].grid(axis="y")

        axs[1].bar(["Learning Rate (DC)"], [nn_diag["dc"]["learning_rate"]])
        axs[1].set_title("Learning Rate (DC)")
        axs[1].grid(axis="y")

        axs[2].bar(["Total Weights (DC)"], [nn_diag["dc"]["total_weights"]])
        axs[2].set_title("Model Size (Total Weights, DC)")
        axs[2].grid(axis="y")

        axs[3].bar(["Training Time (s, DC)"], [nn_diag["dc"]["train_time"]])
        axs[3].set_title("Training Time (DC)")
        axs[3].grid(axis="y")

        fig.suptitle(f"{inverter_id} — Neural Network Diagnostics (DC)", y=1.02)
        fig.tight_layout()

        fname = f"{inverter_id}_nn_diagnostics_DC.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

        # ---------- E. Combined vs Average Parallel metrics (RMSE / MAE) ----------
        def plot_combined_vs_parallel(
            metric_combined_dc,
            metric_combined_ac,
            metric_parallel_dc,
            metric_parallel_ac,
            title_suffix,
            suffix_file,
        ):
            labels_loc = list(models.keys())
            x_loc = np.arange(len(labels_loc))
            width_loc = 0.18

            fig_loc, ax_loc = plt.subplots(figsize=(12, 6))

            c_dc = [metric_combined_dc[m] for m in labels_loc]
            c_ac = [metric_combined_ac[m] for m in labels_loc]
            p_dc = [metric_parallel_dc.get(m, np.nan) for m in labels_loc]
            p_ac = [metric_parallel_ac.get(m, np.nan) for m in labels_loc]

            ax_loc.bar(x_loc - 1.5 * width_loc, c_dc, width_loc, label="Combined DC")
            ax_loc.bar(x_loc - 0.5 * width_loc, c_ac, width_loc, label="Combined AC")
            ax_loc.bar(x_loc + 0.5 * width_loc, p_dc, width_loc, label="Avg Parallel DC")
            ax_loc.bar(x_loc + 1.5 * width_loc, p_ac, width_loc, label="Avg Parallel AC")

            ax_loc.set_xticks(x_loc)
            ax_loc.set_xticklabels(labels_loc)
            ax_loc.set_ylabel(title_suffix)
            ax_loc.set_title(f"{inverter_id} — Combined vs Average Parallel ({title_suffix})")
            ax_loc.legend()
            ax_loc.grid(axis="y")

            fig_loc.tight_layout()
            fig_loc.savefig(os.path.join(plot_folder, suffix_file),
                            dpi=150, bbox_inches="tight")
            plt.close(fig_loc)

        # RMSE comparison
        plot_combined_vs_parallel(
            {m: combined_results_dc[m]["rmse"] for m in models.keys()},
            {m: combined_results_ac[m]["rmse"] for m in models.keys()},
            avg_parallel_rmse_dc,
            avg_parallel_rmse_ac,
            "RMSE",
            f"{inverter_id}_combined_vs_parallel_RMSE.png",
        )

        # MAE comparison
        plot_combined_vs_parallel(
            {m: combined_results_dc[m]["mae"] for m in models.keys()},
            {m: combined_results_ac[m]["mae"] for m in models.keys()},
            avg_parallel_mae_dc,
            avg_parallel_mae_ac,
            "MAE",
            f"{inverter_id}_combined_vs_parallel_MAE.png",
        )

        # ---------- F. Per-day (“parallel”) RMSE over time ----------
        days_idx = np.arange(len(valid_day_labels))

        # DC
        fig, ax = plt.subplots(figsize=(14, 6))
        for name in models.keys():
            if len(parallel_rmse_dc[name]) == len(valid_day_labels):
                ax.plot(days_idx, parallel_rmse_dc[name], marker="o", label=name)
        ax.set_xticks(days_idx)
        ax.set_xticklabels(valid_day_labels, rotation=45, ha="right")
        ax.set_ylabel("RMSE (DC)")
        ax.set_xlabel("Day")
        ax.set_title(f"{inverter_id} — Per-Day RMSE (DC, Parallel Training)")
        ax.legend()
        ax.grid(True)
        fig.tight_layout()

        fname = f"{inverter_id}_per_day_rmse_DC.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

        # AC
        fig, ax = plt.subplots(figsize=(14, 6))
        for name in models.keys():
            if len(parallel_rmse_ac[name]) == len(valid_day_labels):
                ax.plot(days_idx, parallel_rmse_ac[name], marker="o", label=name)
        ax.set_xticks(days_idx)
        ax.set_xticklabels(valid_day_labels, rotation=45, ha="right")
        ax.set_ylabel("RMSE (AC)")
        ax.set_xlabel("Day")
        ax.set_title(f"{inverter_id} — Per-Day RMSE (AC, Parallel Training)")
        ax.legend()
        ax.grid(True)
        fig.tight_layout()

        fname = f"{inverter_id}_per_day_rmse_AC.png"
        fig.savefig(os.path.join(plot_folder, fname), dpi=150, bbox_inches="tight")
        plt.close(fig)

    # ------------------------------------------------------------------
    # 10. RETURN RESULTS (your main script pickles per inverter)
    # ------------------------------------------------------------------
    return results



<a id="export-clean"></a>

## 4. Export Cleaned Data to CSV (Plant 1 & 2)

These cells:

- Run Plant 1 and Plant 2 pipelines using `run_plant_1_pipeline` and `run_plant_2_pipeline`.
- Export the cleaned, per-inverter data to CSV files for downstream steps.


Code cell — Plant 1 export (Script 3)

In [None]:
import os
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

# from Utilities import run_plant_1_pipeline, plot_inverter_cleaned

if __name__ == "__main__":


####################################################################################################################################################################################################
# Change Path
    # folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\In"  # <-- update to your actual path

    # # outfolder   = os.path.join(folder, "00_Comparison_Plots")

    # outfolder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1"

    base = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data"

    # Input folder (In)
    folder = os.path.join(base, "In")

    # Output folder (00 Excel clean file\Plant 1)
    outfolder = os.path.join(base, "00 Excel clean file", "Plant 1")



# Add Path here
####################################################################################################################################################################################################

    os.makedirs(outfolder, exist_ok=True)

    results = run_plant_1_pipeline(folder)

    df_ps1 = results["df_ps1"]
    source_keys = results["source_key_1"]

    print(f"\nDetected {len(source_keys)} inverters: {source_keys}")

    print("\nSaving cleaned inverter CSV files...")

    for sk in source_keys:
        df = df_ps1[sk]
        outfile = os.path.join(outfolder, f"Plant1_{sk}_clean.csv")
        df.to_csv(outfile)
        print(f"Saved: {outfile}")

    exported_files = [
        f for f in os.listdir(outfolder)
        if f.startswith("Plant1_") and f.endswith("_clean.csv")
    ]

    print("\n-----------------------------------------")
    print("CSV EXPORT SUMMARY")
    print("-----------------------------------------")
    print("Expected: 22 CSV")
    print(f"Found:    {len(exported_files)} CSV\n")

    if len(exported_files) == 22:
        print("✅ SUCCESS — all 22 inverter CSV files exported correctly.")
    else:
        print("❌ ERROR — missing inverter CSV files!")
        print("Files found:", exported_files)

    plot_inverter_cleaned(
        df_ps1=df_ps1,
        source_key_1=source_keys,
        idx=0
    )
    plt.show()


Plant 1 missing values:
 PLANT_ID                   0
SOURCE_KEY                 0
DC_POWER                   0
AC_POWER                   0
DAILY_YIELD                0
TOTAL_YIELD                0
Operating_Condition    23098
dtype: int64
Plant 1 shape: (1021186, 7)
Number of inverters: 22
Source keys: ['1BY6WEcLGh8j5v7', '1IF53ai7Xc0U56Y', '3PZuoBAID5Wc2HD', '7JYdWkrLSPkdwr4', 'McdE0feGgRqW7Ca', 'VHMLBKoKgIrUVDU', 'WRmjgnKYAwPKWDb', 'ZnxXDlPa8U1GXgE', 'ZoEaEvLYb1n2sOq', 'adLQvlD726eNBSB', 'bvBOhCH3iADSZry', 'iCRJl6heRkivqQ3', 'ih0vzX44oOqAx2f', 'pkci93gMrogZuBj', 'rGa61gmuvPhdLxV', 'sjndEbLyjtCKgGv', 'uHbuxQJl8lW7ozc', 'wCURE6d3bPkepu2', 'z9Y9gH1T5YWrNuG', 'zBIq5rxdHJRwDNY', 'zVJPv84UY57bAof', 'YxYtjZvoooNbGkE']

Inverter 1BY6WEcLGh8j5v7 missing values:
PLANT_ID                  0
SOURCE_KEY                0
DC_POWER                  0
AC_POWER                  0
DAILY_YIELD               0
TOTAL_YIELD               0
Operating_Condition    1055
dtype: int64
Shape: (46702, 7)

Inver

  plt.show()


Code cell — Plant 2 export (Script 2)

In [None]:
import os
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

# from Utilities import run_plant_2_pipeline, plot_inverter_cleaned_plant2

if __name__ == "__main__":

###############################################################################################################################################################

# Change here 

    # folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\In"
    # outfolder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 2"

    base = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data"

    # Input folder (In)
    folder = os.path.join(base, "In")

    # Output folder (00 Excel clean file\Plant 1)
    outfolder = os.path.join(base, "00 Excel clean file", "Plant 2")

###############################################################################################################################################################
    os.makedirs(outfolder, exist_ok=True)

    results = run_plant_2_pipeline(folder)

    df_ps2 = results["df_ps2"]
    source_keys = results["source_key_2"]

    print(f"\nDetected {len(source_keys)} Plant 2 inverters: {source_keys}")

    print("\nSaving cleaned Plant 2 inverter CSV files...")

    for sk in source_keys:
        df = df_ps2[sk]
        outfile = os.path.join(outfolder, f"Plant2_{sk}_clean.csv")
        df.to_csv(outfile)
        print(f"Saved: {outfile}")

    exported_files = [
        f for f in os.listdir(outfolder)
        if f.startswith("Plant2_") and f.endswith("_clean.csv")
    ]

    print("\n-----------------------------------------")
    print("PLANT 2 CSV EXPORT SUMMARY")
    print("-----------------------------------------")
    print(f"Expected: {len(source_keys)} CSV")
    print(f"Found:    {len(exported_files)} CSV\n")

    if len(exported_files) == len(source_keys):
        print("✅ SUCCESS — all cleaned Plant 2 inverter CSV files exported correctly.")
    else:
        print("❌ ERROR — missing inverter CSV files!")
        print("Files found:", exported_files)

    plot_inverter_cleaned_plant2(
        df_ps2=df_ps2,
        source_key_2=source_keys,
        idx=0
    )
    plt.show()


Plant 2 missing values:
 SOURCE_KEY             0
DC_POWER               0
AC_POWER               0
DAILY_YIELD            0
TOTAL_YIELD            0
Operating_Condition    0
dtype: int64
Plant 2 shape: (1421196, 6)
Number of Plant 2 inverters: 22
Source keys: ['4UPUqMRk7TRMgml', '81aHJ1q11NBPMrL', '9kRcWv60rDACzjR', 'Et9kgGMDl729KT4', 'IQ2d7wF4YD8zU1Q', 'LYwnQax7tkwH5Cb', 'LlT2YUhhzqhg5Sw', 'Mx2yZCDsyf6DPfv', 'NgDl19wMapZy17u', 'PeE6FRyGXUgsRhN', 'Qf4GUc1pJu5T6c6', 'Quc1TzYxW2pYoWX', 'V94E5Ben1TlhnDV', 'WcxssY2VbP4hApt', 'mqwcsP2rE7J0TFp', 'oZ35aAeoifZaQzV', 'oZZkBaNadn6DNKz', 'q49J1IKaHRwDQnt', 'rrq4fwE8jgrTyWY', 'vOuJvMaM2sgwLmb', 'xMbIugepa2P7lBB', 'xoJJ8DcxJEcupym']

Detected 22 Plant 2 inverters: ['4UPUqMRk7TRMgml', '81aHJ1q11NBPMrL', '9kRcWv60rDACzjR', 'Et9kgGMDl729KT4', 'IQ2d7wF4YD8zU1Q', 'LYwnQax7tkwH5Cb', 'LlT2YUhhzqhg5Sw', 'Mx2yZCDsyf6DPfv', 'NgDl19wMapZy17u', 'PeE6FRyGXUgsRhN', 'Qf4GUc1pJu5T6c6', 'Quc1TzYxW2pYoWX', 'V94E5Ben1TlhnDV', 'WcxssY2VbP4hApt', 'mqwcsP2rE7J0TFp', 'o

  plt.show()


<a id="daily-split"></a>

## 5. Daily Splitting per Inverter (Plant 1)

This step:

- Takes each `Plant1_<INV>_clean.csv`
- Filters to the modelling window
- Writes one CSV per day, per inverter:  
  `Daily_Inverter_Data/<INV_ID>/YYYY-MM-DD.csv`


In [None]:
import pandas as pd
import os

###############################################################################################################################################################
# Change here 

input_folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1" 
output_root  = input_folder + "/Daily_Inverter_Data"

###############################################################################################################################################################

os.makedirs(output_root, exist_ok=True)

inverters = [
    '1BY6WEcLGh8j5v7', '1IF53ai7Xc0U56Y', '3PZuoBAID5Wc2HD', '7JYdWkrLSPkdwr4',
    'McdE0feGgRqW7Ca', 'VHMLBKoKgIrUVDU', 'WRmjgnKYAwPKWDb', 'ZnxXDlPa8U1GXgE',
    'ZoEaEvLYb1n2sOq', 'adLQvlD726eNBSB', 'bvBOhCH3iADSZry', 'iCRJl6heRkivqQ3',
    'ih0vzX44oOqAx2f', 'pkci93gMrogZuBj', 'rGa61gmuvPhdLxV', 'sjndEbLyjtCKgGv',
    'uHbuxQJl8lW7ozc', 'wCURE6d3bPkepu2', 'z9Y9gH1T5YWrNuG', 'zBIq5rxdHJRwDNY',
    'zVJPv84UY57bAof', 'YxYtjZvoooNbGkE'
]

keep_cols = [
    "DATE_TIME",
    "IRRADIATION_CLEAN",
    "AMBIENT_TEMPERATURE",
    "MODULE_TEMPERATURE",
    "DAILY_YIELD_CLEAN",
    "DC_CLEAN",
    "AC_CLEAN"
]

start_date = pd.to_datetime("15/05/2020", dayfirst=True)
end_date   = pd.to_datetime("17/06/2020 23:59", dayfirst=True)

print(f"Processing {len(inverters)} inverters...\n")

for inv in inverters:
    filename = f"Plant1_{inv}_clean.csv"
    csv_path = os.path.join(input_folder, filename)

    if not os.path.exists(csv_path):
        print(f"⚠ WARNING: {filename} not found — skipping.")
        continue

    print(f"→ Processing inverter {inv}")

    df = pd.read_csv(csv_path)
    df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
    df = df[(df["DATE_TIME"] >= start_date) & (df["DATE_TIME"] <= end_date)]

    df["DATE"] = df["DATE_TIME"].dt.date
    df = df[keep_cols + ["DATE"]]

    inverter_folder = os.path.join(output_root, inv)
    os.makedirs(inverter_folder, exist_ok=True)

    for date, group in df.groupby("DATE"):
        group = group[keep_cols]
        outname = f"{date.strftime('%Y-%m-%d')}.csv"
        save_path = os.path.join(inverter_folder, outname)
        group.reset_index(drop=True).to_csv(save_path, index=False)
        print(f"   Saved {outname}")

print("\n✓ All 22 inverters processed successfully.")


  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


Processing 22 inverters...

→ Processing inverter 1BY6WEcLGh8j5v7
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter 1IF53ai7Xc0U56Y
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter 7JYdWkrLSPkdwr4
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter McdE0feGgRqW7Ca
   Saved 2020-05-15.csv
   Saved 202

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter VHMLBKoKgIrUVDU
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter ZoEaEvLYb1n2sOq
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter adLQvlD726eNBSB
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter iCRJl6heRkivqQ3
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter pkci93gMrogZuBj
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter uHbuxQJl8lW7ozc
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter z9Y9gH1T5YWrNuG
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter zVJPv84UY57bAof
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv

✓ All 22 inverters processed successfully.


<a id="per-inverter-training"></a>

## 6. Per-Inverter Training: Combined & Per-Day

This cell runs `run_inverter_experiment` for **all 22 inverters**, using the daily split:

- Trains Linear, Ridge, Lasso, RandomForest, NeuralNet
- Combined and per-day (parallel) setups
- Saves:
  - Per-inverter `.pkl` result files
  - Per-inverter plots
  - A master `ALL_INVERTER_RESULTS.pkl`


In [None]:
import os
import pickle
# from Utilities import run_inverter_experiment

###############################################################################################################################################################
# Change Here

DATA_BASE = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data"

BASE_DAILY_FOLDER = os.path.join(
    DATA_BASE,
    "00 Excel clean file",
    "Plant 1",
    "Daily_Inverter_Data",
)

SAVE_PLOTS_BASE = os.path.join(
    DATA_BASE,
    "01 Plant1_Inverter_Models",
)


###############################################################################################################################################################

os.makedirs(SAVE_PLOTS_BASE, exist_ok=True)

inverters = [
    '1BY6WEcLGh8j5v7', '1IF53ai7Xc0U56Y', '3PZuoBAID5Wc2HD', '7JYdWkrLSPkdwr4',
    'McdE0feGgRqW7Ca', 'VHMLBKoKgIrUVDU', 'WRmjgnKYAwPKWDb', 'ZnxXDlPa8U1GXgE',
    'ZoEaEvLYb1n2sOq', 'adLQvlD726eNBSB', 'bvBOhCH3iADSZry', 'iCRJl6heRkivqQ3',
    'ih0vzX44oOqAx2f', 'pkci93gMrogZuBj', 'rGa61gmuvPhdLxV', 'sjndEbLyjtCKgGv',
    'uHbuxQJl8lW7ozc', 'wCURE6d3bPkepu2', 'z9Y9gH1T5YWrNuG', 'zBIq5rxdHJRwDNY',
    'zVJPv84UY57bAof', 'YxYtjZvoooNbGkE'
]

all_results = {}

for inv in inverters:
    print(f"\n======================")
    print(f" TRAINING INVERTER: {inv}")
    print(f"======================\n")

    inverter_daily_path = os.path.join(BASE_DAILY_FOLDER, inv)
    inverter_plot_path = os.path.join(SAVE_PLOTS_BASE, inv)
    os.makedirs(inverter_plot_path, exist_ok=True)

    results = run_inverter_experiment(
        inverter_id=inv,
        daily_folder=inverter_daily_path,
        start_date_str="2020-05-15",
        end_date_str="2020-06-17",
        verbose=True,
        save_plots=True,
        plot_folder=inverter_plot_path
    )

    all_results[inv] = results

    results_file = os.path.join(SAVE_PLOTS_BASE, f"{inv}_results.pkl")
    with open(results_file, "wb") as f:
        pickle.dump(results, f)

    print(f"✓ Saved results for {inv}")
    print(f"✓ Plots saved to: {inverter_plot_path}")
    print("------------------------------------------------------")

master_save_path = os.path.join(SAVE_PLOTS_BASE, "ALL_INVERTER_RESULTS.pkl")
with open(master_save_path, "wb") as f:
    pickle.dump(all_results, f)

print("\n======================================================")
print("   FINISHED TRAINING ALL 22 INVERTERS — SUCCESS 🎉")
print("======================================================")



 TRAINING INVERTER: 1BY6WEcLGh8j5v7

[1BY6WEcLGh8j5v7] Found 34 CSV files in C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1\Daily_Inverter_Data\1BY6WEcLGh8j5v7


  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[1BY6WEcLGh8j5v7] Linear       | DC  RMSE= 640.219, MAE= 289.414 | AC  RMSE=  62.651, MAE=  28.532
[1BY6WEcLGh8j5v7] Ridge        | DC  RMSE= 638.183, MAE= 294.664 | AC  RMSE=  62.461, MAE=  29.032
[1BY6WEcLGh8j5v7] Lasso        | DC  RMSE= 640.218, MAE= 289.414 | AC  RMSE=  62.651, MAE=  28.532




[1BY6WEcLGh8j5v7] RandomForest | DC  RMSE= 633.565, MAE= 259.661 | AC  RMSE=  61.483, MAE=  25.281
[1BY6WEcLGh8j5v7] NeuralNet    | DC  RMSE=1125.345, MAE= 661.600 | AC  RMSE= 120.576, MAE=  76.789

Iterations completed : 159
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5778
--------------------------------------------------------------
Iterations completed : 65
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.6867

[1BY6WEcLGh8j5v7] Skipping 2020-06-17: not enough samples (1)

[1BY6WEcLGh8j5v7] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 432.322, MAE= 243.864 | AC  RMSE=  42.309, MAE=  23.957
Ridge        | DC  RMSE= 637.510, MAE= 382.324 | AC  RMSE=  62.225, MAE=  37.404
Lasso        | DC  RMSE= 432.323, MAE= 243.864 | AC  RMSE=  42.309, MAE=  23.958
RandomForest | DC  RMSE= 410.735, MAE= 202.129 | AC  RMSE=  40.094, MAE=  19.744
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[1IF53ai7Xc0U56Y] Linear       | DC  RMSE= 512.283, MAE= 258.041 | AC  RMSE=  50.230, MAE=  25.565
[1IF53ai7Xc0U56Y] Ridge        | DC  RMSE= 513.131, MAE= 258.610 | AC  RMSE=  50.292, MAE=  25.592
[1IF53ai7Xc0U56Y] Lasso        | DC  RMSE= 512.282, MAE= 258.040 | AC  RMSE=  50.230, MAE=  25.565




[1IF53ai7Xc0U56Y] RandomForest | DC  RMSE= 473.567, MAE= 207.428 | AC  RMSE=  46.024, MAE=  20.182
[1IF53ai7Xc0U56Y] NeuralNet    | DC  RMSE=1228.579, MAE= 732.206 | AC  RMSE=  95.042, MAE=  59.583

Iterations completed : 149
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3694
--------------------------------------------------------------
Iterations completed : 309
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.9859

[1IF53ai7Xc0U56Y] Skipping 2020-06-17: not enough samples (1)

[1IF53ai7Xc0U56Y] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 298.397, MAE= 183.777 | AC  RMSE=  29.326, MAE=  18.227
Ridge        | DC  RMSE= 580.618, MAE= 365.612 | AC  RMSE=  56.689, MAE=  35.788
Lasso        | DC  RMSE= 298.397, MAE= 183.777 | AC  RMSE=  29.326, MAE=  18.227
RandomForest | DC  RMSE= 384.532, MAE= 203.085 | AC  RMSE=  37.532, MAE=  19.794
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[3PZuoBAID5Wc2HD] Linear       | DC  RMSE= 555.209, MAE= 286.299 | AC  RMSE=  54.317, MAE=  28.259
[3PZuoBAID5Wc2HD] Ridge        | DC  RMSE= 557.047, MAE= 290.841 | AC  RMSE=  54.475, MAE=  28.675
[3PZuoBAID5Wc2HD] Lasso        | DC  RMSE= 555.209, MAE= 286.299 | AC  RMSE=  54.316, MAE=  28.259




[3PZuoBAID5Wc2HD] RandomForest | DC  RMSE= 536.878, MAE= 241.628 | AC  RMSE=  52.118, MAE=  23.539
[3PZuoBAID5Wc2HD] NeuralNet    | DC  RMSE=1243.500, MAE= 752.142 | AC  RMSE= 121.732, MAE=  78.048

Iterations completed : 142
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6143
--------------------------------------------------------------
Iterations completed : 80
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.8905

[3PZuoBAID5Wc2HD] Skipping 2020-06-17: not enough samples (1)

[3PZuoBAID5Wc2HD] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 376.457, MAE= 222.498 | AC  RMSE=  36.888, MAE=  21.926
Ridge        | DC  RMSE= 618.351, MAE= 388.515 | AC  RMSE=  60.381, MAE=  38.031
Lasso        | DC  RMSE= 376.457, MAE= 222.498 | AC  RMSE=  36.888, MAE=  21.926
RandomForest | DC  RMSE= 453.210, MAE= 236.394 | AC  RMSE=  44.009, MAE=  22.908
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[7JYdWkrLSPkdwr4] Linear       | DC  RMSE= 435.458, MAE= 230.234 | AC  RMSE=  42.733, MAE=  22.812
[7JYdWkrLSPkdwr4] Ridge        | DC  RMSE= 452.186, MAE= 240.433 | AC  RMSE=  44.352, MAE=  23.813
[7JYdWkrLSPkdwr4] Lasso        | DC  RMSE= 435.459, MAE= 230.234 | AC  RMSE=  42.734, MAE=  22.812




[7JYdWkrLSPkdwr4] RandomForest | DC  RMSE= 404.299, MAE= 175.333 | AC  RMSE=  39.286, MAE=  17.035
[7JYdWkrLSPkdwr4] NeuralNet    | DC  RMSE=1015.203, MAE= 592.773 | AC  RMSE= 111.715, MAE=  71.192

Iterations completed : 254
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.3424
--------------------------------------------------------------
Iterations completed : 107
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9219

[7JYdWkrLSPkdwr4] Skipping 2020-06-17: not enough samples (1)

[7JYdWkrLSPkdwr4] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 274.588, MAE= 165.459 | AC  RMSE=  26.947, MAE=  16.338
Ridge        | DC  RMSE= 558.763, MAE= 353.511 | AC  RMSE=  54.499, MAE=  34.576
Lasso        | DC  RMSE= 274.588, MAE= 165.459 | AC  RMSE=  26.947, MAE=  16.339
RandomForest | DC  RMSE= 384.100, MAE= 204.433 | AC  RMSE=  37.113, MAE=  19.749
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[McdE0feGgRqW7Ca] Linear       | DC  RMSE= 394.363, MAE= 232.647 | AC  RMSE=  38.735, MAE=  23.037
[McdE0feGgRqW7Ca] Ridge        | DC  RMSE= 401.529, MAE= 237.130 | AC  RMSE=  39.432, MAE=  23.471
[McdE0feGgRqW7Ca] Lasso        | DC  RMSE= 394.363, MAE= 232.647 | AC  RMSE=  38.735, MAE=  23.037




[McdE0feGgRqW7Ca] RandomForest | DC  RMSE= 407.080, MAE= 176.166 | AC  RMSE=  39.607, MAE=  17.097
[McdE0feGgRqW7Ca] NeuralNet    | DC  RMSE=1201.180, MAE= 725.128 | AC  RMSE= 154.322, MAE= 108.511

Iterations completed : 164
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.2909
--------------------------------------------------------------
Iterations completed : 202
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.8946

[McdE0feGgRqW7Ca] Skipping 2020-06-17: not enough samples (1)

[McdE0feGgRqW7Ca] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 374.068, MAE= 208.997 | AC  RMSE=  36.606, MAE=  20.565
Ridge        | DC  RMSE= 646.402, MAE= 391.743 | AC  RMSE=  63.115, MAE=  38.327
Lasso        | DC  RMSE= 374.067, MAE= 208.997 | AC  RMSE=  36.606, MAE=  20.566
RandomForest | DC  RMSE= 372.907, MAE= 190.922 | AC  RMSE=  36.403, MAE=  18.650
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[VHMLBKoKgIrUVDU] Linear       | DC  RMSE= 480.057, MAE= 254.530 | AC  RMSE=  47.060, MAE=  25.196
[VHMLBKoKgIrUVDU] Ridge        | DC  RMSE= 497.682, MAE= 268.269 | AC  RMSE=  48.766, MAE=  26.522
[VHMLBKoKgIrUVDU] Lasso        | DC  RMSE= 480.058, MAE= 254.531 | AC  RMSE=  47.061, MAE=  25.197




[VHMLBKoKgIrUVDU] RandomForest | DC  RMSE= 460.679, MAE= 201.939 | AC  RMSE=  44.699, MAE=  19.631
[VHMLBKoKgIrUVDU] NeuralNet    | DC  RMSE= 983.276, MAE= 566.988 | AC  RMSE= 120.755, MAE=  78.219

Iterations completed : 305
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.9329
--------------------------------------------------------------
Iterations completed : 85
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.8428

[VHMLBKoKgIrUVDU] Skipping 2020-06-17: not enough samples (1)

[VHMLBKoKgIrUVDU] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 310.181, MAE= 183.417 | AC  RMSE=  30.431, MAE=  18.103
Ridge        | DC  RMSE= 593.993, MAE= 374.193 | AC  RMSE=  57.963, MAE=  36.603
Lasso        | DC  RMSE= 310.181, MAE= 183.418 | AC  RMSE=  30.432, MAE=  18.104
RandomForest | DC  RMSE= 392.375, MAE= 210.263 | AC  RMSE=  38.068, MAE=  20.464
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[WRmjgnKYAwPKWDb] Linear       | DC  RMSE= 635.043, MAE= 315.315 | AC  RMSE=  62.068, MAE=  31.071
[WRmjgnKYAwPKWDb] Ridge        | DC  RMSE= 634.799, MAE= 319.921 | AC  RMSE=  62.028, MAE=  31.496
[WRmjgnKYAwPKWDb] Lasso        | DC  RMSE= 635.043, MAE= 315.315 | AC  RMSE=  62.067, MAE=  31.071




[WRmjgnKYAwPKWDb] RandomForest | DC  RMSE= 613.868, MAE= 263.139 | AC  RMSE=  59.731, MAE=  25.598
[WRmjgnKYAwPKWDb] NeuralNet    | DC  RMSE=1234.035, MAE= 755.753 | AC  RMSE= 125.337, MAE=  79.749

Iterations completed : 142
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6410
--------------------------------------------------------------
Iterations completed : 63
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.7012

[WRmjgnKYAwPKWDb] Skipping 2020-06-17: not enough samples (1)

[WRmjgnKYAwPKWDb] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 395.838, MAE= 234.808 | AC  RMSE=  38.801, MAE=  23.125
Ridge        | DC  RMSE= 621.191, MAE= 393.014 | AC  RMSE=  60.667, MAE=  38.454
Lasso        | DC  RMSE= 395.838, MAE= 234.808 | AC  RMSE=  38.801, MAE=  23.125
RandomForest | DC  RMSE= 480.837, MAE= 252.339 | AC  RMSE=  46.708, MAE=  24.530
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[ZnxXDlPa8U1GXgE] Linear       | DC  RMSE= 491.807, MAE= 251.763 | AC  RMSE=  48.242, MAE=  24.948
[ZnxXDlPa8U1GXgE] Ridge        | DC  RMSE= 508.834, MAE= 264.638 | AC  RMSE=  49.887, MAE=  26.199
[ZnxXDlPa8U1GXgE] Lasso        | DC  RMSE= 491.808, MAE= 251.764 | AC  RMSE=  48.243, MAE=  24.949




[ZnxXDlPa8U1GXgE] RandomForest | DC  RMSE= 441.339, MAE= 197.046 | AC  RMSE=  42.902, MAE=  19.123
[ZnxXDlPa8U1GXgE] NeuralNet    | DC  RMSE=1256.312, MAE= 750.414 | AC  RMSE= 119.841, MAE=  77.321

Iterations completed : 102
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9936
--------------------------------------------------------------
Iterations completed : 183
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.9439

[ZnxXDlPa8U1GXgE] Skipping 2020-06-17: not enough samples (1)

[ZnxXDlPa8U1GXgE] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 320.476, MAE= 189.484 | AC  RMSE=  31.370, MAE=  18.667
Ridge        | DC  RMSE= 609.114, MAE= 381.444 | AC  RMSE=  59.359, MAE=  37.274
Lasso        | DC  RMSE= 320.476, MAE= 189.484 | AC  RMSE=  31.370, MAE=  18.667
RandomForest | DC  RMSE= 370.336, MAE= 191.685 | AC  RMSE=  36.078, MAE=  18.689
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[ZoEaEvLYb1n2sOq] Linear       | DC  RMSE= 361.991, MAE= 209.717 | AC  RMSE=  35.652, MAE=  20.810
[ZoEaEvLYb1n2sOq] Ridge        | DC  RMSE= 362.182, MAE= 213.223 | AC  RMSE=  35.672, MAE=  21.129
[ZoEaEvLYb1n2sOq] Lasso        | DC  RMSE= 361.990, MAE= 209.717 | AC  RMSE=  35.651, MAE=  20.809




[ZoEaEvLYb1n2sOq] RandomForest | DC  RMSE= 304.118, MAE= 141.530 | AC  RMSE=  29.582, MAE=  13.803
[ZoEaEvLYb1n2sOq] NeuralNet    | DC  RMSE=1010.620, MAE= 620.919 | AC  RMSE= 104.605, MAE=  67.811

Iterations completed : 293
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.8398
--------------------------------------------------------------
Iterations completed : 130
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3114

[ZoEaEvLYb1n2sOq] Skipping 2020-06-17: not enough samples (1)

[ZoEaEvLYb1n2sOq] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 255.477, MAE= 152.724 | AC  RMSE=  25.165, MAE=  15.116
Ridge        | DC  RMSE= 623.295, MAE= 389.833 | AC  RMSE=  60.815, MAE=  38.118
Lasso        | DC  RMSE= 255.478, MAE= 152.725 | AC  RMSE=  25.166, MAE=  15.117
RandomForest | DC  RMSE= 322.853, MAE= 168.993 | AC  RMSE=  31.427, MAE=  16.378
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[adLQvlD726eNBSB] Linear       | DC  RMSE= 529.717, MAE= 270.006 | AC  RMSE=  51.878, MAE=  26.713
[adLQvlD726eNBSB] Ridge        | DC  RMSE= 531.464, MAE= 274.419 | AC  RMSE=  52.029, MAE=  27.118
[adLQvlD726eNBSB] Lasso        | DC  RMSE= 529.717, MAE= 270.006 | AC  RMSE=  51.878, MAE=  26.712




[adLQvlD726eNBSB] RandomForest | DC  RMSE= 513.616, MAE= 232.845 | AC  RMSE=  49.906, MAE=  22.626
[adLQvlD726eNBSB] NeuralNet    | DC  RMSE=1266.032, MAE= 750.554 | AC  RMSE= 123.280, MAE=  78.321

Iterations completed : 149
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6913
--------------------------------------------------------------
Iterations completed : 99
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.0237

[adLQvlD726eNBSB] Skipping 2020-06-17: not enough samples (1)

[adLQvlD726eNBSB] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 311.611, MAE= 189.659 | AC  RMSE=  30.606, MAE=  18.779
Ridge        | DC  RMSE= 591.154, MAE= 372.774 | AC  RMSE=  57.709, MAE=  36.475
Lasso        | DC  RMSE= 311.610, MAE= 189.659 | AC  RMSE=  30.606, MAE=  18.779
RandomForest | DC  RMSE= 401.458, MAE= 211.859 | AC  RMSE=  38.998, MAE=  20.488
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[bvBOhCH3iADSZry] Linear       | DC  RMSE= 513.512, MAE= 266.190 | AC  RMSE=  50.289, MAE=  26.232
[bvBOhCH3iADSZry] Ridge        | DC  RMSE= 516.654, MAE= 270.651 | AC  RMSE=  50.582, MAE=  26.667
[bvBOhCH3iADSZry] Lasso        | DC  RMSE= 513.512, MAE= 266.191 | AC  RMSE=  50.289, MAE=  26.232




[bvBOhCH3iADSZry] RandomForest | DC  RMSE= 538.761, MAE= 221.013 | AC  RMSE=  52.742, MAE=  21.559
[bvBOhCH3iADSZry] NeuralNet    | DC  RMSE=1076.594, MAE= 634.566 | AC  RMSE= 111.821, MAE=  71.441

Iterations completed : 199
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.1487
--------------------------------------------------------------
Iterations completed : 129
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3211

[bvBOhCH3iADSZry] Skipping 2020-06-17: not enough samples (1)

[bvBOhCH3iADSZry] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 360.430, MAE= 205.439 | AC  RMSE=  35.301, MAE=  20.210
Ridge        | DC  RMSE= 597.596, MAE= 357.876 | AC  RMSE=  58.333, MAE=  35.030
Lasso        | DC  RMSE= 360.431, MAE= 205.439 | AC  RMSE=  35.301, MAE=  20.211
RandomForest | DC  RMSE= 364.611, MAE= 172.643 | AC  RMSE=  35.393, MAE=  16.749
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[iCRJl6heRkivqQ3] Linear       | DC  RMSE= 304.280, MAE= 185.234 | AC  RMSE=  30.112, MAE=  18.474
[iCRJl6heRkivqQ3] Ridge        | DC  RMSE= 321.718, MAE= 193.201 | AC  RMSE=  31.785, MAE=  19.209
[iCRJl6heRkivqQ3] Lasso        | DC  RMSE= 304.280, MAE= 185.234 | AC  RMSE=  30.112, MAE=  18.474




[iCRJl6heRkivqQ3] RandomForest | DC  RMSE= 252.200, MAE= 119.001 | AC  RMSE=  24.745, MAE=  11.655
[iCRJl6heRkivqQ3] NeuralNet    | DC  RMSE=1210.712, MAE= 753.639 | AC  RMSE= 105.843, MAE=  68.039

Iterations completed : 245
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.9786
--------------------------------------------------------------
Iterations completed : 208
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.3466

[iCRJl6heRkivqQ3] Skipping 2020-06-17: not enough samples (1)

[iCRJl6heRkivqQ3] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 267.080, MAE= 161.743 | AC  RMSE=  26.246, MAE=  15.955
Ridge        | DC  RMSE= 612.931, MAE= 380.264 | AC  RMSE=  59.781, MAE=  37.186
Lasso        | DC  RMSE= 267.080, MAE= 161.744 | AC  RMSE=  26.247, MAE=  15.955
RandomForest | DC  RMSE= 334.788, MAE= 174.717 | AC  RMSE=  32.528, MAE=  16.974
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[ih0vzX44oOqAx2f] Linear       | DC  RMSE= 438.215, MAE= 236.056 | AC  RMSE=  43.019, MAE=  23.391
[ih0vzX44oOqAx2f] Ridge        | DC  RMSE= 453.195, MAE= 246.847 | AC  RMSE=  44.463, MAE=  24.429
[ih0vzX44oOqAx2f] Lasso        | DC  RMSE= 438.216, MAE= 236.056 | AC  RMSE=  43.019, MAE=  23.392




[ih0vzX44oOqAx2f] RandomForest | DC  RMSE= 389.547, MAE= 181.785 | AC  RMSE=  37.826, MAE=  17.690
[ih0vzX44oOqAx2f] NeuralNet    | DC  RMSE=1163.536, MAE= 692.552 | AC  RMSE= 116.854, MAE=  74.865

Iterations completed : 130
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5518
--------------------------------------------------------------
Iterations completed : 125
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3821

[ih0vzX44oOqAx2f] Skipping 2020-06-17: not enough samples (1)

[ih0vzX44oOqAx2f] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 302.967, MAE= 182.615 | AC  RMSE=  29.636, MAE=  17.970
Ridge        | DC  RMSE= 584.193, MAE= 364.075 | AC  RMSE=  56.941, MAE=  35.565
Lasso        | DC  RMSE= 302.967, MAE= 182.614 | AC  RMSE=  29.635, MAE=  17.970
RandomForest | DC  RMSE= 348.174, MAE= 181.495 | AC  RMSE=  33.822, MAE=  17.560
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[pkci93gMrogZuBj] Linear       | DC  RMSE= 515.146, MAE= 268.890 | AC  RMSE=  50.425, MAE=  26.549
[pkci93gMrogZuBj] Ridge        | DC  RMSE= 521.301, MAE= 275.381 | AC  RMSE=  51.025, MAE=  27.186
[pkci93gMrogZuBj] Lasso        | DC  RMSE= 515.146, MAE= 268.890 | AC  RMSE=  50.424, MAE=  26.549




[pkci93gMrogZuBj] RandomForest | DC  RMSE= 514.760, MAE= 222.399 | AC  RMSE=  49.878, MAE=  21.617
[pkci93gMrogZuBj] NeuralNet    | DC  RMSE=1025.317, MAE= 614.817 | AC  RMSE= 122.881, MAE=  81.210

Iterations completed : 348
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 3.1711
--------------------------------------------------------------
Iterations completed : 152
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.1153

[pkci93gMrogZuBj] Skipping 2020-06-17: not enough samples (1)

[pkci93gMrogZuBj] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 375.561, MAE= 224.896 | AC  RMSE=  36.797, MAE=  22.120
Ridge        | DC  RMSE= 632.894, MAE= 405.762 | AC  RMSE=  61.763, MAE=  39.690
Lasso        | DC  RMSE= 375.561, MAE= 224.897 | AC  RMSE=  36.798, MAE=  22.121
RandomForest | DC  RMSE= 455.116, MAE= 234.796 | AC  RMSE=  44.437, MAE=  23.045
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[rGa61gmuvPhdLxV] Linear       | DC  RMSE= 539.042, MAE= 266.847 | AC  RMSE=  52.783, MAE=  26.387
[rGa61gmuvPhdLxV] Ridge        | DC  RMSE= 530.175, MAE= 263.460 | AC  RMSE=  51.929, MAE=  26.055
[rGa61gmuvPhdLxV] Lasso        | DC  RMSE= 539.041, MAE= 266.847 | AC  RMSE=  52.781, MAE=  26.386




[rGa61gmuvPhdLxV] RandomForest | DC  RMSE= 515.303, MAE= 201.132 | AC  RMSE=  49.899, MAE=  19.530
[rGa61gmuvPhdLxV] NeuralNet    | DC  RMSE=1157.907, MAE= 697.309 | AC  RMSE= 127.105, MAE=  83.603

Iterations completed : 159
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3337
--------------------------------------------------------------
Iterations completed : 80
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.6387

[rGa61gmuvPhdLxV] Skipping 2020-06-17: not enough samples (1)

[rGa61gmuvPhdLxV] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 374.238, MAE= 219.146 | AC  RMSE=  36.627, MAE=  21.572
Ridge        | DC  RMSE= 659.049, MAE= 411.425 | AC  RMSE=  64.255, MAE=  40.212
Lasso        | DC  RMSE= 374.239, MAE= 219.147 | AC  RMSE=  36.628, MAE=  21.572
RandomForest | DC  RMSE= 467.927, MAE= 236.999 | AC  RMSE=  45.502, MAE=  22.985
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[sjndEbLyjtCKgGv] Linear       | DC  RMSE= 363.481, MAE= 221.476 | AC  RMSE=  35.777, MAE=  21.963
[sjndEbLyjtCKgGv] Ridge        | DC  RMSE= 364.917, MAE= 222.354 | AC  RMSE=  35.922, MAE=  22.036
[sjndEbLyjtCKgGv] Lasso        | DC  RMSE= 363.480, MAE= 221.475 | AC  RMSE=  35.776, MAE=  21.962




[sjndEbLyjtCKgGv] RandomForest | DC  RMSE= 374.071, MAE= 166.524 | AC  RMSE=  36.280, MAE=  16.277
[sjndEbLyjtCKgGv] NeuralNet    | DC  RMSE=1195.801, MAE= 721.980 | AC  RMSE= 111.932, MAE=  73.180

Iterations completed : 169
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3085
--------------------------------------------------------------
Iterations completed : 150
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.0837

[sjndEbLyjtCKgGv] Skipping 2020-06-17: not enough samples (1)

[sjndEbLyjtCKgGv] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 338.560, MAE= 192.626 | AC  RMSE=  33.224, MAE=  19.026
Ridge        | DC  RMSE= 639.349, MAE= 391.554 | AC  RMSE=  62.459, MAE=  38.355
Lasso        | DC  RMSE= 338.560, MAE= 192.627 | AC  RMSE=  33.224, MAE=  19.026
RandomForest | DC  RMSE= 350.631, MAE= 179.126 | AC  RMSE=  33.990, MAE=  17.469
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[uHbuxQJl8lW7ozc] Linear       | DC  RMSE= 530.634, MAE= 272.426 | AC  RMSE=  51.912, MAE=  26.901
[uHbuxQJl8lW7ozc] Ridge        | DC  RMSE= 532.076, MAE= 276.410 | AC  RMSE=  52.057, MAE=  27.297
[uHbuxQJl8lW7ozc] Lasso        | DC  RMSE= 530.633, MAE= 272.426 | AC  RMSE=  51.912, MAE=  26.900




[uHbuxQJl8lW7ozc] RandomForest | DC  RMSE= 526.124, MAE= 223.546 | AC  RMSE=  51.189, MAE=  21.797
[uHbuxQJl8lW7ozc] NeuralNet    | DC  RMSE=1150.434, MAE= 737.658 | AC  RMSE= 114.756, MAE=  76.532

Iterations completed : 286
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.7906
--------------------------------------------------------------
Iterations completed : 158
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.4806

[uHbuxQJl8lW7ozc] Skipping 2020-06-17: not enough samples (1)

[uHbuxQJl8lW7ozc] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 384.746, MAE= 231.101 | AC  RMSE=  37.691, MAE=  22.720
Ridge        | DC  RMSE= 634.875, MAE= 404.473 | AC  RMSE=  61.961, MAE=  39.568
Lasso        | DC  RMSE= 384.746, MAE= 231.102 | AC  RMSE=  37.692, MAE=  22.721
RandomForest | DC  RMSE= 464.084, MAE= 244.171 | AC  RMSE=  45.315, MAE=  23.906
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[wCURE6d3bPkepu2] Linear       | DC  RMSE= 717.527, MAE= 266.182 | AC  RMSE=  70.175, MAE=  26.247
[wCURE6d3bPkepu2] Ridge        | DC  RMSE= 711.656, MAE= 266.718 | AC  RMSE=  69.608, MAE=  26.287
[wCURE6d3bPkepu2] Lasso        | DC  RMSE= 717.526, MAE= 266.182 | AC  RMSE=  70.174, MAE=  26.246




[wCURE6d3bPkepu2] RandomForest | DC  RMSE= 702.231, MAE= 230.019 | AC  RMSE=  68.489, MAE=  22.412
[wCURE6d3bPkepu2] NeuralNet    | DC  RMSE=1304.656, MAE= 803.552 | AC  RMSE= 132.437, MAE=  86.463

Iterations completed : 140
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5298
--------------------------------------------------------------
Iterations completed : 66
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.6869

[wCURE6d3bPkepu2] Skipping 2020-06-17: not enough samples (1)

[wCURE6d3bPkepu2] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 390.621, MAE= 214.333 | AC  RMSE=  38.178, MAE=  21.033
Ridge        | DC  RMSE= 658.043, MAE= 393.798 | AC  RMSE=  64.190, MAE=  38.496
Lasso        | DC  RMSE= 390.620, MAE= 214.333 | AC  RMSE=  38.177, MAE=  21.032
RandomForest | DC  RMSE= 400.626, MAE= 195.026 | AC  RMSE=  39.183, MAE=  19.106
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[z9Y9gH1T5YWrNuG] Linear       | DC  RMSE= 620.915, MAE= 280.558 | AC  RMSE=  60.703, MAE=  27.665
[z9Y9gH1T5YWrNuG] Ridge        | DC  RMSE= 611.486, MAE= 278.907 | AC  RMSE=  59.795, MAE=  27.516
[z9Y9gH1T5YWrNuG] Lasso        | DC  RMSE= 620.914, MAE= 280.557 | AC  RMSE=  60.702, MAE=  27.665




[z9Y9gH1T5YWrNuG] RandomForest | DC  RMSE= 600.068, MAE= 241.562 | AC  RMSE=  58.179, MAE=  23.434
[z9Y9gH1T5YWrNuG] NeuralNet    | DC  RMSE=1205.431, MAE= 739.034 | AC  RMSE= 120.324, MAE=  81.422

Iterations completed : 125
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9917
--------------------------------------------------------------
Iterations completed : 89
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.7996

[z9Y9gH1T5YWrNuG] Skipping 2020-06-17: not enough samples (1)

[z9Y9gH1T5YWrNuG] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 399.197, MAE= 229.444 | AC  RMSE=  39.056, MAE=  22.532
Ridge        | DC  RMSE= 624.505, MAE= 383.688 | AC  RMSE=  60.951, MAE=  37.516
Lasso        | DC  RMSE= 399.196, MAE= 229.445 | AC  RMSE=  39.056, MAE=  22.532
RandomForest | DC  RMSE= 416.340, MAE= 210.523 | AC  RMSE=  40.440, MAE=  20.400
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[zBIq5rxdHJRwDNY] Linear       | DC  RMSE= 422.115, MAE= 250.568 | AC  RMSE=  41.471, MAE=  24.728
[zBIq5rxdHJRwDNY] Ridge        | DC  RMSE= 435.244, MAE= 258.371 | AC  RMSE=  42.743, MAE=  25.521
[zBIq5rxdHJRwDNY] Lasso        | DC  RMSE= 422.115, MAE= 250.568 | AC  RMSE=  41.472, MAE=  24.729




[zBIq5rxdHJRwDNY] RandomForest | DC  RMSE= 322.583, MAE= 156.976 | AC  RMSE=  31.238, MAE=  15.219
[zBIq5rxdHJRwDNY] NeuralNet    | DC  RMSE=1176.974, MAE= 690.552 | AC  RMSE= 124.805, MAE=  80.297

Iterations completed : 174
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5857
--------------------------------------------------------------
Iterations completed : 67
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.5363

[zBIq5rxdHJRwDNY] Skipping 2020-06-17: not enough samples (1)

[zBIq5rxdHJRwDNY] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 306.521, MAE= 182.354 | AC  RMSE=  30.062, MAE=  17.968
Ridge        | DC  RMSE= 624.294, MAE= 386.622 | AC  RMSE=  60.914, MAE=  37.833
Lasso        | DC  RMSE= 306.522, MAE= 182.354 | AC  RMSE=  30.063, MAE=  17.968
RandomForest | DC  RMSE= 353.827, MAE= 184.805 | AC  RMSE=  34.322, MAE=  17.925
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[zVJPv84UY57bAof] Linear       | DC  RMSE= 410.418, MAE= 229.959 | AC  RMSE=  40.343, MAE=  22.807
[zVJPv84UY57bAof] Ridge        | DC  RMSE= 414.297, MAE= 230.738 | AC  RMSE=  40.725, MAE=  22.903
[zVJPv84UY57bAof] Lasso        | DC  RMSE= 410.417, MAE= 229.958 | AC  RMSE=  40.342, MAE=  22.807




[zVJPv84UY57bAof] RandomForest | DC  RMSE= 351.066, MAE= 157.403 | AC  RMSE=  34.198, MAE=  15.277
[zVJPv84UY57bAof] NeuralNet    | DC  RMSE=1099.981, MAE= 635.867 | AC  RMSE= 130.471, MAE=  86.272

Iterations completed : 202
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.0717
--------------------------------------------------------------
Iterations completed : 160
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6127

[zVJPv84UY57bAof] Skipping 2020-06-17: not enough samples (1)

[zVJPv84UY57bAof] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 322.753, MAE= 194.935 | AC  RMSE=  31.670, MAE=  19.226
Ridge        | DC  RMSE= 612.122, MAE= 386.367 | AC  RMSE=  59.725, MAE=  37.786
Lasso        | DC  RMSE= 322.753, MAE= 194.935 | AC  RMSE=  31.670, MAE=  19.226
RandomForest | DC  RMSE= 385.610, MAE= 204.179 | AC  RMSE=  37.475, MAE=  19.660
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[YxYtjZvoooNbGkE] Linear       | DC  RMSE= 347.029, MAE= 188.487 | AC  RMSE=  34.145, MAE=  18.734
[YxYtjZvoooNbGkE] Ridge        | DC  RMSE= 346.705, MAE= 193.004 | AC  RMSE=  34.106, MAE=  19.137
[YxYtjZvoooNbGkE] Lasso        | DC  RMSE= 347.028, MAE= 188.486 | AC  RMSE=  34.144, MAE=  18.733




[YxYtjZvoooNbGkE] RandomForest | DC  RMSE= 304.045, MAE= 141.076 | AC  RMSE=  29.679, MAE=  13.768
[YxYtjZvoooNbGkE] NeuralNet    | DC  RMSE=1176.496, MAE= 698.907 | AC  RMSE= 129.448, MAE=  83.949

Iterations completed : 134
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5008
--------------------------------------------------------------
Iterations completed : 150
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.4966

[YxYtjZvoooNbGkE] Skipping 2020-06-17: not enough samples (1)

[YxYtjZvoooNbGkE] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE= 270.690, MAE= 160.113 | AC  RMSE=  26.600, MAE=  15.808
Ridge        | DC  RMSE= 619.084, MAE= 378.110 | AC  RMSE=  60.394, MAE=  36.965
Lasso        | DC  RMSE= 270.687, MAE= 160.111 | AC  RMSE=  26.601, MAE=  15.808
RandomForest | DC  RMSE= 357.427, MAE= 184.764 | AC  RMSE=  34.798, MAE=  17.965
N

<a id="global-comparison"></a>

## 7. Global Model Comparison Across Inverters

This cell:

- Loads all `<inverter>_results.pkl`
- Aggregates RMSE/MAE across inverters
- Compares:
  - Combined vs Parallel
  - DC vs AC
- Produces bar charts, box plots, and NN loss comparisons


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ============================================================
# CONFIG
# ============================================================

###############################################################################################################################################################

# Change Here
 
RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models"
PLOTS_FOLDER   = os.path.join(RESULTS_FOLDER, "00 Training_Visualization_Plots")

###############################################################################################################################################################

os.makedirs(PLOTS_FOLDER, exist_ok=True)

# ============================================================
# LOAD ALL NN LOSS CURVES FROM PKL FILES
# ============================================================

loss_dc = {}   # inverter → loss array
loss_ac = {}   # inverter → loss array

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inv_id = res.get("inverter_id", fname.replace("_results.pkl", ""))

    diag = res.get("nn_diag", {})

    if "dc" in diag and "loss_curve" in diag["dc"]:
        loss_dc[inv_id] = np.array(diag["dc"]["loss_curve"], dtype=float)

    if "ac" in diag and "loss_curve" in diag["ac"]:
        loss_ac[inv_id] = np.array(diag["ac"]["loss_curve"], dtype=float)

print(f"Loaded DC loss curves from {len(loss_dc)} inverters")
print(f"Loaded AC loss curves from {len(loss_ac)} inverters")


# ============================================================
# A1. PLOT ALL DC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_dc = max(len(v) for v in loss_dc.values())
all_dc = np.full((len(loss_dc), max_len_dc), np.nan)

for i, (inv, curve) in enumerate(loss_dc.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_dc[i, :len(curve)] = curve

mean_dc = np.nanmean(all_dc, axis=0)
std_dc  = np.nanstd(all_dc, axis=0)

ax.plot(mean_dc, color="black", linewidth=2, label="Mean DC Loss")

ax.fill_between(
    np.arange(len(mean_dc)),
    mean_dc - std_dc,
    mean_dc + std_dc,
    alpha=0.15,
    label="±1 std"
)

ax.set_title("Neural Network DC Loss Curves — All Inverters")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "DC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# A2. PLOT ALL AC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_ac = max(len(v) for v in loss_ac.values())
all_ac = np.full((len(loss_ac), max_len_ac), np.nan)

for i, (inv, curve) in enumerate(loss_ac.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_ac[i, :len(curve)] = curve

mean_ac = np.nanmean(all_ac, axis=0)
std_ac  = np.nanstd(all_ac, axis=0)

ax.plot(mean_ac, color="black", linewidth=2, label="Mean AC Loss")
ax.fill_between(
    np.arange(len(mean_ac)),
    mean_ac - std_ac,
    mean_ac + std_ac,
    alpha=0.15
)

ax.set_title("Neural Network AC Loss Curves — All Inverters")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "AC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# B. DC vs AC Mean Loss Comparison
# ============================================================

L = min(len(mean_dc), len(mean_ac))
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(mean_dc[:L], label="Mean DC Loss", linewidth=2)
ax.plot(mean_ac[:L], label="Mean AC Loss", linewidth=2)

ax.set_title("Mean Loss Comparison: DC vs AC")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Mean_DC_vs_AC.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# C. Convergence Speed (per inverter)
# ============================================================

def get_convergence_epoch(loss, tol=1e-4, patience=10):
    """Returns epoch where improvement slows down."""
    best = loss[0]
    count = 0
    for i in range(1, len(loss)):
        if loss[i] < best - tol:
            best = loss[i]
            count = 0
        else:
            count += 1
        if count >= patience:
            return i
    return len(loss)

conv_dc = {inv: get_convergence_epoch(curve) for inv, curve in loss_dc.items()}
conv_ac = {inv: get_convergence_epoch(curve) for inv, curve in loss_ac.items()}

# Bar plot
fig, ax = plt.subplots(figsize=(14, 6))
invs = list(conv_dc.keys())
ax.bar(invs, [conv_dc[i] for i in invs], alpha=0.6, label="DC")
ax.bar(invs, [conv_ac.get(i, np.nan) for i in invs], alpha=0.6, label="AC")

ax.set_title("Convergence Epoch per Inverter")
ax.set_ylabel("Epoch")
ax.set_xticklabels(invs, rotation=45, ha="right")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Convergence_Epochs.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

print("\n✅ Training visualization complete.")
print(f"Plots saved in: {PLOTS_FOLDER}")


Loaded DC loss curves from 22 inverters
Loaded AC loss curves from 22 inverters


  ax.set_xticklabels(invs, rotation=45, ha="right")



✅ Training visualization complete.
Plots saved in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models\00 Training_Visualization_Plots


<a id="bias-variance"></a>

## 8. Bias–Variance Analysis & NN Diagnostics

This cell:

- Uses the same `.pkl` files
- Computes bias and variance proxies:
  - Bias ≈ mean combined RMSE
  - Variance ≈ std of per-day RMSE
- Builds:
  - Bias–variance scatter (DC & AC)
  - Bar charts (bias vs variance)
  - NN loss curves (DC & AC)
- Prints a concise bias–variance diagnosis for each model


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ==============================================================
# CONFIG
# ==============================================================

###############################################################################################################################################################
# Change Here 

RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models"
BIASVAR_FOLDER = os.path.join(RESULTS_FOLDER, "00 BiasVariance")

###############################################################################################################################################################

os.makedirs(BIASVAR_FOLDER, exist_ok=True)

MODELS = ["Linear", "Ridge", "Lasso", "RandomForest", "NeuralNet"]

# ==============================================================
# 1. LOAD ALL PER-INVERTER RESULTS (.pkl files)
# ==============================================================

inverter_results = {}   # inverter_id -> results dict

# Neural net loss curves per inverter
nn_loss_dc = {}         # inverter_id -> np.array loss curve (DC)
nn_loss_ac = {}         # inverter_id -> np.array loss curve (AC)

# Neural net diagnostics (per inverter)
nn_diag_dc = {
    "iterations": {},
    "learning_rate": {},
    "momentum": {},
    "total_weights": {},
    "train_time": {},
}
nn_diag_ac = {
    "iterations": {},
    "learning_rate": {},
    "momentum": {},
    "total_weights": {},
    "train_time": {},
}

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        # skip master file or other pkl's
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inverter_id = res.get("inverter_id", fname.replace("_results.pkl", ""))
    inverter_results[inverter_id] = res

    # Collect NN loss curves + diagnostics, if present
    diag = res.get("nn_diag", {})

    # Expected structure (newer version):
    # nn_diag = {
    #   "dc": {"iterations":..., "learning_rate":..., "momentum":...,
    #          "total_weights":..., "train_time":..., "loss_curve":[...]},
    #   "ac": {...}
    # }
    if isinstance(diag, dict) and "dc" in diag:
        dc_diag = diag["dc"]
        if "loss_curve" in dc_diag:
            nn_loss_dc[inverter_id] = np.array(dc_diag["loss_curve"], dtype=float)
        for key in nn_diag_dc.keys():
            if key in dc_diag:
                nn_diag_dc[key][inverter_id] = dc_diag[key]

    # AC side
    if isinstance(diag, dict) and "ac" in diag:
        ac_diag = diag["ac"]
        if "loss_curve" in ac_diag:
            nn_loss_ac[inverter_id] = np.array(ac_diag["loss_curve"], dtype=float)
        for key in nn_diag_ac.keys():
            if key in ac_diag:
                nn_diag_ac[key][inverter_id] = ac_diag[key]

print(f"Loaded {len(inverter_results)} inverter result files.")


# ==============================================================
# 2. COLLECT METRICS ACROSS INVERTERS
# ==============================================================

# Combined (all days merged)
combined_dc_rmse = {m: [] for m in MODELS}
combined_ac_rmse = {m: [] for m in MODELS}
combined_dc_mae  = {m: [] for m in MODELS}
combined_ac_mae  = {m: [] for m in MODELS}

# Parallel (average per-day metrics stored in avg_* fields)
parallel_dc_rmse = {m: [] for m in MODELS}
parallel_ac_rmse = {m: [] for m in MODELS}
parallel_dc_mae  = {m: [] for m in MODELS}
parallel_ac_mae  = {m: [] for m in MODELS}

# We also keep *per-inverter* per-model parallel RMSE lists
# for variance estimation
parallel_dc_rmse_full = {m: [] for m in MODELS}  # each element = list of per-day RMSE for that inverter
parallel_ac_rmse_full = {m: [] for m in MODELS}

for inv_id, res in inverter_results.items():
    comb = res["combined"]
    par  = res["parallel"]

    # --- combined ---
    dc_comb = comb["dc"]
    ac_comb = comb["ac"]

    for m in MODELS:
        if m in dc_comb:
            combined_dc_rmse[m].append(dc_comb[m]["rmse"])
            combined_dc_mae[m].append(dc_comb[m]["mae"])
        if m in ac_comb:
            combined_ac_rmse[m].append(ac_comb[m]["rmse"])
            combined_ac_mae[m].append(ac_comb[m]["mae"])

    # --- parallel (per-day info + averages) ---
    avg_dc_rmse = par.get("avg_dc_rmse", {})
    avg_ac_rmse = par.get("avg_ac_rmse", {})
    avg_dc_mae  = par.get("avg_dc_mae", {})
    avg_ac_mae  = par.get("avg_ac_mae", {})

    dc_rmse_days = par.get("dc_rmse", {})
    ac_rmse_days = par.get("ac_rmse", {})

    for m in MODELS:
        # parallel averages
        if m in avg_dc_rmse:
            parallel_dc_rmse[m].append(avg_dc_rmse[m])
        if m in avg_ac_rmse:
            parallel_ac_rmse[m].append(avg_ac_rmse[m])
        if m in avg_dc_mae:
            parallel_dc_mae[m].append(avg_dc_mae[m])
        if m in avg_ac_mae:
            parallel_ac_mae[m].append(avg_ac_mae[m])

        # full per-day RMSE lists (for variance proxy)
        if m in dc_rmse_days:
            # store the list for this inverter and model
            parallel_dc_rmse_full[m].append(dc_rmse_days[m])
        if m in ac_rmse_days:
            parallel_ac_rmse_full[m].append(ac_rmse_days[m])


# Helper to compute mean & std, ignoring empty lists
def mean_std(arr):
    if len(arr) == 0:
        return np.nan, np.nan
    return float(np.mean(arr)), float(np.std(arr))


# ==============================================================
# 3. BIAS–VARIANCE PROXIES
# ==============================================================

# "Bias" proxy: mean combined RMSE across inverters
# "Variance" proxy: std of *per-day* parallel RMSE across days & inverters

bias_proxy_dc = {}
bias_proxy_ac = {}
var_proxy_dc  = {}
var_proxy_ac  = {}

for m in MODELS:
    # bias proxies
    bias_proxy_dc[m] = mean_std(combined_dc_rmse[m])[0]
    bias_proxy_ac[m] = mean_std(combined_ac_rmse[m])[0]

    # variance proxies: flatten all per-day RMSE for this model
    all_dc_days = []
    for lst in parallel_dc_rmse_full[m]:
        all_dc_days.extend(lst)
    all_ac_days = []
    for lst in parallel_ac_rmse_full[m]:
        all_ac_days.extend(lst)

    var_proxy_dc[m] = float(np.std(all_dc_days)) if len(all_dc_days) > 0 else np.nan
    var_proxy_ac[m] = float(np.std(all_ac_days)) if len(all_ac_days) > 0 else np.nan


# ==============================================================
# 4. BIAS–VARIANCE SCATTER PLOTS (DC & AC)
# ==============================================================

labels = MODELS
x_dc = [var_proxy_dc[m] for m in labels]
y_dc = [bias_proxy_dc[m] for m in labels]

fig, ax = plt.subplots(figsize=(8, 6))
for i, m in enumerate(labels):
    ax.scatter(x_dc[i], y_dc[i])
    ax.text(x_dc[i] * 1.01, y_dc[i] * 1.01, m)

ax.set_xlabel("Variance proxy (std of per-day RMSE, DC)")
ax.set_ylabel("Bias proxy (mean combined RMSE, DC)")
ax.set_title("Bias–Variance Proxy Plane — DC")
ax.grid(True)

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_scatter_DC.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)

# AC version
x_ac = [var_proxy_ac[m] for m in labels]
y_ac = [bias_proxy_ac[m] for m in labels]

fig, ax = plt.subplots(figsize=(8, 6))
for i, m in enumerate(labels):
    ax.scatter(x_ac[i], y_ac[i])
    ax.text(x_ac[i] * 1.01, y_ac[i] * 1.01, m)

ax.set_xlabel("Variance proxy (std of per-day RMSE, AC)")
ax.set_ylabel("Bias proxy (mean combined RMSE, AC)")
ax.set_title("Bias–Variance Proxy Plane — AC")
ax.grid(True)

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_scatter_AC.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 5. BAR PLOTS: BIAS & VARIANCE (DC & AC)
# ==============================================================

x = np.arange(len(labels))
width = 0.35

# DC
fig, ax = plt.subplots(figsize=(10, 5))
bias_dc_bar = [bias_proxy_dc[m] for m in labels]
var_dc_bar  = [var_proxy_dc[m] for m in labels]

ax.bar(x - width/2, bias_dc_bar, width, label="Bias proxy (mean combined RMSE)")
ax.bar(x + width/2, var_dc_bar,  width, label="Variance proxy (std per-day RMSE)")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("Error / Std")
ax.set_title("Bias vs Variance proxy — DC")
ax.legend()
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_bar_DC.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)

# AC
fig, ax = plt.subplots(figsize=(10, 5))
bias_ac_bar = [bias_proxy_ac[m] for m in labels]
var_ac_bar  = [var_proxy_ac[m] for m in labels]

ax.bar(x - width/2, bias_ac_bar, width, label="Bias proxy (mean combined RMSE)")
ax.bar(x + width/2, var_ac_bar,  width, label="Variance proxy (std per-day RMSE)")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("Error / Std")
ax.set_title("Bias vs Variance proxy — AC")
ax.legend()
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_bar_AC.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 6. NEURAL NET LEARNING CURVES (COST vs ITERATION) — DC & AC
# ==============================================================

# ----- DC cost function -----
if len(nn_loss_dc) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_dc = max(len(curve) for curve in nn_loss_dc.values())
    all_curves_dc = np.full((len(nn_loss_dc), max_len_dc), np.nan)

    for idx, (inv_id, curve) in enumerate(nn_loss_dc.items()):
        epochs = np.arange(len(curve))
        ax.plot(epochs, curve, alpha=0.3, label=inv_id)
        all_curves_dc[idx, :len(curve)] = curve

    mean_curve_dc = np.nanmean(all_curves_dc, axis=0)
    ax.plot(np.arange(len(mean_curve_dc)), mean_curve_dc,
            linewidth=2.5, label="Mean across inverters")

    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss (Cost Function)")
    ax.set_title("Neural Network Training Loss — DC (Combined data)")
    ax.grid(True)
    ax.legend(loc="upper right", fontsize=8, ncol=2)

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_learning_curves_DC.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)
else:
    print("No DC nn_diag loss_curve found; skipping DC cost-function plot.")

# ----- AC cost function -----
if len(nn_loss_ac) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_ac = max(len(curve) for curve in nn_loss_ac.values())
    all_curves_ac = np.full((len(nn_loss_ac), max_len_ac), np.nan)

    for idx, (inv_id, curve) in enumerate(nn_loss_ac.items()):
        epochs = np.arange(len(curve))
        ax.plot(epochs, curve, alpha=0.3, label=inv_id)
        all_curves_ac[idx, :len(curve)] = curve

    mean_curve_ac = np.nanmean(all_curves_ac, axis=0)
    ax.plot(np.arange(len(mean_curve_ac)), mean_curve_ac,
            linewidth=2.5, label="Mean across inverters")

    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss (Cost Function)")
    ax.set_title("Neural Network Training Loss — AC (Combined data)")
    ax.grid(True)
    ax.legend(loc="upper right", fontsize=8, ncol=2)

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_learning_curves_AC.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)
else:
    print("No AC nn_diag loss_curve found; skipping AC cost-function plot.")

# ----- DC vs AC mean loss in a single figure -----
if len(nn_loss_dc) > 0 and len(nn_loss_ac) > 0:
    # Compute mean DC
    max_len_dc = max(len(c) for c in nn_loss_dc.values())
    dc_mat = np.full((len(nn_loss_dc), max_len_dc), np.nan)
    for i, c in enumerate(nn_loss_dc.values()):
        dc_mat[i, :len(c)] = c
    mean_dc = np.nanmean(dc_mat, axis=0)

    # Compute mean AC
    max_len_ac = max(len(c) for c in nn_loss_ac.values())
    ac_mat = np.full((len(nn_loss_ac), max_len_ac), np.nan)
    for i, c in enumerate(nn_loss_ac.values()):
        ac_mat[i, :len(c)] = c
    mean_ac = np.nanmean(ac_mat, axis=0)

    # Align lengths
    L = min(len(mean_dc), len(mean_ac))
    mean_dc = mean_dc[:L]
    mean_ac = mean_ac[:L]

    fig, ax = plt.subplots(figsize=(10, 6))
    epochs = np.arange(L)
    ax.plot(epochs, mean_dc, label="Mean DC Loss")
    ax.plot(epochs, mean_ac, label="Mean AC Loss")
    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss")
    ax.set_title("Neural Net Cost Function Comparison — DC vs AC (mean over inverters)")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_mean_loss_DC_vs_AC.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)


# ==============================================================
# 7. PRINT SOME TEXTUAL BIAS–VARIANCE DIAGNOSIS
# ==============================================================

print("\n===== Bias–Variance Diagnosis (proxies) =====")
print("Model         | Bias_DC  | Var_DC   | Bias_AC  | Var_AC")
print("-------------------------------------------------------")
for m in MODELS:
    print(f"{m:12s} | "
          f"{bias_proxy_dc[m]:7.3f} | {var_proxy_dc[m]:7.3f} | "
          f"{bias_proxy_ac[m]:7.3f} | {var_proxy_ac[m]:7.3f}")
print("=======================================================")

print("\n✅ Bias–Variance analysis complete.")
print(f"All bias–variance and learning-curve plots saved in: {BIASVAR_FOLDER}")


Loaded 22 inverter result files.

===== Bias–Variance Diagnosis (proxies) =====
Model         | Bias_DC  | Var_DC   | Bias_AC  | Var_AC
-------------------------------------------------------
Linear       | 489.021 | 262.319 |  47.942 |  25.542
Ridge        | 493.475 | 267.692 |  48.370 |  26.076
Lasso        | 489.021 | 262.320 |  47.942 |  25.543
RandomForest | 462.717 | 225.921 |  44.985 |  21.968
NeuralNet    | 1159.451 | 871.630 | 120.267 |  28.125

✅ Bias–Variance analysis complete.
All bias–variance and learning-curve plots saved in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models\00 BiasVariance


<a id="nn-visualisation"></a>

## 9. Neural Network Training Visualisation

This step focuses specifically on the neural networks:

- Plots all DC loss curves with mean ± std band
- Plots all AC loss curves with mean ± std band
- Compares mean DC vs mean AC loss
- Computes convergence epoch per inverter and plots them


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ============================================================
# CONFIG
# ============================================================

###############################################################################################################################################################

# Change here 

RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models"
PLOTS_FOLDER   = os.path.join(RESULTS_FOLDER, "00 Training_Visualization_Plots")

###############################################################################################################################################################

os.makedirs(PLOTS_FOLDER, exist_ok=True)

# ============================================================
# LOAD ALL NN LOSS CURVES FROM PKL FILES
# ============================================================

loss_dc = {}   # inverter → loss array
loss_ac = {}   # inverter → loss array

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inv_id = res.get("inverter_id", fname.replace("_results.pkl", ""))

    diag = res.get("nn_diag", {})

    if "dc" in diag and "loss_curve" in diag["dc"]:
        loss_dc[inv_id] = np.array(diag["dc"]["loss_curve"], dtype=float)

    if "ac" in diag and "loss_curve" in diag["ac"]:
        loss_ac[inv_id] = np.array(diag["ac"]["loss_curve"], dtype=float)

print(f"Loaded DC loss curves from {len(loss_dc)} inverters")
print(f"Loaded AC loss curves from {len(loss_ac)} inverters")


# ============================================================
# A1. PLOT ALL DC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_dc = max(len(v) for v in loss_dc.values())
all_dc = np.full((len(loss_dc), max_len_dc), np.nan)

for i, (inv, curve) in enumerate(loss_dc.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_dc[i, :len(curve)] = curve

mean_dc = np.nanmean(all_dc, axis=0)
std_dc  = np.nanstd(all_dc, axis=0)

ax.plot(mean_dc, color="black", linewidth=2, label="Mean DC Loss")

ax.fill_between(
    np.arange(len(mean_dc)),
    mean_dc - std_dc,
    mean_dc + std_dc,
    alpha=0.15,
    label="±1 std"
)

ax.set_title("Neural Network DC Loss Curves — All Inverters")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "DC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# A2. PLOT ALL AC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_ac = max(len(v) for v in loss_ac.values())
all_ac = np.full((len(loss_ac), max_len_ac), np.nan)

for i, (inv, curve) in enumerate(loss_ac.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_ac[i, :len(curve)] = curve

mean_ac = np.nanmean(all_ac, axis=0)
std_ac  = np.nanstd(all_ac, axis=0)

ax.plot(mean_ac, color="black", linewidth=2, label="Mean AC Loss")
ax.fill_between(
    np.arange(len(mean_ac)),
    mean_ac - std_ac,
    mean_ac + std_ac,
    alpha=0.15
)

ax.set_title("Neural Network AC Loss Curves — All Inverters")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "AC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# B. DC vs AC Mean Loss Comparison
# ============================================================

L = min(len(mean_dc), len(mean_ac))
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(mean_dc[:L], label="Mean DC Loss", linewidth=2)
ax.plot(mean_ac[:L], label="Mean AC Loss", linewidth=2)

ax.set_title("Mean Loss Comparison: DC vs AC")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Mean_DC_vs_AC.png"), dpi=150, bbox_inches="tight")
plt.close(fig)


# ============================================================
# C. Convergence Speed (per inverter)
# ============================================================

def get_convergence_epoch(loss, tol=1e-4, patience=10):
    """Returns epoch where improvement slows down."""
    best = loss[0]
    count = 0
    for i in range(1, len(loss)):
        if loss[i] < best - tol:
            best = loss[i]
            count = 0
        else:
            count += 1
        if count >= patience:
            return i
    return len(loss)

conv_dc = {inv: get_convergence_epoch(curve) for inv, curve in loss_dc.items()}
conv_ac = {inv: get_convergence_epoch(curve) for inv, curve in loss_ac.items()}

# Bar plot
fig, ax = plt.subplots(figsize=(14, 6))
invs = list(conv_dc.keys())
ax.bar(invs, [conv_dc[i] for i in invs], alpha=0.6, label="DC")
ax.bar(invs, [conv_ac.get(i, np.nan) for i in invs], alpha=0.6, label="AC")

ax.set_title("Convergence Epoch per Inverter")
ax.set_ylabel("Epoch")
ax.set_xticklabels(invs, rotation=45, ha="right")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Convergence_Epochs.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

print("\n✅ Training visualization complete.")
print(f"Plots saved in: {PLOTS_FOLDER}")


Loaded DC loss curves from 22 inverters
Loaded AC loss curves from 22 inverters


  ax.set_xticklabels(invs, rotation=45, ha="right")



✅ Training visualization complete.
Plots saved in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\01 Plant1_Inverter_Models\00 Training_Visualization_Plots


<a id="time-of-day-plots"></a>

## 10. Time-of-Day Operational Plots (Plant 1)

Finally, we visualise operational behaviour:

- For each inverter:
  - Time-of-day on x-axis (0:00 → 23:45)
  - Overlay multiple days as separate curves
  - One subplot per variable:
    - Irradiation
    - Ambient temperature
    - Module temperature
    - Daily yield
    - DC power
    - AC power


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os

# Folder where all inverter CSV files are stored
###############################################################################################################################################################

# Change Here

input_folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1"

###############################################################################################################################################################

# (Optional) Folder where plots will be saved
plot_output = input_folder + "/Plots"
os.makedirs(plot_output, exist_ok=True)

# 22 inverter IDs
inverters = [
    '1BY6WEcLGh8j5v7', '1IF53ai7Xc0U56Y', '3PZuoBAID5Wc2HD', '7JYdWkrLSPkdwr4',
    'McdE0feGgRqW7Ca', 'VHMLBKoKgIrUVDU', 'WRmjgnKYAwPKWDb', 'ZnxXDlPa8U1GXgE',
    'ZoEaEvLYb1n2sOq', 'adLQvlD726eNBSB', 'bvBOhCH3iADSZry', 'iCRJl6heRkivqQ3',
    'ih0vzX44oOqAx2f', 'pkci93gMrogZuBj', 'rGa61gmuvPhdLxV', 'sjndEbLyjtCKgGv',
    'uHbuxQJl8lW7ozc', 'wCURE6d3bPkepu2', 'z9Y9gH1T5YWrNuG', 'zBIq5rxdHJRwDNY',
    'zVJPv84UY57bAof', 'YxYtjZvoooNbGkE'
]

# Variables to plot
variables = [
    ("IRRADIATION_CLEAN", "Irradiation"),
    ("AMBIENT_TEMPERATURE", "Ambient Temp (°C)"),
    ("MODULE_TEMPERATURE", "Module Temp (°C)"),
    ("DAILY_YIELD_CLEAN", "Daily Yield"),
    ("DC_CLEAN", "DC Power"),
    ("AC_CLEAN", "AC Power"),
]

# Time tick labels (full day 00:00 → 23:00)
tick_minutes = [h * 60 for h in range(0, 24)]
tick_labels = [f"{h:02d}:00" for h in range(0, 24)]

# Date range filter
start_date = pd.to_datetime("15/05/2020", dayfirst=True)
end_date   = pd.to_datetime("17/06/2020 23:45", dayfirst=True)

# -------------------------------------------------------------------
# MAIN LOOP — PROCESS EACH INVERTER
# -------------------------------------------------------------------
for inv in inverters:
    csv_path = os.path.join(input_folder, f"Plant1_{inv}_clean.csv")

    if not os.path.exists(csv_path):
        print(f"⚠ Missing file skipped: {csv_path}")
        continue

    print(f"📈 Plotting inverter: {inv}")

    # Load data
    df = pd.read_csv(csv_path)

    # Convert to datetime
    df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)

    # Filter date range
    df = df[(df["DATE_TIME"] >= start_date) & (df["DATE_TIME"] <= end_date)]

    # Extract date + minutes from midnight
    df["DATE"] = df["DATE_TIME"].dt.date
    df["MINUTES"] = df["DATE_TIME"].dt.hour * 60 + df["DATE_TIME"].dt.minute

    # -------------------------------------------------------------
    # BUILD FIGURE
    # -------------------------------------------------------------
    fig = plt.figure(figsize=(22, 20))
    gs = gridspec.GridSpec(6, 7, figure=fig, width_ratios=[1,1,1,1,1,1,0.35])

    axes = []
    for i in range(6):
        ax = fig.add_subplot(gs[i, :6])
        axes.append(ax)

    legend_ax = fig.add_subplot(gs[:, 6])
    legend_ax.axis("off")

    # -------------------------------------------------------------
    # PLOT VARIABLES
    # -------------------------------------------------------------
    for ax, (col, ylabel) in zip(axes, variables):
        for date, group in df.groupby("DATE"):
            ax.plot(group["MINUTES"], group[col], label=str(date))
        ax.set_ylabel(ylabel)
        ax.grid(True)
        ax.set_xticks(tick_minutes)
        ax.set_xticklabels(tick_labels)

    axes[-1].set_xlabel("Time of Day")

    # -------------------------------------------------------------
    # LEGEND PANEL
    # -------------------------------------------------------------
    handles, labels = axes[0].get_legend_handles_labels()
    legend_ax.legend(
        handles,
        labels,
        title="Dates",
        loc="center",
        frameon=True,
        fontsize=8,
    )

    plt.tight_layout()

    # OPTIONAL: Save figure automatically
    plot_path = os.path.join(plot_output, f"{inv}_plot.png")
    plt.savefig(plot_path, dpi=200)

    print(f"   ✓ Saved plot: {plot_path}")

    # Show each inverter's plot
    plt.show()


📈 Plotting inverter: 1BY6WEcLGh8j5v7


  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\1BY6WEcLGh8j5v7_plot.png
📈 Plotting inverter: 1IF53ai7Xc0U56Y


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\1IF53ai7Xc0U56Y_plot.png
📈 Plotting inverter: 3PZuoBAID5Wc2HD


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\3PZuoBAID5Wc2HD_plot.png
📈 Plotting inverter: 7JYdWkrLSPkdwr4


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\7JYdWkrLSPkdwr4_plot.png
📈 Plotting inverter: McdE0feGgRqW7Ca


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\McdE0feGgRqW7Ca_plot.png
📈 Plotting inverter: VHMLBKoKgIrUVDU


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\VHMLBKoKgIrUVDU_plot.png
📈 Plotting inverter: WRmjgnKYAwPKWDb


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\WRmjgnKYAwPKWDb_plot.png
📈 Plotting inverter: ZnxXDlPa8U1GXgE


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\ZnxXDlPa8U1GXgE_plot.png
📈 Plotting inverter: ZoEaEvLYb1n2sOq


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\ZoEaEvLYb1n2sOq_plot.png
📈 Plotting inverter: adLQvlD726eNBSB


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\adLQvlD726eNBSB_plot.png
📈 Plotting inverter: bvBOhCH3iADSZry


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\bvBOhCH3iADSZry_plot.png
📈 Plotting inverter: iCRJl6heRkivqQ3


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\iCRJl6heRkivqQ3_plot.png
📈 Plotting inverter: ih0vzX44oOqAx2f


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\ih0vzX44oOqAx2f_plot.png
📈 Plotting inverter: pkci93gMrogZuBj


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\pkci93gMrogZuBj_plot.png
📈 Plotting inverter: rGa61gmuvPhdLxV


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\rGa61gmuvPhdLxV_plot.png
📈 Plotting inverter: sjndEbLyjtCKgGv


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\sjndEbLyjtCKgGv_plot.png
📈 Plotting inverter: uHbuxQJl8lW7ozc


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\uHbuxQJl8lW7ozc_plot.png
📈 Plotting inverter: wCURE6d3bPkepu2


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\wCURE6d3bPkepu2_plot.png
📈 Plotting inverter: z9Y9gH1T5YWrNuG


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\z9Y9gH1T5YWrNuG_plot.png
📈 Plotting inverter: zBIq5rxdHJRwDNY


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\zBIq5rxdHJRwDNY_plot.png
📈 Plotting inverter: zVJPv84UY57bAof


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\zVJPv84UY57bAof_plot.png
📈 Plotting inverter: YxYtjZvoooNbGkE


  plt.show()
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   ✓ Saved plot: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 1/Plots\YxYtjZvoooNbGkE_plot.png


  plt.show()


<a id="plant2-train-loop"></a>

## 11. Plant 2 Inverter Training Loop

In this step we train machine learning models for **each Plant 2 inverter** using
daily CSV files already generated for Plant 2:

- Loops over all 22 Plant 2 inverters
- Calls `run_inverter_experiment(...)` for each inverter
- Saves:
  - Per-inverter plots into dedicated folders
  - Per-inverter results into `{inverter}_results.pkl`
  - A master file `PLANT2_ALL_INVERTER_RESULTS.pkl` with all results


In [None]:
###############################################################################################################################################################

# Change Here 
input_folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 2" 
output_root  = input_folder + "/Daily_Inverter_Data"

###############################################################################################################################################################

os.makedirs(output_root, exist_ok=True)

inverters = [
    "4UPUqMRk7TRMgml", "9kRcWv60rDACzjR", "81aHJ1q11NBPMrL", "Et9kgGMDl729KT4",
    "IQ2d7wF4YD8zU1Q", "LlT2YUhhzqhg5Sw", "LYwnQax7tkwH5Cb", "mqwcsP2rE7J0TFp",
    "Mx2yZCDsyf6DPfv", "NgDl19wMapZy17u", "oZ35aAeoifZaQzV", "oZZkBaNadn6DNKz",
    "PeE6FRyGXUgsRhN", "q49J1IKaHRwDQnt", "Qf4GUc1pJu5T6c6", "Quc1TzYxW2pYoWX",
    "rrq4fwE8jgrTyWY", "V94E5Ben1TlhnDV", "vOuJvMaM2sgwLmb", "WcxssY2VbP4hApt",
    "xMbIugepa2P7lBB", "xoJJ8DcxJEcupym"
]

keep_cols = [
    "DATE_TIME",
    "IRRADIATION_CLEAN",
    "AMBIENT_TEMPERATURE",
    "MODULE_TEMPERATURE",
    "DAILY_YIELD_CLEAN",
    "DC_CLEAN",
    "AC_CLEAN"
]

start_date = pd.to_datetime("15/05/2020", dayfirst=True)
end_date   = pd.to_datetime("17/06/2020 23:59", dayfirst=True)

print(f"Processing {len(inverters)} inverters...\n")

for inv in inverters:
    filename = f"Plant2_{inv}_clean.csv"
    csv_path = os.path.join(input_folder, filename)

    if not os.path.exists(csv_path):
        print(f"⚠ WARNING: {filename} not found — skipping.")
        continue

    print(f"→ Processing inverter {inv}")

    df = pd.read_csv(csv_path)
    df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
    df = df[(df["DATE_TIME"] >= start_date) & (df["DATE_TIME"] <= end_date)]

    df["DATE"] = df["DATE_TIME"].dt.date
    df = df[keep_cols + ["DATE"]]

    inverter_folder = os.path.join(output_root, inv)
    os.makedirs(inverter_folder, exist_ok=True)

    for date, group in df.groupby("DATE"):
        group = group[keep_cols]
        outname = f"{date.strftime('%Y-%m-%d')}.csv"
        save_path = os.path.join(inverter_folder, outname)
        group.reset_index(drop=True).to_csv(save_path, index=False)
        print(f"   Saved {outname}")

print("\n✓ All 22 inverters processed successfully.")


Processing 22 inverters...

→ Processing inverter 4UPUqMRk7TRMgml
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv


  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter 9kRcWv60rDACzjR
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter Et9kgGMDl729KT4
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter LYwnQax7tkwH5Cb
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter mqwcsP2rE7J0TFp
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter Mx2yZCDsyf6DPfv
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 202

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter NgDl19wMapZy17u
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter oZ35aAeoifZaQzV
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 202

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter oZZkBaNadn6DNKz
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter q49J1IKaHRwDQnt
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter rrq4fwE8jgrTyWY
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter vOuJvMaM2sgwLmb
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
  

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter WcxssY2VbP4hApt
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter xMbIugepa2P7lBB
   Saved 2020-05-15.csv
   Saved 202

  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)
  df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"], dayfirst=True)


   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
   Saved 2020-06-11.csv
   Saved 2020-06-12.csv
   Saved 2020-06-13.csv
   Saved 2020-06-14.csv
   Saved 2020-06-15.csv
   Saved 2020-06-16.csv
   Saved 2020-06-17.csv
→ Processing inverter xoJJ8DcxJEcupym
   Saved 2020-05-15.csv
   Saved 2020-05-16.csv
   Saved 2020-05-17.csv
   Saved 2020-05-18.csv
   Saved 2020-05-19.csv
   Saved 2020-05-20.csv
   Saved 2020-05-21.csv
   Saved 2020-05-22.csv
   Saved 2020-05-23.csv
   Saved 2020-05-24.csv
   Saved 2020-05-25.csv
   Saved 2020-05-26.csv
   Saved 2020-05-27.csv
   Saved 2020-05-28.csv
   Saved 2020-05-29.csv
   Saved 2020-05-30.csv
   Saved 2020-05-31.csv
   Saved 2020-06-01.csv
   Saved 2020-06-02.csv
   Saved 2020-06-03.csv
   Saved 2020-06-04.csv
   Saved 2020-06-05.csv
   Saved 2020-06-06.csv
   Saved 2020-06-07.csv
   Saved 2020-06-08.csv
   Saved 2020-06-09.csv
   Saved 2020-06-10.csv
  

In [None]:
import os
import pickle
# from Utilities import run_inverter_experiment   # your main training function

# ============================================================
# PATHS FOR PLANT 2 (TASK 3)
# ============================================================

###############################################################################################################################################################

# Change here

# BASE_DAILY_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 2\Daily_Inverter_Data"
# SAVE_PLOTS_BASE   = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models"


DATA_BASE = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data"

BASE_DAILY_FOLDER = os.path.join(
    DATA_BASE,
    "00 Excel clean file",
    "Plant 2",
    "Daily_Inverter_Data",
)

SAVE_PLOTS_BASE = os.path.join(
    DATA_BASE,
    "02 Plant2_Inverter_Models",
)


###############################################################################################################################################################

os.makedirs(SAVE_PLOTS_BASE, exist_ok=True)

# ============================================================
# 22 PLANT 2 INVERTERS (auto-detected from your folder)
# ============================================================
inverters = [
    "4UPUqMRk7TRMgml", "9kRcWv60rDACzjR", "81aHJ1q11NBPMrL", "Et9kgGMDl729KT4",
    "IQ2d7wF4YD8zU1Q", "LlT2YUhhzqhg5Sw", "LYwnQax7tkwH5Cb", "mqwcsP2rE7J0TFp",
    "Mx2yZCDsyf6DPfv", "NgDl19wMapZy17u", "oZ35aAeoifZaQzV", "oZZkBaNadn6DNKz",
    "PeE6FRyGXUgsRhN", "q49J1IKaHRwDQnt", "Qf4GUc1pJu5T6c6", "Quc1TzYxW2pYoWX",
    "rrq4fwE8jgrTyWY", "V94E5Ben1TlhnDV", "vOuJvMaM2sgwLmb", "WcxssY2VbP4hApt",
    "xMbIugepa2P7lBB", "xoJJ8DcxJEcupym"
]

# ============================================================
# TRAINING LOOP FOR ALL 22 INVERTERS — PLANT 2
# ============================================================
all_results = {}

for inv in inverters:

    print("\n===================================")
    print(f"   TRAINING PLANT 2 INVERTER: {inv}")
    print("===================================\n")

    # Daily folder for this inverter
    inverter_daily_path = os.path.join(BASE_DAILY_FOLDER, inv)

    # Create plot folder
    inverter_plot_path = os.path.join(SAVE_PLOTS_BASE, inv)
    os.makedirs(inverter_plot_path, exist_ok=True)

    # -------------------------------------------------------
    # RUN THE TRAINING EXPERIMENT FOR THIS INVERTER
    # -------------------------------------------------------
    results = run_inverter_experiment(
        inverter_id=inv,
        daily_folder=inverter_daily_path,
        start_date_str="2020-05-15",
        end_date_str="2020-06-17",
        verbose=True,
        save_plots=True,
        plot_folder=inverter_plot_path
    )

    # Store results in dictionary
    all_results[inv] = results

    # Save single inverter results
    results_file = os.path.join(SAVE_PLOTS_BASE, f"{inv}_results.pkl")
    with open(results_file, "wb") as f:
        pickle.dump(results, f)

    print(f"✔ Saved results → {results_file}")
    print(f"✔ Saved plots   → {inverter_plot_path}")
    print("-----------------------------------------------")

# ============================================================
# SAVE MASTER RESULTS FILE FOR PLANT 2
# ============================================================
master_save_path = os.path.join(SAVE_PLOTS_BASE, "PLANT2_ALL_INVERTER_RESULTS.pkl")
with open(master_save_path, "wb") as f:
    pickle.dump(all_results, f)

print("\n======================================================")
print("   FINISHED TRAINING ALL 22 PLANT 2 INVERTERS 🎉")
print(f"   MASTER RESULTS SAVED TO:\n   {master_save_path}")
print("======================================================")



   TRAINING PLANT 2 INVERTER: 4UPUqMRk7TRMgml

[4UPUqMRk7TRMgml] Found 34 CSV files in C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\00 Excel clean file\Plant 2\Daily_Inverter_Data\4UPUqMRk7TRMgml


  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[4UPUqMRk7TRMgml] Linear       | DC  RMSE= 147.113, MAE=  74.184 | AC  RMSE= 143.760, MAE=  72.715
[4UPUqMRk7TRMgml] Ridge        | DC  RMSE= 147.333, MAE=  75.221 | AC  RMSE= 143.975, MAE=  73.732
[4UPUqMRk7TRMgml] Lasso        | DC  RMSE= 147.113, MAE=  74.186 | AC  RMSE= 143.760, MAE=  72.717




[4UPUqMRk7TRMgml] RandomForest | DC  RMSE=  87.094, MAE=  33.150 | AC  RMSE=  85.084, MAE=  32.297
[4UPUqMRk7TRMgml] NeuralNet    | DC  RMSE= 170.278, MAE= 109.877 | AC  RMSE= 142.317, MAE=  81.725

Iterations completed : 264
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.3608
--------------------------------------------------------------
Iterations completed : 300
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.6832

[4UPUqMRk7TRMgml] Skipping 2020-06-17: not enough samples (1)

[4UPUqMRk7TRMgml] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  66.489, MAE=  42.024 | AC  RMSE=  65.104, MAE=  41.238
Ridge        | DC  RMSE=  77.320, MAE=  52.449 | AC  RMSE=  75.727, MAE=  51.438
Lasso        | DC  RMSE=  66.493, MAE=  42.012 | AC  RMSE=  65.107, MAE=  41.224
RandomForest | DC  RMSE=  47.708, MAE=  21.509 | AC  RMSE=  46.657, MAE=  21.103
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[9kRcWv60rDACzjR] Linear       | DC  RMSE= 160.653, MAE=  92.815 | AC  RMSE= 157.030, MAE=  90.879
[9kRcWv60rDACzjR] Ridge        | DC  RMSE= 159.919, MAE=  93.138 | AC  RMSE= 156.311, MAE=  91.212
[9kRcWv60rDACzjR] Lasso        | DC  RMSE= 160.652, MAE=  92.815 | AC  RMSE= 157.029, MAE=  90.879




[9kRcWv60rDACzjR] RandomForest | DC  RMSE=  87.467, MAE=  30.365 | AC  RMSE=  85.670, MAE=  29.829
[9kRcWv60rDACzjR] NeuralNet    | DC  RMSE= 144.730, MAE=  92.866 | AC  RMSE= 137.758, MAE=  81.153

Iterations completed : 291
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.5935
--------------------------------------------------------------
Iterations completed : 338
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 3.0636

[9kRcWv60rDACzjR] Skipping 2020-06-17: not enough samples (1)

[9kRcWv60rDACzjR] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  63.649, MAE=  45.581 | AC  RMSE=  62.373, MAE=  44.739
Ridge        | DC  RMSE=  76.417, MAE=  56.066 | AC  RMSE=  74.908, MAE=  55.000
Lasso        | DC  RMSE=  63.648, MAE=  45.580 | AC  RMSE=  62.371, MAE=  44.738
RandomForest | DC  RMSE=  38.094, MAE=  17.211 | AC  RMSE=  37.068, MAE=  16.820
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[81aHJ1q11NBPMrL] Linear       | DC  RMSE= 161.137, MAE=  95.009 | AC  RMSE= 157.474, MAE=  93.061
[81aHJ1q11NBPMrL] Ridge        | DC  RMSE= 160.350, MAE=  95.083 | AC  RMSE= 156.708, MAE=  93.163
[81aHJ1q11NBPMrL] Lasso        | DC  RMSE= 161.136, MAE=  95.009 | AC  RMSE= 157.473, MAE=  93.061




[81aHJ1q11NBPMrL] RandomForest | DC  RMSE= 130.392, MAE=  50.035 | AC  RMSE= 126.981, MAE=  48.783
[81aHJ1q11NBPMrL] NeuralNet    | DC  RMSE= 201.318, MAE= 142.327 | AC  RMSE= 160.306, MAE=  95.364

Iterations completed : 223
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.8118
--------------------------------------------------------------
Iterations completed : 417
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 4.5351

[81aHJ1q11NBPMrL] Skipping 2020-06-17: not enough samples (1)

[81aHJ1q11NBPMrL] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  74.662, MAE=  50.907 | AC  RMSE=  73.112, MAE=  49.894
Ridge        | DC  RMSE=  82.880, MAE=  58.477 | AC  RMSE=  81.173, MAE=  57.321
Lasso        | DC  RMSE=  74.661, MAE=  50.906 | AC  RMSE=  73.111, MAE=  49.893
RandomForest | DC  RMSE=  50.495, MAE=  20.312 | AC  RMSE=  49.190, MAE=  19.665
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[Et9kgGMDl729KT4] Linear       | DC  RMSE= 156.850, MAE=  98.825 | AC  RMSE= 153.469, MAE=  96.799
[Et9kgGMDl729KT4] Ridge        | DC  RMSE= 157.049, MAE=  99.122 | AC  RMSE= 153.664, MAE=  97.091
[Et9kgGMDl729KT4] Lasso        | DC  RMSE= 156.851, MAE=  98.826 | AC  RMSE= 153.470, MAE=  96.800




[Et9kgGMDl729KT4] RandomForest | DC  RMSE= 101.810, MAE=  37.502 | AC  RMSE=  98.965, MAE=  36.410
[Et9kgGMDl729KT4] NeuralNet    | DC  RMSE= 147.126, MAE=  93.184 | AC  RMSE= 137.310, MAE=  81.052

Iterations completed : 292
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 3.1200
--------------------------------------------------------------
Iterations completed : 254
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.3949

[Et9kgGMDl729KT4] Skipping 2020-06-17: not enough samples (1)

[Et9kgGMDl729KT4] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  92.425, MAE=  62.960 | AC  RMSE=  90.555, MAE=  61.719
Ridge        | DC  RMSE=  91.692, MAE=  65.834 | AC  RMSE=  89.886, MAE=  64.562
Lasso        | DC  RMSE=  92.426, MAE=  62.946 | AC  RMSE=  90.556, MAE=  61.705
RandomForest | DC  RMSE=  50.716, MAE=  21.451 | AC  RMSE=  49.644, MAE=  21.056
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[IQ2d7wF4YD8zU1Q] Linear       | DC  RMSE=  96.838, MAE=  40.686 | AC  RMSE=  94.711, MAE=  40.033
[IQ2d7wF4YD8zU1Q] Ridge        | DC  RMSE=  97.565, MAE=  42.268 | AC  RMSE=  95.406, MAE=  41.570
[IQ2d7wF4YD8zU1Q] Lasso        | DC  RMSE=  96.839, MAE=  40.687 | AC  RMSE=  94.712, MAE=  40.033




[IQ2d7wF4YD8zU1Q] RandomForest | DC  RMSE=  97.402, MAE=  32.565 | AC  RMSE=  94.620, MAE=  31.609
[IQ2d7wF4YD8zU1Q] NeuralNet    | DC  RMSE= 167.486, MAE= 107.610 | AC  RMSE= 155.445, MAE=  98.417

Iterations completed : 137
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5593
--------------------------------------------------------------
Iterations completed : 109
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.4139

[IQ2d7wF4YD8zU1Q] Skipping 2020-06-17: not enough samples (1)

[IQ2d7wF4YD8zU1Q] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  41.623, MAE=  28.242 | AC  RMSE=  40.804, MAE=  27.768
Ridge        | DC  RMSE=  56.555, MAE=  40.593 | AC  RMSE=  55.402, MAE=  39.832
Lasso        | DC  RMSE=  41.575, MAE=  28.218 | AC  RMSE=  40.759, MAE=  27.748
RandomForest | DC  RMSE=  40.350, MAE=  19.291 | AC  RMSE=  39.509, MAE=  18.848
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[LlT2YUhhzqhg5Sw] Linear       | DC  RMSE= 153.812, MAE=  83.967 | AC  RMSE= 150.286, MAE=  82.231
[LlT2YUhhzqhg5Sw] Ridge        | DC  RMSE= 153.841, MAE=  84.432 | AC  RMSE= 150.313, MAE=  82.703
[LlT2YUhhzqhg5Sw] Lasso        | DC  RMSE= 153.812, MAE=  83.967 | AC  RMSE= 150.286, MAE=  82.231




[LlT2YUhhzqhg5Sw] RandomForest | DC  RMSE=  80.349, MAE=  29.062 | AC  RMSE=  78.314, MAE=  28.350
[LlT2YUhhzqhg5Sw] NeuralNet    | DC  RMSE= 224.068, MAE= 145.919 | AC  RMSE= 188.418, MAE= 130.059

Iterations completed : 65
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.5381
--------------------------------------------------------------
Iterations completed : 145
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.2675

[LlT2YUhhzqhg5Sw] Skipping 2020-06-17: not enough samples (1)

[LlT2YUhhzqhg5Sw] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  64.402, MAE=  45.484 | AC  RMSE=  63.120, MAE=  44.626
Ridge        | DC  RMSE=  75.079, MAE=  55.409 | AC  RMSE=  73.609, MAE=  54.377
Lasso        | DC  RMSE=  64.400, MAE=  45.483 | AC  RMSE=  63.119, MAE=  44.624
RandomForest | DC  RMSE=  38.608, MAE=  18.002 | AC  RMSE=  37.740, MAE=  17.592
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[LYwnQax7tkwH5Cb] Linear       | DC  RMSE= 167.866, MAE= 102.504 | AC  RMSE= 164.239, MAE= 100.484
[LYwnQax7tkwH5Cb] Ridge        | DC  RMSE= 167.849, MAE= 102.972 | AC  RMSE= 164.221, MAE= 100.939
[LYwnQax7tkwH5Cb] Lasso        | DC  RMSE= 167.866, MAE= 102.505 | AC  RMSE= 164.238, MAE= 100.484




[LYwnQax7tkwH5Cb] RandomForest | DC  RMSE=  92.606, MAE=  33.754 | AC  RMSE=  90.503, MAE=  32.958
[LYwnQax7tkwH5Cb] NeuralNet    | DC  RMSE= 186.254, MAE= 117.452 | AC  RMSE= 180.359, MAE= 114.532

Iterations completed : 182
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.5316
--------------------------------------------------------------
Iterations completed : 182
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3591

[LYwnQax7tkwH5Cb] Skipping 2020-06-17: not enough samples (1)

[LYwnQax7tkwH5Cb] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  86.267, MAE=  59.176 | AC  RMSE=  84.616, MAE=  58.120
Ridge        | DC  RMSE=  91.636, MAE=  66.790 | AC  RMSE=  89.905, MAE=  65.557
Lasso        | DC  RMSE=  86.265, MAE=  59.176 | AC  RMSE=  84.614, MAE=  58.120
RandomForest | DC  RMSE=  44.200, MAE=  19.569 | AC  RMSE=  43.261, MAE=  19.155
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[mqwcsP2rE7J0TFp] Linear       | DC  RMSE=  98.615, MAE=  46.025 | AC  RMSE=  96.446, MAE=  45.248
[mqwcsP2rE7J0TFp] Ridge        | DC  RMSE=  99.245, MAE=  47.256 | AC  RMSE=  97.047, MAE=  46.438
[mqwcsP2rE7J0TFp] Lasso        | DC  RMSE=  98.615, MAE=  46.026 | AC  RMSE=  96.446, MAE=  45.248




[mqwcsP2rE7J0TFp] RandomForest | DC  RMSE=  94.577, MAE=  32.922 | AC  RMSE=  92.283, MAE=  32.068
[mqwcsP2rE7J0TFp] NeuralNet    | DC  RMSE= 167.724, MAE= 106.190 | AC  RMSE= 158.697, MAE= 103.079

Iterations completed : 125
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9926
--------------------------------------------------------------
Iterations completed : 99
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.7214

[mqwcsP2rE7J0TFp] Skipping 2020-06-17: not enough samples (1)

[mqwcsP2rE7J0TFp] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  57.731, MAE=  35.282 | AC  RMSE=  56.568, MAE=  34.683
Ridge        | DC  RMSE=  69.837, MAE=  46.678 | AC  RMSE=  68.377, MAE=  45.785
Lasso        | DC  RMSE=  57.685, MAE=  35.257 | AC  RMSE=  56.522, MAE=  34.661
RandomForest | DC  RMSE=  47.412, MAE=  22.115 | AC  RMSE=  46.066, MAE=  21.580
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[Mx2yZCDsyf6DPfv] Linear       | DC  RMSE= 137.554, MAE=  62.222 | AC  RMSE= 134.412, MAE=  60.980
[Mx2yZCDsyf6DPfv] Ridge        | DC  RMSE= 138.199, MAE=  63.528 | AC  RMSE= 135.039, MAE=  62.267
[Mx2yZCDsyf6DPfv] Lasso        | DC  RMSE= 137.555, MAE=  62.223 | AC  RMSE= 134.412, MAE=  60.982




[Mx2yZCDsyf6DPfv] RandomForest | DC  RMSE= 111.713, MAE=  35.956 | AC  RMSE= 109.218, MAE=  35.241
[Mx2yZCDsyf6DPfv] NeuralNet    | DC  RMSE= 155.221, MAE=  96.115 | AC  RMSE= 152.421, MAE=  95.066

Iterations completed : 260
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.6052
--------------------------------------------------------------
Iterations completed : 260
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.9994

[Mx2yZCDsyf6DPfv] Skipping 2020-06-17: not enough samples (1)

[Mx2yZCDsyf6DPfv] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  57.323, MAE=  37.666 | AC  RMSE=  56.181, MAE=  36.993
Ridge        | DC  RMSE=  69.306, MAE=  48.769 | AC  RMSE=  67.884, MAE=  47.840
Lasso        | DC  RMSE=  57.321, MAE=  37.642 | AC  RMSE=  56.179, MAE=  36.968
RandomForest | DC  RMSE=  41.121, MAE=  18.593 | AC  RMSE=  40.062, MAE=  18.109
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[NgDl19wMapZy17u] Linear       | DC  RMSE= 109.386, MAE=  48.081 | AC  RMSE= 106.970, MAE=  47.238
[NgDl19wMapZy17u] Ridge        | DC  RMSE= 109.818, MAE=  49.280 | AC  RMSE= 107.379, MAE=  48.402
[NgDl19wMapZy17u] Lasso        | DC  RMSE= 109.386, MAE=  48.082 | AC  RMSE= 106.970, MAE=  47.239




[NgDl19wMapZy17u] RandomForest | DC  RMSE= 100.326, MAE=  35.232 | AC  RMSE=  98.309, MAE=  34.445
[NgDl19wMapZy17u] NeuralNet    | DC  RMSE= 179.741, MAE= 111.628 | AC  RMSE= 175.633, MAE= 109.842

Iterations completed : 131
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9194
--------------------------------------------------------------
Iterations completed : 131
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.9110

[NgDl19wMapZy17u] Skipping 2020-06-17: not enough samples (1)

[NgDl19wMapZy17u] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  52.728, MAE=  33.243 | AC  RMSE=  51.655, MAE=  32.678
Ridge        | DC  RMSE=  65.226, MAE=  44.844 | AC  RMSE=  63.874, MAE=  43.991
Lasso        | DC  RMSE=  52.691, MAE=  33.218 | AC  RMSE=  51.619, MAE=  32.657
RandomForest | DC  RMSE=  44.124, MAE=  20.702 | AC  RMSE=  43.408, MAE=  20.528
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[oZ35aAeoifZaQzV] Linear       | DC  RMSE= 151.660, MAE=  79.058 | AC  RMSE= 148.204, MAE=  77.444
[oZ35aAeoifZaQzV] Ridge        | DC  RMSE= 152.046, MAE=  80.052 | AC  RMSE= 148.580, MAE=  78.413
[oZ35aAeoifZaQzV] Lasso        | DC  RMSE= 151.660, MAE=  79.060 | AC  RMSE= 148.204, MAE=  77.445




[oZ35aAeoifZaQzV] RandomForest | DC  RMSE= 118.312, MAE=  40.685 | AC  RMSE= 115.591, MAE=  39.801
[oZ35aAeoifZaQzV] NeuralNet    | DC  RMSE= 157.830, MAE=  92.385 | AC  RMSE= 164.607, MAE= 104.086

Iterations completed : 201
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.7490
--------------------------------------------------------------
Iterations completed : 175
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.3393

[oZ35aAeoifZaQzV] Skipping 2020-06-17: not enough samples (1)

[oZ35aAeoifZaQzV] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  64.341, MAE=  43.891 | AC  RMSE=  63.021, MAE=  43.089
Ridge        | DC  RMSE=  75.233, MAE=  54.000 | AC  RMSE=  73.681, MAE=  52.986
Lasso        | DC  RMSE=  64.338, MAE=  43.869 | AC  RMSE=  63.020, MAE=  43.068
RandomForest | DC  RMSE=  39.088, MAE=  18.397 | AC  RMSE=  38.173, MAE=  18.054
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[oZZkBaNadn6DNKz] Linear       | DC  RMSE= 171.807, MAE=  94.973 | AC  RMSE= 167.920, MAE=  93.005
[oZZkBaNadn6DNKz] Ridge        | DC  RMSE= 172.094, MAE=  95.649 | AC  RMSE= 168.200, MAE=  93.666
[oZZkBaNadn6DNKz] Lasso        | DC  RMSE= 171.807, MAE=  94.975 | AC  RMSE= 167.921, MAE=  93.007




[oZZkBaNadn6DNKz] RandomForest | DC  RMSE= 108.384, MAE=  38.645 | AC  RMSE= 105.220, MAE=  37.564
[oZZkBaNadn6DNKz] NeuralNet    | DC  RMSE= 173.783, MAE= 110.719 | AC  RMSE= 155.530, MAE=  86.352

Iterations completed : 211
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.7373
--------------------------------------------------------------
Iterations completed : 257
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.1257

[oZZkBaNadn6DNKz] Skipping 2020-06-17: not enough samples (1)

[oZZkBaNadn6DNKz] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  73.108, MAE=  51.330 | AC  RMSE=  71.589, MAE=  50.333
Ridge        | DC  RMSE=  84.354, MAE=  61.247 | AC  RMSE=  82.617, MAE=  60.069
Lasso        | DC  RMSE=  73.107, MAE=  51.330 | AC  RMSE=  71.588, MAE=  50.333
RandomForest | DC  RMSE=  46.697, MAE=  20.098 | AC  RMSE=  45.330, MAE=  19.509
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[PeE6FRyGXUgsRhN] Linear       | DC  RMSE= 151.195, MAE=  79.331 | AC  RMSE= 147.763, MAE=  77.701
[PeE6FRyGXUgsRhN] Ridge        | DC  RMSE= 151.410, MAE=  80.172 | AC  RMSE= 147.971, MAE=  78.531
[PeE6FRyGXUgsRhN] Lasso        | DC  RMSE= 151.196, MAE=  79.332 | AC  RMSE= 147.763, MAE=  77.703




[PeE6FRyGXUgsRhN] RandomForest | DC  RMSE=  93.470, MAE=  35.873 | AC  RMSE=  91.208, MAE=  34.871
[PeE6FRyGXUgsRhN] NeuralNet    | DC  RMSE= 139.585, MAE=  79.697 | AC  RMSE= 145.759, MAE=  88.092

Iterations completed : 281
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.4721
--------------------------------------------------------------
Iterations completed : 283
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.2329

[PeE6FRyGXUgsRhN] Skipping 2020-06-17: not enough samples (1)

[PeE6FRyGXUgsRhN] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  66.378, MAE=  43.648 | AC  RMSE=  65.015, MAE=  42.795
Ridge        | DC  RMSE=  77.411, MAE=  54.352 | AC  RMSE=  75.833, MAE=  53.288
Lasso        | DC  RMSE=  66.376, MAE=  43.647 | AC  RMSE=  65.014, MAE=  42.794
RandomForest | DC  RMSE=  49.432, MAE=  20.083 | AC  RMSE=  48.087, MAE=  19.520
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[q49J1IKaHRwDQnt] Linear       | DC  RMSE= 142.496, MAE=  82.925 | AC  RMSE= 139.387, MAE=  81.293
[q49J1IKaHRwDQnt] Ridge        | DC  RMSE= 142.027, MAE=  83.218 | AC  RMSE= 138.926, MAE=  81.598
[q49J1IKaHRwDQnt] Lasso        | DC  RMSE= 142.496, MAE=  82.925 | AC  RMSE= 139.386, MAE=  81.293




[q49J1IKaHRwDQnt] RandomForest | DC  RMSE=  91.622, MAE=  34.466 | AC  RMSE=  89.011, MAE=  33.447
[q49J1IKaHRwDQnt] NeuralNet    | DC  RMSE= 140.918, MAE=  83.805 | AC  RMSE= 141.967, MAE=  86.042

Iterations completed : 284
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.3254
--------------------------------------------------------------
Iterations completed : 288
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.2744

[q49J1IKaHRwDQnt] Skipping 2020-06-17: not enough samples (1)

[q49J1IKaHRwDQnt] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  78.048, MAE=  48.733 | AC  RMSE=  76.522, MAE=  47.859
Ridge        | DC  RMSE=  83.756, MAE=  57.099 | AC  RMSE=  82.175, MAE=  56.054
Lasso        | DC  RMSE=  78.047, MAE=  48.733 | AC  RMSE=  76.520, MAE=  47.858
RandomForest | DC  RMSE=  51.064, MAE=  20.621 | AC  RMSE=  49.891, MAE=  19.973
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[Qf4GUc1pJu5T6c6] Linear       | DC  RMSE= 146.103, MAE=  75.118 | AC  RMSE= 142.765, MAE=  73.542
[Qf4GUc1pJu5T6c6] Ridge        | DC  RMSE= 146.215, MAE=  75.860 | AC  RMSE= 142.875, MAE=  74.275
[Qf4GUc1pJu5T6c6] Lasso        | DC  RMSE= 146.103, MAE=  75.118 | AC  RMSE= 142.765, MAE=  73.543




[Qf4GUc1pJu5T6c6] RandomForest | DC  RMSE=  85.347, MAE=  30.158 | AC  RMSE=  83.307, MAE=  29.434
[Qf4GUc1pJu5T6c6] NeuralNet    | DC  RMSE= 199.179, MAE= 136.171 | AC  RMSE= 136.181, MAE=  80.726

Iterations completed : 83
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.8106
--------------------------------------------------------------
Iterations completed : 283
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.0176

[Qf4GUc1pJu5T6c6] Skipping 2020-06-17: not enough samples (1)

[Qf4GUc1pJu5T6c6] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  53.715, MAE=  36.767 | AC  RMSE=  52.615, MAE=  36.070
Ridge        | DC  RMSE=  66.532, MAE=  47.317 | AC  RMSE=  65.177, MAE=  46.426
Lasso        | DC  RMSE=  53.721, MAE=  36.762 | AC  RMSE=  52.620, MAE=  36.063
RandomForest | DC  RMSE=  37.897, MAE=  18.544 | AC  RMSE=  36.680, MAE=  17.946
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[Quc1TzYxW2pYoWX] Linear       | DC  RMSE= 130.937, MAE=  79.364 | AC  RMSE= 128.239, MAE=  77.826
[Quc1TzYxW2pYoWX] Ridge        | DC  RMSE= 131.142, MAE=  79.685 | AC  RMSE= 128.441, MAE=  78.140
[Quc1TzYxW2pYoWX] Lasso        | DC  RMSE= 130.937, MAE=  79.365 | AC  RMSE= 128.239, MAE=  77.827




[Quc1TzYxW2pYoWX] RandomForest | DC  RMSE=  89.871, MAE=  37.167 | AC  RMSE=  87.684, MAE=  36.238
[Quc1TzYxW2pYoWX] NeuralNet    | DC  RMSE= 139.204, MAE=  87.030 | AC  RMSE= 131.892, MAE=  82.073

Iterations completed : 225
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.8160
--------------------------------------------------------------
Iterations completed : 225
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.7370

[Quc1TzYxW2pYoWX] Skipping 2020-06-17: not enough samples (1)

[Quc1TzYxW2pYoWX] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  82.582, MAE=  56.238 | AC  RMSE=  80.946, MAE=  55.131
Ridge        | DC  RMSE=  85.424, MAE=  60.310 | AC  RMSE=  83.759, MAE=  59.158
Lasso        | DC  RMSE=  82.579, MAE=  56.235 | AC  RMSE=  80.943, MAE=  55.129
RandomForest | DC  RMSE=  45.607, MAE=  19.471 | AC  RMSE=  44.845, MAE=  19.145
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[rrq4fwE8jgrTyWY] Linear       | DC  RMSE= 169.761, MAE= 105.081 | AC  RMSE= 166.011, MAE= 102.928
[rrq4fwE8jgrTyWY] Ridge        | DC  RMSE= 169.307, MAE= 105.264 | AC  RMSE= 165.566, MAE= 103.117
[rrq4fwE8jgrTyWY] Lasso        | DC  RMSE= 169.760, MAE= 105.081 | AC  RMSE= 166.010, MAE= 102.928




[rrq4fwE8jgrTyWY] RandomForest | DC  RMSE= 105.280, MAE=  37.504 | AC  RMSE= 102.638, MAE=  36.698
[rrq4fwE8jgrTyWY] NeuralNet    | DC  RMSE= 151.279, MAE=  83.571 | AC  RMSE= 204.083, MAE= 135.432

Iterations completed : 378
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 3.2083
--------------------------------------------------------------
Iterations completed : 96
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.7687

[rrq4fwE8jgrTyWY] Skipping 2020-06-17: not enough samples (1)

[rrq4fwE8jgrTyWY] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  89.491, MAE=  61.448 | AC  RMSE=  87.710, MAE=  60.318
Ridge        | DC  RMSE=  95.485, MAE=  69.127 | AC  RMSE=  93.600, MAE=  67.807
Lasso        | DC  RMSE=  89.489, MAE=  61.447 | AC  RMSE=  87.708, MAE=  60.317
RandomForest | DC  RMSE=  46.006, MAE=  19.914 | AC  RMSE=  44.840, MAE=  19.435
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[V94E5Ben1TlhnDV] Linear       | DC  RMSE= 168.922, MAE=  87.032 | AC  RMSE= 165.061, MAE=  85.162
[V94E5Ben1TlhnDV] Ridge        | DC  RMSE= 169.147, MAE=  87.653 | AC  RMSE= 165.279, MAE=  85.789
[V94E5Ben1TlhnDV] Lasso        | DC  RMSE= 168.923, MAE=  87.033 | AC  RMSE= 165.061, MAE=  85.163




[V94E5Ben1TlhnDV] RandomForest | DC  RMSE=  96.374, MAE=  35.439 | AC  RMSE=  94.671, MAE=  34.745
[V94E5Ben1TlhnDV] NeuralNet    | DC  RMSE= 156.361, MAE=  85.639 | AC  RMSE= 178.586, MAE= 117.623

Iterations completed : 310
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.4147
--------------------------------------------------------------
Iterations completed : 266
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.0393

[V94E5Ben1TlhnDV] Skipping 2020-06-17: not enough samples (1)

[V94E5Ben1TlhnDV] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  66.860, MAE=  46.926 | AC  RMSE=  65.475, MAE=  45.986
Ridge        | DC  RMSE=  76.620, MAE=  55.248 | AC  RMSE=  75.030, MAE=  54.179
Lasso        | DC  RMSE=  66.859, MAE=  46.925 | AC  RMSE=  65.474, MAE=  45.985
RandomForest | DC  RMSE=  42.957, MAE=  18.617 | AC  RMSE=  41.938, MAE=  18.180
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[vOuJvMaM2sgwLmb] Linear       | DC  RMSE= 143.532, MAE=  81.825 | AC  RMSE= 140.195, MAE=  80.068
[vOuJvMaM2sgwLmb] Ridge        | DC  RMSE= 142.951, MAE=  81.723 | AC  RMSE= 139.630, MAE=  80.000
[vOuJvMaM2sgwLmb] Lasso        | DC  RMSE= 143.531, MAE=  81.825 | AC  RMSE= 140.194, MAE=  80.068




[vOuJvMaM2sgwLmb] RandomForest | DC  RMSE= 104.172, MAE=  35.904 | AC  RMSE= 101.350, MAE=  34.835
[vOuJvMaM2sgwLmb] NeuralNet    | DC  RMSE= 174.549, MAE= 112.275 | AC  RMSE= 183.953, MAE= 124.028

Iterations completed : 186
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6260
--------------------------------------------------------------
Iterations completed : 191
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.6805

[vOuJvMaM2sgwLmb] Skipping 2020-06-17: not enough samples (1)

[vOuJvMaM2sgwLmb] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  67.019, MAE=  46.774 | AC  RMSE=  65.620, MAE=  45.855
Ridge        | DC  RMSE=  76.675, MAE=  55.739 | AC  RMSE=  75.055, MAE=  54.634
Lasso        | DC  RMSE=  67.017, MAE=  46.772 | AC  RMSE=  65.618, MAE=  45.853
RandomForest | DC  RMSE=  43.297, MAE=  18.893 | AC  RMSE=  42.314, MAE=  18.420
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[WcxssY2VbP4hApt] Linear       | DC  RMSE= 153.309, MAE=  93.406 | AC  RMSE= 149.812, MAE=  91.382
[WcxssY2VbP4hApt] Ridge        | DC  RMSE= 153.649, MAE=  93.930 | AC  RMSE= 150.142, MAE=  91.938
[WcxssY2VbP4hApt] Lasso        | DC  RMSE= 153.309, MAE=  93.406 | AC  RMSE= 149.812, MAE=  91.382




[WcxssY2VbP4hApt] RandomForest | DC  RMSE=  97.244, MAE=  35.625 | AC  RMSE=  94.984, MAE=  34.788
[WcxssY2VbP4hApt] NeuralNet    | DC  RMSE= 196.122, MAE= 130.819 | AC  RMSE= 128.577, MAE=  70.812

Iterations completed : 188
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.8775
--------------------------------------------------------------
Iterations completed : 507
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 4.1374

[WcxssY2VbP4hApt] Skipping 2020-06-17: not enough samples (1)

[WcxssY2VbP4hApt] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  63.797, MAE=  44.342 | AC  RMSE=  62.469, MAE=  43.443
Ridge        | DC  RMSE=  76.966, MAE=  55.719 | AC  RMSE=  75.366, MAE=  54.609
Lasso        | DC  RMSE=  63.796, MAE=  44.341 | AC  RMSE=  62.468, MAE=  43.442
RandomForest | DC  RMSE=  40.496, MAE=  18.605 | AC  RMSE=  39.543, MAE=  18.028
N

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[xMbIugepa2P7lBB] Linear       | DC  RMSE=  94.670, MAE=  39.108 | AC  RMSE=  92.601, MAE=  38.508
[xMbIugepa2P7lBB] Ridge        | DC  RMSE=  95.266, MAE=  40.477 | AC  RMSE=  93.167, MAE=  39.819
[xMbIugepa2P7lBB] Lasso        | DC  RMSE=  94.670, MAE=  39.108 | AC  RMSE=  92.601, MAE=  38.508




[xMbIugepa2P7lBB] RandomForest | DC  RMSE=  90.580, MAE=  30.880 | AC  RMSE=  88.742, MAE=  30.146
[xMbIugepa2P7lBB] NeuralNet    | DC  RMSE= 176.685, MAE= 115.083 | AC  RMSE= 164.977, MAE= 105.108

Iterations completed : 112
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.8914
--------------------------------------------------------------
Iterations completed : 98
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 0.7042

[xMbIugepa2P7lBB] Skipping 2020-06-17: not enough samples (1)

[xMbIugepa2P7lBB] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  39.881, MAE=  26.918 | AC  RMSE=  39.106, MAE=  26.483
Ridge        | DC  RMSE=  54.785, MAE=  38.917 | AC  RMSE=  53.696, MAE=  38.189
Lasso        | DC  RMSE=  39.839, MAE=  26.901 | AC  RMSE=  39.066, MAE=  26.468
RandomForest | DC  RMSE=  36.830, MAE=  17.405 | AC  RMSE=  35.615, MAE=  16.939
Ne

  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="ffill")
  df = df.fillna(method="bfill").fillna(method="


[xoJJ8DcxJEcupym] Linear       | DC  RMSE= 121.165, MAE=  75.521 | AC  RMSE= 118.758, MAE=  74.259
[xoJJ8DcxJEcupym] Ridge        | DC  RMSE= 121.266, MAE=  75.635 | AC  RMSE= 118.852, MAE=  74.369
[xoJJ8DcxJEcupym] Lasso        | DC  RMSE= 121.165, MAE=  75.521 | AC  RMSE= 118.758, MAE=  74.259




[xoJJ8DcxJEcupym] RandomForest | DC  RMSE=  87.380, MAE=  32.111 | AC  RMSE=  85.218, MAE=  31.465
[xoJJ8DcxJEcupym] NeuralNet    | DC  RMSE= 123.729, MAE=  81.555 | AC  RMSE= 110.704, MAE=  62.169

Iterations completed : 213
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 1.7944
--------------------------------------------------------------
Iterations completed : 335
Learning rate (init) : 0.001
Momentum             : 0.9
Total weights        : 4416
Training time (sec)  : 2.7134

[xoJJ8DcxJEcupym] Skipping 2020-06-17: not enough samples (1)

[xoJJ8DcxJEcupym] ===== AVERAGE PER-DAY (“PARALLEL”) RESULTS =====
Linear       | DC  RMSE=  69.456, MAE=  46.782 | AC  RMSE=  68.114, MAE=  45.887
Ridge        | DC  RMSE=  82.313, MAE=  56.683 | AC  RMSE=  80.722, MAE=  55.644
Lasso        | DC  RMSE=  69.455, MAE=  46.781 | AC  RMSE=  68.113, MAE=  45.886
RandomForest | DC  RMSE=  53.139, MAE=  21.355 | AC  RMSE=  51.883, MAE=  20.828
N

<a id="plant2-nn-visualisation"></a>

## 12. Neural Network Training Visualisation (Plant 2)

This step focuses on **neural network training dynamics** for Plant 2:

- Loads DC and AC loss curves from all `_results.pkl` files
- Plots all DC loss curves with mean ± std band
- Plots all AC loss curves with mean ± std band
- Compares mean DC vs mean AC loss
- Estimates convergence epoch per inverter and plots convergence speed


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ============================================================
# CONFIG  (PLANT 2 VERSION)
# ============================================================

###############################################################################################################################################################

# Change here 

RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models"
PLOTS_FOLDER   = os.path.join(RESULTS_FOLDER, "00 Training_Visualization_Plots")

###############################################################################################################################################################

os.makedirs(PLOTS_FOLDER, exist_ok=True)

# ============================================================
# LOAD ALL NN LOSS CURVES FROM PKL FILES
# ============================================================

loss_dc = {}   # inverter → loss array
loss_ac = {}   # inverter → loss array

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inv_id = res.get("inverter_id", fname.replace("_results.pkl", ""))
    diag   = res.get("nn_diag", {})

    # ----- DC -----
    if "dc" in diag and "loss_curve" in diag["dc"]:
        loss_dc[inv_id] = np.array(diag["dc"]["loss_curve"], dtype=float)

    # ----- AC -----
    if "ac" in diag and "loss_curve" in diag["ac"]:
        loss_ac[inv_id] = np.array(diag["ac"]["loss_curve"], dtype=float)

print(f"Loaded DC loss curves from {len(loss_dc)} inverters")
print(f"Loaded AC loss curves from {len(loss_ac)} inverters")

# ============================================================
# A1. PLOT ALL DC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_dc = max(len(v) for v in loss_dc.values())
all_dc = np.full((len(loss_dc), max_len_dc), np.nan)

for i, (inv, curve) in enumerate(loss_dc.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_dc[i, :len(curve)] = curve

mean_dc = np.nanmean(all_dc, axis=0)
std_dc  = np.nanstd(all_dc, axis=0)

ax.plot(mean_dc, color="black", linewidth=2, label="Mean DC Loss")
ax.fill_between(
    np.arange(len(mean_dc)),
    mean_dc - std_dc,
    mean_dc + std_dc,
    alpha=0.15,
    label="±1 std"
)

ax.set_title("Neural Network DC Loss Curves — All Inverters (Plant 2)")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "DC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

# ============================================================
# A2. PLOT ALL AC LOSS CURVES
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))
max_len_ac = max(len(v) for v in loss_ac.values())
all_ac = np.full((len(loss_ac), max_len_ac), np.nan)

for i, (inv, curve) in enumerate(loss_ac.items()):
    ax.plot(curve, alpha=0.3, label=inv)
    all_ac[i, :len(curve)] = curve

mean_ac = np.nanmean(all_ac, axis=0)
std_ac  = np.nanstd(all_ac, axis=0)

ax.plot(mean_ac, color="black", linewidth=2, label="Mean AC Loss")
ax.fill_between(
    np.arange(len(mean_ac)),
    mean_ac - std_ac,
    mean_ac + std_ac,
    alpha=0.15
)

ax.set_title("Neural Network AC Loss Curves — All Inverters (Plant 2)")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend(fontsize=7, ncol=2)

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "AC_Loss_All.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

# ============================================================
# B. DC vs AC Mean Loss Comparison
# ============================================================

L = min(len(mean_dc), len(mean_ac))

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(mean_dc[:L], label="Mean DC Loss", linewidth=2)
ax.plot(mean_ac[:L], label="Mean AC Loss", linewidth=2)

ax.set_title("Mean Loss Comparison: DC vs AC — Plant 2")
ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Mean_DC_vs_AC.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

# ============================================================
# C. Convergence Speed (per inverter)
# ============================================================

def get_convergence_epoch(loss, tol=1e-4, patience=10):
    """Returns epoch where improvement slows down."""
    best = loss[0]
    count = 0
    for i in range(1, len(loss)):
        if loss[i] < best - tol:
            best = loss[i]
            count = 0
        else:
            count += 1
        if count >= patience:
            return i
    return len(loss)

conv_dc = {inv: get_convergence_epoch(curve) for inv, curve in loss_dc.items()}
conv_ac = {inv: get_convergence_epoch(curve) for inv, curve in loss_ac.items()}

fig, ax = plt.subplots(figsize=(14, 6))
invs = list(conv_dc.keys())

ax.bar(invs, [conv_dc[i] for i in invs], alpha=0.6, label="DC")
ax.bar(invs, [conv_ac.get(i, np.nan) for i in invs], alpha=0.6, label="AC")

ax.set_title("Convergence Epoch per Inverter — Plant 2")
ax.set_ylabel("Epoch")
ax.set_xticklabels(invs, rotation=45, ha="right")
ax.grid(True)
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "Convergence_Epochs.png"), dpi=150, bbox_inches="tight")
plt.close(fig)

print("\n✅ Training visualization complete (Plant 2).")
print(f"Plots saved in: {PLOTS_FOLDER}")


Loaded DC loss curves from 22 inverters
Loaded AC loss curves from 22 inverters


  ax.set_xticklabels(invs, rotation=45, ha="right")



✅ Training visualization complete (Plant 2).
Plots saved in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models\00 Training_Visualization_Plots


<a id="plant2-bias-variance"></a>

## 13. Bias–Variance Proxy Analysis (Plant 2)

Here we analyse **bias–variance behaviour** of all models on Plant 2:

- Loads all per-inverter `_results.pkl` files
- Extracts combined and per-day (parallel) RMSE/MAE for each model
- Computes:
  - Bias proxy = mean combined RMSE across inverters
  - Variance proxy = std of per-day RMSE across days & inverters
- Produces:
  - Bias–variance scatter plots (DC & AC)
  - Bias vs variance bar plots
  - NN loss curves (DC & AC) and mean DC vs AC
- Saves all plots under `00_BiasVariance` inside the Plant 2 models folder


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ==============================================================
# CONFIG  —  PLANT 2 (Using Plant-1 style trimming)
# ==============================================================

###############################################################################################################################################################

# Change here 

RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models"

###############################################################################################################################################################

BIASVAR_FOLDER = os.path.join(RESULTS_FOLDER, "00_BiasVariance")


os.makedirs(BIASVAR_FOLDER, exist_ok=True)

MODELS = ["Linear", "Ridge", "Lasso", "RandomForest", "NeuralNet"]


# ==============================================================
# 1. LOAD ALL PER-INVERTER RESULTS (.pkl files)
# ==============================================================

inverter_results = {}

nn_loss_dc = {}
nn_loss_ac = {}

nn_diag_dc = {"iterations": {}, "learning_rate": {}, "momentum": {}, "total_weights": {}, "train_time": {}}
nn_diag_ac = {"iterations": {}, "learning_rate": {}, "momentum": {}, "total_weights": {}, "train_time": {}}

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inverter_id = res.get("inverter_id", fname.replace("_results.pkl", ""))
    inverter_results[inverter_id] = res

    diag = res.get("nn_diag", {})

    # DC
    if isinstance(diag, dict) and "dc" in diag:
        dc_diag = diag["dc"]
        if "loss_curve" in dc_diag:
            nn_loss_dc[inverter_id] = np.array(dc_diag["loss_curve"], dtype=float)
        for key in nn_diag_dc.keys():
            if key in dc_diag:
                nn_diag_dc[key][inverter_id] = dc_diag[key]

    # AC
    if isinstance(diag, dict) and "ac" in diag:
        ac_diag = diag["ac"]
        if "loss_curve" in ac_diag:
            nn_loss_ac[inverter_id] = np.array(ac_diag["loss_curve"], dtype=float)
        for key in nn_diag_ac.keys():
            if key in ac_diag:
                nn_diag_ac[key][inverter_id] = ac_diag[key]

print(f"Loaded {len(inverter_results)} inverter result files from Plant 2.")


# ==============================================================
# 2. COLLECT METRICS
# ==============================================================

combined_dc_rmse = {m: [] for m in MODELS}
combined_ac_rmse = {m: [] for m in MODELS}
combined_dc_mae  = {m: [] for m in MODELS}
combined_ac_mae  = {m: [] for m in MODELS}

parallel_dc_rmse = {m: [] for m in MODELS}
parallel_ac_rmse = {m: [] for m in MODELS}
parallel_dc_mae  = {m: [] for m in MODELS}
parallel_ac_mae  = {m: [] for m in MODELS}

parallel_dc_rmse_full = {m: [] for m in MODELS}
parallel_ac_rmse_full = {m: [] for m in MODELS}

for inv_id, res in inverter_results.items():

    comb = res["combined"]
    par  = res["parallel"]

    dc_comb = comb["dc"]
    ac_comb = comb["ac"]

    for m in MODELS:
        if m in dc_comb:
            combined_dc_rmse[m].append(dc_comb[m]["rmse"])
            combined_dc_mae[m].append(dc_comb[m]["mae"])

        if m in ac_comb:
            combined_ac_rmse[m].append(ac_comb[m]["rmse"])
            combined_ac_mae[m].append(ac_comb[m]["mae"])

    avg_dc_rmse = par.get("avg_dc_rmse", {})
    avg_ac_rmse = par.get("avg_ac_rmse", {})
    avg_dc_mae  = par.get("avg_dc_mae", {})
    avg_ac_mae  = par.get("avg_ac_mae", {})

    dc_rmse_days = par.get("dc_rmse", {})
    ac_rmse_days = par.get("ac_rmse", {})

    for m in MODELS:

        if m in avg_dc_rmse:
            parallel_dc_rmse[m].append(avg_dc_rmse[m])

        if m in avg_ac_rmse:
            parallel_ac_rmse[m].append(avg_ac_rmse[m])

        if m in avg_dc_mae:
            parallel_dc_mae[m].append(avg_dc_mae[m])

        if m in avg_ac_mae:
            parallel_ac_mae[m].append(avg_ac_mae[m])

        if m in dc_rmse_days:
            parallel_dc_rmse_full[m].append(dc_rmse_days[m])

        if m in ac_rmse_days:
            parallel_ac_rmse_full[m].append(ac_rmse_days[m])


def mean_std(arr):
    if len(arr) == 0:
        return np.nan, np.nan
    return float(np.mean(arr)), float(np.std(arr))


# ==============================================================
# 3. BIAS–VARIANCE PROXIES
# ==============================================================

bias_proxy_dc = {}
bias_proxy_ac = {}
var_proxy_dc  = {}
var_proxy_ac  = {}

for m in MODELS:

    bias_proxy_dc[m] = mean_std(combined_dc_rmse[m])[0]
    bias_proxy_ac[m] = mean_std(combined_ac_rmse[m])[0]

    all_dc_days = []
    for lst in parallel_dc_rmse_full[m]:
        all_dc_days.extend(lst)

    all_ac_days = []
    for lst in parallel_ac_rmse_full[m]:
        all_ac_days.extend(lst)

    var_proxy_dc[m] = float(np.std(all_dc_days)) if all_dc_days else np.nan
    var_proxy_ac[m] = float(np.std(all_ac_days)) if all_ac_days else np.nan


# ==============================================================
# 4. SCATTER PLOTS
# ==============================================================

labels = MODELS

# DC scatter
x_dc = [var_proxy_dc[m] for m in labels]
y_dc = [bias_proxy_dc[m] for m in labels]

fig, ax = plt.subplots(figsize=(8, 6))
for i, m in enumerate(labels):
    ax.scatter(x_dc[i], y_dc[i])
    ax.text(x_dc[i]*1.01, y_dc[i]*1.01, m)

ax.set_xlabel("Variance proxy (std per-day RMSE, DC)")
ax.set_ylabel("Bias proxy (mean combined RMSE, DC)")
ax.set_title("Bias–Variance Proxy Plane — DC (Plant 2)")
ax.grid(True)

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_scatter_DC.png"))
plt.close(fig)


# AC scatter
x_ac = [var_proxy_ac[m] for m in labels]
y_ac = [bias_proxy_ac[m] for m in labels]

fig, ax = plt.subplots(figsize=(8, 6))
for i, m in enumerate(labels):
    ax.scatter(x_ac[i], y_ac[i])
    ax.text(x_ac[i]*1.01, y_ac[i]*1.01, m)

ax.set_xlabel("Variance proxy (std per-day RMSE, AC)")
ax.set_ylabel("Bias proxy (mean combined RMSE, AC)")
ax.set_title("Bias–Variance Proxy Plane — AC (Plant 2)")
ax.grid(True)

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_scatter_AC.png"))
plt.close(fig)


# ==============================================================
# 5. BAR PLOTS
# ==============================================================

x = np.arange(len(labels))
width = 0.35

# DC bars
fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, [bias_proxy_dc[m] for m in labels], width, label="Bias")
ax.bar(x + width/2, [var_proxy_dc[m] for m in labels], width, label="Variance")
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_title("Bias vs Variance — DC (Plant 2)")
ax.grid(axis="y")
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_bar_DC.png"))
plt.close(fig)

# AC bars
fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, [bias_proxy_ac[m] for m in labels], width, label="Bias")
ax.bar(x + width/2, [var_proxy_ac[m] for m in labels], width, label="Variance")
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_title("Bias vs Variance — AC (Plant 2)")
ax.grid(axis="y")
ax.legend()

fig.tight_layout()
fig.savefig(os.path.join(BIASVAR_FOLDER, "bias_variance_bar_AC.png"))
plt.close(fig)


# ==============================================================
# 6. NEURAL NETWORK LEARNING CURVES
# ==============================================================

# --- DC learning curves ---
if len(nn_loss_dc) > 0:

    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_dc = max(len(c) for c in nn_loss_dc.values())
    all_dc = np.full((len(nn_loss_dc), max_len_dc), np.nan)

    for i, (inv, curve) in enumerate(nn_loss_dc.items()):
        ax.plot(np.arange(len(curve)), curve, alpha=0.3)
        all_dc[i, :len(curve)] = curve

    mean_dc_curve = np.nanmean(all_dc, axis=0)
    ax.plot(np.arange(len(mean_dc_curve)), mean_dc_curve, linewidth=2.5, label="Mean")

    ax.set_title("Neural Network Loss — DC (Plant 2)")
    ax.set_xlabel("Epoch")
    ax.set_ylabel("Loss")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_learning_curves_DC.png"))
    plt.close(fig)


# --- AC learning curves ---
if len(nn_loss_ac) > 0:

    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_ac = max(len(c) for c in nn_loss_ac.values())
    all_ac = np.full((len(nn_loss_ac), max_len_ac), np.nan)

    for i, (inv, curve) in enumerate(nn_loss_ac.items()):
        ax.plot(np.arange(len(curve)), curve, alpha=0.3)
        all_ac[i, :len(curve)] = curve

    mean_ac_curve = np.nanmean(all_ac, axis=0)
    ax.plot(np.arange(len(mean_ac_curve)), mean_ac_curve, linewidth=2.5, label="Mean")

    ax.set_title("Neural Network Loss — AC (Plant 2)")
    ax.set_xlabel("Epoch")
    ax.set_ylabel("Loss")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_learning_curves_AC.png"))
    plt.close(fig)


# --- DC vs AC MEAN curves (Plant-1 style TRIMMING) ---
if len(nn_loss_dc) > 0 and len(nn_loss_ac) > 0:

    # DC mean
    max_len_dc = max(len(c) for c in nn_loss_dc.values())
    dc_mat = np.full((len(nn_loss_dc), max_len_dc), np.nan)
    for i, c in enumerate(nn_loss_dc.values()):
        dc_mat[i, :len(c)] = c
    mean_dc = np.nanmean(dc_mat, axis=0)

    # AC mean
    max_len_ac = max(len(c) for c in nn_loss_ac.values())
    ac_mat = np.full((len(nn_loss_ac), max_len_ac), np.nan)
    for i, c in enumerate(nn_loss_ac.values()):
        ac_mat[i, :len(c)] = c
    mean_ac = np.nanmean(ac_mat, axis=0)

    # Trim to same length
    L = min(len(mean_dc), len(mean_ac))
    mean_dc = mean_dc[:L]
    mean_ac = mean_ac[:L]

    fig, ax = plt.subplots(figsize=(10, 6))
    epochs = np.arange(L)

    ax.plot(epochs, mean_dc, label="Mean DC")
    ax.plot(epochs, mean_ac, label="Mean AC")

    ax.set_title("Mean NN Loss — DC vs AC (Plant 2)")
    ax.set_xlabel("Epoch")
    ax.set_ylabel("Loss")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fig.savefig(os.path.join(BIASVAR_FOLDER, "nn_mean_loss_DC_vs_AC.png"))
    plt.close(fig)


# ==============================================================
# DONE
# ==============================================================

print("\n==============================================")
print(" Bias–Variance analysis for Plant 2 completed.")
print(" Saved plots in:", BIASVAR_FOLDER)
print("==============================================")


Loaded 22 inverter result files from Plant 2.

 Bias–Variance analysis for Plant 2 completed.
 Saved plots in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models\00_BiasVariance


<a id="plant2-model-comparison"></a>

## 14. Global Model Comparison & NN Diagnostics (Plant 2)

In this step we summarise and visualise **model performance across all Plant 2 inverters**:

- Loads all Plant 2 `_results.pkl` files
- Aggregates combined and per-day (parallel) RMSE/MAE for each model
- Prints summary tables (mean ± std) for:
  - Combined DC/AC RMSE & MAE
  - Per-day (parallel) DC/AC RMSE & MAE
- Produces comparison plots:
  - Combined DC vs AC average RMSE
  - Parallel DC vs AC average RMSE
  - Combined vs parallel RMSE per model
  - Boxplots of RMSE distributions across inverters
- Visualises NN cost functions:
  - All DC and AC loss curves + mean loss
  - Mean DC vs mean AC loss
- Plots NN diagnostics distributions and DC vs AC bar charts:
  - Iterations, learning rate, momentum, total weights, training time
- Saves everything under `00_Comparison_Plots` in the Plant 2 models folder


In [None]:
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt

# ==============================================================
# CONFIG  – PLANT 2
# ==============================================================

###############################################################################################################################################################

# Change here

RESULTS_FOLDER = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models"
PLOTS_FOLDER   = os.path.join(RESULTS_FOLDER, "00_Comparison_Plots")

###############################################################################################################################################################


os.makedirs(PLOTS_FOLDER, exist_ok=True)

# We know we used these 5 models
MODELS = ["Linear", "Ridge", "Lasso", "RandomForest", "NeuralNet"]

# ==============================================================
# 1. LOAD ALL PER-INVERTER RESULTS
# ==============================================================

inverter_results = {}   # inverter_id -> results dict
nn_loss_dc       = {}   # inverter_id -> np.array loss curve (DC)
nn_loss_ac       = {}   # inverter_id -> np.array loss curve (AC)

# global NN diagnostics containers
nn_diag_dc = {
    "iterations": {},
    "learning_rate": {},
    "momentum": {},
    "total_weights": {},
    "train_time": {},
}
nn_diag_ac = {
    "iterations": {},
    "learning_rate": {},
    "momentum": {},
    "total_weights": {},
    "train_time": {},
}

for fname in os.listdir(RESULTS_FOLDER):
    if not fname.endswith("_results.pkl"):
        # skip master file or other pkl's
        continue

    fpath = os.path.join(RESULTS_FOLDER, fname)
    with open(fpath, "rb") as f:
        res = pickle.load(f)

    inverter_id = res.get("inverter_id", fname.replace("_results.pkl", ""))
    inverter_results[inverter_id] = res

    # Collect NN loss curves + diagnostics
    diag = res.get("nn_diag", {})

    # Expected structure:
    # nn_diag = {
    #   "dc": {"iterations":..., "learning_rate":..., "momentum":...,
    #          "total_weights":..., "train_time":..., "loss_curve":[...]},
    #   "ac": {... same keys ...}
    # }
    if "dc" in diag:
        dc_diag = diag["dc"]
        if "loss_curve" in dc_diag:
            nn_loss_dc[inverter_id] = np.array(dc_diag["loss_curve"], dtype=float)
        for key in nn_diag_dc.keys():
            if key in dc_diag:
                nn_diag_dc[key][inverter_id] = dc_diag[key]

    if "ac" in diag:
        ac_diag = diag["ac"]
        if "loss_curve" in ac_diag:
            nn_loss_ac[inverter_id] = np.array(ac_diag["loss_curve"], dtype=float)
        for key in nn_diag_ac.keys():
            if key in ac_diag:
                nn_diag_ac[key][inverter_id] = ac_diag[key]

print(f"Loaded {len(inverter_results)} Plant 2 inverter result files.")


# ==============================================================
# 2. COLLECT METRICS ACROSS INVERTERS
# ==============================================================

# Combined (all days merged)
combined_dc_rmse = {m: [] for m in MODELS}
combined_ac_rmse = {m: [] for m in MODELS}
combined_dc_mae  = {m: [] for m in MODELS}
combined_ac_mae  = {m: [] for m in MODELS}

# Parallel (average per-day metrics stored in avg_* fields)
parallel_dc_rmse = {m: [] for m in MODELS}
parallel_ac_rmse = {m: [] for m in MODELS}
parallel_dc_mae  = {m: [] for m in MODELS}
parallel_ac_mae  = {m: [] for m in MODELS}

for inv_id, res in inverter_results.items():
    comb = res["combined"]
    par  = res["parallel"]

    # --- combined ---
    dc_comb = comb["dc"]
    ac_comb = comb["ac"]

    for m in MODELS:
        if m in dc_comb:
            combined_dc_rmse[m].append(dc_comb[m]["rmse"])
            combined_dc_mae[m].append(dc_comb[m]["mae"])
        if m in ac_comb:
            combined_ac_rmse[m].append(ac_comb[m]["rmse"])
            combined_ac_mae[m].append(ac_comb[m]["mae"])

    # --- parallel (use avg_* dicts) ---
    avg_dc_rmse = par.get("avg_dc_rmse", {})
    avg_ac_rmse = par.get("avg_ac_rmse", {})
    avg_dc_mae  = par.get("avg_dc_mae", {})
    avg_ac_mae  = par.get("avg_ac_mae", {})

    for m in MODELS:
        if m in avg_dc_rmse:
            parallel_dc_rmse[m].append(avg_dc_rmse[m])
        if m in avg_ac_rmse:
            parallel_ac_rmse[m].append(avg_ac_rmse[m])
        if m in avg_dc_mae:
            parallel_dc_mae[m].append(avg_dc_mae[m])
        if m in avg_ac_mae:
            parallel_ac_mae[m].append(avg_ac_mae[m])

# Helper to compute mean & std, ignoring empty lists
def mean_std(arr):
    if len(arr) == 0:
        return np.nan, np.nan
    return float(np.mean(arr)), float(np.std(arr))


# ==============================================================
# 3. PRINT SUMMARY TABLES (COMBINED vs PARALLEL)
# ==============================================================

print("\n================== PLANT 2 — MODEL COMPARISON: COMBINED DATA ==================")
print("Model         | DC_RMSE(mean±std)    | AC_RMSE(mean±std)    | DC_MAE(mean±std)     | AC_MAE(mean±std)")
print("-----------------------------------------------------------------------------------------------")
for m in MODELS:
    dc_rmse_mean, dc_rmse_std = mean_std(combined_dc_rmse[m])
    ac_rmse_mean, ac_rmse_std = mean_std(combined_ac_rmse[m])
    dc_mae_mean,  dc_mae_std  = mean_std(combined_dc_mae[m])
    ac_mae_mean,  ac_mae_std  = mean_std(combined_ac_mae[m])

    print(f"{m:12s} | "
          f"{dc_rmse_mean:8.3f}±{dc_rmse_std:6.3f} | "
          f"{ac_rmse_mean:8.3f}±{ac_rmse_std:6.3f} | "
          f"{dc_mae_mean:8.3f}±{dc_mae_std:6.3f} | "
          f"{ac_mae_mean:8.3f}±{ac_mae_std:6.3f}")
print("=====================================================================\n")


print("================== PLANT 2 — MODEL COMPARISON: PARALLEL (PER-DAY AVG) ==================")
print("Model         | DC_RMSE(mean±std)    | AC_RMSE(mean±std)    | DC_MAE(mean±std)     | AC_MAE(mean±std)")
print("-----------------------------------------------------------------------------------------------")
for m in MODELS:
    dc_rmse_mean, dc_rmse_std = mean_std(parallel_dc_rmse[m])
    ac_rmse_mean, ac_rmse_std = mean_std(parallel_ac_rmse[m])
    dc_mae_mean,  dc_mae_std  = mean_std(parallel_dc_mae[m])
    ac_mae_mean,  ac_mae_std  = mean_std(parallel_ac_mae[m])

    print(f"{m:12s} | "
          f"{dc_rmse_mean:8.3f}±{dc_rmse_std:6.3f} | "
          f"{ac_rmse_mean:8.3f}±{ac_rmse_std:6.3f} | "
          f"{dc_mae_mean:8.3f}±{dc_mae_std:6.3f} | "
          f"{ac_mae_mean:8.3f}±{ac_mae_std:6.3f}")
print("==========================================================================\n")


# ==============================================================
# 4. PLOT: COMBINED DC vs AC (AVERAGE ACROSS INVERTERS)
# ==============================================================

labels = MODELS
x = np.arange(len(labels))
width = 0.35

avg_combined_dc_rmse = [mean_std(combined_dc_rmse[m])[0] for m in labels]
avg_combined_ac_rmse = [mean_std(combined_ac_rmse[m])[0] for m in labels]

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, avg_combined_dc_rmse, width, label="DC RMSE (Combined)")
ax.bar(x + width/2, avg_combined_ac_rmse, width, label="AC RMSE (Combined)")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("RMSE")
ax.set_title("Plant 2 — Average Combined RMSE — DC vs AC across inverters")
ax.legend()
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "combined_avg_rmse_dc_vs_ac_P2.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 5. PLOT: PARALLEL DC vs AC (AVERAGE PER-DAY ACROSS INVERTERS)
# ==============================================================

avg_parallel_dc_rmse = [mean_std(parallel_dc_rmse[m])[0] for m in labels]
avg_parallel_ac_rmse = [mean_std(parallel_ac_rmse[m])[0] for m in labels]

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - width/2, avg_parallel_dc_rmse, width, label="DC RMSE (Parallel avg)")
ax.bar(x + width/2, avg_parallel_ac_rmse, width, label="AC RMSE (Parallel avg)")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("RMSE")
ax.set_title("Plant 2 — Average Per-Day Parallel RMSE — DC vs AC across inverters")
ax.legend()
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "parallel_avg_rmse_dc_vs_ac_P2.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 6. PLOT: COMBINED vs PARALLEL (RMSE) PER MODEL
# ==============================================================

fig, ax = plt.subplots(figsize=(12, 6))
width = 0.18
x = np.arange(len(labels))

c_dc = [mean_std(combined_dc_rmse[m])[0] for m in labels]
c_ac = [mean_std(combined_ac_rmse[m])[0] for m in labels]
p_dc = [mean_std(parallel_dc_rmse[m])[0] for m in labels]
p_ac = [mean_std(parallel_ac_rmse[m])[0] for m in labels]

ax.bar(x - 1.5*width, c_dc, width, label="Combined DC")
ax.bar(x - 0.5*width, c_ac, width, label="Combined AC")
ax.bar(x + 0.5*width, p_dc, width, label="Parallel DC")
ax.bar(x + 1.5*width, p_ac, width, label="Parallel AC")

ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.set_ylabel("RMSE")
ax.set_title("Plant 2 — Combined vs Parallel RMSE — DC & AC")
ax.legend()
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "combined_vs_parallel_rmse_P2.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 7. BOX PLOTS (DISTRIBUTION ACROSS INVERTERS)
# ==============================================================

# DC combined
fig, ax = plt.subplots(figsize=(10, 5))
data = [combined_dc_rmse[m] for m in labels]
ax.boxplot(data, labels=labels, showmeans=True)
ax.set_ylabel("RMSE")
ax.set_title("Plant 2 — Distribution of DC RMSE (Combined) across inverters")
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "boxplot_combined_dc_rmse_P2.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)

# AC combined
fig, ax = plt.subplots(figsize=(10, 5))
data = [combined_ac_rmse[m] for m in labels]
ax.boxplot(data, labels=labels, showmeans=True)
ax.set_ylabel("RMSE")
ax.set_title("Plant 2 — Distribution of AC RMSE (Combined) across inverters")
ax.grid(axis="y")

fig.tight_layout()
fig.savefig(os.path.join(PLOTS_FOLDER, "boxplot_combined_ac_rmse_P2.png"),
            dpi=150, bbox_inches="tight")
plt.close(fig)


# ==============================================================
# 8. COST FUNCTION PER ITERATION (NEURAL NET) — LOSS CURVES DC & AC
# ==============================================================

# ----- DC cost function -----
if len(nn_loss_dc) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_dc = max(len(curve) for curve in nn_loss_dc.values())
    all_curves_dc = np.full((len(nn_loss_dc), max_len_dc), np.nan)

    for idx, (inv_id, curve) in enumerate(nn_loss_dc.items()):
        epochs = np.arange(len(curve))
        ax.plot(epochs, curve, alpha=0.3, label=inv_id)
        all_curves_dc[idx, :len(curve)] = curve

    mean_curve_dc = np.nanmean(all_curves_dc, axis=0)
    ax.plot(np.arange(len(mean_curve_dc)), mean_curve_dc,
            linewidth=2.5, label="Mean across inverters")

    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss (Cost Function)")
    ax.set_title("Plant 2 — Neural Network Training Loss — DC (Combined data)")
    ax.grid(True)
    ax.legend(loc="upper right", fontsize=8, ncol=2)

    fig.tight_layout()
    fig.savefig(os.path.join(PLOTS_FOLDER, "nn_loss_curves_all_inverters_DC_P2.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)
else:
    print("No DC nn_diag loss_curve found; skipping DC cost-function plot.")

# ----- AC cost function -----
if len(nn_loss_ac) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))

    max_len_ac = max(len(curve) for curve in nn_loss_ac.values())
    all_curves_ac = np.full((len(nn_loss_ac), max_len_ac), np.nan)

    for idx, (inv_id, curve) in enumerate(nn_loss_ac.items()):
        epochs = np.arange(len(curve))
        ax.plot(epochs, curve, alpha=0.3, label=inv_id)
        all_curves_ac[idx, :len(curve)] = curve

    mean_curve_ac = np.nanmean(all_curves_ac, axis=0)
    ax.plot(np.arange(len(mean_curve_ac)), mean_curve_ac,
            linewidth=2.5, label="Mean across inverters")

    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss (Cost Function)")
    ax.set_title("Plant 2 — Neural Network Training Loss — AC (Combined data)")
    ax.grid(True)
    ax.legend(loc="upper right", fontsize=8, ncol=2)

    fig.tight_layout()
    fig.savefig(os.path.join(PLOTS_FOLDER, "nn_loss_curves_all_inverters_AC_P2.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)
else:
    print("No AC nn_diag loss_curve found; skipping AC cost-function plot.")

# ----- DC vs AC mean loss in a single figure -----
if len(nn_loss_dc) > 0 and len(nn_loss_ac) > 0:
    # Compute mean DC
    max_len_dc = max(len(c) for c in nn_loss_dc.values())
    dc_mat = np.full((len(nn_loss_dc), max_len_dc), np.nan)
    for i, c in enumerate(nn_loss_dc.values()):
        dc_mat[i, :len(c)] = c
    mean_dc = np.nanmean(dc_mat, axis=0)

    # Compute mean AC
    max_len_ac = max(len(c) for c in nn_loss_ac.values())
    ac_mat = np.full((len(nn_loss_ac), max_len_ac), np.nan)
    for i, c in enumerate(nn_loss_ac.values()):
        ac_mat[i, :len(c)] = c
    mean_ac = np.nanmean(ac_mat, axis=0)

    # Align lengths
    L = min(len(mean_dc), len(mean_ac))
    mean_dc = mean_dc[:L]
    mean_ac = mean_ac[:L]

    fig, ax = plt.subplots(figsize=(10, 6))
    epochs = np.arange(L)
    ax.plot(epochs, mean_dc, label="Mean DC Loss")
    ax.plot(epochs, mean_ac, label="Mean AC Loss")
    ax.set_xlabel("Iteration / Epoch")
    ax.set_ylabel("Loss")
    ax.set_title("Plant 2 — Neural Net Cost Function Comparison — DC vs AC (mean over inverters)")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fig.savefig(os.path.join(PLOTS_FOLDER, "nn_mean_loss_DC_vs_AC_P2.png"),
                dpi=150, bbox_inches="tight")
    plt.close(fig)


# ==============================================================
# 9. GLOBAL NN DIAGNOSTICS DISTRIBUTIONS (DC & AC)
# ==============================================================

def plot_nn_diag_histograms(diag_dict_dc, diag_dict_ac, key, pretty_name):
    """
    diag_dict_*[key] = {inverter_id -> value}
    """
    vals_dc = list(diag_dict_dc[key].values())
    vals_ac = list(diag_dict_ac[key].values())

    if len(vals_dc) == 0 and len(vals_ac) == 0:
        return

    fig, ax = plt.subplots(figsize=(10, 5))
    if len(vals_dc) > 0:
        ax.hist(vals_dc, bins=10, alpha=0.5, label="DC")
    if len(vals_ac) > 0:
        ax.hist(vals_ac, bins=10, alpha=0.5, label="AC")

    ax.set_xlabel(pretty_name)
    ax.set_ylabel("Count")
    ax.set_title(f"Plant 2 — Distribution of Neural Net {pretty_name} across inverters")
    ax.grid(True)
    ax.legend()

    fig.tight_layout()
    fname = f"nn_diag_hist_{key}_P2.png"
    fig.savefig(os.path.join(PLOTS_FOLDER, fname),
                dpi=150, bbox_inches="tight")
    plt.close(fig)


def plot_nn_diag_bar_means(diag_dict_dc, diag_dict_ac, key, pretty_name):
    vals_dc = list(diag_dict_dc[key].values())
    vals_ac = list(diag_dict_ac[key].values())

    if len(vals_dc) == 0 and len(vals_ac) == 0:
        return

    mean_dc, std_dc = mean_std(vals_dc)
    mean_ac, std_ac = mean_std(vals_ac)

    fig, ax = plt.subplots(figsize=(6, 5))
    x = np.arange(2)
    means = [mean_dc, mean_ac]
    stds  = [std_dc, std_ac]

    ax.bar(x, means, yerr=stds, capsize=5, tick_label=["DC", "AC"])
    ax.set_ylabel(pretty_name)
    ax.set_title(f"Plant 2 — Neural Net {pretty_name} — DC vs AC (mean ± std)")
    ax.grid(axis="y")

    fig.tight_layout()
    fname = f"nn_diag_bar_{key}_P2.png"
    fig.savefig(os.path.join(PLOTS_FOLDER, fname),
                dpi=150, bbox_inches="tight")
    plt.close(fig)


pretty_names = {
    "iterations": "Iterations",
    "learning_rate": "Learning Rate",
    "momentum": "Momentum",
    "total_weights": "Total Weights",
    "train_time": "Training Time (s)",
}

for k, nm in pretty_names.items():
    plot_nn_diag_histograms(nn_diag_dc, nn_diag_ac, k, nm)
    plot_nn_diag_bar_means(nn_diag_dc, nn_diag_ac, k, nm)


# ==============================================================
# 10. FINAL MESSAGE
# ==============================================================

print("\n✅ Plant 2 model comparison complete.")
print(f"All Plant 2 comparison plots saved in: {PLOTS_FOLDER}")


Loaded 22 Plant 2 inverter result files.

Model         | DC_RMSE(mean±std)    | AC_RMSE(mean±std)    | DC_MAE(mean±std)     | AC_MAE(mean±std)
-----------------------------------------------------------------------------------------------
Linear       |  142.517±23.621 |  139.341±23.058 |   78.048±19.144 |   76.490±18.688
Ridge        |  142.622±23.383 |  139.440±22.831 |   78.710±18.778 |   77.144±18.340
Lasso        |  142.517±23.620 |  139.341±23.058 |   78.049±19.144 |   76.491±18.688
RandomForest |   97.808±11.499 |   95.435±11.163 |   35.227± 4.327 |   34.365± 4.208
NeuralNet    |  166.962±23.942 |  156.158±22.131 |  105.542±19.585 |   96.947±18.826

Model         | DC_RMSE(mean±std)    | AC_RMSE(mean±std)    | DC_MAE(mean±std)     | AC_MAE(mean±std)
-----------------------------------------------------------------------------------------------
Linear       |   66.908±13.426 |   65.559±13.160 |   45.198± 9.519 |   44.350± 9.317
Ridge        |   76.886±10.187 |   75.339±10.005 | 

  ax.boxplot(data, labels=labels, showmeans=True)
  ax.boxplot(data, labels=labels, showmeans=True)



✅ Plant 2 model comparison complete.
All Plant 2 comparison plots saved in: C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\02 Plant2_Inverter_Models\00_Comparison_Plots
